images don't appear to get read from the persistent context properly / cached

I'm having trouble getting Scrapy + Playwright to respect caches when crawling, when using a persistent context. I've tried to get it down to a minimal example, which you can see here:

https://github.com/pjlsergeant/scrapy-playwright-cache-bug

app.py is a minimal Flask app to demonstrate; if you start it (`flask run`) and then run the scrape (`scrapy crawl crawl`), you can see that the PNG at `/pixel` doesn't get cached, both from the flask logs and by the final body output: `<html><head></head><body>count:6</body></html>`, signifying 6 hits.

Interestingly, if you then manually load up Playwright using the persistent config (something like `browser_context = chromium.launch_persistent_context(userDataDir)`), you'll see the image is _already_ cached, so the image _is_ being written to the cache during Playwright+Scrapy's run, it's just not being *loaded* from the cache when Playwright is being driven by Scrapy.

Any help gratefully received


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

images don't appear to get read from the persistent context properly / cached #198

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

images don't appear to get read from the persistent context properly / cached #198

Description

Activity

elacuesta commented on Sep 4, 2023

alembiewski commented on Dec 9, 2024

elacuesta commented on Dec 28, 2024

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions