We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot close spider through SIGINT (ctrl+c)
My code:
import scrapy from scrapy.linkextractors import LinkExtractor from scrapy_playwright.page import PageMethod meta={ 'playwright': True, 'playwright_include_page': True, 'playwright_page_methods': [PageMethod('wait_for_load_state','networkidle')] } async def error_back(failure): page = failure.request.meta["playwright_page"] await page.close() class MsuSpider(scrapy.Spider): name = "msu" allowed_domains = ["msu.ru"] start_urls = ["https://msu.ru"] custom_settings = { 'PLAYWRIGHT_BROWSER_TYPE': 'chromium', 'HTTPCACHE_ENABLED': False, "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor", "DOWNLOAD_HANDLERS": { "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", } } def __init__(self, **kwargs): super().__init__(**kwargs) self.link_extractor = LinkExtractor(unique=True) def start_requests(self) : start_urls = ["https://msu.ru"] for url in start_urls: yield scrapy.Request( url=url, meta=meta, callback=self.parse, errback=error_back ) async def parse(self, response, **kwargs): page = response.meta["playwright_page"] screenshot = await page.screenshot(path=f"{response.url}.png", full_page=True) await page.close() links_on_current_page = self.link_extractor.extract_links(response) for link in links_on_current_page: yield response.follow(link.url, callback=self.parse, meta=meta, errback=error_back)
logs:
2024-10-17 14:08:04 [scrapy.crawler] INFO: Received SIGINT, shutting down gracefully. Send again to force 2024-10-17 14:08:04 [scrapy.core.engine] INFO: Closing spider (shutdown) 2024-10-17 14:25:12 [scrapy-playwright] INFO: Launching browser chromium 2024-10-17 14:25:12 [scrapy.core.scraper] ERROR: Error downloading <GET https://msu.ru/ch> Traceback (most recent call last): File "/home/../../../venv/lib/python3.12/site-packages/twisted/internet/defer.py", line 1999, in _inlineCallbacks result = context.run( File "/home/../../../venv/lib/python3.12/site-packages/twisted/python/failure.py", line 519, in throwExceptionIntoGenerator return g.throw(self.value.with_traceback(self.tb)) File "/home/../../../venv/lib/python3.12/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_request return (yield download_func(request=request, spider=spider)) File "/home/../../../venv/lib/python3.12/site-packages/twisted/internet/defer.py", line 1251, in adapt extracted: _SelfResultT | Failure = result.result() File "/home/../../../venv/lib/python3.12/site-packages/scrapy_playwright/handler.py", line 378, in _download_request return await self._download_request_with_retry(request=request, spider=spider) File "/home/../../../venv/lib/python3.12/site-packages/scrapy_playwright/handler.py", line 397, in _download_request_with_retry page = await self._create_page(request=request, spider=spider) File "/home/../../.../venv/lib/python3.12/site-packages/scrapy_playwright/handler.py", line 296, in _create_page ctx_wrapper = await self._create_browser_context( File "/home/../../../venv/lib/python3.12/site-packages/scrapy_playwright/handler.py", line 257, in _create_browser_context await self._maybe_launch_browser() File "/home/../../../venv/lib/python3.12/site-packages/scrapy_playwright/handler.py", line 205, in _maybe_launch_browser self.browser = await self.browser_type.launch(**self.config.launch_options) File "/home/../../../venv/lib/python3.12/site-packages/playwright/async_api/_generated.py", line 14115, in launch await self._impl_obj.launch( File "/home/../../../venv/lib/python3.12/site-packages/playwright/_impl/_browser_type.py", line 95, in launch Browser, from_channel(await self._channel.send("launch", params)) File "/home/../../../venv/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 59, in send return await self._connection.wrap_api_call( File "/home/../../../venv/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 514, in wrap_api_call raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None Exception: BrowserType.launch: Connection closed while reading from the driver 2024-10-17 14:25:12 [scrapy.core.scraper] ERROR: Spider error processing <GET https://msu.ru/ch> (referer: https://msu.ru/) Traceback (most recent call last): File "/home/../../../venv/lib/python3.12/site-packages/twisted/internet/defer.py", line 1251, in adapt extracted: _SelfResultT | Failure = result.result() ^^^^^^^^^^^^^^^ File "/home/../../../../crawlers/spiders/msu.py", line 28, in error_back page = failure.request.meta["playwright_page"] ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^ KeyError: 'playwright_page' playwright._impl._errors.TargetClosedError: Page.screenshot: Target page, context or browser has been closed Call log: taking page screenshot - waiting for fonts to load... - fonts loaded /home/../../../venv/lib/python3.12/site-packages/playwright/driver/package/lib/server/chromium/crPage.js:491 this._firstNonInitialNavigationCommittedReject(new _errors.TargetClosedError()); ^ TargetClosedError: Target page, context or browser has been closed at FrameSession.dispose (/home/../../../venv/lib/python3.12/site-packages/playwright/driver/package/lib/server/chromium/crPage.js:491:52) at CRPage.didClose (/home/../../../venv/lib/python3.12/site-packages/playwright/driver/package/lib/server/chromium/crPage.js:162:60) at CRBrowser._onDetachedFromTarget (/home/../../../venv/lib/python3.12/site-packages/playwright/driver/package/lib/server/chromium/crBrowser.js:200:14) at CRSession.emit (node:events:519:28) at /home/../../../venv/lib/python3.12/site-packages/playwright/driver/package/lib/server/chromium/crConnection.js:160:14 Node.js v20.17.0 2024-10-17 14:08:25 [asyncio] WARNING: pipe closed by peer or os.write(pipe, data) raised exception. 2024-10-17 14:08:28 [asyncio] WARNING: pipe closed by peer or os.write(pipe, data) raised exception. ... 2024-10-17 14:13:58 [scrapy.extensions.logstats] INFO: Crawled 66 pages (at 0 pages/min), scraped 62 items (at 0 items/min)
last lines continues forever
The text was updated successfully, but these errors were encountered:
Is this the same as #62?
Sorry, something went wrong.
No branches or pull requests
Cannot close spider through SIGINT (ctrl+c)
My code:
logs:
last lines continues forever
The text was updated successfully, but these errors were encountered: