Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overridden method for Playwright request to original=POST new=GET #239

Closed
tommylge opened this issue Oct 27, 2023 · 1 comment
Closed

Overridden method for Playwright request to original=POST new=GET #239

tommylge opened this issue Oct 27, 2023 · 1 comment

Comments

@tommylge
Copy link

tommylge commented Oct 27, 2023

Hello, i'm facing an issue concerning this method.
I don't understand why you do so, and it causes an error regarding my script.

I've replaced
overrides["method"] = method with overrides["method"] = playwright_request.method.upper() and it works fine, i've seen some issues that might be related but not sure about the solution / answer you provide them.

scrapy_playwright/handler.py -> line n°598.

The issue caused when i do not replace your code:

DEBUG    [14:02:00]    DEBUG     [Context=unblocked] Overridden method for Playwright request to https://www.example.com/: original=POST new=GET  handler.py:611
ERROR    [14:02:00]    ERROR     Error downloading <GET https://www.example.com>                                                                  scraper.py:328
         Traceback (most recent call last):                                                                                                                        
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/twisted/internet/defer.py", line 1693, in _inlineCallbacks               
             result = context.run(                                                                                                                                 
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/twisted/python/failure.py", line 518, in                                 
         throwExceptionIntoGenerator                                                                                                                               
             return g.throw(self.type, self.value, self.tb)                                                                                                        
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/scrapy/core/downloader/middleware.py", line 54, in                       
         process_request                                                                                                                                           
             return (yield download_func(request=request, spider=spider))                                                                                          
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/twisted/internet/defer.py", line 1065, in adapt                          
             extracted = result.result()                                                                                                                           
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 324, in                              
         _download_request                                                                                                                                         
             return await self._download_request_with_page(request, page, spider)                                                                                  
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 376, in                              
         _download_request_with_page                                                                                                                               
             await self._apply_page_methods(page, request, spider)                                                                                                 
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 490, in                              
         _apply_page_methods                                                                                                                                       
             pm.result = await _maybe_await(method(*pm.args, **pm.kwargs))                                                                                         
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/scrapy_playwright/_utils.py", line 16, in _maybe_await                   
             return await obj                                                                                                                                      
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/playwright/async_api/_generated.py", line 9408, in                       
         wait_for_url                                                                                                                                              
             await self._impl_obj.wait_for_url(                                                                                                                    
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/playwright/_impl/_page.py", line 498, in wait_for_url                    
             return await self._main_frame.wait_for_url(**locals_to_params(locals()))                                                                              
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/playwright/_impl/_frame.py", line 226, in wait_for_url                   
             async with self.expect_navigation(                                                                                                                    
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/playwright/_impl/_event_context_manager.py", line 33, in                 
         __aexit__                                                                                                                                                 
             await self._future                                                                                                                                    
           File "/Users/x/Desktop/driver_tester/.venv/lib/python3.11/site-packages/playwright/_impl/_frame.py", line 203, in continuation                   
             raise Error(event["error"])                                                                                                                           
         playwright._impl._api_types.Error: resource exceeds maximum size

My script:

class TestSpider(scrapy.Spider):
    name = "test"

    def __init__(self, url: str, wait_url: str | None, *args, **kwargs) -> None:
        self.url = url
        self.wait_url = wait_url

        if not self.url:
            raise Exception('Missing url in spider.')

        super().__init__(*args, **kwargs)

    def start_requests(self):
        yield scrapy.Request(url=self.url, meta={
            'playwright': True,
            'playwright_include_page': True,
            'playwright_context': 'custom',
            'playwright_page_goto_kwargs': {
                'wait_until': 'load',
            },
            'playwright_page_methods': (
                PageMethod('wait_for_url', self.wait_url),
            ),
        })

    def parse(self, response: Response, **kwargs):
        LOGGER.info(f'[Spider] Parsing page: {response.url}')

My settings related:

PLAYWRIGHT_PROCESS_REQUEST_HEADERS = None

Possible related issues: #176

@elacuesta
Copy link
Member

See this comment for an explanation on why it's necessary to override the method for certain requests. There's a number of things that need to happen for this to occur, make sure you're using the latest version of this package because this was modified not so long ago (#177).

@elacuesta elacuesta closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants