-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cookies not transferring between page.goto() calls - stay logged in using same playwright page #149
Comments
By default headers (including cookies) are not handled by the browser, instead they are overridden (source) with the headers that come from the Scrapy request. This means cookies are the result of the processing done by Scrapy's built-in
|
Thank you for your response. I understand now that the cookies are handled by Scrapy and its middlewares and not within the browser. I did try setting PLAYWRIGHT_PROCESS_REQUEST_HEADERS=None but it seemed that did not work and the cookies were not loaded. But I did try using the PLAYWRIGHT_CONTEXTS option. I had to do some rigging and am curious as to why it works. My spider:
The issue I ran into is with the page methods. I had to input the username and password, then click a button which redirects me to the main webpage of the website (login web page redirected to main web page). The issue is that the page would not properly load when the button was clicked and it hung in a loading state until it timed out. I found that by commenting out the lines 267-277 in the handler.py file, it would work but would not wait for the page to load, hence why I added: "load1": PageMethod("wait_for_url", url='https://www.testwebsite.com/main/'). Screenshot of handler.py lines commented out: So my question is, is this part of the code needed? Could it possibly cause issues in the future? I do hope this all makes sense. I have been working on this all week and am learning it as fast as I can. I do appreciate all the help and advice. -Benz |
That bit of code is necessary because:
That said, I'd actually expect the cookies from the context not to be sent because of the overriding of the headers that happens in there, but it seems like it is not working the way I though it was. I've opened microsoft/playwright-python#1686 upstream regarding this. |
Hey, I'm experiencing a similar issue. |
There has been no work related to this issue. If you think there's a bug, please provide a minimal, reproducible example. |
Yes, I think there might be a bug, or there could be another way of applying cookies that I'm not seeing.
In the first case, some cookies were sent but not the one I was interested in. In the second case, no cookies were sent. |
Closing due to inactivity. |
I am running into an issue where if I set cookies to a page, the webpage loads as if the cookies are there (ie accepting the terms and conditions popup) but then when I load the same webpage using the same playwright page, the page loads with the terms and conditions popup as if the cookies are not there.
The following code is a simple way of accessing the url with cookies and then uses the same page to reload the same webpage url. The original url I was using is a webpage only I have access to unfortunately and could not provide the page. I am also using the: PLAYWRIGHT_LAUNCH_OPTIONS = {"headless": False} option within the settings.py file for scrapy.
I also tried the above with only using playwright and the webpage loads both times as if if has cookies set.
I understand I could just add the cookies in the yield scrapy.Request() inside of the parse1 function. The reason I am doing it this way is because in the yield scrapy.Request() inside of start_requests I will be using a PageMethod call in order to log into a website. I want to stay logged in throughout the entire scrapy session, using a single page to load all the urls I need.
What I also found interesting is that I went directly inside the code in handler.ScrapyPlaywrightDownloadHandler and changed the code to make a page.goto() request right after the original page.goto() request but still got the same outcome of no cookies loaded on second page.goto() call. Shown below starting at line 296
Is there a way to log into the initial playwright page and stay logged in? Or is there a part of the code within the handler.py file that is preventing the cookies and login information from staying with each page call?
The text was updated successfully, but these errors were encountered: