Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapy Playwright load chrome extensions and configure them #310

Open
milan-cp-dev opened this issue Aug 7, 2024 · 3 comments
Open

Scrapy Playwright load chrome extensions and configure them #310

milan-cp-dev opened this issue Aug 7, 2024 · 3 comments
Labels
support Support questions

Comments

@milan-cp-dev
Copy link

Do you have ready to go method to init chrome extension of captcha service and configure it before visiting the page and obtaining page context?

@elacuesta
Copy link
Member

playwright_page_init_callback might be useful. According to these upstream docs you need to access the context, which you can with the following page init callback:

async def init_page(page, request):
    context = page.context
    if len(context.background_pages) == 0:
        background_page = await context.wait_for_event('backgroundpage')
    else:
        background_page = context.background_pages[0]

Otherwise you'll need to elaborate on your use case.

@milan-cp-dev
Copy link
Author

Thanks! Will look into it.

@elacuesta elacuesta added the support Support questions label Aug 8, 2024
@milan-cp-dev
Copy link
Author

Hello,

Goal is to load chrome extensions. I have minimum reproducible example. I still can’t figure out how to load extensions. One example that loads any extension would be greatly appreciated.

My code uses scrapy-playwright to make request with persistent context and attempts to load chrome extension.

Chrome extension is obtained from:
https://antcpt.com/eng/home.html
https://anti-captcha.com/
https://github.com/anti-captcha-plugin/anti-captcha-plugin?tab=readme-ov-file
Chrome extension updated API key in config_ac_api_key.js file inside js folder from anticaptcha-plugin_v0.67.zip
anticaptcha-plugin_v0.67.zip

Following commands are executed:
scrapy startproject playwrightextensions
cd playwrightextensions
CaptchaSpider.py added in spiders
CaptchaSpider.py.zip
xvfb-run -a scrapy crawl CaptchaSpider

Expectations:
Extension loaded, attempt to resolve captcha recorded
Reality:
Extension doesn’t load
CaptchaSpider

Test done with clean playwright:
playwrightextensions.py
playwrightextensions.py.zip

Tests done with same anticaptcha-plugin_v0.67 folder inside clean playwright as well as in regular Chrome browser:
Extension loaded, attempt to resolve captcha recorded
playwrightextensions

Versions:

playwright --version
Version 1.39.0
python -c "import scrapy_playwright; print(scrapy_playwright.version)"
0.0.36

scrapy version -v
INFO:scrapy.utils.log:Scrapy 2.11.2 started (bot: playwrightextensions)
INFO:scrapy.utils.log:Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.3.0, Python 3.11.5 (main, Jun 26 2024, 21:00:36) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)], pyOpenSSL 24.1.0 (OpenSSL 3.2.2 4 Jun 2024), cryptography 42.0.8, Platform Linux-5.14.0-284.25.1.el9_2.x86_64-x86_64-with-glibc2.34
Scrapy : 2.11.2
lxml : 4.9.2.0
libxml2 : 2.9.14
cssselect : 1.2.0
parsel : 1.9.1
w3lib : 2.2.1
Twisted : 24.3.0
Python : 3.11.5 (main, Jun 26 2024, 21:00:36) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]
pyOpenSSL : 24.1.0 (OpenSSL 3.2.2 4 Jun 2024)
cryptography : 42.0.8
Platform : Linux-5.14.0-284.25.1.el9_2.x86_64-x86_64-with-glibc2.34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Support questions
Projects
None yet
Development

No branches or pull requests

2 participants