Scrapy Playwright load chrome extensions and configure them #310

milan-cp-dev · 2024-08-07T14:52:15Z

Do you have ready to go method to init chrome extension of captcha service and configure it before visiting the page and obtaining page context?

elacuesta · 2024-08-07T16:08:38Z

playwright_page_init_callback might be useful. According to these upstream docs you need to access the context, which you can with the following page init callback:

async def init_page(page, request):
    context = page.context
    if len(context.background_pages) == 0:
        background_page = await context.wait_for_event('backgroundpage')
    else:
        background_page = context.background_pages[0]

Otherwise you'll need to elaborate on your use case.

milan-cp-dev · 2024-08-07T16:50:50Z

Thanks! Will look into it.

milan-cp-dev · 2024-08-27T23:06:19Z

Hello,

Goal is to load chrome extensions. I have minimum reproducible example. I still can’t figure out how to load extensions. One example that loads any extension would be greatly appreciated.

My code uses scrapy-playwright to make request with persistent context and attempts to load chrome extension.

Chrome extension is obtained from:
https://antcpt.com/eng/home.html
https://anti-captcha.com/
https://github.com/anti-captcha-plugin/anti-captcha-plugin?tab=readme-ov-file
Chrome extension updated API key in config_ac_api_key.js file inside js folder from anticaptcha-plugin_v0.67.zip
anticaptcha-plugin_v0.67.zip

Following commands are executed:
scrapy startproject playwrightextensions
cd playwrightextensions
CaptchaSpider.py added in spiders
CaptchaSpider.py.zip
xvfb-run -a scrapy crawl CaptchaSpider

Expectations:
Extension loaded, attempt to resolve captcha recorded
Reality:
Extension doesn’t load
CaptchaSpider

Test done with clean playwright:
playwrightextensions.py
playwrightextensions.py.zip

Tests done with same anticaptcha-plugin_v0.67 folder inside clean playwright as well as in regular Chrome browser:
Extension loaded, attempt to resolve captcha recorded
playwrightextensions

Versions:

playwright --version
Version 1.39.0
python -c "import scrapy_playwright; print(scrapy_playwright.version)"
0.0.36

scrapy version -v
INFO:scrapy.utils.log:Scrapy 2.11.2 started (bot: playwrightextensions)
INFO:scrapy.utils.log:Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.3.0, Python 3.11.5 (main, Jun 26 2024, 21:00:36) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)], pyOpenSSL 24.1.0 (OpenSSL 3.2.2 4 Jun 2024), cryptography 42.0.8, Platform Linux-5.14.0-284.25.1.el9_2.x86_64-x86_64-with-glibc2.34
Scrapy : 2.11.2
lxml : 4.9.2.0
libxml2 : 2.9.14
cssselect : 1.2.0
parsel : 1.9.1
w3lib : 2.2.1
Twisted : 24.3.0
Python : 3.11.5 (main, Jun 26 2024, 21:00:36) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]
pyOpenSSL : 24.1.0 (OpenSSL 3.2.2 4 Jun 2024)
cryptography : 42.0.8
Platform : Linux-5.14.0-284.25.1.el9_2.x86_64-x86_64-with-glibc2.34

elacuesta added the support Support questions label Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrapy Playwright load chrome extensions and configure them #310

Scrapy Playwright load chrome extensions and configure them #310

milan-cp-dev commented Aug 7, 2024

elacuesta commented Aug 7, 2024

milan-cp-dev commented Aug 7, 2024

milan-cp-dev commented Aug 27, 2024

Scrapy Playwright load chrome extensions and configure them #310

Scrapy Playwright load chrome extensions and configure them #310

Comments

milan-cp-dev commented Aug 7, 2024

elacuesta commented Aug 7, 2024

milan-cp-dev commented Aug 7, 2024

milan-cp-dev commented Aug 27, 2024