Skip to content

Commit

Permalink
Fix spider: cuya_northeast_ohio_coordinating
Browse files Browse the repository at this point in the history
Use playwright and different user-agent to avoid 403 responses on agency's website that are likely caused by bot-detection software.
  • Loading branch information
SimmonsRitchie committed Aug 13, 2024
1 parent 6283296 commit e4027c9
Show file tree
Hide file tree
Showing 6 changed files with 620 additions and 464 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ jobs:
env:
PIPENV_DEFAULT_PYTHON_VERSION: ${{ matrix.python-version }}

- name: Set up playwright
run: |
pipenv run playwright install firefox
- name: Check imports with isort
run: pipenv run isort . --check-only

Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/cron.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ jobs:
env:
PIPENV_DEFAULT_PYTHON_VERSION: 3.9

- name: Set up playwright
run: |
pipenv run playwright install firefox
- name: Run scrapers
run: |
export PYTHONPATH=$(pwd):$PYTHONPATH
Expand Down
1 change: 1 addition & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ python-dateutil = "*"
pdfminer-six = "*"
scrapy-sentry-errors = "1.0.0"
pytz = "*"
scrapy-playwright = "*"

[dev-packages]
freezegun = "*"
Expand Down
Loading

0 comments on commit e4027c9

Please sign in to comment.