Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

👾 Replace scrapy-sentry package #67

Merged
merged 3 commits into from
Jan 24, 2024
Merged

👾 Replace scrapy-sentry package #67

merged 3 commits into from
Jan 24, 2024

Conversation

SimmonsRitchie
Copy link
Contributor

@SimmonsRitchie SimmonsRitchie commented Jan 21, 2024

What's this PR do?

Replaces scrapy-sentry with a new custom built package called scrapy-sentry-errors.

Why are we doing this?

scrapy-sentry worked great for City Bureau's city-scraper repos for many years but it appears to no longer be maintained. The package uses older dependencies and python packaging processes that were increasingly causing unexpected conflicts with dependencies in city-scraper repos, including this one, meaning our CI process couldn't run.

This PR replaces scrapy-sentry with scrapy-sentry-errors – built by City Bureau – that uses modern python packaging practices and upgraded dependencies. Longterm, using our own sentry integration lets us better tailor sentry monitoring to our specific needs.

Steps to manually test

It's critical that our Sentry monitoring continues to work as expected. Locally, you can ensure Sentry monitoring is working correctly by doing the following:

  1. Get the Sentry DSN number from City Bureau's Sentry account or one of our secrets managers.
  2. Open city_scrapers/settings/base.py and add the following key and the DSN as its value:
SENTRY_DSN = <SENTRY_DSN_VALUE>
  1. In the same file, go to the EXTENSIONS key and add the new sentry integration:
EXTENSIONS = {
    "scrapy_sentry_errors.extensions.Errors": 10, # <- Add this line
    "scrapy.extensions.closespider.CloseSpider": None,
}
  1. Ensure scrapy-sentry-errors (our new package) is installed in your virtual environment:
pipenv install
  1. Trigger a spider in the repo that is known to error. At time of writing, the cuya_administrative_rules spider is a good choice. Execute with this command:
scrapy crawl cuya_administrative_rules
  1. Locally, the spider should raise an Exception. Check the issue dashboard of our Sentry account and ensure the same error was logged. Ensure that the error was raised in the timeframe that you triggered your error and wasn't an error logged by another PR reviewer or the author of this PR. You should see an error that looks something like this:
TypeError: to_bytes must receive a str or bytes object, got NoneType

Are there any smells or added technical debt to note?

  • Our new package, scrapy-sentry-errors captures Exceptions in a slightly different way than scrapy-sentry, therefore the output in our Sentry dashboard might look a bit different. In particular, issue titles are formatted like "" rather than "[]: ". Future upgrades of scrapy-sentry-errors may tweak the format of these messages.
  • After this PR is merged we will need to monitor Sentry to ensure that error logging continues to work as expected. This particular repo has quite a lot of issues with its spiders so it should provide good insight into the new package's behavior in production.

@SimmonsRitchie SimmonsRitchie changed the title Replace scrapy-sentry package 👾 Replace scrapy-sentry package Jan 22, 2024
@SimmonsRitchie SimmonsRitchie requested a review from a team January 22, 2024 15:39
@SimmonsRitchie SimmonsRitchie marked this pull request as ready for review January 22, 2024 15:39
@SimmonsRitchie SimmonsRitchie merged commit 6638a63 into main Jan 24, 2024
2 checks passed
@SimmonsRitchie SimmonsRitchie deleted the sentry-upgrade branch January 24, 2024 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant