Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spider: cuya_audit #81

Merged
merged 3 commits into from
Jan 29, 2024
Merged

Conversation

haileyhoyat
Copy link
Contributor

What's this PR do?
Fixes our Cuyahoga County Audit Advisory Committee spider (aka. cuya_audit), which broke due to URL and page structure changes across Cuyahoga County's website.

Why are we doing this?
To ensure our Cuyahoga County Audit spider works. The changes in this PR include use of a new class mixin for scraping the Cuyahoga County website.

Steps to manually test
Clone this repo, run the command: cd city_scrapers && scrapy crawl cuya_audit

Are there any smells or added technical debt to note?
No

@haileyhoyat
Copy link
Contributor Author

@SimmonsRitchie I think you'll be proud of me on this one. I utilized the CuyaMixin2, DRY, all tests pass, no dead code, noqa only for long lines.

Comment on lines 14 to 17
location = {
"name": "County Headquarters, 4-407 Conference Room B",
"address": "2079 East 9th St Cleveland, OH 44115",
}
Copy link
Contributor

@SimmonsRitchie SimmonsRitchie Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small request here, Hails. Could you please delete this class variable? In this case, CuyaCountyMixin2 is handling the parsing of location info from the page – so this variable is essentially unused.

Just in case there's any confusion here: I know in the city-scrapers-indianapolis PR we discussed treating location info as a constant via class variables. I think that approach makes sense for certain agencies that don't display location information on their meeting detail pages and we generally know a meeting is always going to be in the same place. In cases where location information is reliably displayed on meeting detail pages – which appears to be the case with Cuyahoga County agencies – I think it makes sense to parse it from the page. Since meeting venues can occasionally shift around (especially between years), I think this increases the chance we're scraping good data longterm.

Copy link
Contributor

@SimmonsRitchie SimmonsRitchie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, @haileyhoyat! Great job. The code looks nice and clean and the new tests pass 🥇

I have only one little comment, re: class var. If you can remove that var, I'll approve the PR and you can merge it. 🚀

Copy link
Contributor

@SimmonsRitchie SimmonsRitchie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Merge away 🚀

@haileyhoyat haileyhoyat merged commit c2cb3fa into City-Bureau:main Jan 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants