-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🏗 Build Spider: Cincinnati Board of Education #12
base: main
Are you sure you want to change the base?
Conversation
WalkthroughA new spider for scraping Cincinnati Board of Education meeting data has been developed. The spider is designed to retrieve meeting information from an API endpoint, parse JSON responses, and extract relevant meeting details. It focuses on meetings within the last 12 months, creating structured Changes
Sequence DiagramsequenceDiagram
participant Spider as CinohBoardOfEdSpider
participant API as Meeting Data API
participant Parser as JSON Parser
participant Meeting as Meeting Object
Spider->>API: Request meeting data
API-->>Spider: Return JSON response
Spider->>Parser: Parse JSON data
Parser-->>Spider: Extract meeting details
Spider->>Meeting: Create Meeting objects
Meeting-->>Spider: Return parsed meetings
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
city_scrapers/spiders/cinoh_Board_of_Ed.py (2)
73-81
: Consider distinguishing multiple document links.If future requirements expand (e.g., separate Zoom link, minutes, attachments), consider returning multiple link entries with more specific titles.
19-23
: Potential future enhancement to parse second API endpoint.You mentioned there's a second endpoint that’s currently unused. If it provides additional meeting details or ensures completeness, consider adding a second request/parse flow in the future.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
city_scrapers/spiders/cinoh_Board_of_Ed.py
(1 hunks)tests/files/cinoh_Board_of_Ed.json
(1 hunks)tests/test_cinoh_Board_of_Ed.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/files/cinoh_Board_of_Ed.json
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_cinoh_Board_of_Ed.py
5-5: city_scrapers_core.constants.NOT_CLASSIFIED
imported but unused
Remove unused import
(F401)
5-5: city_scrapers_core.constants.BOARD
imported but unused
Remove unused import
(F401)
🔇 Additional comments (3)
city_scrapers/spiders/cinoh_Board_of_Ed.py (3)
23-26
: Good use of FormRequest
to handle form data.
This approach ensures correct server interaction and clean request handling.
28-45
: Validate date filtering and iteration logic.
The logic to skip meetings older than 12 months helps keep the dataset relevant. Confirm this aligns with project requirements (some boards may post older meeting data retroactively).
65-71
: Classification logic looks appropriate.
The classification covers committee, board, or default. This improves clarity if more categories are introduced in the future.
from os.path import dirname, join | ||
|
||
import pytest | ||
from city_scrapers_core.constants import COMMITTEE, NOT_CLASSIFIED, BOARD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused imports.
NOT_CLASSIFIED
and BOARD
aren't used anywhere, so please remove them to avoid clutter and improve maintainability.
-from city_scrapers_core.constants import COMMITTEE, NOT_CLASSIFIED, BOARD
+from city_scrapers_core.constants import COMMITTEE
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from city_scrapers_core.constants import COMMITTEE, NOT_CLASSIFIED, BOARD | |
from city_scrapers_core.constants import COMMITTEE |
🧰 Tools
🪛 Ruff (0.8.2)
5-5: city_scrapers_core.constants.NOT_CLASSIFIED
imported but unused
Remove unused import
(F401)
5-5: city_scrapers_core.constants.BOARD
imported but unused
Remove unused import
(F401)
What's this PR do?
This PR creates a new scraper for the Cincinnati Board of Education website at URL: https://go.boarddocs.com/oh/cps/Board.nsf/Public#tab-welcome
Why are we doing this?
Scraper requested from this spreadsheet.
Steps to manually test
Monitor the output and ensure no errors are raised.
Inspect
test_output.csv
to ensure the data looks valid.Are there any smells or added technical debt to note?
It scrapes one of two API endpoints found after clicking on a meeting within the table.
Summary by CodeRabbit
New Features
Tests