This Python-based web scraper collects publicly available teacher data from high school websites near Drexel University.
It searches using Google (via SerpAPI), scans potential staff directories, and extracts:
- π€ Teacher names
- π§ Emails
- π Subjects (if listed)
- π School source URLs
Everything is exported into a clean Excel file β perfect for analysis, outreach, or research.
- π Searches Google for high schools near a target location (via SerpAPI)
- π Follows multiple link types (staff, about, directory, contact, etc.)
- π§ Smart content detection:
- Recognizes pages with actual teacher info
- Handles table-based layouts & plain text
- π Scalable architecture:
- External
keywords.txtfor custom logic - Modular functions
- Retry-safe requests
- External
- π Export to Excel with timestamped filenames
- Loads
keywords.txtto identify potential staff/directory/contact links. - Uses SerpAPI to find nearby school websites.
- Follows each link and scans for useful teacher data (tables, emails, titles).
- Stops when valid info is found or exhausts all options.
- Exports final results to an Excel spreadsheet.
| Name | Position | School Website | |
|---|---|---|---|
| John Smith | jsmith@school.org | Math Teacher | https://examplehigh.org |
| Amanda Lee | alee@school.org | Principal | https://anotherhigh.org |
git clone https://github.com/yourusername/highschool-teacher-scraper.git
cd highschool-teacher-scraper