This Python-based web scraper collects publicly available teacher data from high school websites near Drexel University.
It searches using Google (via SerpAPI), scans potential staff directories, and extracts:
- 👤 Teacher names
- 📧 Emails
- 📚 Subjects (if listed)
- 🌐 School source URLs
Everything is exported into a clean Excel file — perfect for analysis, outreach, or research.
- 🔍 Searches Google for high schools near a target location (via SerpAPI)
- 🌐 Follows multiple link types (staff, about, directory, contact, etc.)
- 🧠 Smart content detection:
- Recognizes pages with actual teacher info
- Handles table-based layouts & plain text
- 📈 Scalable architecture:
- External
keywords.txtfor custom logic - Modular functions
- Retry-safe requests
- External
- 📊 Export to Excel with timestamped filenames
- Loads
keywords.txtto identify potential staff/directory/contact links. - Uses SerpAPI to find nearby school websites.
- Follows each link and scans for useful teacher data (tables, emails, titles).
- Stops when valid info is found or exhausts all options.
- Exports final results to an Excel spreadsheet.
| Name | Position | School Website | |
|---|---|---|---|
| John Smith | jsmith@school.org | Math Teacher | https://examplehigh.org |
| Amanda Lee | alee@school.org | Principal | https://anotherhigh.org |
git clone https://github.com/yourusername/highschool-teacher-scraper.git
cd highschool-teacher-scraper