🕵️‍♂️ High School Teacher Web Scraper

This Python-based web scraper collects publicly available teacher data from high school websites near Drexel University.
It searches using Google (via SerpAPI), scans potential staff directories, and extracts:

👤 Teacher names
📧 Emails
📚 Subjects (if listed)
🌐 School source URLs

Everything is exported into a clean Excel file — perfect for analysis, outreach, or research.

📦 Features

🔍 Searches Google for high schools near a target location (via SerpAPI)
🌐 Follows multiple link types (staff, about, directory, contact, etc.)
🧠 Smart content detection:
- Recognizes pages with actual teacher info
- Handles table-based layouts & plain text
📈 Scalable architecture:
- External keywords.txt for custom logic
- Modular functions
- Retry-safe requests
📊 Export to Excel with timestamped filenames

🚀 How It Works

Loads keywords.txt to identify potential staff/directory/contact links.
Uses SerpAPI to find nearby school websites.
Follows each link and scans for useful teacher data (tables, emails, titles).
Stops when valid info is found or exhausts all options.
Exports final results to an Excel spreadsheet.

🧪 Example Output

Name	Email	Position	School Website
John Smith	jsmith@school.org	Math Teacher	`https://examplehigh.org`
Amanda Lee	alee@school.org	Principal	`https://anotherhigh.org`

⚙️ Setup Instructions

1. Clone the Repo

git clone https://github.com/yourusername/highschool-teacher-scraper.git
cd highschool-teacher-scraper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🕵️‍♂️ High School Teacher Web Scraper

📦 Features

🚀 How It Works

🧪 Example Output

⚙️ Setup Instructions

1. Clone the Repo

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🕵️‍♂️ High School Teacher Web Scraper

📦 Features

🚀 How It Works

🧪 Example Output

⚙️ Setup Instructions

1. Clone the Repo