A single Python script (site_searcher.py) that:
-
fetches a page (requests) or (optionally) loads JS pages with Selenium,
-
extracts elements by CSS selector or XPath,
-
returns element text or attributes (href/src/etc.), writes results to CSV or JSON.
-
a requirements.txt and the PyInstaller command to create the .exe.
-
usage examples, packaging tips, and legal/ethical notes.
#!/usr/bin/env python3
...
# check ../webScraperExe/site_searcher.py
check ../webScraperExe/requirements.txt
check ../webScraperExe/virtualEnv.bash
check ../webScrapperExe/pyinstaller.bash
- Extract links:
<!-- -->
site_searcher.exe --url "https://news.ycombinator.com" --selector "a.storylink" --attr "href" --output hn_links.csv
- Save titles as JSON:
<!-- -->
site_searcher.exe --url "https://example.com/blog" --selector ".post-title" --text --output titles.json
- Use
--onefile
for single exe. - Use
--onedir
for smaller build. - Selenium + webdriver-manager requires Chrome.
- Respect robots.txt and TOS.
- Do not overload servers.
- Prefer official APIs for production use.