Website Data Extractor

takes a .csv containing the following as input
- id
- website name ( company name that maps to this domain, usually )
- url ( http://www.website.com )
crawls these sites
extracts basic info about each site
- emails
- addresses
- trigger terms
puts extracted data into .csvs

make sure your settings are correct ( config.php )
put a list of website ids, names, and urls into the targets.csv file
- make sure it has linux line endings ( \n )
  - or make sure you update the line endings setting in config.php
- generally excel fucks shit up when creating csvs, so use numbers or some other basic csv editor ( or just a text editor )
execute scrape_targets.php
- this will crawl all sites, extract data, and push the contents into results.csv

this tool uses the public "Data Science Toolkit API" by default
If you run lots of addresses through it, it will block you ( dunno specific limits )
See the submodule ( https://github.com/tbrianjones/data_science_toolkit_php_api_client ) to learn more about the Data Science Toolkit and running your own server of it for mass geocoding.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
app		app
inputs		inputs
results		results
utilities		utilities
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
config_sample.php		config_sample.php
process_targets.php		process_targets.php

Provide feedback