- asynchronous programming
- stable, also useful to exception handling
Based on asyncio Stream
- coroutines web scraper:
run_web_scrapper.py
intest
- coroutines selenium scrapper :
run_selenium.py
intest
- db processor:
DBConnector
indb_connector
- query builder:
dev for sql query builder & http query builder
- run code inside
test
dir - when use tasks.csv
python run_web_scrapper.py --tasks path/to/tasks.csv --save_file crawling \
--result_path result --result_type text
- when edit url list inside code, skip tasks option
python run_web_scrapper.py --save_file crawling \
--result_path result --result_type text
- sample run sh
python run_web_scrapper.py --tasks ../tasks.csv --save_file crawling \
--result_path result --result_type text
python run_selenium.py --save_file selenium \
--result_path result --result_type text
- shell 'dev'
python run.py
- stream with request module:
Reader, Writer, Stream, Session
instream.map
- selenium scrapper
- db processor code
- web scrapper code
- Dev crawler using API