A new web scraper for scraping Waseda University syllabus.
my_crawler = SyllabusCrawler(configs...)
results = my_crawler.execute()
print(list(results))
-
dept
The name of the school you want to scrape
-
task
To be defined...
-
engine
The syllabus-scraper engine you want to use:
thread-only
default engine, use traditional worker threads to scrape each coursehybrid
use threads with coroutines, that is, the task of scraping courses in a single page is assigned to a thread , for each course in the page, a coroutine is created to scrape the course. Use with caution! -
worker
Number of worker threads, the default value is 8
engine | number of courses | number of workers | execution time (s) |
---|---|---|---|
thread-only |
454 | 1 | 178 |
thread-only |
454 | 4 | 60 |
thread-only |
454 | 8 | 32 |
thread-only |
454 | 32 | 14 |
thread-only |
100 | 32 | 5 |
hybrid |
100 | 1 | 4 |
hybrid |
200 | 2 | 6 |
hybrid |
454 | 5 | ???(Connection refused) |