Skip to content

A high-performance web scraper for obtaining the syllabus of Waseda University.

License

Notifications You must be signed in to change notification settings

wasedatime/syllabus-scraper

Repository files navigation

syllabus-scraper

A new web scraper for scraping Waseda University syllabus.

Usage

my_crawler = SyllabusCrawler(configs...)
results = my_crawler.execute()
print(list(results))

Configuration

  • dept

    The name of the school you want to scrape

  • task

    To be defined...

  • engine

    The syllabus-scraper engine you want to use:

    thread-only default engine, use traditional worker threads to scrape each course

    hybrid use threads with coroutines, that is, the task of scraping courses in a single page is assigned to a thread , for each course in the page, a coroutine is created to scrape the course. Use with caution!

  • worker

    Number of worker threads, the default value is 8

Benchmarks

engine number of courses number of workers execution time (s)
thread-only 454 1 178
thread-only 454 4 60
thread-only 454 8 32
thread-only 454 32 14
thread-only 100 32 5
hybrid 100 1 4
hybrid 200 2 6
hybrid 454 5 ???(Connection refused)