-
Notifications
You must be signed in to change notification settings - Fork 0
iis-MarkKuang/douban_crawler
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The original idea is to filter the comments count to be greater than 1000, and keep track of the lowest rating of all books I have crawled, whenever I hit a new book record, I compare the rating of it to the lowest rating, and if it's greater than the lowest rating in my list, I delete the book with lowest rating and add the new book, else, I continue browsing, at the end of the crawling, I sort the list of books and write the data to excel. I used 2 external libraries, one is jsoup for parsing html of the page, the other is jxl, for writing data into excel file. Total run time with 5 threads: 113 seconds
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published