Skip to content

iis-MarkKuang/douban_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The original idea is to filter the comments count to be greater than 1000, and keep track of the 
lowest rating of all books I have crawled, whenever I hit a new book record, I compare the rating 
of it to the lowest rating, and if it's greater than the lowest rating in my list, I delete the book 
with lowest rating and add the new book, else, I continue browsing, at the end of the crawling, I sort
the list of books and write the data to excel.
 
I used 2 external libraries, one is jsoup for parsing html of the page, the other is jxl, for writing 
data into excel file.

Total run time with 5 threads: 113 seconds

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages