Naver and Daum news web crawler via Jsoup + Selenium.
It will do crawling all of news from naver and daum, or if you specified categories what you want, it only crawls those things.
Unfortunately naver using ajax
to refresh page for updated news in every some minutes. So, I had to use selenium
, because using ajax
means web page is loaded dynamically and Jsoup cannot read them. For these behind story, you have to install firefox
browser and download its driver. Crawler will open new instance of browser and use it to crawling.
-
Jsoup : 1.11.3
-
Selenium Standalone : 3.141.0
- You have to install Firefox web browser.
- ...and Firefox Driver too, from here.
- Download above core libraries from refereced link and this repository.
- Move all of jar files of core libraries to repository directory.
- Import project to eclipse photon or just use NaverCrawler or DaumCrawler .java files.
- Naver :
Breaking
,Politics
,Economic
,Society
,Culture
,World
,Science
- Daum :
Politics
,Economic
,Society
,Culture
,Foreign(=World)
,Digital(=Science)
,Sports
,Entertain
If you want to specify categories selectively, follow below code.
public static void main(String args[]){}
NaverCrawler ncrawler = new NaverCrawler();
ncrawler.setCategory(NaverCrawler.CAT_CULTURE | NaverCrawler.CAT_SOCIETY | NaverCrawler.CAT_SCIENCE);
ncrawler.run();
}