Using JavaScript, I have created a simple web crawler using scraping techniques to extract the first 30 entries from the website https://news.ycombinator.com/.
The requirements for this project are:
Important
The program should be able to extract the following information from the website:
- Title
- Points
- Author
- Number of comments
Note
For this project, I have to filter all entries with more than five words in the title ordered by the number of comments first.
Note
For this project, I have to filter all entries with less than or equal five words in the title ordered by the points first.
- Node.js v18.7.0 or later
- JavaScript ES6
To run this project, install it locally using npm:
$ cd ../web-crawler
$ npm install
$ npm start https://news.ycombinator.com/
If you want to create an Excel file with the output, you can use the following command:
$ npm start https://news.ycombinator.com fileName
This is an example of the output console of the program: This is an example of the output file of the program:
To run the tests, use the following command:
$ npm test
This project was created as part of a test of Software Development Intership program.