This is a web scraping project to obtain required information from cricinfo website The following 3 activities are carried out in this project-
- Print the last ball commentary.
- Print the name of the winning team and bowler(with name and no. of wickets) who has taken the maximum wickets from the winning team.
- Print the birthday of every batsmen played.
- Clone this repository in your local environment.
- Run command
npm install
to install all the required packages. - Run each file in the activities directory one by one to get desired output.
- Different files created for implementing different activities.
- Cheerio module used here for web scraping.
- Disadvantage of cheerio module: it only parses and extracts initial loaded html, so we cannot find first ball commentary using this.
- HTML seggregation is done using another file (table.html) to make information extraction easier.
- Multiple page scraping is done here in printing birthdays of every batsmen.