You have a HTML document that you want to extract data from. You know generally the structure of the HTML document.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.
Visit https://jsoup.org/ for more details
- Clone the repository
$ git clone https://github.com/Arham-12336/Web-Scrapping-Java-.git
- Check into the cloned repository
$ cd main.xml
- Install the dependencies and package the application
$ mvn package
- Run the web scraper
Run the xml file on the IDE
Please feel free to raise issues using this and I'll get back to you.
You can also fork the repository, make changes and submit a Pull Request.