BeautifulSoup, requests, wordpress-xmlrpc 2.3
pip install beautifulsoup4
pip install requests
pip install python-wordpress-xmlrpc 2.3
- It is a demonstration of using modules like BeautifulSoup and Requests which helps in Web Scraping in Python.
- You can just scrape the content of any desired page into a
.txt
or.cvs
file on your system. - It has Real Time monitoring that means it will keep checking for any new content that needs to scraped and posted on the website.
- The Project also uses wordpress-xmlrpc 2.3. It is a Python library to interface with a WordPress blog's XML-RPC API.
- The two scripts
WebScraperMonitor.py
(Real Time Monitoring) andWebScraper_NoMonitor.py
(No Real Time Monitoring) scrapes the data from government sets and save it on your system in the form of a.txt
file. - The script
ImportingToWordpress.py
iterate through the scraped text files on your system and post a New Post for every file on the website. - The script
Web-Scraper.py
scrapes the content of the new link available(on the government owned website) directly into a New Post on the website. It reads the suffix of the link(from which data needs to be scraped) and suffix of the heading for every post for the website from two text file.
To Automate the Web-Scraper I have made a batch
file which runs Web-Scraper.py
script or Web-Scraper.exe
(can be made by using Pyinstaller).
Open command line and type:
>pip install pyinstaller
>pyinstaller Web-Scraper.py
Then set task for the created batch
file using Task Scheduler (for Windows) or Cron Job (for Linux).
Web-Scraper will run on the desired time and day and will scrape the new data on the website.
This Project was part of my Internship. You can see the Scraped Data from Government owned website judis.nic.in on the website of the Employer LegalWiki.in.
The source from where the data is scraped is unavailable right now maybe because it has been shifted to a new address. This is one of the Links.