Webscrapping to identify and download latest pdf documents. Classify these documents into pre-defined categories.

This repository will assist you in scrapping data from multiple websites. It will download the latest pdf files published on a website in a specific folder as per the users requirement. This can be used for automating various operations involved in market research.
Once the pdfs are downloaded they are classified into oil/no_oil/foreign_language categories based on a string based rule
You can customize these rules for classification as per your need

Instructions

I devised the solution from the following pages of the documentation: