A framework to crawl and scrape all of the PaytmMall products and their associated information (categories, coupons, etc) and save them in a relational database (MySQL) with an extended well-defined API to control the workflow of crawling and scraping.
-
Paytm Mall doesn't provides you with a feature of searching/querying products on the basis of some coupon-code (obviously).
-
So I took it upon myself to crawl and scrape all of their products and interlink each of their offers with their given products. Result of doing so?
- I found some of their hidden steal deals giving me discount of as much as 100% in certain cases which I ended up giving many of the steal products for free to the people in need that I bought for effectively free.
- Crawl and scrape all of the products from paytm-mall app.
- Data stored in DBMS for effective retrieval by writing powerful SQL queries.
- Batch retrieval and Batch Multi-Insert.
- Can be ported to async requests by using a library like asyncio with ease (but need to use proxies to avoid getting rate-limited)
- Find steal-deals by searching products on the basis of coupon-codes (Search all products with coupon-code, FMCGSAVE45 for example)
- Python3
- pip
- virtualenv
- Clone the repo:
git clone https://github.com/slapbot/paytm-scraping-offers
- Cd into the directory:
cd paytm-scraping-offers
- Create a virtualenv for python:
virtualenv paytm-scraping-offers-env
- Activate the virtualenv:
- Linux:
source paytm-scraping-offers-env/bin/activate
- Windows:
sources paytm-scraping-offers-env/Scripts/activate
- Linux:
- Upgrade your pip to latest version: pip install --upgrade pip
- Install the application dependencies:
pip install -r requirements.txt
- Install MySQL dependencies depending on your OS: https://dev.mysql.com/downloads/installer/
- Create a database
- Plug in the configuration of MySQL you've just setup at:
config.ini
- Run
python migrate.py
to migrate the database with correct tables.
from core.db_manager import DB_Manager
from core.scrape import Scrape
from time import sleep
s = Scrape()
categories = s.get_all_categories()
dbm = DB_Manager()
# print(categories) -> Will show you all the available categories in paytm mall.
# crawl and scrape category: Speakers -> existing at 3rd index position of categores list.
for i in range(0, 30):
print("Crawling and Scraping for Category: %s" % categories[3]['name'])
data = s.scrape(categories[3]['url'], s.get_query_string(page_number=i + 1, sort_type="popular"))
dbm.insert_data(data)
sleep(5)
- Run
python examples/crawl_and_scrape_category.py
to scrape away all of the products with their steal deals fromspeakers
category - Run
python examples/crawl_and_scrape_everything.py
to scrape away all of the products listed in Paytm Mall. - Run
python examples/crawl_and_scrape_predefined_urls.py
to scrape away all of the products of a given offer going (Super Sunday) etc. - Run
python examples/crawl_and_scrape_template.py
to get an understanding of how to create your own scrapers.
from core.db_manager import DB_Manager
from core.query_manager import QueryManager
import pprint
dbm = DB_Manager()
qm = QueryManager(dbm)
pp = pprint.PrettyPrinter(indent=4)
print("FOUND COUPON CODES: ")
coupon_codes = qm.get_distinct_coupon_codes()
pp.pprint(coupon_codes)
coupon_code = coupon_codes[0]['code']
print("FOUND PRODUCTS FOR COUPON CODE {0}: ".format(coupon_code))
pp.pprint(qm.get_products_by_coupon_code(coupon_code))
- Run
python query.py
to query results from what you've scraped.
The API is super straightforward and intuitive to understand and consume,
taking a look at the examples
should simply tell you all about the basics of the workflow of the framework.
A classic scraper might look like this:
from core.db_manager import DB_Manager
from core.scrape import Scrape
from time import sleep
s = Scrape()
categories = s.get_all_categories()
dbm = DB_Manager()
page_start = 0
page_end = 30
# Loop to fetch pages from start to end.
for i in range(page_start, page_end):
# Write your own logic to meet the needs of your scraping requirements.
print("Crawling and Scraping for Category: %s" % categories[0]['name'])
data = s.scrape(categories[0]['url'], s.get_query_string(page_number=i + 1, sort_type="popular"))
# insert the scraped data to a database.
dbm.insert_data(data)
# sleep to avoid rate limiting.
sleep(5)
core/configurer.py
: Managing configuration of the application by retrieving information fromconfig.ini
file.core/db_manager.py
: Abstraction over database query calls spread across the application.core/migrator.py
: Usingdb_manager.py
to create migrations that you do while installing the project.core/query_manager.py
: Another consumer ofdb_manager.py
to showcase SQL queries of various kinds.core/scrape.py
: Responsible to scrape the paytm products.sql_queries
: Some SQL queries that might come super handy to use and get an overview of how to write your own.
- Scraping offers coupons and getting some of the best steal-deals
- Creating a database of products, categories, prices, discounts, reviews to create ML models.
- Analysis of products listed by paytm-mall compared to amazon, flipkart and e-commerce alternatives.
- And your imagination to consume data for any kind of analytics or visualisation.
Its one of the projects that I wrote quite back when Paytm-Mall had just started and I've already bought enough steal-deals for myself ranging me discounts from (60-100% yes 100%). But I've recently embraced a minimalistic lifestyle so the project is no longer needed for my interests.
So I've figured to make the code open-source for anyone to create a database, find steal-deals, create comparison aggregator of various e-commerce websites, etc.