What is PyProduct

PyProduct is a Python + Selenium + BeautifulSoup based web catalog scraper, programmatic OMS, and all-around e-commerce interaction bot solution.

What is PyProduct

In short, PyProduct aims to be the one-stop shop for e-commerce interaction, from catalog scraping, to programmatic OMS and inventory tracking. PyProduct is built with Python, Selenium, and BeautifulSoup. PyProduct allows for quick and efficient data capture from online e-commerce catalogs like Nike, Adidas, and others using headless browser bot interactions.

The goals of PyProduct are as follows:

Integration free, clone-and-go setup
Efficient and reliable online e-commerce catalog data capture
Secure and fast oms order placement
Reliable inventory and shipping option tracking

Getting Started

To get started with PyProduct, simply clone, install dependencies, get a driver, and go:

Clone the repo and cd into the directory

git clone https://github.com/elginbeloy/PyProduct
cd PyProduct

Install requirements.txt into a virtual environment of your choice

virtualenv venv
source ./venv/bin/activate
pip install -r requirements.txt

Download a chrome driver here, and place it in the PyProduct directory as chromedriver. NOTE: Ensure you download the correct driver for your version of Chrome.
Run PyProduct with help to see options

python pyproduct.py --help

NOTE: Programmatic OMS functionality will not work without a .env in root that contains the necessary purchase environment variables.

How PyProduct Works

PyProduct has three main components

Scraping products from e-commerce websites (using Spider)
Tracking product availability (in progress)
Purchasing products automatically (using OMBot)

The functionality of each of these parts of PyProduct (both developed and WIP) is outlined below:

NOTE: Some websites (namely adidas) have a policy against bots and implement IP blocking which limits the amount of requests that can be made from a single IP before being blacklisted for a time frame. This can bypassed with any sort of bypass routing like VPN or Onion-routing - however this slows down scrapping significantly.

Spider (Scrapping)

The scraping part of PyProduct is done with a spider that crawls catalog URLs from a base domain URL (E.g https://shop.nike.com). The spider avoids scraping specific product URLs (looking only for catalogs, not specific products) using a regex check of the URL. This is to save the scraper the time cost of scraping a single product page - focusing only on catalog URLs.

These scrapped URLs are then passed to a bot that scrolls the page (using a smooth scroll JS for lazy loaded SPAs) then finds each product by container xpaths. Each attribute of the product is then found within this container using xpaths that are cached. For an example of this see cached_xpaths.json.

Automatic XPath generation using NNs is a WIP - but will be vitally helpful for auto adding websites (without having to manually add and verify xpaths) and fail-safe checking.

Fail-safe checks are also a WIP.

OMBot (OMS Interaction)

OMBot aims to encapsulate the OMS checkout process into a single bot that can go through checkout on a variety of sites using cached xpaths (similar to the spider). Again, auto xpath generation is vital here but still a WIP.

Core ToDo's

Contributing

Don't for now 🤷 Gotta get it to a good starting place first.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
core		core
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
output.tsv		output.tsv
pyproduct.py		pyproduct.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is PyProduct

Getting Started

How PyProduct Works

Spider (Scrapping)

OMBot (OMS Interaction)

Core ToDo's

Contributing

About

Releases

Packages

Languages

License

elginbeloy/PyProduct

Folders and files

Latest commit

History

Repository files navigation

What is PyProduct

Getting Started

How PyProduct Works

Spider (Scrapping)

OMBot (OMS Interaction)

Core ToDo's

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages