google-patents-scraper

A simple scraper for the Google patents website I wrote as a freelance project. Saves each patent's HTML, images and PDF in a directory.

Requirements

Python 2.7 - https://www.python.org/download/releases/2.7/
pip - https://pip.pypa.io/en/latest/installing.html#install-pip
lxml - run pip install lxml

Command line parameters:

  -h, --help            show this help message and exit
  --start START         start patent id (default: None)
  --end END             end patent id (inclusive) (default: None)
  --output_dir OUTPUT_DIR
                        output directory (default: ./)
  --org {EP,US,WO,DE}   prefix of the organization publishing the patent
                        (default: EP)

example command line:
python scraper.py --start 234 --end 1872

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

google-patents-scraper

About

Releases

Packages

Languages

amnonkhen/google-patents-scraper

Folders and files

Latest commit

History

Repository files navigation

google-patents-scraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages