WebImageScraper

Scrape images from Google Images

Dependencies:

Depends on Requests and BeautifulSoup Libraries

Usage:

Download images (upto 100 currently). To use this, download the scraper.py and run,

python3 scraper.py

Arguments:

-s - (s)earch term.

-c - Include this flag to (c)ache the search query. The searches are cached using a simple pickle file at the location of scraper.py within a subfolder /caches with the same filename as that of the search query and an extension .cache. The pickle file is a simple list of urls.

-p - If you wish to download from a pre-existing cache file, include the (p)ath of the cache file after this argument.

-d - Choose the location to (d)ownload the files. A downloads folder is created at the location and the image files are stored in a subdirectory with the name of the search term.

-v - (V)erbose mode to see the intermediate steps

Class:

The implementation is in a class. The class downloader is initialized with the following:

search_term - The term to search for
verbose_mode - Verbose mode status

The Methods provided involve:

get_urls - Takes cache status as a parameter. Obtains the image urls into downloadurls.
printprogress - Print the progess of the download. Takes the current number of the file being downloaded as a parameter
download - Downloads the images into the location specified by the download_location parameter
load_from_cache - Load the download urls from the cache file into downloadurls.

Todo:

Auto search the cache first for a cache file. Access Google Images only upon a cache miss
Parallelize the downloads in threads
Increase downloaded image count per run (currently 100)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebImageScraper

Dependencies:

Usage:

Arguments:

Class:

Todo:

About

Releases

Packages

Languages

mmshivesh/WebImageScraper

Folders and files

Latest commit

History

Repository files navigation

WebImageScraper

Dependencies:

Usage:

Arguments:

Class:

Todo:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages