Reverse-Image-Search-Scraper

This python notebook can search simultaneously across the 3 following reverse image search engines to scrape image data:

Deployment

X virtual framebuffer abbreviated as Xvfb allows you to execute graphical apps without having to use a monitor by connecting some input device. Virtual memory is used to perform graphical operations and it allows the program to run headlessly.

sudo apt install xvfb

git clone https://github.com/Cescollino/Reverse-Image-Search-Scraper.git
cd Reverse-Image-Search-Scraper
Dependencies Conda: The requirements.txt can be used to create a conda virtual environment with: conda create --name <env> --file requirements.txt

Virtualenv: python3 -m venv scraper source scraper/bin/activate pip install -r requirements.txt

Once the environment is made, you may open Jupyter Lab and load the notebook, you may use the following command on a terminal: jupyter lab

Open the Reverse Image Search Scraper.ipynb in jupyter lab. Uncomment the first cell and run it once, then comment it again. If there are no errors, you are good to go.

You should see this output:

Running

Defining Paths

Go to the " # Hardcoded File Paths & Run " group of cells

File Path

You may change the file_path variable to be the absolute path of the image you are trying to upload (.jpg) ex. (.../Reverse-Image-Search-Scraper/IMAGES/INPUT/1.jpg)

Result Path

You may change result_path to be the absolute path of the directory you are trying to store your images in. ex. (.../Reverse-Image-Search-Scraper/IMAGES/OUTPUT)

Number of Results

You may change the number of results gathered by each engine by changing: "max_count=100" in the last cell of the notebook

Running

Just press "run all cells". It may be that there arent any images left for the scraper to download. In that case simply press the square button on the top of the notebook to kill the kernel.

Output

This program will output image files to the directory with the following nomenclature:

jpg_"Image Number"_EngineID_ "Search Engine ID".jpg

Deduplicating Images

For image nearly similar image deduplication using perceptual hashing, please refer to imgdupes

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.obsidian		.obsidian
IMAGES/OUTPUT		IMAGES/OUTPUT
.gitignore		.gitignore
LICENSE		LICENSE
Pasted image 20220831175528.png		Pasted image 20220831175528.png
Pasted image 20220831175725.png		Pasted image 20220831175725.png
Pasted image 20220831175749.png		Pasted image 20220831175749.png
Pasted image 20220831175806.png		Pasted image 20220831175806.png
README.md		README.md
Reverse Image Search Scraper.ipynb		Reverse Image Search Scraper.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reverse-Image-Search-Scraper

Deployment

Defining Paths

File Path

Result Path

Number of Results

Running

Output

Deduplicating Images

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reverse-Image-Search-Scraper

Deployment

Defining Paths

File Path

Result Path

Number of Results

Running

Output

Deduplicating Images

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages