Skip to content

Latest commit

 

History

History
162 lines (98 loc) · 4.65 KB

README.md

File metadata and controls

162 lines (98 loc) · 4.65 KB

Memefinder Documentation

Adding to documentation

Table of Contents


Script: run.sh

Runs the flask server on localhost:5000 in development mode.

Script: add.sh

Adds subreddits to the web scraping list.

Script: collect.sh

Collects memes from the subreddits in Meme.txt.

Script: tkrun.sh (DEPRECATED)

Runs the deprecated tkinter application.


Directory: lib/

Contains the files required for searching through the database.

File: search.py

generate_query_db(query)

Extends the query to include all synonyms related to the input query using nltk package.

create_index_db()

Creates an new collection of all memes stored in the database, where the filename is used as key and associated text as value. Along with that score field is added to map ranking in get_score_db()

update_required()

Checks if there is update in data and correspondingly calls create_index_db()

get_score_db(keywords)

Creates a relevance based score list matched with the filenames in collection INDEX for the given keywords

File: util.py

Used for the flask application. Use the functions here for a higher level abstraction of the project.

get_memes(query)

Returns list of meme files from the database based on the query.

File: meme_gui.py (DEPRECATED)

Starts the deprecated Tkinter GUI of the project. Will be removed in the future.

File: meme_gui_support.py (DEPRECATED)

Will be removed in the future.

Class: meme

Contains vital information like memeList and currentImage and the object of this class is very important in the functioning of the GUI.

getMemeList(query)

Gets the list of memes which match the given query.

display(canvas, image_path)

Displays the image at image_path on the canvas in the GUI.

go(canvas, query)

Initiates all the process essential for the GUI to function. It gets the memeList ready based on the entered query and also dispays the first meme on the canvas.

prev(canvas)

Displays the previous image on the canvas.

next(canvas)

Displays the next image on the canvas.


Directory: scraper/

Contains files related to the web scraping part of the project.

File: api.py

Finds a list of meme subreddits using reddit REST api. Stores the list in scraper/Meme.txt.

File: scraper.py

Scraps meme images from the subreddits in scraper/Meme.txt and stores them in processed/ directory.

File: standard.py

Renames the memes present in raw/ folder to a unique hex-digest generated filename and moves it to processed/ folder.

File: ocr.py

Extracts text using Tesseract OCR from the meme from the raw/ folder.


Directory: database/

Store for the txt-based comma separated databases.

Directory: temp/

Temperory cache for storing images. Will be created and removed by collect.sh

Directory: processed/

This is a store for image files with standardised file names. Will be created after running collect.sh

Directory: raw/

This is a store for image files from web scraping. Will be created and removed by collect.sh