Full Text Search - (https://fulltext-search-dino.herokuapp.com/)
It takes in multiple paragraphs of text, assigns a unique ID To each paragraph and stores the words to paragraph mappings on an inverted index. This is similar to what elasticsearch does. This paragraph can also be referred to as a ‘document’.
- Tokenize to words by splitting at whitespace
- Convert all words to lowercase
- TIndex these words against the documents they are from.
- Generate a unique ID for every document that is index.
- A paragraph is defined by two newline characters
-
clear - Clear the index and all indexed documents.
-
index - Index a given document (After having split the input into paragraphs a.k.a document ).
-
Search - Given a word, search for it and retrieve the top 10 paragraphs (Documents) that contain it.
- Step 0 - Clone the Tap search repository and
cdinto the directory.
git clone https://github.com/dinolinjob/tap-search-dino.git
cd tap-search-dino- Step 1 - Open a terminal and enter the following commands to setup a virtual environment
sudo apt-get install python3.
virtualenv -p python3 tap_venv
. tap_venv/bin/activate- Step 2 - Now to install the dependencies using pip, type
pip3 install -r requirements.txt-
Click==7.0
-
Flask==1.1.1
-
gunicorn==20.0.3
-
itsdangerous==1.1.0
-
Jinja2==2.10.3
-
MarkupSafe==1.1.1
-
Werkzeug==0.16.0
-
Step 3 - To run the application, type
python app.py
- Step 4 - Go to
localhost:5000in your web browser to see the application live.
