The aim of this project is to predict similar text given an input text.
This project involves textual data. The basic text preprocessing involves remvoving of url, emoticons, punctuations and digits. The text vectorization is performed using Countvectorizer from nltk
library.
Next the recommendation are built using different 'Similarity Algoithms'. One can find the distance between two textual data using several approaches as - Hamming Distance, Cosine Similarity, Euclidean distance, Jaccard Coefficient, Manhattan distance etc. In this project we will use only the above mentioned algorithms to find the similarity between texts.