Developed a Streamlit-based web application to check for duplicate question pairs on Quora. Utilized a pre-trained Random Forest classifier to predict similarity between questions, achieving an impressive 76.56% accuracy on the validation dataset. Implemented natural language processing (NLP)
The objective of developing a Streamlit-based web application to check for duplicate question pairs on Quora is to provide users with a convenient and efficient way to identify duplicate questions. This can be useful for a variety of reasons, such as:
- To improve the quality of the Quora platform by reducing the number of duplicate questions.
- To help users save time by avoiding answering questions that have already been answered.
- To help users find the most relevant answers to their questions by directing them to the existing duplicate question.
-
The web application utilizes a pre-trained Random Forest classifier to predict the similarity between questions. This classifier was trained on a dataset of labeled question pairs, and it achieves an impressive 76.56% accuracy on the validation dataset. This suggests that the web application is able to reliably identify duplicate question pairs.
-
The web application also implements natural language processing (NLP) techniques to preprocess and feature engineer the question data. This helps to improve the accuracy of the Random Forest classifier.