A hybrid AI-powered web application designed to detect deceptive hotel reviews using a combination of Semantic NLP (TF-IDF) and Stylometric Analysis.
Fake reviews are a significant problem in the digital economy. This project uses machine learning to distinguish between truthful and deceptive reviews by analyzing not just what is said, but how it is written.
- Hybrid Detection Engine: Combines TF-IDF vectorization with handcrafted stylometric features.
- Stylometric Metrics:
- Vocabulary Diversity: Measures the richness of the reviewer's vocabulary.
- Punctuation Intensity: Tracks excessive use of exclamation and question marks.
- Personal Pronoun Ratio: Analyzes the self-focus of the reviewer (often higher in fake reviews).
- Interactive Web Interface: Built with Flask for real-time analysis.
- Backend: Python, Flask
- Machine Learning: Scikit-learn, Joblib
- NLP: NLTK, Regex
- Frontend: HTML5, CSS3 (Vanilla)
- Data Handling: Pandas, NumPy, Scipy
├── data/ # Datasets used for training
├── models/ # Saved ML models (.pkl)
├── notebooks/ # Jupyter notebooks for EDA and Training
├── templates/ # HTML templates for the Flask app
├── app.py # Main Flask application
├── requirements.txt # Project dependencies
└── .gitignore # Files excluded from version control
git clone https://github.com/Ankush-22/ReviewLens.git
cd ReviewLenspython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtpython app.pyOpen your browser and navigate to http://127.0.0.1:5000.
- Preprocessing: Text is cleaned, lemmatized, and stop-words are removed.
- Feature Extraction:
- TF-IDF: Captures the importance of specific words.
- Stylometry: Calculates mathematical ratios of language use.
- Classification: A pre-trained hybrid model (Random Forest/SVM) predicts the likelihood of the review being fake.