From f4e0aebdc66baa8f8053b2cce3a39112b57c509f Mon Sep 17 00:00:00 2001 From: Sheryar Adil <66489218+Sherryyy00@users.noreply.github.com> Date: Thu, 3 Oct 2024 10:34:55 +0500 Subject: [PATCH] Update README.md --- README.md | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/README.md b/README.md index 17338ff..3748fa9 100644 --- a/README.md +++ b/README.md @@ -43,6 +43,40 @@  Monthly Bar Chart

+## Recommendation System Implementation + +The recommendation system in **Scholarship Spy** employs a **content-based filtering** approach to provide personalized scholarship recommendations based on users' personal statements. This process involves several key steps to ensure accurate and relevant suggestions. + +### Key Techniques: + +1. **Text Cleaning**: + - The input personal statement undergoes preprocessing to enhance data quality. This involves: + - Converting all text to lowercase to maintain uniformity. + - Removing non-alphanumeric characters and punctuation to focus on the content. + - Tokenization, which breaks the text into individual words for analysis. + - Filtering out common stopwords (e.g., "and," "the," "is") using the NLTK library to reduce noise in the data. + +2. **Word Embeddings**: + - The system utilizes pre-trained **GloVe** (Global Vectors for Word Representation) embeddings (`glove.6B.50d.txt`), which convert words into numerical vector representations. This allows the model to capture semantic meanings and relationships between words. + - Each word in the user's cleaned personal statement is converted into a vector, and the average of these vectors creates a single vector representation for the entire statement. + +3. **Clustering**: + - The dataset is processed to form clusters of scholarships based on their textual features. The centroids of these clusters are stored in a file named `cluster_centers.npy`. + - Each centroid represents a distinct group of scholarships, allowing the model to categorize scholarships based on similarities in their descriptions and titles. + +4. **Recommendation**: + - When a user inputs their personal statement, the system generates a vector representation of the statement using the techniques mentioned above. + - It calculates the Euclidean distance between the user's statement vector and the centroids of the scholarship clusters. + - The system identifies the `n` closest centroids (scholarships) to the user's vector and retrieves the corresponding scholarship titles, universities, and links from the dataset. + +### Output: +The code outputs the following information for each recommended scholarship: +- **University**: The name of the university offering the scholarship. +- **Scholarship**: The title of the scholarship. +- **Link**: A URL linking directly to the scholarship application page. + +This recommendation system enhances the user experience by providing tailored scholarship opportunities that align closely with individual aspirations and qualifications. + ## Installation 1. Clone the repository: