Update README.md

Sherryyy00 · Oct 3, 2024 · f4e0aeb · f4e0aeb
1 parent 446d959
commit f4e0aeb
Showing 1 changed file with 34 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -43,6 +43,40 @@
   <img src="https://github.com/Sherryyy00/Scholarship-Spy/blob/main/images/admin.jpeg", alt=" Monthly Bar Chart " width="50%" height="50%">
 </p>
 
+## Recommendation System Implementation
+
+The recommendation system in **Scholarship Spy** employs a **content-based filtering** approach to provide personalized scholarship recommendations based on users' personal statements. This process involves several key steps to ensure accurate and relevant suggestions.
+
+### Key Techniques:
+
+1. **Text Cleaning**:
+   - The input personal statement undergoes preprocessing to enhance data quality. This involves:
+     - Converting all text to lowercase to maintain uniformity.
+     - Removing non-alphanumeric characters and punctuation to focus on the content.
+     - Tokenization, which breaks the text into individual words for analysis.
+     - Filtering out common stopwords (e.g., "and," "the," "is") using the NLTK library to reduce noise in the data.
+
+2. **Word Embeddings**:
+   - The system utilizes pre-trained **GloVe** (Global Vectors for Word Representation) embeddings (`glove.6B.50d.txt`), which convert words into numerical vector representations. This allows the model to capture semantic meanings and relationships between words.
+   - Each word in the user's cleaned personal statement is converted into a vector, and the average of these vectors creates a single vector representation for the entire statement.
+
+3. **Clustering**:
+   - The dataset is processed to form clusters of scholarships based on their textual features. The centroids of these clusters are stored in a file named `cluster_centers.npy`.
+   - Each centroid represents a distinct group of scholarships, allowing the model to categorize scholarships based on similarities in their descriptions and titles.
+
+4. **Recommendation**:
+   - When a user inputs their personal statement, the system generates a vector representation of the statement using the techniques mentioned above.
+   - It calculates the Euclidean distance between the user's statement vector and the centroids of the scholarship clusters. 
+   - The system identifies the `n` closest centroids (scholarships) to the user's vector and retrieves the corresponding scholarship titles, universities, and links from the dataset.
+
+### Output:
+The code outputs the following information for each recommended scholarship:
+- **University**: The name of the university offering the scholarship.
+- **Scholarship**: The title of the scholarship.
+- **Link**: A URL linking directly to the scholarship application page.
+
+This recommendation system enhances the user experience by providing tailored scholarship opportunities that align closely with individual aspirations and qualifications.
+
 ## Installation
 
 1. Clone the repository: