Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Sherryyy00 authored Oct 3, 2024
1 parent 446d959 commit f4e0aeb
Showing 1 changed file with 34 additions and 0 deletions.
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,40 @@
<img src="https://github.com/Sherryyy00/Scholarship-Spy/blob/main/images/admin.jpeg", alt=" Monthly Bar Chart " width="50%" height="50%">
</p>

## Recommendation System Implementation

The recommendation system in **Scholarship Spy** employs a **content-based filtering** approach to provide personalized scholarship recommendations based on users' personal statements. This process involves several key steps to ensure accurate and relevant suggestions.

### Key Techniques:

1. **Text Cleaning**:
- The input personal statement undergoes preprocessing to enhance data quality. This involves:
- Converting all text to lowercase to maintain uniformity.
- Removing non-alphanumeric characters and punctuation to focus on the content.
- Tokenization, which breaks the text into individual words for analysis.
- Filtering out common stopwords (e.g., "and," "the," "is") using the NLTK library to reduce noise in the data.

2. **Word Embeddings**:
- The system utilizes pre-trained **GloVe** (Global Vectors for Word Representation) embeddings (`glove.6B.50d.txt`), which convert words into numerical vector representations. This allows the model to capture semantic meanings and relationships between words.
- Each word in the user's cleaned personal statement is converted into a vector, and the average of these vectors creates a single vector representation for the entire statement.

3. **Clustering**:
- The dataset is processed to form clusters of scholarships based on their textual features. The centroids of these clusters are stored in a file named `cluster_centers.npy`.
- Each centroid represents a distinct group of scholarships, allowing the model to categorize scholarships based on similarities in their descriptions and titles.

4. **Recommendation**:
- When a user inputs their personal statement, the system generates a vector representation of the statement using the techniques mentioned above.
- It calculates the Euclidean distance between the user's statement vector and the centroids of the scholarship clusters.
- The system identifies the `n` closest centroids (scholarships) to the user's vector and retrieves the corresponding scholarship titles, universities, and links from the dataset.

### Output:
The code outputs the following information for each recommended scholarship:
- **University**: The name of the university offering the scholarship.
- **Scholarship**: The title of the scholarship.
- **Link**: A URL linking directly to the scholarship application page.

This recommendation system enhances the user experience by providing tailored scholarship opportunities that align closely with individual aspirations and qualifications.

## Installation

1. Clone the repository:
Expand Down

0 comments on commit f4e0aeb

Please sign in to comment.