PROJECT : ALUMNI PROFILE MATCHING

Motivation: It’s always good to reach out to people with similar interests and backgrounds. Usually, we only reach out to alumni who are introduced during special events or seminars. This could overwhelm the alumni or they might not answer all the current students' questions. For suppose, if there are 2 students - one with a Finance background with several years of experience and the other with a Computer Science background with no/little experience. Alumni with relative or similar backgrounds can make better suggestions. This project aims to pick the top 5 similar alumni profiles for each current student to whom they can reach out by determining the similarity scores for the students and the alumni.

Data sources :

List of alumni profiles by cohort maintained on GCP
Scraped linkedin profiles data(in json) per cohort

In phase 1, we setup a DAG airflow pipeline to extract data from the aforementioned data sources, perform transformations on spark and load into mongo db. Listed below are the modules in which the ETL setup using DAG has been implemented.

DAG pipeline

msds697_task2.py

The DAG is setup in a way that it dynamically generates task workflows for each cohort. We read the main file(source_file#1) in get_cohorts() function. Each flow consists of two tasks: one for extract and other for tranform&load

Extract :

alumni_list.py alumni_profiles.py

Transform & Load

aggregates_to_mongo.py

Load

mongodb.py

User definition

user_definition.py

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
utilities		utilities
.gitignore		.gitignore
README.md		README.md
Task 3 - Final Report.pdf		Task 3 - Final Report.pdf
Task3.html		Task3.html
aggregates_to_mongo.py		aggregates_to_mongo.py
alumni_list.py		alumni_list.py
alumni_profiles.py		alumni_profiles.py
mongodb.py		mongodb.py
msds697_task2.py		msds697_task2.py
transform.py		transform.py
user_definition.py		user_definition.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PROJECT : ALUMNI PROFILE MATCHING

About

Releases

Packages

Contributors 4

Languages

MohanaMeher/alumni_profile_matching

Folders and files

Latest commit

History

Repository files navigation

PROJECT : ALUMNI PROFILE MATCHING

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages