Datathon

Repository for DSA Datathon

GW Data Science Association Spring Datathon Guidelines

Dates: ● GitHub Repository with the Report due at 5 PM on Thursday, March 30th
● In-Person Presentation on Saturday, April 1st at 11 AM
● Attend Data Science Week events to earn bonus points. Each team/participant can receive up to 10 bonus points and you can find information about location and times on our instagram @gwu_dsa
○ Trivia Night (March 27, 2023): +2 points
○ Data Science Working Session (March 28, 2023): +3 points
■ Please Read: This time is provided for you to meet with your group as well as asking questions about setup/deliverables. We encourage you to come to this working session because we will have a professor there to assist you.

○ Game Night & Networking (March 29, 2023): +3 points
○ DSA Logo Competition (March 30, 2023): +2 points
○ Guest Speaker Event (March 31, 2023): +5 points

Data: You can access the dataset for Datathon at this link
Problem Statement: ● We are interested in identifying, understanding, and predicting the drinkability of water based on its features. Your task is:
○ To identify pattern, and show how different water qualities may be related ○ Build a model that predicts drinkability of water when the qualities change ● Please cite any additional research done to support your conclusion
Report: The report, in the form of Jupyter notebook or Google Collab, should include the following sections: ● Abstract: Summarize the key findings ● Pipeline: Implement the full pipeline of the project, including: ○ Data cleaning/preprocessing ○ EDA ○ Hyperparameter tuning and / or model selection ○ Interpretation: e. g., feature’s predictive power over the target ○ Discussion/ Conclusion
Presentation: ● Prepare a 5-10 minute long presentation, which should include a focus on the interpretation and a discussion of the report ● PowerPoint Presentation ○ The slides can include data cleaning and preprocessing steps, feature engineering, EDA, and modeling, etc. ○ Highlight any interesting patterns and insights ○ Explain why you chose the model and the steps you took to increasing its performance ○ Make sure to not exeed the time limit
Deliverables: ● Each team should submit a Jupyter notebook or Google Collab file and include: ○ The GitHub Repo link of the report ○ Submission should be emailed to arugupta17@gwu.edu by 5 PM on Thursday, March 30th
Judging Criteria: ○ On top of evaluating the GitHub Repository organization and the in-person presentation the rubric below is also how the judges will evaluate each team/ participant: Categories 4-Excellent 3- Good 2-Acceptable 1-Needs Improvement

Data Cleaning /Preprocessing

Techniques used are effective and well throughtout to not remove important information

Most of the data is cleaned and preprocessed

Better data cleaning techniques could have been used

No data cleaning or preprocessing was done

Visualizations Visuals tell the story about the data and are appropriately chosen

Visuals partially describe the data

Visuals don’t fully describe and are not the best chose

Visuals are not related to data at all

Models The appropriate model is chosen and participants maximized its potential

Participants seem mostly knowledgeable on why they chose the model and have used it correctly

Model chosen may not have the best one but one was implemented

No modeling was attempted.

Analysis A thorough discussion about the findings and the impact it can create

A brief discussion about the findings but limited analysis on how they are helpful

Minor discussion but lacks clarity and effort

Results were simply stated.

Prizes ● Prize money will be awarded to the top 3 highest scoring teams with the 1st team to win $300, 2nd team to win $200 and the 3rd team to win $100.

If you have any questions, please feel free to email Aru Gupta at arugupta17@gwu.edu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Datathon

Files

README.md

Latest commit

History

README.md

File metadata and controls

Datathon