This project is part of my studies at HSE's Data Science and Business Analytics (DSBA) program. The analysis explores the Student Performance Factors dataset from Kaggle, focusing on the various factors that may influence student outcomes. The objective is to uncover meaningful insights and patterns within the data that could help in understanding what factors contribute most to student success.
Dataset: Student Performance Factors on Kaggle
- Project Overview
- Dataset Description
- Installation and Setup
- Usage
- Analysis Summary
- Project Structure
- Contributions
- License
The dataset includes various student-related features, such as demographic information, academic background, and other factors. These attributes can provide insights into potential performance outcomes.
Hours_Studied
- Number of hours spent studying per week.Attendance
- Percentage of classes attended.Parental_Involvement
- Level of parental involvement in the student's education (Low, Medium, High).Access_to_Resources
- Availability of educational resources (Low, Medium, High).Extracurricular_Activities
- Participation in extracurricular activities (Yes, No).Sleep_Hours
- Average number of hours of sleep per night.Previous_Scores
- Scores from previous exams.Motivation_Level
- Student's level of motivation (Low, Medium, High).Internet_Access
- Availability of internet access (Yes, No).Tutoring_Sessions
- Number of tutoring sessions attended per month.Family_Income
- Family income level (Low, Medium, High).Teacher_Quality
- Quality of the teachers (Low, Medium, High).School_Type
- Type of school attended (Public, Private).Peer_Influence
- Influence of peers on academic performance (Positive, Neutral, Negative).Physical_Activity
- Average number of hours of physical activity per week.Learning_Disabilities
- Presence of learning disabilities (Yes, No).Parental_Education_Level
- Highest education level of parents (High School, College, Postgraduate).Distance_from_Home
- Distance from home to school (Near, Moderate, Far).Gender
- Gender of the student (Male, Female).Exam_Score
- Final exam score.
The goal is to identify significant predictors of student performance using exploratory data analysis and possibly some machine learning techniques.
To get started, clone this repository and ensure you have Jupyter Notebook installed along with necessary libraries such as pandas
, numpy
, matplotlib
, and seaborn
.
-
Clone the repository:
git clone https://github.com/MattoYuzuru/student_performance_factors_analysis.git
-
Navigate to the project folder:
cd student_performance_factors_analysis
-
Install the dependencies:
pip install -r requirements.txt
-
Open the Jupyter Notebook:
jupyter notebook main.ipynb
Or just run main.ipynb on your fav IDE.
- Load the Data: The notebook provides code to load and inspect the dataset.
- Run Analysis Cells: Execute each cell sequentially to understand and analyze the data step-by-step. The notebook includes sections for:
- Data Cleaning and Preprocessing
- Exploratory Data Analysis (EDA)
- Feature Engineering
This section will include a summary of key findings from the analysis, including:
- Significant correlations between study habits and performance
- Visualization of the impact of absences on grades
- Exploration of demographic influences on student success
That is my first analysis project so far. I tried to keep structure simple, but it still looks a bit scattered
- After reading the dataset, I've cleaned NaN cells
- Before making any hypothesis I made EDA
Contributions are welcome! Please feel free to submit a pull request or open an issue if you have suggestions for improving the analysis or visualizations.
This project is licensed under the MIT License. See the LICENSE file for details.