A data analysis project examining factors that influence student exam scores, including study time, attendance, and sleep patterns.
This project analyzes student performance data to identify key factors affecting academic success. The analysis includes data cleaning, statistical analysis, and visualization to provide actionable insights.
The dataset contains 1000 student records with the following variables:
- hours_studied: Daily study hours
- previous_score: Previous exam score
- attendance: Attendance percentage
- sleep_hours: Average sleep hours per night
- internet_usage: Daily internet usage in hours
- final_score: Final exam score (0-100)
pip install pandas matplotlib numpy seaborn jupyterFor quick analysis and results:
python analysis.pyThis will generate:
- Cleaned dataset (student_performance_cleaned.csv)
- Visualization charts (student_analysis_charts.png)
- Statistical summary in console
For interactive analysis with detailed visualizations:
jupyter notebookThen open student_analysis.ipynb and run all cells.
- Handling missing values
- Removing invalid data
- Data validation
- Average, maximum, and minimum scores
- Pass/fail rates (passing grade: 60)
- Study hours analysis
- Attendance patterns
Python Script (4 charts):
- Study Hours vs Exam Score
- Pass vs Fail Distribution
- Attendance vs Score
- Score Distribution
Jupyter Notebook (10+ charts):
- All script visualizations plus:
- Correlation heatmap
- Box plots by study groups
- Violin plots
- Pair plots
- Missing data visualization
- Impact of study time on performance
- Relationship between attendance and scores
- Pass rate analysis
- Correlation analysis
Key insights from the analysis:
- Students studying more than 3 hours per day score significantly higher
- Attendance above 75% correlates with better performance
- Overall pass rate: 98%
- Average score: 88.49
See REPORT.md for detailed findings and recommendations.
.
├── analysis.py # Python script for analysis
├── student_analysis.ipynb # Jupyter notebook
├── student_performance.csv # Original dataset
├── student_performance_cleaned.csv # Cleaned dataset (generated)
├── student_analysis_charts.png # Visualizations (generated)
├── README.md # This file
└── REPORT.md # Detailed analysis report
Feel free to fork this project and submit pull requests for improvements.
This project is open source and available for educational purposes.
Biruk D. GitHub: @birukd1