Welcome to my GitHub!
My name is Jonathan Daniel. I'm an aspiring Data Scientist with a deep interest in uncovering insights, solving real-world problems, and using data to help make better decisions. I'm transitioning into tech, building a strong foundation in Data Analytics, Python, and Machine Learning through learning and hands-on projects.
- Data exploration and visualisation using Power BI
- Writing efficient queries and transformations with SQL
- Building predictive models and performing exploratory data analysis (EDA) in Python
- Documenting and sharing end-to-end workflows for learning and collaboration
| Project Title | Description | Tools Used |
|---|---|---|
| Stroke Risk Prediction | Predicts stroke risk based on health and lifestyle data | Python (scikit-learn, pandas, matplotlib and seaborn) |
| Calorie Expenditure Prediction | Built a Streamlit application powered by a machine learning model to estimate the number of calories burned during a given exercise session based on its duration and other relevant factors | Python and Streamlit |
| Return to Space Challenge | Use data of space mission from 1957 to 2022 to tell the thrilling story of humanity’s journey to the stars. | PowerBI |
⚡ More projects coming soon...
- Use data to support better business decisions and everyday activities.
- Launch a career in Data Science.
- Contribute to open-source or socially impactful data projects
-
Programming & Data Analysis
- Python: NumPy, Pandas, Matplotlib, Seaborn, Plotly
- Scikit-learn: Regression, Classification, Model Evaluation, Hyperparameter Tuning
- Streamlit: Interactive dashboards & ML app deployment
-
Data Visualisation & BI
- Power BI: DAX, Power Query, Interactive Reports, Power Pivot
- Excel: Data Cleaning, Formulas, Pivot Tables, Dashboard Reporting
-
Databases & Querying
- SQL: Table creation and Schema Design, Data extraction, Joins, Aggregations, Filtering, Window Functions
- Relational Databases: MySQL, PostgreSQL
-
Machine Learning & AI
- Supervised Learning: Linear/Logistic Regression, Decision Trees, Random Forest, Gradient Boosting
- Anomaly Detection: Isolation Forest & DBSCAN
- Unsupervised Learning: Clustering with KMeans, Heirarchical, DBSCAN
- Model Interpretation: SHAP, Permutation Importance
- Pipeline Implementation: Preprocessing, Feature Engineering & Model training
-
Version Control & Collaboration
- Git & GitHub: Branching, Pull Requests, Project Documentation
-
Other Tools
- Jupyter Notebook & VS Code for experimentation and development
- Render & GitHub Pages for deployment
I'm always excited to connect with fellow data enthusiasts, so feel free to reach out to me on:
Let's learn and grow together on this data analysis journey! If you have any questions, suggestions, or would like to collaborate on a project, please don't hesitate to get in touch.