Hi, this is me showing you a selection of my Python, SQL & data science projects.
My Name is Jan. I'm from Germany and here are some short information about my education and work experience.
Physics & Economy, Diploma
- IT project manager for a team of 20 IT developers, working with SCRUM, KANBAN and modern DevOps practices
- Business process management for customer-facing service processes and internal supporting processes with data mining experience in SQL
- Quality management and business intelligence tasks with SQL and PowerBI
- Owned small company for database/cloud consulting and analytics, regarding performance benchmarking. Mainly responsible for all entrepreneurial tasks like funding, staffing, OKRs, financial planning, marketing and international sales.
As you can see, I'm not an educated software engineer or data scientist. I came to data science due to my passion for mathematics, data and working in analytical jobs for a few years. While I learned JAVA and C at university, I taught myself Python for my data science projects.
What sets me apart from many other data scientists is my experience and my understanding of business contexts and processes. The business impact sets the goal and effort of any IT project.
Let's have a look at my recent projects.
- [Python, Postgres DBMS, SQL, Streamlit, MistralAI Systems]
- A heart project of mine to transfer my fundamental value investment approach from excel to Python, SQL and an AI System.
- Project under construction, but first steps and roadmap can be found here: Link
- [Python, Tensorflow, Keras]
- Tensorflow model on a lendingclub loan dataset including data analysis, data cleaning and data preparation. Link
- [Python, PyTorch, Matplotlib]
- Pytorch NN with classes and functions for MNIST dataset. Evaluating hidden layers, activation functions, learning rates, loss functions, optimization functions and epochs to understand performance optimization of NNs. Link
- [Python, numpy]
- Building a 3 layer neural network with numpy step by step with forward and backpropagating using numpy matrices. Link
- Also creating a reusable class and functions for this 3-layer neural network. Link
- [Python, sklearn, pandas, CountVectorizer, MistralAI]
- Building a test dataset with MistralAI, a European ChatGPT alternative, to train a simple Chatbot based on BagOfWords and a DecisionTree to identify category and posible answers. Link
- [Python, pandas, sklearn, seaborn]
- Multidimensional KMeans cluster segmentation on "University data" with PCA. Link
- [Python, pandas, sklearn, seaborn]
- Multidimensional KMeans cluster segmentation on Kaggle dataset "Mall_Customers" with PCA and 3d plotting. Link
- [Python, pandas, sklearn, seaborn]
- Creating random customer data, visualizing and clustering with KMeans. Identifying optimal cluster number with ellbow and silhouette score. Link
- [Python, sklearn, pandas, matplotlib]
- Comparing the results of several supervised learning algorithms - DecisionTree, LogisticRegression, KNearestNeighbor, NaiveBayes on the identical test data. Using different scaling algorithms and optimization approaches to fine tune accuracy. Link
- [Python, sklearn, pandas]
- Identifying number images from MNIST dataset and using GaussianMixture to generate new images. Link
- [Python, pandas, sklearn, seaborn]
- Small training project to categorize random product features to pricing classes with RandomForestClassifier. Link
- [Python, pandas, sklearn, seaborn]
- Straight forward Logistic regression to detect tumor based on features. Link
- [Python, pandas, sklearn, seaborn]
- Cleaning data and doing a polynomila regression on salary data. Link
- [Python, sklearn]
- Using K-Fold cross validation to identify best polynomial regression degree. Link
- [Python, pandas, seaborn]
- Data cleaning and data analytics as preparation for a neural netwokr training on a large loan dataset. Link
- [Python, pandas, plotly]
- Extensive data analytics project with pandas, data cleaning, data grouping and pivoting. Interactive visualisations with plotly. Link
- Note: Plotly graphs can be found in folder "Results", due to incompatible view in GitHub notebooks.
- [Python, pandas, requests, seaborn]
- Using the FinancialModelingAPI to get stock data for Hensoldt,rheinmetall and other defense stocks for reindexing, return analysis and momentum visualisation. Link
- [Python, pandas, seaborn]
- Simple data cleaning project as finger exercise: Datetime corrections, fill NA values, outliers, duplicates. Link
- [Python, pandas, sklearn]
- Calculating feature entropy and information gain together with basic statistical information about the dataset. Link
- [Python]
- Visualizing distance methods, used for clustering in dendrograms. Link