In an effort to transition my career from that of a Chemist & Data Automation Specialist to a Data Scientist, I followed the self-guided curriculum laid out by DataQuest.io. I elected to skip the first course as I was already very strong in the basics of python. When I began, I already had 2+ years of Python/SQLite data extraction, cleansing, and analysis experience. The code for each DataQuest.io course is included in this repo.
Unfortunately, the sheer volume and size of the raw csv files used, vastly exceeds the file size limitation imposed by github. Therefore, I’ve have included *.csv files on the .gitignore file.
- Numpy
- Pandas
- Jupyter Notebook
- matplotlib
- seaborn
- basemap
- regular expressions (re)
- Running a Linux VM
- Nagivation
- Working with files
- Running python scripts from the command line
- Pipng & redirecting output
- csv toolkit
- git
- git remotes (github, .gitignore)
- API's (requests)
- JSON (JavaScript Object Notation)
- Authentication (OAuth2)
- Web Scraping (BeautifulSoup)
- SQL (Joins, WITH VIEW, UNION, INTERSECT, EXCEPT, ect...)
- SQLite (sqlite3)
- Database normalization
- PostgreSQL (psycopg2, PostgreSQL Command-line, .pgpass)
- Database indexing
- Standard Deviation & Correlation
- Linear Regression
- Disributions & Sampling
- Probabilities
- Probability Distributions
- Chi-Squared Tests
- Multi Category Chi-Squared Tests
- Major Python Libraries Learned/Utilized in the Probability & Statistics Course:
- scipy.stats
- skew
- kurtosis
- norm
- pearsonr
- linregress
- binom
- chisquare
- chi2_contingency
- linspace (note, this one is in scipy, not scipy.stats)
- math
- functools
- operator
- scipy.stats
- Fundamentals
- Introduction to K-Nearest Neighbors
- Evaluating Model Performance
- Multivariate K-Nearest Neighbors
- Hyperparameter Optimization
- Cross Validation
- Calculus For Machine Learning
- Understanding Linear & Nonlinear Functions
- Understanding Limits
- Finding Extreme Points
- Linear Algebra For Machine Learning
- Linear Systems
- Vectors
- Matrix Algebra
- Solution Sets
- Linear Regression For Machine Learning
- The Linear Regression Model
- Feature Selection
- Gradient Descent
- Ordinary Least Squares
- Processing & Transforming Features
- Machine Learning in Python Intermediate Course
- Logistic Regression
- Binary Classifiers
- Multiclass Classification
- Intermediate Linear Regression
- Overfitting
- Clustering Basics
- K-Means Clustering
- Gradient Descent
- Into to Neural Networks
- Decision Trees
- Entropy
- Information gain
- ID3 algorithm
- apply & tweak decision trees
- random forests
- Machine Learning Final Project
- Data Cleaning
- Preparing the features
- Making Predictions
- Major Python Libraries Learned in the Machine Learning Course
- scipy.spatial
- distance
- sklearn.neighbors
- KNeighborsRegressor
- sklearn.cluster
- KMeans
- sklearn.linear_model
- LinearRegression
- LogisticRegression
- sklearn.metrics
- mean_squared_error
- sklearn.metrics.pairwise
- euclidean_distances
- sklearn.model_selection
- cross_val_score
- KFold
- SumPy
- symbols
- limit
- NumPy
- linalg.inv
- linalg.det
- dot
- scipy.spatial