Skip to content

Latest commit

 

History

History
128 lines (120 loc) · 4.02 KB

README.md

File metadata and controls

128 lines (120 loc) · 4.02 KB

Data Science Massive Open Online Course

Below is a list of topics covered in meticulous detail.

In an effort to transition my career from that of a Chemist & Data Automation Specialist to a Data Scientist, I followed the self-guided curriculum laid out by DataQuest.io. I elected to skip the first course as I was already very strong in the basics of python. When I began, I already had 2+ years of Python/SQLite data extraction, cleansing, and analysis experience. The code for each DataQuest.io course is included in this repo.

Unfortunately, the sheer volume and size of the raw csv files used, vastly exceeds the file size limitation imposed by github. Therefore, I’ve have included *.csv files on the .gitignore file.

2. Data Analysis, Visualization, & Cleaning

  • Numpy
  • Pandas
  • Jupyter Notebook
  • matplotlib
  • seaborn
  • basemap
  • regular expressions (re)

3. Linux Command Line

  • Running a Linux VM
  • Nagivation
  • Working with files
  • Running python scripts from the command line
  • Pipng & redirecting output
  • csv toolkit
  • git
  • git remotes (github, .gitignore)

4. Working with Data Sources

  • API's (requests)
  • JSON (JavaScript Object Notation)
  • Authentication (OAuth2)
  • Web Scraping (BeautifulSoup)
  • SQL (Joins, WITH VIEW, UNION, INTERSECT, EXCEPT, ect...)
  • SQLite (sqlite3)
  • Database normalization
  • PostgreSQL (psycopg2, PostgreSQL Command-line, .pgpass)
  • Database indexing

5. Probability & Statistics

  • Standard Deviation & Correlation
  • Linear Regression
  • Disributions & Sampling
  • Probabilities
  • Probability Distributions
  • Chi-Squared Tests
  • Multi Category Chi-Squared Tests
  • Major Python Libraries Learned/Utilized in the Probability & Statistics Course:
    • scipy.stats
      • skew
      • kurtosis
      • norm
      • pearsonr
      • linregress
      • binom
      • chisquare
      • chi2_contingency
      • linspace (note, this one is in scipy, not scipy.stats)
    • math
    • functools
    • operator

6. Maching Learning

  1. Fundamentals
    1. Introduction to K-Nearest Neighbors
    2. Evaluating Model Performance
    3. Multivariate K-Nearest Neighbors
    4. Hyperparameter Optimization
    5. Cross Validation
  2. Calculus For Machine Learning
    1. Understanding Linear & Nonlinear Functions
    2. Understanding Limits
    3. Finding Extreme Points
  3. Linear Algebra For Machine Learning
    1. Linear Systems
    2. Vectors
    3. Matrix Algebra
    4. Solution Sets
  4. Linear Regression For Machine Learning
    1. The Linear Regression Model
    2. Feature Selection
    3. Gradient Descent
    4. Ordinary Least Squares
    5. Processing & Transforming Features
  5. Machine Learning in Python Intermediate Course
    1. Logistic Regression
    2. Binary Classifiers
    3. Multiclass Classification
    4. Intermediate Linear Regression
    5. Overfitting
    6. Clustering Basics
    7. K-Means Clustering
    8. Gradient Descent
    9. Into to Neural Networks
  6. Decision Trees
    1. Entropy
    2. Information gain
    3. ID3 algorithm
    4. apply & tweak decision trees
    5. random forests
  7. Machine Learning Final Project
    1. Data Cleaning
    2. Preparing the features
    3. Making Predictions
  8. Major Python Libraries Learned in the Machine Learning Course
    1. scipy.spatial
      • distance
    2. sklearn.neighbors
      • KNeighborsRegressor
    3. sklearn.cluster
      • KMeans
    4. sklearn.linear_model
      • LinearRegression
      • LogisticRegression
    5. sklearn.metrics
      • mean_squared_error
    6. sklearn.metrics.pairwise
      • euclidean_distances
    7. sklearn.model_selection
      • cross_val_score
      • KFold
    8. SumPy
      • symbols
      • limit
    9. NumPy
      • linalg.inv
      • linalg.det
      • dot