Skip to content

wongkhoon/Professional-Data-Scientist-in-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

An overview of the (former version) Datacamp Data Scientist Certification in Python

Personal certification page: https://www.datacamp.com/certificate/DS0017278428696

23 courses, 6 projects, 3 skill assessments

  1. Introduction to Python
    Master the basics of data analysis in Python. Expand your skillset by learning scientific computing with Numpy.
  2. Intermediate Python
    Level up your data science skills by creating visualizations using Matplotlib and manipulating DataFrames with Pandas

    PROJECT: Investigating Netflix Movies and Guest Stars in the Office

  3. Data Manipulation with pandas
    Use the world’s most popular Python data science package to manipulate data and calculate summary statistics.

    PROJECT: The Android App Market on Google Play

  4. Joining Data with Pandas
    Learn to combine data from multiple tables by joining data together using pandas

    PROJECT: The Github History of the Scala Language

  5. Introduction to Data Visualization with Matplotlib
    Learn how to create, customize, and share data visualizations using Matplotlib.
  6. Introduction to Data Visualization with Seaborn
    Learn how to create informative and attractive visualizations in Python using the Seaborn library.
  7. Python Data Science Toolbox (Part 1)
    Learn the art of writing your own function in Python, as well as key concepts like scoping and error handling.
  8. Python Data Science Toobox (part 2)
    Continue to build your modern Data Science skills by learning about iterators and list comprehensions.
  9. Intermediate Data Visualization with Seaborn
    Use Seaborn’s sophisticated visualization tools to make beautiful, informative visualizations with ease.

    PROJECT: A Visual History of Nobel Prize Winners

    SKILL ASSESSMENT: Data Manipulations with Python

  10. Introduction to Importing Data in Python
    Learn to import data into Python from various sources, such as Excel, SQL, SAS and right from the web.
  11. Intermediate Importing Data in Python
    Improve your Python data importing skills and learn to work with web and API date
  12. Cleaning Data in Python
    Learn to diagnose and treat dirty data and develop the skills needed to transform your raw data into accurate insights!
  13. Working with Dates and Times in Python
    Learn how to work with dates and times in Python

    SKILL ASSESSMENT: Importing & Cleaning Data with Python

  14. Writing Functions with Python
    Learn to use best practices to write maintainable, reusable, complex function with good documentation.

    SKILL ASSESSMENT: Python Programming

  15. Exploratory Data Analysis in Python
    Learn how to explore, visualize, and extract insights from data
  16. Analyzing Police Activity with pandas
    Explore the Standford Open Policing Project dataset and analyze the impact of gender on police behaviour using Pandas
  17. Statistical Thinking in Python (Part 1)
    Build the foundation you need to think statistically and to speak the language of your data.
  18. Statistical Thinking in Python (Part 2)
    Learn to perform the two key tasks in statistical inference: parameter estimation and hypothesis testing.

    PROJECT: Dr. Semmelwels and the Discovery of Handwashing

  19. Machine Learning with scikit-learn
    Learn how to build and tune predictive models and evaluate how well they’ll perform on unseen data

    PROJECT: Predicting Credit Card Approvals

  20. Unsupervised Learning in Python
    Learn how to cluster, transform, visualize, and extract insights from unlabelled datasets using scikit-learn and scipy.
  21. Machine Learning Tree-Based Models in Python
    In this course, you’ll learn how to use tree-based models and ensembles for regression and classification using scikit…
  22. Case Study: School Budgeting with Machine Learning in Python
    Learn how to build a model automatically classify items in a school budget.
  23. Cluster Analysis in Python
    In this course, you will be introduced to unsupervised learning through techniques such as hierarchical and k-means c…

Four 40-minute timed assessments

  1. Coding for Production
  2. Statistical Experimentation
  3. Exploratory Analysis with PostgreSQL
  4. Model Development

One coding challenge

Final case study1

  • Two parts:
    • Technical report for data science manager
    • Presentation for non-technical audience
  • Problem type: Binary classification
  • Work environment: Datacamp workspace
  • Libraries/modules used: pandas, numpy, seaborn, matplotlib, BeautifulSoup, nltk, PorterStemmer,
    TfidfVectorizer, WordCloud, operator, plotly, sklearn, time, imblearn, LogisticRegressionCV, RandomForestClassifier, classification_report, confusion_matrix, balanced_accuracy_score, matthews_corrcoef, geometric_mean_score, compute_class_weight, GridSearchCV
  • Workflow:
    • Read CSV data: >40000 entries
    • Data exploration
    • Subset data for NLP analysis
    • Data quality: duplicates/missing data/quality check e.g. mojibake
    • Feature engineering e.g. combining non-numerical columns
    • Text preprocessing (e.g. stemming, tokenization, TF-IDF)
    • Data summary and visualizations e.g. word clouds, distribution
    • Stratified train-test split
    • Machine learning algorithms with class weights: compare metrics, select best, hyperparameter tuning (gridsearchcv + adjust decision threshold) to answer business question as well as achieve business success criteria.
    • Summary/result/discussion: Findings, final model, metrics trade-off
    • Recommendations for future work: SME involvement, better data quality e.g. well defined features, better data representativeness e.g. additional features, other methods e.g. deep learning

Footnotes

  1. Restrictions on sharing as advised by DataCamp