Repository for the Data Science learning track to host assignments.
Find powerpoints and helpful resources in the course_material folder of this repo! You'll need to clone the repo to see most of them.
Find a great quick python reference here: https://www.w3schools.com/python/
Think ‘process’ not ‘product’. The goal is to learn. The goal is not to hand in a perfect assignment.
Skim your homework assignment BEFORE you do the readings. It will help focus your attention!
SQR3: Scan, Question, Read, Recall, Review!!!!
- Finish any installs not completed in class.
- Skim the
Survival Guide
presentation. We will discuss this in more detail throughout the first 8 wks. - Submit the in-class activity to canvas. You can submit a link to your repo or the ipynb file itself.
-
Intro to Git Please complete 1-2
-
Intro to Python - Please complete 1-2
-
Go through the Provided
python_click_through.ipynb
.-
Open another notebook and copy each cell and play with it in the new notebook.
-
Ask yourself a question and experiment.
-
What if I change this variable?
- What’s the outcome?
-
What if I intentionally write code I think will fail?
- Does it fail?
-
What if I combine the concept in the cell above with this cell?
-
-
- Complete the week 1 homework notebook found here. You can also find it in the week 1 folder inside the course materials folder here on the github page.
- Submit a link to your week 1 homework on Canvas. Week 1 you are allowed to submit the file itself, but in the future you will have to submit a link.
https://www.youtube.com/watch?v=YYXdXT2l-Gg&list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7
- Suggest only Videos 2-7 and 9
- Read the following: http://swcarpentry.github.io/shell-novice/01-intro/index.html http://swcarpentry.github.io/shell-novice/02-filedir/index.html
- Create your own week 02 repository. (If you have not done so in class)
- Skim the 'In a nutshell' links for 'Learning how to Learn' and 'Deep Work' in the
Survival Guide
.- Find something in those readings that interests you and explore further.
- These topics can have profound effects outside the classroom as well.
- Submit a link to your group activity on Canvas.
-
Loops
- In DataCamp, complete Intermediate Python, Chapter 4: Loops (Click here to start)
-
Functions
- Intro to Python - Please complete 3
-
Classes
-
Working with classes can be challenging. Focus your attention on:
- Creating classes.
- Adding attributes.
- Creating class methods. (methods that operate on the entire class)
- Creating instance methods. (methods that act only on the instance)
- Creating objects from classes. (
foo = MyClass(attr1, attr2
)
-
Focus less (but be aware) of:
- Inheritance
-
Read this introduction to classes. (Don't worry about the exercises or any notes about Python 2.7.)
-
Read this and complete the exercise at the end. You do not need to submit these, but they will prepare you for the homework.
-
Read this Python's Methods Demystified
-
-
Intro to Git Please complete 3
- Complete the
week_02_homework.ipynb
found here. You can also find it in the week 2 folder in the course materials folder at the top of the github page. Submit a link to your repo or submit the.ipynb
file.
https://www.youtube.com/watch?v=YYXdXT2l-Gg&list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7
- Only videos 7 & 8
https://www.youtube.com/watch?v=tJxcKyFMTGo&list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7&index=11
https://www.youtube.com/watch?v=ZDa-Z5JzLYM&list=PL-osiE80TeTsqhIuOqKhwlXsIBIdSeYtc
- Only videos 1,2 & 3
Optional Reading: Only do this if you have completed your homework. And have deleted it and done it again.
introduction to functions. Read more on functions here.
- Submit your group activity on Canvas. Make sure it works, first!
- OPTIONAL: DataCamp PIP Tutorial
-
DataCamp: NumPy
- Intro to Python - Please complete 4
- Complete the whole chapter: “NumPy” through “Blend it all together”
-
Cheat Sheets (just for your reference)
-
Readings (The Unix Shell)
- Read these but spend most of your time this week on Numpy.
- Introducing the Shell
- Navigating Files and Directories
- Working with Files and Directories
- Pipes and Filters
-
Intro to Git Please complete 4-5
- Complete the
week_03_homework.ipynb
here. You can also find the notebook and the csv file you'll need in the week 4 folder in the course materials folder at the top of the page. - Embed screenshots at the end of your jupyter notebook to show you completed the Intro to Git and Intro to Python DataCamp courses.
- Do your own research on how to do this. There are a couple "correct" ways to embed images in ipynb files
- Make sure they render when you push your notebook to github
- List three things you learned about the unix shell in Markdown below the screenshots. Title the section "The Unix Shell"
- Submit your group activity to Canvas
- Read, Click Through and Digest:
pandas_part_1.ipynb'
- Read, Click Through and Digest:
pandas_part_2.ipynb'
- Pandas DataFrames - please read and review as needed
- Time Series tutorial with Pandas - please read and review as needed
- In DataCamp, Data Manipulation with Pandas - please complete
- In DataCamp, Into to DataViz - Matplotlib - Please complete 1-2
- Complete the
week_04_starter.ipynb
. You can find it in the week 4 folder in the course materials folder at the top of the github page. Submit a link to your repo.
- See
README.md
in the week_04/homework folder for full homework instructions. NOTE: Best viewed in github. Output_examples.ipynb
is provided as a reference.- View in Jupyter or Github. (Github sometimes mis-formats documents.)
- NOTE: Your numerical results should be very close to the examples.
- Your formatting may be very different than provided examples. Focus on getting the data and less on the formatting.
- Create a simple graph (any type) using Matplotlib and any of the data in the dataframe. Briefly explain what the graph shows.
- Embed an image indicating that you completed Data Manipulation with Pandas from DataCamp
- Submit your group activity to Canvas using the git url!
- Read REST API Tutorial if you did not in class or need a refresher
-
In DataCamp, complete the rest of Into to DataViz - Matplotlib
-
In DataCamp, complete all of Intro to DataViz - Seaborn
- Complete the
WeatherAPI_homework_starter.ipynb
. You can find it in the week 5 folder in the course materials folder at the top of the github page. Submit a link to your repo.
- This homework is likely your first opportunity to build your portfolio.
- Start early, make it neat.
- This is a real project you can showcase!
- API calls can be really slow (it is a free service), so limit the number of calls you are making while testing
- Embed screenshots at the end of your jupyter notebook to show you completed the Intro to Data Visualization with Matplotlib and Intro to Data Visualization with Seaborn from DataCamp
- Submit your group activity to Canvas using the git url!
- Install postgres and pg admin
- What is a Database?
- Overview (Only First page)
- RDBMS Concepts (Only First Page)
- Intro to SQL - Please complete 1-4
- Joining Data - Please complete 1 and 2
- Create a file called
week_6_hw.sql
- Answer all the questions in
week_6_sql_hw.docx
- For every problem, do the following:
- Copy and paste the problem into your
week_6_hw.sql
file. - Use PostgreSQL in PGAdmin on your computer to solve the problem.
- Paste your query into
week_6_hw.sql
. - Write an explanation of what is happening in each query (as a sql comment or in the readme). Be sure to reference the data model in your explanations as needed.
- Copy and paste the problem into your
- Commit the
week_6_hw.sql
file to your own repo. - In the readme for the repo explain what an RDBMS is and what SQL is briefly (under 250 words)
- Also in the readme, embed a screenshot indicating you have completed the Introduction to SQL in DataCamp
- Submit a link to your repo.
- Submit your group activity to Canvas using the git url!
- Subquery vs join
- SQL Autoincrementing
- Joining Data - Please complete 3 and 4
- Intermediate SQL - Please complete 1-4
- Create a file called
week_7_hw.sql
- Answer all the questions in
week_7_sql_hw.docx
- For every problem, do the following:
- Copy and paste the problem into your
week_7_hw.sql
file. - Use PostgreSQL in PGAdmin on your computer to solve the problem.
- Paste your query into
week_7_hw.sql
. - Write an explanation of what is happening in each query (as a sql comment). Be sure to reference the data model in your explanations as needed.
- Copy and paste the problem into your
- Commit the
week_7_hw.sql
file to your own repo. - In the readme, explain what autoincrementing is. Also explain the difference between creating a join and a subquery. This section should be less than 300 words.
- Also in the readme, embed a screenshot indicating you have completed the Joining Data in Postgresql DataCamp
- Submit a link to your repo.
- Submit your group activity to Canvas using the git url!
- Intro to Stats in Python - Please complete 1 - 4
- Reference scipy documentation
In lieu of a coding assignment, you need to work on your class project and turn it in under week 8 in Canvas.
Please make sure you have a github repo where your team is storing and working on your project together.
We will make sure everyone on the team is contributing evenly to the same repository.
Use what you learned this week in your project!
- Submit your group activity to Canvas using the git url!
- Make sure you actually completed the Intro to Stats in DataCamp from last week
- Foundations of Probability DataCamp - Please complete 1-2. 3-4 are optional.
- Khan Academy: basic theoretical probability section
- Create a markdown heading and explanation for each question in the probability_hw.docx file under week 9.
- Put the code answer for each question under the markdown heading.
- Embed a screenshot into your jupyter notebook showing you completed DataCamp's Intro to Stats
- Upload your completed notebook to github and submit the link to Canvas.
- The Khan Academy: Statistics & Probability course is a great resource to get another video on any concepts that are challenging. Seek out what you need more help on, and your TAs are here to help.
- One hot encoding - This is a tecnique you will use a lot going forward, and it requires knowledge of linear algebra to use in many scenarios
- Submit your group activity to Canvas using the git url!
- Linear Algebra in python - Go through all the subsections from Basics of Linear Algebra to Summary. Do the TRY IT! sections, as they will help you with your homework.
- Dot product vs cross product
- PCA in python - Remember, PCA uses linear algebra, hence why its relevant here
- be sure to copy the example code into your own jupyter notebook and run it as you go through the reading.
- LaTeX formatting - this can be done in a jupyter notebook markdown section!
- Create a markdown heading and explanation for each question in the linear_algebra_hw.docx file under week 9.
- Put the code answer for each question under the markdown heading.
- Upload your completed notebook to github and submit the link to Canvas.
- Linear Algebra
- Python linear algebra basics
- Recursion in Python
- DataCamp's Hierarchical and Recursive Queries in SQL Server
- DataCamp's Intermediate Python - you should already have completed some of this from earlier in the class, but the rest is useful to make sure you have mastered the basics
- Fourier Analysis
- Submit your group activity to Canvas using the git url!
- Time Series Analysis - Complete 1-4
- Fourier Analysis - Reread this to make sure you understand Fourier transform fundamentally
- SARIMA in Python
- Create a markdown heading and explanation for each question in the time_series_hw.docx file under week 10.
- Put the code answer for each question under the markdown heading.
- Embed a screenshot into your jupyter notebook showing you completed DataCamp's Timeseries Analysis in Python
- Upload your completed notebook to github and submit the link to Canvas.
- More Fourier Analysis
- DataCamp's SQL server Time Series
- Autoregression - this online book also digs into the math of many other time series models and components. Please note, the code examples in this book are in R, not Python, but the concepts are expressed well.
- Lambda, Apply, Assign
- Map, Reduce, Lambda
- Submit your group activity to Canvas using the git url!
- Machine learning for business Complete 1-4
- Python Data Science Toolbox Part 1 Complete 1-3
- Preprocessing for Machine Learning Complete 1
- https://datatofish.com/correlation-matrix-pandas/ Correlation matrix
- Create a markdown heading and explanation for each question in the intro_to_ml.docx file under week 12.
- Put the code answer for each question under the markdown heading.
- Embed screenshots into your jupyter notebook showing you completed DataCamp's Machine Learning for Business AND Python Data Science Toolbox Part 1
- Upload your completed notebook to github and submit the link to Canvas.
Read about how lambda works https://realpython.com/python-lambda/ More review of linear regression https://www.w3schools.com/python/python_ml_linear_regression.asp (see there is also something on polynomial regression that is useful)
- Intro to Supervised Learning in Python
- Submit your group activity to Canvas using the git url!
- Preprocessing for Machine Learning Complete 2-4
- Supervised Learning in sklearn Complete 1-4
- Create a markdown heading and explanation for each question in the supervised_learning.docx file under week 13.
- Put the code answer for each question under the markdown heading.
- Embed screenshots into your jupyter notebook showing you completed DataCamp's Preprocessing for Machine Learning in Python AND Supervised Learning with Scikit Learn
- Upload your completed notebook to github and submit the link to Canvas.
Google ML Crash Course - up to Regularization
- Logistic Regression Read 4.2 on Logistic Regression
- Submit your group activity to Canvas using the git url!
- Intermediate Regression with statsmodel Complete 1-4
- Python Data Science Toolbox Part 2 - Please complete 1
- Multiple Linear Regression - this is another way to do multiple regression: different from the datacamp course but what we discussed in class. You can use either approach.
- Preprocessing Reading - Please read this article to solidify your understanding of preprocessing
- Create a markdown heading and explanation for each question in the regression_hw.docx file under week 14.
- Put the code answer for each question under the markdown heading.
- Embed screenshots into your jupyter notebook showing you completed DataCamp's Intermediate Regression with Statsmodel
- Upload your completed notebook to github and submit the link to Canvas.
Intro to Regression with Statsmodel
Some additional statistical concepts https://data-flair.training/blogs/python-statistics
Read section 4.1 on Linear Regression https://christophm.github.io/interpretable-ml-book/limo.html
Read https://www.investopedia.com/terms/m/mlr.asp on multiple linear regression
- Oversampling and Undersampling
- SVM Sklearn Documentation
- Submit your group activity to Canvas using the git url!
- Linear Classifiers in sklearn Please complete 1-3
- Python Data Science Toolbox Part 2 - Please complete 2-3
- You do not have a notebook to complete this week. Your ETL projects are due next week and should be submitted under week 15 on canvas.
No optional readings this week. Make sure you understand very well what an SVM is and the kinds of problems it can be used to solve!
- Decision Trees with python
- Submit your group activity to Canvas using the git url!
- Linear Classifiers in sklearn Please complete 4
- Machine Learning with Tree-Based Models Please complete 1-3
- Create a markdown heading and explanation for each question in the svm_over_under_sampling_hw.docx file under week 16.
- Put the code answer for each question under the markdown heading.
- Embed screenshots into your jupyter notebook showing you completed DataCamp's Linear Classifiers in Python AND Python Data Science Toolkit Part 2 (from last week)
- Upload your completed notebook to github and submit the link to Canvas.
Gentle Introduction to Information Theory
Decision Tree Classification in Python
Decision Trees for Decision Making
- ROC Curve
- Submit your group activity to Canvas using the git url!
- Machine Learning with Tree-Based Models Please complete 4-5
- Parallel Random Forest Paper - You do not need to understand 100% of this, but its important you know what the industry is doing
- Intro to XGBoost
- Intro to Deep Learning please complete 1-2
- Create a markdown heading and explanation for each question in the tree_based_models_hw.docx file under week 17.
- Put the code answer for each question under the markdown heading.
- Embed screenshots into your jupyter notebook showing you completed DataCamp's Machine Learning with Tree-Based Models in Python
- Upload your completed notebook to github and submit the link to Canvas.
- PyTorch Loss Functions
- Submit your group activity to Canvas using the git url!
- Intro to Deep Learning please complete 3-4
- Unsupervised Learning in python Please complete 1-2
- Create a markdown heading and explanation for each question in the neural_networks_hw.docx file under week 18.
- Put the code answer for each question under the markdown heading.
- Embed screenshots into your jupyter notebook showing you completed DataCamp's Intro to Deep Learning.
- Upload your completed notebook to github and submit the link to Canvas.
Make sure you have picked your group for your final project. No issues if you want to mix them up. Notify the instructor of your project topic and group.
PyTorch vs Keras vs TensorFlow
- Neural Network Scaling
- Submit your group activity to Canvas using the git url!
- Unsupervised Learning in python Please complete 3-4
- Create a markdown heading and explanation for each question in the unsupervised_learning_hw.docx file under week 19.
- Put the code answer for each question under the markdown heading.
- Embed screenshots into your jupyter notebook showing you completed DataCamp's Unsupervised Learning in Python
- Upload your completed notebook to github and submit the link to Canvas.
Unsupervised Learning Cheat Sheet
K-Means Ideal Number of Clusters
Dimensionality Reduction Algorithms
- Tokenization
- Submit your group activity to Canvas using the git url!
- Intro to NLP Complete 1-4
- Webscraping in python Complete 1
- Create your free Azure account for next class
- Answer each question in the natural_language_processing_hw.docx file under week 20. Document as needed.
- Embed screenshots into your jupyter notebook showing you completed DataCamp's Intro to NLP
- Upload your completed repo to github and submit the link to Canvas.
- CRUD vs REST APIs
- Submit your group activity to Canvas using the git url!
- Cloud Computing for Everyone Please complete 1-3. This should take less than 2 hours.
- Review Flask documentation as needed to complete HW
- Create a python application to answer the questions in the APIs_hw.docx file under week 21.
- Embed screenshots into your jupyter notebook showing you completed DataCamp's Cloud Computing for Everyone.
- Write any conceptual questions, clarifications, and notes in comments. Be sure to number appropriately.
Flask APIs GETting and POSTing
- Selenium Web Scraping
- Submit your group activity to Canvas using the git url!
- Web Scraping in python Please complete 2-4
- MongoDB in Python Please complete 1-2
- Install MongoDB for next class. https://www.mongodb.com/try/download/community . Instructions To confirm it is working, follow the instructions to run the command line tool, and type in "show dbs" once that is running. If you see some results (usually admin, config, and local), it should be properly installed.
- How we learnt to stop worrying and love web scraping
- Create a python application to answer the questions in the webscraping_hw.docx file under week 22.
- Embed screenshots into your readme showing you completed DataCamp's Web Scraping in Python and 1-2 in MongoDB in Python.
- Write any conceptual questions, clarifications, and notes in a readme.md file. Be sure to number appropriately.
- NoSQL Explained
- Submit your group activity to Canvas using the git url!
- Delete any running services in your Azure account so you dont incur unwanted charges
- MongoDB in Python Please complete 3-4
There is no notebook assignment this week. Please complete your final projects with your group and be ready to present next week!
[NoSQL vs SQL](https://www.mongodb.com/nosql-explained/nosql-vs-sql