Data team notes on each project can be found here (for better or for worse).
Last year's repository is also available, which contains a list of resources and tutorials (from the first week of the workshop).
First, a list of documentation for major packages:
- scikit-learn for scientific computaiton and some machine learning
- pandas for dataframes
- numpy for fast matrix operations
- dask for distributed dataframes
- PyTorch for dynamically constructed neural networks in Python. Well documented.
- keras for neural networks built on Tensorflow or Theano. Visit the keras blog for myriad tutorials. Documentation is only so-so.
Also, a long list of useful Python-related tutorials.
nltk
and nltk book (there's a whole Github organization for it)torchtext
for NLP. New and maybe a little flaky as a result.- "Working with text data" in
sklearn
- Tutorial for creating a chat bot with deep learning and tensorflow in Python
- PyTextRank
- A tutorial by Dhananjay Bhaskar (formerly UBC, now Brown) on using nltk to analyze movie ratings.
- Git documentation
- git and bash tutorial by Patrick Walls
- Accessing a GPU on Google CoLab. Another one here.
- Kalman filters
- Latent dirichlet allocation
- Visit the sklearn documentation for several tutorials on popular machine learning methods like Naive Bayes.