Syllabus

Course Syllabus

Note: The specific topics and pacing will be adjusted to best fit the needs of the students

Unit 1: The Basics

####Introduction to Data Exploration

Describe the data mining workflow and the key traits of a successful data scientist.
Extract, format, and preprocess data using UNIX command-line tools.
Explore & visualize data using R and ggplot2.

####Introduction to Machine Learning

Explain the concepts and applications of supervised & unsupervised learning techniques.
Describe categorical and continuous feature spaces, including examples and techniques for each.
Discuss the purpose of machine learning and the interpretation of predictive modeling results.

###Unit 2: Fundamental Modeling Techniques

####K-Nearest Neighbors Classification

Describe the setting and goal of a classification task.
Minimize prediction error using training & test sets, optimize predictive performance using cross-validation.
Understand the kNN classification algorithm, its intuition and implementation.
Implement the "hello world" of machine learning (kNN classification of iris dataset).

####Naive Bayes Classification

Outline the basic principles of probability, including conditional probability and Bayes theorem.
Describe inference in the Bayesian setting, including the prior and posterior distributions and the likelihood function.
Understand the naive Bayes classifier and its assumptions.
Implement a spam filter using the naive Bayes technique.

####Regression & Regularization

Explain the concepts of regression models, including their assumptions and applications.
Discuss the motivation for regularization techniques and their use.
Implement a regularized fit.

####Logistic Regression

Describe the applications of logistic regression to classification problems and probability estimation.
Introduce the concepts underlying logistic regression, including its relation to other regression models.
Predict the probability of a user action on a website using logistic regression.

###Unit 3: Further Modeling Techniques

####K-Means Clustering with Python

Introduce Python and its usefulness for data analysis tasks.
Experiment with scikit-learn, a general-purpose machine learning library for Python.

####Decision Trees & Random Forests

Describe the use and construction of decision trees for classification tasks.
Create a random forest model for ensemble classification.

####Dimensionality Reduction

Explain the practical and conceptual difficulties in working with very high-dimensional data.
Understand the application and use of dimensionality reduction techniques.
Draw inferences from high-dimensional datasets using principal components analysis.

####Recommendation Systems

Explain the use of recommendation systems, and discuss several familiar examples.
Understand the underlying concepts, including collaborative & content-based filtering.
Implement a recommendation system.

###Unit 4: Other Tools

Database Technologies

Introduce concepts and use of relational databases, alternative database technologies such as NoSQL, and popular examples of each.

####Map-Reduce

Describe the concepts of parallel computing and applications to problems in big data.
Introduce the map-reduce framework.
Implement and explore examples of map-reduce tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Syllabus

Course Syllabus

Unit 1: The Basics

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally