Skip to content

This GitHub course teaches essential Data Mining skills, including managing large datasets, interpreting relevant data, and applying flexible knowledge to gain expertise in statistic techniques, covering pre-processing, similarity metrics, basket analysis, association rules mining, recommender systems, and streaming data handling.

License

Notifications You must be signed in to change notification settings

ialexmp/Massive-Datasets-Mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Mining Course

Instructor: Teodora Sandra Buda

UPF Course Github: https://github.com/chatox/data-mining-course/

Presentation

Finding patterns in large datasets is one of the main tasks that a data scientist performs professionally. Data mining sits at the intersection of databases and statistics, and includes several steps from managing to pre-processing, cleaning, modeling, and performing inferences using data.

Data mining can be a challenging task. Data may not be formatted ideally for a purpose, or it may include noisy or missing data points. Datasets can be extremely large making even quadratic-time algorithms impractical. In many cases, the size of a dataset is unbounded and one needs to provide answers as new data elements keep arriving.

This course offers the students the possibility of learning fundamental data mining algorithms.

Associated competences

Basic competences

CB3. That the students have the ability of collecting and interpreting relevant data (normally within their study area) to issue judgements which include a reflection about relevant topics of social, scientific or ethical nature.

Transversal competences

CT3. Applying with flexibility and creativity the acquired knowledge and adapting it to new contexts and situations.

Specific competences

RA.CE7.2 Recognizing the statistic techniques applied to data mining.

RA.CE9.2 Recognizing and applying data mining techniques.

Results from learning

At the end of the course, the students would have acquired:

  • Knowledge of typical data mining pipelines.
  • Techniques for pre-processing data.
  • Knowledge of similarity metrics.
  • Techniques for fast similarity searches.
  • Knowledge of basket analysis basics.
  • Methods for association rules mining.
  • Knowledge of recommender systems basics.
  • Methods for creating recommender systems.
  • Knowledge of the streaming data model.
  • Methods for handling data streams and time series.

Main bibliography

📘 Data Mining, The Textbook (2015) by Charu Agrawal. ISBN 978-3-319-14142-8. Free Download

📒 Mining of Massive Datasets SECOND EDITION (2014) by Leskovec et al. ISBN 978-1107077232. Online materials: http://www.mmds.org/. Free Download

Additional bibliography

📙 Introduction to Data Mining SECOND EDITION (2019) by Tan et al. ISBN 978-0-13-312890-1. Online materials: https://www-users.cs.umn.edu/~kumar001/dmbook/index.php

📘 Data Mining and Machine Learning SECOND EDITION (2020) by Zaki and Meira. ISBN 978-1108473989.

📓 Data Mining Concepts and Techniques THIRD EDITION (2011) by Han et al. ISBN 978-0123814791.

About

This GitHub course teaches essential Data Mining skills, including managing large datasets, interpreting relevant data, and applying flexible knowledge to gain expertise in statistic techniques, covering pre-processing, similarity metrics, basket analysis, association rules mining, recommender systems, and streaming data handling.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published