Skip to content

Videos, Slides, Notebooks, and Papers about some of Important Tools in Data Science

Notifications You must be signed in to change notification settings

hhaji/Tools-in-Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Webpage of This Repository: Tools in Data Science
Data Science Center, Shahid Beheshti University


In this repository, we introduce some videos, slides, notebooks, and papers about some of 
important tools in data science and also some tools to write or share your projects. 

Index:


Command Line:

Additional Reading:

Anaconda:

Anaconda Distribution: With over 6 million users, the open source Anaconda Distribution is the fastest and easiest way to do Python and R data science and machine learning on Linux, Windows, and Mac OS X. It's the industry standard for developing, testing, and training on a single machine.

Additional Reading:

Integrated Development Environment:

Python IDEs and Code Editors (Guide) by by Jon Fincher

  • IDE: An IDE (or Integrated Development Environment) is a program dedicated to software development. As the name implies, IDEs integrate several tools specifically designed for software development. These tools usually include:
    • An editor designed to handle code (with, for example, syntax highlighting and auto-completion)
    • Build, execution, and debugging tools
    • Some form of source control
    • Most IDEs support many different programming languages and contain many more features. They can, therefore, be large and take time to download and install. You may also need advanced knowledge to use them properly.
  • Top Python IDEs For Data Science (My Recommendation):

Colaboratory (a WEB IDE):

Jupyter and IPython (a WEB IDE):

The Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. Also, IPython provides a rich architecture for interactive computing with in multiple programming languages.

Additional Reading:

Jupyter Lab (a WEB IDE):

Additional Reading:

R NoteBook (a WEB IDE):

Markdown:

Markdown is a lightweight markup language that you can use to add formatting elements to plaintext text documents. Created by John Gruber in 2004, Markdown is now one of the world’s most popular markup languages.

Additional Reading:

R Markdown:

Working with Data:

Git:

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. Git is easy to learn and has a tiny footprint with lightning fast performance.

Git Resources:

Additional Reading:

Videos:

Git Cheat Sheets:

Slides:

GitHub:

Docker:

Docker provides a simple and powerful developer experience, workflows and collaboration for creating applications.

Programming Languages:

Python:

You can learn python via SoloLearn (A great website for getting started with coding. It offers easy to follow lessons, interspersed with quizzes to help you retain what you are learning). Also, we recommend the following references:

Additional Reading:

Useful Tricks in Python:

Useful Modules in Python:

R:

Additional Reading:

Useful R Sites:

Useful R Tricks:

Machine Learning in R:

Useful Machine Learning Sites in R:

Practice Code:

If you want to solve interesting problems to practice Python or R, then we recommend to solve the following problems:

SQL:

SQL is a a domain-specific language for managing data in databases.

Python Libraries for Data Science:

Python continues to take leading positions in solving data science tasks and challenges. Kdnuggets introduced 20 libraries of Python for data science. The following table was adopted from Applied Machine Learning and Deep Learning created by Cuixian Chen. Here are five of the most important of libraries:

Python Overview [Word]


Python Tutorial [PDF] [Code]
Numpy [PDF] [Code]
User Guide [Link]
Quickstart [Link]
Reference [Link]
Practice Numpy in LabEx [Link]
Cheatsheet [Link]
Matplotlib [PDF][Code]
Example [Link]
Tutorials [Link]
Reference [Link]
Practice Matplotlib in LabEx [Link]
Cheatsheet [Link]
Pandas [Code]
10 Min to Pandas [Link]
Cookbook [Link]
Tutorials [Link]
Reference [Link]
Practice Pandas in LabEx [Link]
Cheatsheet [Link]
Seaborn: Stat data
Visulization [Link]
Example [Link]
Tutorials [Link]
Reference [Link]
Cheatsheet [Link]
Scikit Learn [Link]
Scikit Image [Link]
Scikit Tutorial #1 [Code]
Scikit Tutorial #2 [Code]
Cheatsheet [Link]

Numpy:

NumPy is the fundamental package for scientific computing with Python. Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

Additional Reading:
  • Exercises: Practice Numpy in LabEx

Pandas:

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Additional Reading:

Matplotlib:

Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.

Additional Reading:
  • Exercises: Practice Matplotlib in LabEx

Scikit-Learn:

Scikit-Learn is a simple and efficient tools for data mining and data analysis. It was built on NumPy, SciPy, and Matplotlib.

SciPy:

SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.

Additional Reading:

Probabilistic Programming in Python:

PyMC3 allows you to write down models using an intuitive syntax to describe a data generating process.

A Fascinating Guide For Machine Learning:

About

Videos, Slides, Notebooks, and Papers about some of Important Tools in Data Science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published