This is a tutorial about Python for Data Science.
This repo is for Python for Data Science, so you will find Python toolkit tutorials here including Numpy, Scipy, Pandas, matplotlib, IPython, Jupyter notebook, etc.
Python v2.7 or v3.6 or higher
pip
to install all tools or easy_install
You can install packages using:
pip install <pkg>
or
easy_install <pkg>
You can update packages using:
pip install -U numpy==1.9.1
to version 1.9.1
or
pip install -U numpy
to the latest version
or
easy_install --upgrade numpy==1.9.1
- We can also use Python distribution which includes various packages in-built. Anaconda, Enthought Canopy, PythonXY, WinPython, etc.
If you need a Python refresher, check out my other Python repository
conda install <package_name> # install a package
conda remove <package_name> # remove a package
conda install <pkg1> <pkg2> # install multiple packages
conda search "*beautiful*" # search for package using some word
conda create -n <env_name> [list of packages] # create new virtual environment with list of packages
conda create -n <env_name> python=2 [list of packages] # create virtual environment with python 2
conda env export > environment.yaml # create export of the environment like requirements.txt file
conda env create -f environment.yaml # create virtual environment using environment file
conda env list # list all virtual environments
conda env remove -n <env_name> # remove a virtual environment
Theory
Problem Solving Approach to Data Science problems
Data Requirements and Collection
Python Basics
Loading and Viewing Data using Pandas
SQL
Connecting to IBM DB2 in Jupyter
Data Science Libraries
Data Analysis intro using Pandas
Area Plots, Histograms and Bar charts
Pie Charts, Scatter plots and Bubble Plots
Waffle Charts, Word clouds and Regression Plots
Pandas and Matplotlib plotting basics
Web scrapping This lab uses Pandas to read CSV from movielens datasets.