Get your umbrellas and brace the storm, we'll be diving right into it: making pretty things from data.
Not considering data curation (long term management of data) there will be a few key skills to practice here:
- Interpreting questions
- Visually organizing data
- Using Python & common packages
- Organizing basic code into reusable blocks
- Working development environment
pip install numpy pandas scipy plotly
Datasets taken from this article
Click around a bit in this repository until you stumble upon the datasets, and a way to download them. The best practices for obtaining files from repositories are in the [git lesson].
Once you found a way to download the datasets, put them in a working directory and navigate to the folder (see [shell basics]).
In the folder, save the following snippet as starter.py
, modify and run it:
# Some conventional imports that make your
# life super easy when getting started!
import pandas as pd
import numpy as np
file = "my_dataset.csv"
column = "my_column"
dataset = pd.read_csv(file)
column_data = dataset[column]
print("My dataset column has", len(column_data), "datapoints")
You should thoroughly understand every statement in this script (and ofcourse, exactly what statement
s and expression
s are!).
If you do not, look for a nice tutorial and introduction to Python. Make sure it teaches you what statements, expressions,
variables, literals, operators, attributes and functions are.
Let's list what you have to understand at this point to continue:
- The working directory, shell and Python interpreter
- How to run Python scripts
- Variables and literals
- What is the difference between
hello
and"hello"
? - What is a
NameError: hello is undefined
? - Data structures such as lists, dictionaries, arrays (
numpy
,pandas
) - Indexing of data structures
- What is the difference between
- Statements (what do these do?)
# comments
x = y
x.y
x(y, z)
-
Use a search engine to find your answers (see [search guide])
-
If you encounter any errors, see the [error guide] to deal with them!
-
Plot with Plotly, and use the
simple_white
theme, I like it, it makes me happy and lenient 😇 -
Factor shared code into functions (see [function guide]).
-
Give your figures titles that overlap with the question:
Q:
Show a line plot with error bars (stdev) of the price vs the year built, and in the same figure, with legend, price vs year renovated.
Title:
Relationship price, year built and renovated
- Give your figures axis titles and units:
Speed (m/s)
- With multiple traces per figure, give descriptive legend names.
- Group or discriminate data by using the color, markers or other properties.
From here on out it's up to you (and Google) to find answers. Make the following exercises and send me your results!
- Show a histogram of the distribution of
floors
in the dataset. - Show a line plot of the average price vs the amount of floors.
- Show a line of the average price per floor vs the amount of floors.
- Show a line plot with error bars (stdev) of the price vs the year built, and in the same figure, with legend, price vs year renovated.