Understand the difference between a Python script and a Jupyter
+notebook.
+
Create Markdown cells in a notebook.
+
Create and run Python cells in a notebook.
+
+
+
+
+
+
To run Python, we are going to use Jupyter Notebooks via JupyterLab for
+the remainder of this workshop. Jupyter notebooks are common in data
+science and visualization and serve as a convenient common-denominator
+experience for running Python code interactively where we can easily
+view and share the results of our Python code.
+
There are other ways of editing, managing, and running code. Software
+developers often use an integrated development environment (IDE) like PyCharm or Visual Studio Code, or text
+editors like Vim or Emacs, to create and edit their Python programs.
+After editing and saving your Python programs you can execute those
+programs within the IDE itself or directly on the command line. In
+contrast, Jupyter notebooks let us execute and view the results of our
+Python code immediately within the notebook.
+
JupyterLab has several other handy features:
+
You can easily type, edit, and copy and paste blocks of code.
+
Tab complete allows you to easily access the names of things you are
+using and learn more about them.
+
It allows you to annotate your code with links, different sized
+text, bullets, etc. to make it more accessible to you and your
+collaborators.
+
It allows you to display figures next to the code that produces them
+to tell a complete story of the analysis.
+
Each notebook contains one or more cells that contain code, text, or
+images.
+
Getting Started with JupyterLab
+
JupyterLab is an application server with a web user interface from Project Jupyter that enables one to work
+with documents and activities such as Jupyter notebooks, text editors,
+terminals, and even custom components in a flexible, integrated, and
+extensible manner. JupyterLab requires a reasonably up-to-date browser
+(ideally a current version of Chrome, Safari, or Firefox); Internet
+Explorer versions 9 and below are not supported.
+
JupyterLab is included as part of the Anaconda Python distribution.
+If you have not already installed the Anaconda Python distribution, see
+the setup instructions for installation
+instructions.
+
In this lesson we will run JupyterLab locally on our own machines so
+it will not require an internet connection besides the initial
+connection to download and install Anaconda and JupyterLab
+
Start the JupyterLab server on your machine
+
Use a web browser to open a special localhost URL that connects to
+your JupyterLab server
+
The JupyterLab server does the work and the web browser renders the
+result
+
Type code into the browser and see the results after your JupyterLab
+server has finished executing your code
Experienced users of Jupyter notebooks interested in a more detailed
+discussion of the similarities and differences between the JupyterLab
+and Jupyter notebook user interfaces can find more information in the JupyterLab
+user interface documentation.
+
+
+
+
Starting JupyterLab
+
You can start the JupyterLab server through the command line or
+through an application called Anaconda Navigator. Anaconda
+Navigator is included as part of the Anaconda Python distribution.
+
+
macOS - Command Line
+
To start the JupyterLab server you will need to access the command
+line through the Terminal. There are two ways to open Terminal on
+Mac.
+
In your Applications folder, open Utilities and double-click on
+Terminal
+
Press Command + spacebar to launch Spotlight.
+Type Terminal and then double-click the search result or
+hit Enter
+
+
After you have launched Terminal, type the command to launch the
+JupyterLab server.
+
+
BASH
+
+
$ jupyter lab
+
+
+
+
Windows Users - Command Line
+
To start the JupyterLab server you will need to access the Anaconda
+Prompt.
+
Press Windows Logo Key and search for
+Anaconda Prompt, click the result or press enter.
+
After you have launched the Anaconda Prompt, type the command:
+
+
BASH
+
+
$ jupyter lab
+
+
+
+
Anaconda Navigator
+
To start a JupyterLab server from Anaconda Navigator you must first
+start
+Anaconda Navigator (click for detailed instructions on macOS, Windows,
+and Linux). You can search for Anaconda Navigator via Spotlight on
+macOS (Command + spacebar), the Windows search
+function (Windows Logo Key) or opening a terminal shell and
+executing the anaconda-navigator executable from the
+command line.
+
After you have launched Anaconda Navigator, click the
+Launch button under JupyterLab. You may need to scroll down
+to find it.
+
Here is a screenshot of an Anaconda Navigator page similar to the one
+that should open on either macOS or Windows.
+
+
+
And here is a screenshot of a JupyterLab landing page that should be
+similar to the one that opens in your default web browser after starting
+the JupyterLab server on either macOS or Windows.
+
+
+
+
The JupyterLab Interface
+
JupyterLab has many features found in traditional integrated
+development environments (IDEs) but is focused on providing flexible
+building blocks for interactive, exploratory computing.
+
The JupyterLab
+Interface consists of the Menu Bar, a collapsable Left Side Bar, and
+the Main Work Area which contains tabs of documents and activities.
+
+
Menu Bar
+
The Menu Bar at the top of JupyterLab has the top-level menus that
+expose various actions available in JupyterLab along with their keyboard
+shortcuts (where applicable). The following menus are included by
+default.
+
+File: Actions related to files and directories such
+as New, Open, Close, Save, etc. The
+File menu also includes the Shut Down action used to
+shutdown the JupyterLab server.
+
+Edit: Actions related to editing documents and
+other activities such as Undo, Cut, Copy,
+Paste, etc.
+
+View: Actions that alter the appearance of
+JupyterLab.
+
+Run: Actions for running code in different
+activities such as notebooks and code consoles (discussed below).
+
+Kernel: Actions for managing kernels. Kernels in
+Jupyter will be explained in more detail below.
+
+Tabs: A list of the open documents and activities
+in the main work area.
+
+Settings: Common JupyterLab settings can be
+configured using this menu. There is also an Advanced Settings
+Editor option in the dropdown menu that provides more fine-grained
+control of JupyterLab settings and configuration options.
+
+Help: A list of JupyterLab and kernel help
+links.
+
+
+
+
+
+
Kernels
+
+
The JupyterLab docs
+define kernels as “separate processes started by the server that runs
+your code in different programming languages and environments.” When we
+open a Jupyter Notebook, that starts a kernel - a process - that is
+going to run the code. In this lesson, we’ll be using the Jupyter
+ipython kernel which lets us run Python 3 code interactively.
+
Using other Jupyter kernels
+for other programming languages would let us write and execute code
+in other programming languages in the same JupyterLab interface, like R,
+Java, Julia, Ruby, JavaScript, Fortran, etc.
+
+
+
+
A screenshot of the default Menu Bar is provided below.
+
+
+
+
+
Left Sidebar
+
The left sidebar contains a number of commonly used tabs, such as a
+file browser (showing the contents of the directory where the JupyterLab
+server was launched), a list of running kernels and terminals, the
+command palette, and a list of open tabs in the main work area. A
+screenshot of the default Left Side Bar is provided below.
+
+
+
The left sidebar can be collapsed or expanded by selecting “Show Left
+Sidebar” in the View menu or by clicking on the active sidebar tab.
+
+
+
Main Work Area
+
The main work area in JupyterLab enables you to arrange documents
+(notebooks, text files, etc.) and other activities (terminals, code
+consoles, etc.) into panels of tabs that can be resized or subdivided. A
+screenshot of the default Main Work Area is provided below.
+
If you do not see the Launcher tab, click the blue plus sign under
+the “File” and “Edit” menus and it will appear.
+
+
+
Drag a tab to the center of a tab panel to move the tab to the panel.
+Subdivide a tab panel by dragging a tab to the left, right, top, or
+bottom of the panel. The work area has a single current activity. The
+tab for the current activity is marked with a colored top border (blue
+by default).
+
+
Creating a Python script
+
To start writing a new Python program click the Text File icon under
+the Other header in the Launcher tab of the Main Work Area.
+
You can also create a new plain text file by selecting the New
+-> Text File from the File menu in the Menu Bar.
+
+
To convert this plain text file to a Python program, select the
+Save File As action from the File menu in the Menu Bar
+and give your new text file a name that ends with the .py
+extension.
+
The .py extension lets everyone (including the
+operating system) know that this text file is a Python program.
+
This is convention, not a requirement.
+
+
Creating a Jupyter Notebook
+
To open a new notebook click the Python 3 icon under the
+Notebook header in the Launcher tab in the main work area. You
+can also create a new notebook by selecting New -> Notebook
+from the File menu in the Menu Bar.
+
Additional notes on Jupyter notebooks.
+
Notebook files have the extension .ipynb to distinguish
+them from plain-text Python programs.
+
Notebooks can be exported as Python scripts that can be run from the
+command line.
+
Below is a screenshot of a Jupyter notebook running inside
+JupyterLab. If you are interested in more details, then see the official
+notebook documentation.
+
+
+
+
+
+
+
+
How It’s Stored
+
+
The notebook file is stored in a format called JSON.
+
Just like a webpage, what’s saved looks different from what you see
+in your browser.
+
But this format allows Jupyter to mix source code, text, and images,
+all in one file.
+
+
+
+
+
+
+
+
+
Arranging Documents into Panels of Tabs
+
+
In the JupyterLab Main Work Area you can arrange documents into
+panels of tabs. Here is an example from the official
+documentation.
+
+
+
First, create a text file, Python console, and terminal window and
+arrange them into three panels in the main work area. Next, create a
+notebook, terminal window, and text file and arrange them into three
+panels in the main work area. Finally, create your own combination of
+panels and tabs. What combination of panels and tabs do you think will
+be most useful for your workflow?
+
+
+
+
+
+
+
+
+
After creating the necessary tabs, you can drag one of the tabs to
+the center of a panel to move the tab to the panel; next you can
+subdivide a tab panel by dragging a tab to the left, right, top, or
+bottom of the panel.
+
+
+
+
+
+
+
+
+
+
Code vs. Text
+
+
Jupyter mixes code and text in different types of blocks, called
+cells. We often use the term “code” to mean “the source code of software
+written in a language such as Python”. A “code cell” in a Notebook is a
+cell that contains software; a “text cell” is one that contains ordinary
+prose written for human beings.
+
+
+
+
The Notebook has Command and Edit modes.
+
If you press Esc and Return alternately, the
+outer border of your code cell will change from gray to blue.
+
These are the Command (gray) and
+Edit (blue) modes of your notebook.
+
Command mode allows you to edit notebook-level features, and Edit
+mode changes the content of cells.
+
When in Command mode (esc/gray),
+
The b key will make a new cell below the currently
+selected cell.
+
The a key will make one above.
+
The x key will delete the current cell.
+
The z key will undo your last cell operation (which could
+be a deletion, creation, etc).
+
+
All actions can be done using the menus, but there are lots of
+keyboard shortcuts to speed things up.
+
+
+
+
+
+
Command Vs. Edit
+
+
In the Jupyter notebook page are you currently in Command or Edit
+mode?
+Switch between the modes. Use the shortcuts to generate a new cell. Use
+the shortcuts to delete a cell. Use the shortcuts to undo the last cell
+operation you performed.
+
+
+
+
+
+
+
+
+
Command mode has a grey border and Edit mode has a blue border. Use
+Esc and Return to switch between modes. You need
+to be in Command mode (Press Esc if your cell is blue). Type
+b or a. You need to be in Command mode (Press
+Esc if your cell is blue). Type x. You need to be
+in Command mode (Press Esc if your cell is blue). Type
+z.
+
+
+
+
+
+
Use the keyboard and mouse to select and edit cells.
+
Pressing the Return key turns the border blue and engages
+Edit mode, which allows you to type within the cell.
+
Because we want to be able to write many lines of code in a single
+cell, pressing the Return key when in Edit mode (blue) moves
+the cursor to the next line in the cell just like in a text editor.
+
We need some other way to tell the Notebook we want to run what’s in
+the cell.
+
Pressing Shift+Return together will execute
+the contents of the cell.
+
Notice that the Return and Shift keys on the
+right of the keyboard are right next to each other.
+
+
+
The Notebook will turn Markdown into pretty-printed
+documentation.
Create a nested list in a Markdown cell in a notebook that looks like
+this:
+
Get funding.
+
Do work.
+
Design experiment.
+
Collect data.
+
Analyze.
+
Write up.
+
Publish.
+
+
+
+
+
+
+
+
+
This challenge integrates both the numbered list and bullet list.
+Note that the bullet list is indented 2 spaces so that it is inline with
+the items of the numbered list.
What is displayed when a Python cell in a notebook that contains
+several calculations is executed? For example, what happens when this
+cell is executed?
+
+
PYTHON
+
+
7*3
+2+1
+
+
+
+
+
+
+
+
+
+
Python returns the output of the last calculation.
+
+
PYTHON
+
+
3
+
+
+
+
+
+
+
+
+
+
+
Change an Existing Cell from Code to Markdown
+
+
What happens if you write some Python in a code cell and then you
+switch it to a Markdown cell? For example, put the following in a code
+cell:
+
+
PYTHON
+
+
x =6*7+12
+print(x)
+
+
And then run it with Shift+Return to be sure
+that it works as a code cell. Now go back to the cell and use
+Esc then m to switch the cell to Markdown and
+“run” it with Shift+Return. What happened and how
+might this be useful?
+
+
+
+
+
+
+
+
+
The Python code gets treated like Markdown text. The lines appear as
+if they are part of one contiguous paragraph. This could be useful to
+temporarily turn on and off cells in notebooks that get used for
+multiple purposes.
+
+
PYTHON
+
+
x =6*7+12print(x)
+
+
+
+
+
+
+
+
+
+
+
Equations
+
+
Standard Markdown (such as we’re using for these notes) won’t render
+equations, but the Notebook will. Create a new Markdown cell and enter
+the following:
+
$\sum_{i=1}^{N} 2^{-i} \approx 1$
+
(It’s probably easier to copy and paste.) What does it display? What
+do you think the underscore, _, circumflex, ^,
+and dollar sign, $, do?
+
+
+
+
+
+
+
+
+
The notebook shows the equation as it would be rendered from LaTeX
+equation syntax. The dollar sign, $, is used to tell
+Markdown that the text in between is a LaTeX equation. If you’re not
+familiar with LaTeX, underscore, _, is used for subscripts
+and circumflex, ^, is used for superscripts. A pair of
+curly braces, { and }, is used to group text
+together so that the statement i=1 becomes the subscript
+and N becomes the superscript. Similarly, -i
+is in curly braces to make the whole statement the superscript for
+2. \sum and \approx are LaTeX
+commands for “sum over” and “approximate” symbols.
+
+
+
+
+
+
Closing JupyterLab
+
From the Menu Bar select the “File” menu and then choose “Shut Down”
+at the bottom of the dropdown menu. You will be prompted to confirm that
+you wish to shutdown the JupyterLab server (don’t forget to save your
+work!). Click “Shut Down” to shutdown the JupyterLab server.
+
To restart the JupyterLab server you will need to re-run the
+following command from a shell.
+
$ jupyter lab
+
+
+
+
+
+
Closing JupyterLab
+
+
Practice closing and restarting the JupyterLab server.
+
+
+
+
+
+
+
+
+
Key Points
+
+
Python scripts are plain text files.
+
Use the Jupyter Notebook for editing and running Python.
+
The Notebook has Command and Edit modes.
+
Use the keyboard and mouse to select and edit cells.
+
The Notebook will turn Markdown into pretty-printed
+documentation.
+
+
diff --git a/02-variables.html b/02-variables.html
new file mode 100644
index 000000000..53b78721a
--- /dev/null
+++ b/02-variables.html
@@ -0,0 +1,1080 @@
+
+Plotting and Programming in Python: Variables and Assignment
+ Skip to main content
+
Write programs that assign scalar values to variables and perform
+calculations with those values.
+
Correctly trace value changes in programs that use scalar
+assignment.
+
+
+
+
+
+
Use variables to store values.
+
Variables are names for values.
+
+
Variable names
+
can only contain letters, digits, and underscore
+_ (typically used to separate words in long variable
+names)
+
cannot start with a digit
+
are case sensitive (age, Age and AGE are three
+different variables)
+
+
The name should also be meaningful so you or another programmer
+know what it is
+
Variable names that start with underscores like
+__alistairs_real_age have a special meaning so we won’t do
+that until we understand the convention.
+
In Python the = symbol assigns the value on the
+right to the name on the left.
+
The variable is created when a value is assigned to it.
+
+
Here, Python assigns an age to a variable age and a
+name in quotes to a variable first_name.
+
+
PYTHON
+
+
age =42
+first_name ='Ahmed'
+
+
+
Use print to display values.
+
Python has a built-in function called print that prints
+things as text.
+
Call the function (i.e., tell Python to run it) by using its
+name.
+
Provide values to the function (i.e., the things to print) in
+parentheses.
+
To add a string to the printout, wrap the string in single or double
+quotes.
+
The values passed to the function are called
+arguments
+
+
+
PYTHON
+
+
print(first_name, 'is', age, 'years old')
+
+
+
OUTPUT
+
+
Ahmed is 42 years old
+
+
+print automatically puts a single space between items
+to separate them.
+
And wraps around to a new line at the end.
+
Variables must be created before they are used.
+
If a variable doesn’t exist yet, or if the name has been
+mis-spelled, Python reports an error. (Unlike some languages, which
+“guess” a default value.)
+
+
PYTHON
+
+
print(last_name)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-1-c1fbb4e96102> in <module>()
+----> 1 print(last_name)
+
+NameError: name 'last_name' is not defined
+
+
The last line of an error message is usually the most
+informative.
Be aware that it is the order of execution of cells that is
+important in a Jupyter notebook, not the order in which they appear.
+Python will remember all the code that was run previously,
+including any variables you have defined, irrespective of the order in
+the notebook. Therefore if you define variables lower down the notebook
+and then (re)run cells further up, those defined further down will still
+be present. As an example, create two cells with the following content,
+in this order:
+
+
PYTHON
+
+
print(myval)
+
+
+
PYTHON
+
+
myval =1
+
+
If you execute this in order, the first cell will give an error.
+However, if you run the first cell after the second cell it
+will print out 1. To prevent confusion, it can be helpful
+to use the Kernel -> Restart & Run All
+option which clears the interpreter and runs everything from a clean
+slate going top to bottom.
+
+
+
+
Variables can be used in calculations.
+
We can use variables in calculations just as if they were values.
+
Remember, we assigned the value 42 to age
+a few lines ago.
+
+
+
PYTHON
+
+
age = age +3
+print('Age in three years:', age)
+
+
+
OUTPUT
+
+
Age in three years: 45
+
+
Use an index to get a single character from a string.
+
The characters (individual letters, numbers, and so on) in a string
+are ordered. For example, the string 'AB' is not the same
+as 'BA'. Because of this ordering, we can treat the string
+as a list of characters.
+
Each position in the string (first, second, etc.) is given a number.
+This number is called an index or sometimes a
+subscript.
+
Indices are numbered from 0.
+
Use the position’s index in square brackets to get the character at
+that position.
+
+
PYTHON
+
+
atom_name ='helium'
+print(atom_name[0])
+
+
+
OUTPUT
+
+
h
+
+
Use a slice to get a substring.
+
A part of a string is called a substring. A
+substring can be as short as a single character.
+
An item in a list is called an element. Whenever we treat a string
+as if it were a list, the string’s elements are its individual
+characters.
+
A slice is a part of a string (or, more generally, a part of any
+list-like thing).
+
We take a slice with the notation [start:stop], where
+start is the integer index of the first element we want and
+stop is the integer index of the element just
+after the last element we want.
+
The difference between stop and start is
+the slice’s length.
+
Taking a slice does not change the contents of the original string.
+Instead, taking a slice returns a copy of part of the original
+string.
+
+
PYTHON
+
+
atom_name ='sodium'
+print(atom_name[0:3])
+
+
+
OUTPUT
+
+
sod
+
+
Use the built-in function len to find the length of a
+string.
+
+
PYTHON
+
+
print(len('helium'))
+
+
+
OUTPUT
+
+
6
+
+
Nested functions are evaluated from the inside out, like in
+mathematics.
+
Python is case-sensitive.
+
Python thinks that upper- and lower-case letters are different, so
+Name and name are different variables.
+
There are conventions for using upper-case letters at the start of
+variable names so we will use lower-case letters for now.
+
Use meaningful variable names.
+
Python doesn’t care what you call variables as long as they obey the
+rules (alphanumeric characters and the underscore).
Use meaningful variable names to help other people understand what
+the program does.
+
The most important “other person” is your future self.
+
+
+
+
+
+
Swapping Values
+
+
Fill the table showing the values of the variables in this program
+after each statement is executed.
+
+
PYTHON
+
+
# Command # Value of x # Value of y # Value of swap #
+x =1.0# # # #
+y =3.0# # # #
+swap = x # # # #
+x = y # # # #
+y = swap # # # #
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
# Command # Value of x # Value of y # Value of swap #
+x=1.0# 1.0 # not defined # not defined #
+y=3.0# 1.0 # 3.0 # not defined #
+swap=x# 1.0 # 3.0 # 1.0 #
+x=y# 3.0 # 3.0 # 1.0 #
+y=swap# 3.0 # 1.0 # 1.0 #
+
+
These three lines exchange the values in x and
+y using the swap variable for temporary
+storage. This is a fairly common programming idiom.
+
+
+
+
+
+
+
+
+
+
Predicting Values
+
+
What is the final value of position in the program
+below? (Try to predict the value without running the program, then check
+your prediction.)
The initial variable is assigned the value
+'left'. In the second line, the position
+variable also receives the string value 'left'. In third
+line, the initial variable is given the value
+'right', but the position variable retains its
+string value of 'left'.
+
+
+
+
+
+
+
+
+
+
Challenge
+
+
If you assign a = 123, what happens if you try to get
+the second digit of a via a[1]?
+
+
+
+
+
+
+
+
+
Numbers are not strings or sequences and Python will raise an error
+if you try to perform an index operation on a number. In the next lesson on types and type
+conversion we will learn more about types and how to convert between
+different types. If you want the Nth digit of a number you can convert
+it into a string using the str built-in function and then
+perform an index operation on that string.
+
+
PYTHON
+
+
a =123
+print(a[1])
+
+
+
ERROR
+
+
TypeError: 'int' object is not subscriptable
+
+
+
PYTHON
+
+
a =str(123)
+print(a[1])
+
+
+
OUTPUT
+
+
2
+
+
+
+
+
+
+
+
+
+
+
Choosing a Name
+
+
Which is a better variable name, m, min, or
+minutes? Why? Hint: think about which code you would rather
+inherit from someone who is leaving the lab:
+
ts = m * 60 + s
+
tot_sec = min * 60 + sec
+
total_seconds = minutes * 60 + seconds
+
+
+
+
+
+
+
+
+
minutes is better because min might mean
+something like “minimum” (and actually is an existing built-in function
+in Python that we will cover later).
+species_name[11:] (without a value after the
+colon)
+
+species_name[:4] (without a value before the
+colon)
+
+species_name[:] (just a colon)
+
species_name[11:-3]
+
species_name[-5:-3]
+
What happens when you choose a stop value which is out
+of range? (i.e., try species_name[0:20] or
+species_name[:103])
+
+
+
+
+
+
+
+
+
+species_name[2:8] returns the substring
+'acia b'
+
+
+species_name[11:] returns the substring
+'folia', from position 11 until the end
+
+species_name[:4] returns the substring
+'Acac', from the start up to but not including position
+4
+
+species_name[:] returns the entire string
+'Acacia buxifolia'
+
+
+species_name[11:-3] returns the substring
+'fo', from the 11th position to the third last
+position
+
+species_name[-5:-3] also returns the substring
+'fo', from the fifth last position to the third last
+
If a part of the slice is out of range, the operation does not fail.
+species_name[0:20] gives the same result as
+species_name[0:], and species_name[:103] gives
+the same result as species_name[:]
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
Use variables to store values.
+
Use print to display values.
+
Variables persist between cells.
+
Variables must be created before they are used.
+
Variables can be used in calculations.
+
Use an index to get a single character from a string.
+
Use a slice to get a substring.
+
Use the built-in function len to find the length of a
+string.
+
+
diff --git a/03-types-conversion.html b/03-types-conversion.html
new file mode 100644
index 000000000..b90832c12
--- /dev/null
+++ b/03-types-conversion.html
@@ -0,0 +1,1158 @@
+
+Plotting and Programming in Python: Data Types and Type Conversion
+ Skip to main content
+
Explain key differences between integers and floating point
+numbers.
+
Explain key differences between numbers and character strings.
+
Use built-in functions to convert between integers, floating point
+numbers, and strings.
+
+
+
+
+
+
Every value has a type.
+
Every value in a program has a specific type.
+
Integer (int): represents positive or negative whole
+numbers like 3 or -512.
+
Floating point number (float): represents real numbers
+like 3.14159 or -2.5.
+
Character string (usually called “string”, str): text.
+
Written in either single quotes or double quotes (as long as they
+match).
+
The quote marks aren’t printed when the string is displayed.
+
+
Use the built-in function type to find the type of a
+value.
+
Use the built-in function type to find out what type a
+value has.
+
Works on variables as well.
+
But remember: the value has the type — the
+variable is just a label.
+
+
+
PYTHON
+
+
print(type(52))
+
+
+
OUTPUT
+
+
<class 'int'>
+
+
+
PYTHON
+
+
fitness ='average'
+print(type(fitness))
+
+
+
OUTPUT
+
+
<class 'str'>
+
+
Types control what operations (or methods) can be performed on a
+given value.
+
A value’s type determines what the program can do to it.
+
+
PYTHON
+
+
print(5-3)
+
+
+
OUTPUT
+
+
2
+
+
+
PYTHON
+
+
print('hello'-'h')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-2-67f5626a1e07> in <module>()
+----> 1 print('hello' - 'h')
+
+TypeError: unsupported operand type(s) for -: 'str' and 'str'
+
+
You can use the “+” and “*” operators on strings.
+
“Adding” character strings concatenates them.
+
+
PYTHON
+
+
full_name ='Ahmed'+' '+'Walsh'
+print(full_name)
+
+
+
OUTPUT
+
+
Ahmed Walsh
+
+
Multiplying a character string by an integer N creates a
+new string that consists of that character string repeated N
+times.
+
Since multiplication is repeated addition.
+
+
+
PYTHON
+
+
separator ='='*10
+print(separator)
+
+
+
OUTPUT
+
+
==========
+
+
Strings have a length (but numbers don’t).
+
The built-in function len counts the number of
+characters in a string.
+
+
PYTHON
+
+
print(len(full_name))
+
+
+
OUTPUT
+
+
11
+
+
But numbers don’t have a length (not even zero).
+
+
PYTHON
+
+
print(len(52))
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-3-f769e8e8097d> in <module>()
+----> 1 print(len(52))
+
+TypeError: object of type 'int' has no len()
+
+
Must convert numbers to strings or vice versa when operating on
+them.
+
Cannot add numbers and strings.
+
+
PYTHON
+
+
print(1+'2')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-4-fe4f54a023c6> in <module>()
+----> 1 print(1 + '2')
+
+TypeError: unsupported operand type(s) for +: 'int' and 'str'
+
+
Not allowed because it’s ambiguous: should 1 + '2' be
+3 or '12'?
+
Some types can be converted to other types by using the type name as
+a function.
+
+
PYTHON
+
+
print(1+int('2'))
+print(str(1) +'2')
+
+
+
OUTPUT
+
+
3
+12
+
+
Can mix integers and floats freely in operations.
+
Integers and floating-point numbers can be mixed in arithmetic.
+
Python 3 automatically converts integers to floats as needed.
The computer reads the value of variable_one when doing
+the multiplication, creates a new value, and assigns it to
+variable_two.
+
Afterwards, the value of variable_two is set to the new
+value and not dependent on variable_one so its
+value does not automatically change when variable_one
+changes.
+
+
+
+
+
+
Fractions
+
+
What type of value is 3.4? How can you find out?
+
+
+
+
+
+
+
+
+
It is a floating-point number (often abbreviated “float”). It is
+possible to find out by using the built-in function
+type().
+
+
PYTHON
+
+
print(type(3.4))
+
+
+
OUTPUT
+
+
<class 'float'>
+
+
+
+
+
+
+
+
+
+
+
Automatic Type Conversion
+
+
What type of value is 3.25 + 4?
+
+
+
+
+
+
+
+
+
It is a float: integers are automatically converted to floats as
+necessary.
+
+
PYTHON
+
+
result =3.25+4
+print(result, 'is', type(result))
+
+
+
OUTPUT
+
+
7.25 is <class 'float'>
+
+
+
+
+
+
+
+
+
+
+
Choose a Type
+
+
What type of value (integer, floating point number, or character
+string) would you use to represent each of the following? Try to come up
+with more than one good answer for each problem. For example, in # 1,
+when would counting days with a floating point variable make more sense
+than using an integer?
+
Number of days since the start of the year.
+
Time elapsed from the start of the year until now in days.
+
Serial number of a piece of lab equipment.
+
A lab specimen’s age
+
Current population of a city.
+
Average population of a city over time.
+
+
+
+
+
+
+
+
+
The answers to the questions are:
+
Integer, since the number of days would lie between 1 and 365.
+
Floating point, since fractional days are required
+
Character string if serial number contains letters and numbers,
+otherwise integer if the serial number consists only of numerals
+
This will vary! How do you define a specimen’s age? whole days since
+collection (integer)? date and time (string)?
+
Choose floating point to represent population as large aggregates
+(eg millions), or integer to represent population in units of
+individuals.
+
Floating point number, since an average is likely to have a
+fractional part.
+
+
+
+
+
+
+
+
+
+
Division Types
+
+
In Python 3, the // operator performs integer
+(whole-number) floor division, the / operator performs
+floating-point division, and the % (or modulo)
+operator calculates and returns the remainder from integer division:
If num_subjects is the number of subjects taking part in
+a study, and num_per_survey is the number that can take
+part in a single survey, write an expression that calculates the number
+of surveys needed to reach everyone once.
+
+
+
+
+
+
+
+
+
We want the minimum number of surveys that reaches everyone once,
+which is the rounded up value of
+num_subjects/ num_per_survey. This is equivalent to
+performing a floor division with // and adding 1. Before
+the division we need to subtract 1 from the number of subjects to deal
+with the case where num_subjects is evenly divisible by
+num_per_survey.
Where reasonable, float() will convert a string to a
+floating point number, and int() will convert a floating
+point number to an integer:
+
+
PYTHON
+
+
print("string to float:", float("3.4"))
+print("float to int:", int(3.4))
+
+
+
OUTPUT
+
+
string to float: 3.4
+float to int: 3
+
+
If the conversion doesn’t make sense, however, an error message will
+occur.
+
+
PYTHON
+
+
print("string to float:", float("Hello world!"))
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-5-df3b790bf0a2> in <module>
+----> 1 print("string to float:", float("Hello world!"))
+
+ValueError: could not convert string to float: 'Hello world!'
+
+
Given this information, what do you expect the following program to
+do?
+
What does it actually do?
+
Why do you think it does that?
+
+
PYTHON
+
+
print("fractional string to int:", int("3.4"))
+
+
+
+
+
+
+
+
+
+
What do you expect this program to do? It would not be so
+unreasonable to expect the Python 3 int command to convert
+the string “3.4” to 3.4 and an additional type conversion to 3. After
+all, Python 3 performs a lot of other magic - isn’t that part of its
+charm?
+
+
PYTHON
+
+
int("3.4")
+
+
+
OUTPUT
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-2-ec6729dfccdc> in <module>
+----> 1 int("3.4")
+ValueError: invalid literal for int() with base 10: '3.4'
+
+
However, Python 3 throws an error. Why? To be consistent, possibly.
+If you ask Python to perform two consecutive typecasts, you must convert
+it explicitly in code.
+
+
PYTHON
+
+
int(float("3.4"))
+
+
+
OUTPUT
+
+
3
+
+
+
+
+
+
+
+
+
+
+
Arithmetic with Different Types
+
+
Which of the following will return the floating point number
+2.0? Note: there may be more than one right answer.
+
+
PYTHON
+
+
first =1.0
+second ="1"
+third ="1.1"
+
+
first + float(second)
+
float(second) + float(third)
+
first + int(third)
+
first + int(float(third))
+
int(first) + int(float(third))
+
2.0 * second
+
+
+
+
+
+
+
+
+
Answer: 1 and 4
+
+
+
+
+
+
+
+
+
+
Complex Numbers
+
+
Python provides complex numbers, which are written as
+1.0+2.0j. If val is a complex number, its real
+and imaginary parts can be accessed using dot notation as
+val.real and val.imag.
Why do you think Python uses j instead of
+i for the imaginary part?
+
What do you expect 1 + 2j + 3 to produce?
+
What do you expect 4j to be? What about
+4 j or 4 + j?
+
+
+
+
+
+
+
+
+
Standard mathematics treatments typically use i to
+denote an imaginary number. However, from media reports it was an early
+convention established from electrical engineering that now presents a
+technically expensive area to change. Stack
+Overflow provides additional explanation and discussion.
+
+
(4+2j)
+
+4j and Syntax Error: invalid syntax. In
+the latter cases, j is considered a variable and the
+statement depends on if j is defined and if so, its
+assigned value.
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
Every value has a type.
+
Use the built-in function type to find the type of a
+value.
+
Types control what operations can be done on values.
+
Strings can be added and multiplied.
+
Strings have a length (but numbers don’t).
+
Must convert numbers to strings or vice versa when operating on
+them.
+
Can mix integers and floats freely in operations.
+
Variables only change value when something is assigned to them.
+
+
diff --git a/04-built-in.html b/04-built-in.html
new file mode 100644
index 000000000..48fcc3e09
--- /dev/null
+++ b/04-built-in.html
@@ -0,0 +1,1062 @@
+
+Plotting and Programming in Python: Built-in Functions and Help
+ Skip to main content
+
Use help to display documentation for built-in functions.
+
Correctly describe situations in which SyntaxError and NameError
+occur.
+
+
+
+
+
+
Use comments to add documentation to programs.
+
+
PYTHON
+
+
# This sentence isn't executed by Python.
+adjustment =0.5# Neither is this - anything after '#' is ignored.
+
+
A function may take zero or more arguments.
+
We have seen some functions already — now let’s take a closer
+look.
+
An argument is a value passed into a function.
+
+len takes exactly one.
+
+int, str, and float create a
+new value from an existing one.
+
+print takes zero or more.
+
+print with no arguments prints a blank line.
+
Must always use parentheses, even if they’re empty, so that Python
+knows a function is being called.
+
+
+
PYTHON
+
+
print('before')
+print()
+print('after')
+
+
+
OUTPUT
+
+
before
+
+after
+
+
Every function returns something.
+
Every function call produces some result.
+
If the function doesn’t have a useful result to return, it usually
+returns the special value None. None is a
+Python object that stands in anytime there is no value.
+
+
PYTHON
+
+
result =print('example')
+print('result of print is', result)
+
+
+
OUTPUT
+
+
example
+result of print is None
+
+
Commonly-used built-in functions include max,
+min, and round.
+
Use max to find the largest value of one or more
+values.
+
Use min to find the smallest.
+
Both work on character strings as well as numbers.
+
“Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.
+
+
+
PYTHON
+
+
print(max(1, 2, 3))
+print(min('a', 'A', '0'))
+
+
+
OUTPUT
+
+
3
+0
+
+
Functions may only work for certain (combinations of)
+arguments.
+
+max and min must be given at least one
+argument.
+
“Largest of the empty set” is a meaningless question.
+
+
And they must be given things that can meaningfully be
+compared.
+
+
PYTHON
+
+
print(max(1, 'a'))
+
+
+
ERROR
+
+
TypeError Traceback (most recent call last)
+<ipython-input-52-3f049acf3762> in <module>
+----> 1 print(max(1, 'a'))
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
Functions may have default values for some arguments.
+
+round will round off a floating-point number.
+
By default, rounds to zero decimal places.
+
+
PYTHON
+
+
round(3.712)
+
+
+
OUTPUT
+
+
4
+
+
We can specify the number of decimal places we want.
+
+
PYTHON
+
+
round(3.712, 1)
+
+
+
OUTPUT
+
+
3.7
+
+
Functions attached to objects are called methods
+
Functions take another form that will be common in the pandas
+episodes.
+
Methods have parentheses like functions, but come after the
+variable.
+
Some methods are used for internal Python operations, and are marked
+with double underlines.
+
+
PYTHON
+
+
my_string ='Hello world!'# creation of a string object
+
+print(len(my_string)) # the len function takes a string as an argument and returns the length of the string
+
+print(my_string.swapcase()) # calling the swapcase method on the my_string object
+
+print(my_string.__len__()) # calling the internal __len__ method on the my_string object, used by len(my_string)
+
+
+
OUTPUT
+
+
12
+hELLO WORLD!
+12
+
+
You might even see them chained together. They operate left to
+right.
+
+
PYTHON
+
+
print(my_string.isupper()) # Not all the letters are uppercase
+print(my_string.upper()) # This capitalizes all the letters
+
+print(my_string.upper().isupper()) # Now all the letters are uppercase
+
+
+
OUTPUT
+
+
False
+HELLO WORLD
+True
+
+
Use the built-in function help to get help for a
+function.
+
Every built-in function has online documentation.
+
+
PYTHON
+
+
help(round)
+
+
+
OUTPUT
+
+
Help on built-in function round in module builtins:
+
+round(number, ndigits=None)
+ Round a number to a given precision in decimal digits.
+
+ The return value is an integer if ndigits is omitted or None. Otherwise
+ the return value has the same type as the number. ndigits may be negative.
+
+
The Jupyter Notebook has two ways to get help.
+
Option 1: Place the cursor near where the function is invoked in a
+cell (i.e., the function name or its parameters),
+
Hold down Shift, and press Tab.
+
Do this several times to expand the information returned.
+
+
Option 2: Type the function name in a cell with a question mark
+after it. Then run the cell.
+
Python reports a syntax error when it can’t understand the source of
+a program.
+
Won’t even try to run the program if it can’t be parsed.
+
+
PYTHON
+
+
# Forgot to close the quote marks around the string.
+name ='Feng
+
+
+
ERROR
+
+
File "<ipython-input-56-f42768451d55>", line 2
+ name = 'Feng
+ ^
+SyntaxError: EOL while scanning string literal
+
+
+
PYTHON
+
+
# An extra '=' in the assignment.
+age ==52
+
+
+
ERROR
+
+
File "<ipython-input-57-ccc3df3cf902>", line 2
+ age = = 52
+ ^
+SyntaxError: invalid syntax
+
+
Look more closely at the error message:
+
+
PYTHON
+
+
print("hello world"
+
+
+
ERROR
+
+
File "<ipython-input-6-d1cc229bf815>", line 1
+ print ("hello world"
+ ^
+SyntaxError: unexpected EOF while parsing
+
+
The message indicates a problem on first line of the input (“line
+1”).
+
In this case the “ipython-input” section of the file name tells us
+that we are working with input into IPython, the Python interpreter used
+by the Jupyter Notebook.
+
+
The -6- part of the filename indicates that the error
+occurred in cell 6 of our Notebook.
+
Next is the problematic line of code, indicating the problem with a
+^ pointer.
+
Python reports a runtime error when something goes wrong while a
+program is executing.
+
+
PYTHON
+
+
age =53
+remaining =100- aege # mis-spelled 'age'
+
+
+
ERROR
+
+
NameError Traceback (most recent call last)
+<ipython-input-59-1214fb6c55fc> in <module>
+ 1 age = 53
+----> 2 remaining = 100 - aege # mis-spelled 'age'
+
+NameError: name 'aege' is not defined
+
+
Fix syntax errors by reading the source and runtime errors by
+tracing execution.
+
+
+
+
+
+
What Happens When
+
+
Explain in simple terms the order of operations in the following
+program: when does the addition happen, when does the subtraction
+happen, when is each function called, etc.
max(len(rich), poor) throws a TypeError. This turns into
+max(4, 'tin') and as we discussed earlier a string and
+integer cannot meaningfully be compared.
+
+
ERROR
+
+
TypeError Traceback (most recent call last)
+<ipython-input-65-bc82ad05177a> in <module>
+----> 1 max(len(rich), poor)
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
+
+
+
+
+
+
+
+
+
Why Not?
+
+
Why is it that max and min do not return
+None when they are called with no arguments?
+
+
+
+
+
+
+
+
+
max and min return TypeErrors in this case
+because the correct number of parameters was not supplied. If it just
+returned None, the error would be much harder to trace as
+it would likely be stored into a variable and used later in the program,
+only to likely throw a runtime error.
+
+
+
+
+
+
+
+
+
+
Last Character of a String
+
+
If Python starts counting from zero, and len returns the
+number of characters in a string, what index expression will get the
+last character in the string name? (Note: we will see a
+simpler way to do this in a later episode.)
+
+
+
+
+
+
+
+
+
name[len(name) - 1]
+
+
+
+
+
+
+
+
+
+
Explore the Python docs!
+
+
The official Python
+documentation is arguably the most complete source of information
+about the language. It is available in different languages and contains
+a lot of useful resources. The Built-in
+Functions page contains a catalogue of all of these functions,
+including the ones that we’ve covered in this lesson. Some of these are
+more advanced and unnecessary at the moment, but others are very simple
+and useful.
+
+
+
+
+
+
+
+
+
Key Points
+
+
Use comments to add documentation to programs.
+
A function may take zero or more arguments.
+
Commonly-used built-in functions include max,
+min, and round.
+
Functions may only work for certain (combinations of)
+arguments.
+
Functions may have default values for some arguments.
+
Use the built-in function help to get help for a
+function.
+
The Jupyter Notebook has two ways to get help.
+
Every function returns something.
+
Python reports a syntax error when it can’t understand the source of
+a program.
+
Python reports a runtime error when something goes wrong while a
+program is executing.
+
Fix syntax errors by reading the source code, and runtime errors by
+tracing the program’s execution.
How can I use software that other people have written?
+
How can I find out what that software does?
+
+
+
+
+
+
+
Objectives
+
Explain what software libraries are and why programmers create and
+use them.
+
Write programs that import and use modules from Python’s standard
+library.
+
Find and read documentation for the standard library interactively
+(in the interpreter) and online.
+
+
+
+
+
+
Most of the power of a programming language is in its
+libraries.
+
A library is a collection of files (called
+modules) that contains functions for use by other programs.
+
May also contain data values (e.g., numerical constants) and other
+things.
+
Library’s contents are supposed to be related, but there’s no way to
+enforce that.
+
+
The Python standard
+library is an extensive suite of modules that comes with Python
+itself.
+
Many additional libraries are available from PyPI (the Python Package
+Index).
+
We will see later how to write new libraries.
+
+
+
+
+
+
Libraries and modules
+
+
A library is a collection of modules, but the terms are often used
+interchangeably, especially since many libraries only consist of a
+single module, so don’t worry if you mix them.
+
+
+
+
A program must import a library module before using it.
+
Use import to load a library module into a program’s
+memory.
+
Then refer to things from the module as
+module_name.thing_name.
+
Python uses . to mean “part of”.
+
+
Using math, one of the modules in the standard
+library:
+
+
PYTHON
+
+
import math
+
+print('pi is', math.pi)
+print('cos(pi) is', math.cos(math.pi))
+
+
+
OUTPUT
+
+
pi is 3.141592653589793
+cos(pi) is -1.0
+
+
Have to refer to each item with the module’s name.
+
+math.cos(pi) won’t work: the reference to
+pi doesn’t somehow “inherit” the function’s reference to
+math.
+
+
Use help to learn about the contents of a library
+module.
+
Works just like help for a function.
+
+
PYTHON
+
+
help(math)
+
+
+
OUTPUT
+
+
Help on module math:
+
+NAME
+ math
+
+MODULE REFERENCE
+ http://docs.python.org/3/library/math
+
+ The following documentation is automatically generated from the Python
+ source files. It may be incomplete, incorrect or include features that
+ are considered implementation detail and may vary between Python
+ implementations. When in doubt, consult the module reference at the
+ location listed above.
+
+DESCRIPTION
+ This module is always available. It provides access to the
+ mathematical functions defined by the C standard.
+
+FUNCTIONS
+ acos(x, /)
+ Return the arc cosine (measured in radians) of x.
+⋮ ⋮ ⋮
+
+
Import specific items from a library module to shorten
+programs.
+
Use from ... import ... to load only specific items
+from a library module.
+
Then refer to them directly without library name as prefix.
+
+
PYTHON
+
+
from math import cos, pi
+
+print('cos(pi) is', cos(pi))
+
+
+
OUTPUT
+
+
cos(pi) is -1.0
+
+
Create an alias for a library module when importing it to shorten
+programs.
+
Use import ... as ... to give a library a short
+alias while importing it.
+
Then refer to items in the library using that shortened name.
+
+
PYTHON
+
+
import math as m
+
+print('cos(pi) is', m.cos(m.pi))
+
+
+
OUTPUT
+
+
cos(pi) is -1.0
+
+
Commonly used for libraries that are frequently used or have long
+names.
+
E.g., the matplotlib plotting library is often aliased
+as mpl.
+
+
But can make programs harder to understand, since readers must learn
+your program’s aliases.
+
+
+
+
+
+
Exploring the Math Module
+
+
What function from the math module can you use to
+calculate a square root without using sqrt?
+
Since the library contains this function, why does sqrt
+exist?
+
+
+
+
+
+
+
+
+
Using help(math) we see that we’ve got
+pow(x,y) in addition to sqrt(x), so we could
+use pow(x, 0.5) to find a square root.
+
The sqrt(x) function is arguably more readable than
+pow(x, 0.5) when implementing equations. Readability is a
+cornerstone of good programming, so it makes sense to provide a special
+function for this specific common case.
+
Also, the design of Python’s math library has its origin
+in the C standard, which includes both sqrt(x) and
+pow(x,y), so a little bit of the history of programming is
+showing in Python’s function names.
+
+
+
+
+
+
+
+
+
+
Locating the Right Module
+
+
You want to select a random character from a string:
The string has 11 characters, each having a positional index from 0
+to 10. You could use the random.randrange
+or random.randint
+functions to get a random integer between 0 and 10, and then select the
+bases character at that index:
+
+
PYTHON
+
+
from random import randrange
+
+random_index = randrange(len(bases))
+print(bases[random_index])
+
+
or more compactly:
+
+
PYTHON
+
+
from random import randrange
+
+print(bases[randrange(len(bases))])
+
+
Perhaps you found the random.sample
+function? It allows for slightly less typing but might be a bit harder
+to understand just by reading:
+
+
PYTHON
+
+
from random import sample
+
+print(sample(bases, 1)[0])
+
+
Note that this function returns a list of values. We will learn about
+lists in episode 11.
+
The simplest and shortest solution is the random.choice
+function that does exactly what we want:
+
+
PYTHON
+
+
from random import choice
+
+print(choice(bases))
+
+
+
+
+
+
+
+
+
+
+
Jigsaw Puzzle (Parson’s Problem) Programming Example
+
+
Rearrange the following statements so that a random DNA base is
+printed and its index in the string. Not all statements may be needed.
+Feel free to use/add intermediate variables.
+
+
PYTHON
+
+
bases="ACTTGCTTGAC"
+import math
+import random
+___ = random.randrange(n_bases)
+___ =len(bases)
+print("random base ", bases[___], "base index", ___)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import math
+import random
+bases ="ACTTGCTTGAC"
+n_bases =len(bases)
+idx = random.randrange(n_bases)
+print("random base", bases[idx], "base index", idx)
+
+
+
+
+
+
+
+
+
+
+
When Is Help Available?
+
+
When a colleague of yours types help(math), Python
+reports an error:
+
+
ERROR
+
+
NameError: name 'math' is not defined
+
+
What has your colleague forgotten to do?
+
+
+
+
+
+
+
+
+
Importing the math module (import math)
+
+
+
+
+
+
+
+
+
+
Importing With Aliases
+
+
Fill in the blanks so that the program below prints
+90.0.
+
Rewrite the program so that it uses import
+withoutas.
+
Which form do you find easier to read?
+
+
PYTHON
+
+
import math as m
+angle = ____.degrees(____.pi /2)
+print(____)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import math as m
+angle = m.degrees(m.pi /2)
+print(angle)
+
+
can be written as
+
+
PYTHON
+
+
import math
+angle = math.degrees(math.pi /2)
+print(angle)
+
+
Since you just wrote the code and are familiar with it, you might
+actually find the first version easier to read. But when trying to read
+a huge piece of code written by someone else, or when getting back to
+your own huge piece of code after several months, non-abbreviated names
+are often easier, except where there are clear abbreviation
+conventions.
+
+
+
+
+
+
+
+
+
+
There Are Many Ways To Import Libraries!
+
+
Match the following print statements with the appropriate library
+calls.
+
Print commands:
+
print("sin(pi/2) =", sin(pi/2))
+
print("sin(pi/2) =", m.sin(m.pi/2))
+
print("sin(pi/2) =", math.sin(math.pi/2))
+
Library calls:
+
from math import sin, pi
+
import math
+
import math as m
+
from math import *
+
+
+
+
+
+
+
+
+
Library calls 1 and 4. In order to directly refer to
+sin and pi without the library name as prefix,
+you need to use the from ... import ... statement. Whereas
+library call 1 specifically imports the two functions sin
+and pi, library call 4 imports all functions in the
+math module.
+
Library call 3. Here sin and pi are
+referred to with a shortened library name m instead of
+math. Library call 3 does exactly that using the
+import ... as ... syntax - it creates an alias for
+math in the form of the shortened name m.
+
Library call 2. Here sin and pi are
+referred to with the regular library name math, so the
+regular import ... call suffices.
+
Note: although library call 4 works, importing all
+names from a module using a wildcard import is not recommended as it makes it
+unclear which names from the module are used in the code. In general it
+is best to make your imports as specific as possible and to only import
+what your code uses. In library call 1, the import
+statement explicitly tells us that the sin function is
+imported from the math module, but library call 4 does not
+convey this information.
+
+
+
+
+
+
+
+
+
+
Importing Specific Items
+
+
Fill in the blanks so that the program below prints
+90.0.
+
Do you find this version easier to read than preceding ones?
+
Why wouldn’t programmers always use this form of
+import?
+
+
PYTHON
+
+
____ math import ____, ____
+angle = degrees(pi /2)
+print(angle)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
from math import degrees, pi
+angle = degrees(pi /2)
+print(angle)
+
+
Most likely you find this version easier to read since it’s less
+dense. The main reason not to use this form of import is to avoid name
+clashes. For instance, you wouldn’t import degrees this way
+if you also wanted to use the name degrees for a variable
+or function of your own. Or if you were to also import a function named
+degrees from another library.
+
+
+
+
+
+
+
+
+
+
Reading Error Messages
+
+
Read the code below and try to identify what the errors are without
+running it.
+
Run the code, and read the error message. What type of error is
+it?
+
+
PYTHON
+
+
from math import log
+log(0)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-1-d72e1d780bab> in <module>
+ 1 from math import log
+----> 2 log(0)
+
+ValueError: math domain error
+
+
The logarithm of x is only defined for
+x > 0, so 0 is outside the domain of the function.
+
You get an error of type ValueError, indicating that
+the function received an inappropriate argument value. The additional
+message “math domain error” makes it clearer what the problem is.
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
Most of the power of a programming language is in its
+libraries.
+
A program must import a library module in order to use it.
+
Use help to learn about the contents of a library
+module.
+
Import specific items from a library to shorten programs.
+
Create an alias for a library when importing it to shorten
+programs.
+
+
diff --git a/07-reading-tabular.html b/07-reading-tabular.html
new file mode 100644
index 000000000..59a2f4048
--- /dev/null
+++ b/07-reading-tabular.html
@@ -0,0 +1,1081 @@
+
+Plotting and Programming in Python: Reading Tabular Data into DataFrames
+ Skip to main content
+
The columns in a dataframe are the observed variables, and the rows
+are the observations.
+
Pandas uses backslash \ to show wrapped lines when
+output is too wide to fit the screen.
+
Using descriptive dataframe names helps us distinguish between
+multiple dataframes so we won’t accidentally overwrite a dataframe or
+read from the wrong one.
+
+
+
+
+
+
File Not Found
+
+
Our lessons store their data files in a data
+sub-directory, which is why the path to the file is
+data/gapminder_gdp_oceania.csv. If you forget to include
+data/, or if you include it but your copy of the file is
+somewhere else, you will get a runtime
+error that ends with a line like this:
+
+
ERROR
+
+
FileNotFoundError: [Errno 2] No such file or directory: 'data/gapminder_gdp_oceania.csv'
+
+
+
+
+
Use index_col to specify that a column’s values should
+be used as row headings.
+
Row headings are numbers (0 and 1 in this case).
+
Really want to index by country.
+
Pass the name of the column to read_csv as its
+index_col parameter to do this.
+
Naming the dataframe data_oceania_country tells us
+which region the data includes (oceania) and how it is
+indexed (country).
Use DataFrame.describe() to get summary statistics
+about data.
+
DataFrame.describe() gets the summary statistics of only
+the columns that have numerical data. All other columns are ignored,
+unless you use the argument include='all'.
Not particularly useful with just two records, but very helpful when
+there are thousands.
+
+
+
+
+
+
Reading Other Data
+
+
Read the data in gapminder_gdp_americas.csv (which
+should be in the same directory as
+gapminder_gdp_oceania.csv) into a variable called
+data_americas and display its summary statistics.
+
+
+
+
+
+
+
+
+
To read in a CSV, we use pd.read_csv and pass the
+filename 'data/gapminder_gdp_americas.csv' to it. We also
+once again pass the column name 'country' to the parameter
+index_col in order to index by country. The summary
+statistics can be displayed with the DataFrame.describe()
+method.
After reading the data for the Americas, use
+help(data_americas.head) and
+help(data_americas.tail) to find out what
+DataFrame.head and DataFrame.tail do.
+
What method call will display the first three rows of this
+data?
+
What method call will display the last three columns of this data?
+(Hint: you may need to change your view of the data.)
+
+
+
+
+
+
+
+
+
We can check out the first five rows of data_americas
+by executing data_americas.head() which lets us view the
+beginning of the DataFrame. We can specify the number of rows we wish to
+see by specifying the parameter n in our call to
+data_americas.head(). To view the first three rows,
+execute:
To check out the last three rows of data_americas, we
+would use the command, americas.tail(n=3), analogous to
+head() used above. However, here we want to look at the
+last three columns so we need to change our view and then use
+tail(). To do so, we create a new DataFrame in which rows
+and columns are switched:
+
+
PYTHON
+
+
americas_flipped = data_americas.T
+
+
We can then view the last three columns of americas by
+viewing the last three rows of americas_flipped:
This shows the data that we want, but we may prefer to display three
+columns instead of three rows, so we can flip it back:
+
+
PYTHON
+
+
americas_flipped.tail(n=3).T
+
+
Note: we could have done the above in a single line
+of code by ‘chaining’ the commands:
+
+
PYTHON
+
+
data_americas.T.tail(n=3).T
+
+
+
+
+
+
+
+
+
+
+
Reading Files in Other Directories
+
+
The data for your current project is stored in a file called
+microbes.csv, which is located in a folder called
+field_data. You are doing analysis in a notebook called
+analysis.ipynb in a sibling folder called
+thesis:
What value(s) should you pass to read_csv to read
+microbes.csv in analysis.ipynb?
+
+
+
+
+
+
+
+
+
We need to specify the path to the file of interest in the call to
+pd.read_csv. We first need to ‘jump’ out of the folder
+thesis using ‘../’ and then into the folder
+field_data using ‘field_data/’. Then we can specify the
+filename `microbes.csv. The result is as follows:
As well as the read_csv function for reading data from a
+file, Pandas provides a to_csv function to write dataframes
+to files. Applying what you’ve learned about reading from files, write
+one of your dataframes to a file called processed.csv. You
+can use help to get information on how to use
+to_csv.
+
+
+
+
+
+
+
+
+
In order to write the DataFrame data_americas to a file
+called processed.csv, execute the following command:
+
+
PYTHON
+
+
data_americas.to_csv('processed.csv')
+
+
For help on read_csv or to_csv, you could
+execute, for example:
+
+
PYTHON
+
+
help(data_americas.to_csv)
+help(pd.read_csv)
+
+
Note that help(to_csv) or help(pd.to_csv)
+throws an error! This is due to the fact that to_csv is not
+a global Pandas function, but a member function of DataFrames. This
+means you can only call it on an instance of a DataFrame e.g.,
+data_americas.to_csv or
+data_oceania.to_csv
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
Use the Pandas library to get basic statistics out of tabular
+data.
+
Use index_col to specify that a column’s values should
+be used as row headings.
+
Use DataFrame.info to find out more about a
+dataframe.
+
The DataFrame.columns variable stores information about
+the dataframe’s columns.
+
Use DataFrame.T to transpose a dataframe.
+
Use DataFrame.describe to get summary statistics about
+data.
How can I do statistical analysis of tabular data?
+
+
+
+
+
+
+
Objectives
+
Select individual values from a Pandas dataframe.
+
Select entire rows or entire columns from a dataframe.
+
Select a subset of both rows and columns from a dataframe in a
+single operation.
+
Select a subset of a dataframe by a single Boolean criterion.
+
+
+
+
+
+
Note about Pandas DataFrames/Series
+
A DataFrame
+is a collection of Series;
+The DataFrame is the way Pandas represents a table, and Series is the
+data-structure Pandas use to represent a column.
+
Pandas is built on top of the Numpy library, which in practice means
+that most of the methods defined for Numpy Arrays apply to Pandas
+Series/DataFrames.
+
What makes Pandas so attractive is the powerful interface to access
+individual records of the table, proper handling of missing values, and
+relational-databases operations between DataFrames.
+
Selecting values
+
To access a value at the position [i,j] of a DataFrame,
+we have two options, depending on what is the meaning of i
+in use. Remember that a DataFrame provides an index as a way to
+identify the rows of the table; a row, then, has a position
+inside the table as well as a label, which uniquely identifies
+its entry in the DataFrame.
+
Use DataFrame.iloc[..., ...] to select values by their
+(entry) position
+
Can specify location by numerical index analogously to 2D version of
+character selection in strings.
+
+
PYTHON
+
+
import pandas as pd
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.iloc[0, 0])
+
+
+
OUTPUT
+
+
1601.056136
+
+
Use DataFrame.loc[..., ...] to select values by their
+(entry) label.
In the above code, we discover that slicing using
+loc is inclusive at both ends, which differs from
+slicing using iloc, where slicing
+indicates everything up to but not including the final index.
+
Result of slicing can be used in further operations.
+
Usually don’t just print a slice.
+
All the statistical operators that work on entire dataframes work
+the same way on slices.
Returns a similarly-shaped dataframe of True and
+False.
+
+
PYTHON
+
+
# Use a subset of data to keep output readable.
+subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
+print('Subset of data:\n', subset)
+
+# Which values were greater than 10000 ?
+print('\nWhere are values large?\n', subset >10000)
A frame full of Booleans is sometimes called a mask because
+of how it can be used.
+
+
PYTHON
+
+
mask = subset >10000
+print(subset[mask])
+
+
+
OUTPUT
+
+
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
+country
+Italy NaN 10022.40131 12269.27378
+Montenegro NaN NaN NaN
+Netherlands 12790.84956 15363.25136 18794.74567
+Norway 13450.40151 16361.87647 18965.05551
+Poland NaN NaN NaN
+
+
Get the value where the mask is true, and NaN (Not a Number) where
+it is false.
+
Useful because NaNs are ignored by operations like max, min,
+average, etc.
Pandas vectorizing methods and grouping operations are features that
+provide users much flexibility to analyse their data.
+
For instance, let’s say we want to have a clearer view on how the
+European countries split themselves according to their GDP.
+
We may have a glance by splitting the countries in two groups during
+the years surveyed, those who presented a GDP higher than the
+European average and those with a lower GDP.
+
We then estimate a wealthy score based on the historical
+(from 1962 to 2007) values, where we account how many times a country
+has participated in the groups of lower or higher
+GDP
Clearly, the second statement produces an additional column and an
+additional row compared to the first statement.
+What conclusion can we draw? We see that a numerical slice, 0:2,
+omits the final index (i.e. index 2) in the range provided,
+while a named slice, ‘gdpPercap_1952’:‘gdpPercap_1962’,
+includes the final element.
+
+
+
+
+
+
+
+
+
+
Reconstructing Data
+
+
Explain what each line in the following short program does: what is
+in first, second, etc.?
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+
+
This line loads the dataset containing the GDP data from all
+countries into a dataframe called first. The
+index_col='country' parameter selects which column to use
+as the row labels in the dataframe.
+
+
PYTHON
+
+
second = first[first['continent'] =='Americas']
+
+
This line makes a selection: only those rows of first
+for which the ‘continent’ column matches ‘Americas’ are extracted.
+Notice how the Boolean expression inside the brackets,
+first['continent'] == 'Americas', is used to select only
+those rows where the expression is true. Try printing this expression!
+Can you print also its individual True/False elements? (hint: first
+assign the expression to a variable)
+
+
PYTHON
+
+
third = second.drop('Puerto Rico')
+
+
As the syntax suggests, this line drops the row from
+second where the label is ‘Puerto Rico’. The resulting
+dataframe third has one row less than the original
+dataframe second.
+
+
PYTHON
+
+
fourth = third.drop('continent', axis =1)
+
+
Again we apply the drop function, but in this case we are dropping
+not a row but a whole column. To accomplish this, we need to specify
+also the axis parameter (we want to drop the second column
+which has index 1).
+
+
PYTHON
+
+
fourth.to_csv('result.csv')
+
+
The final step is to write the data that we have been working on to a
+csv file. Pandas makes this easy with the to_csv()
+function. The only required argument to the function is the filename.
+Note that the file will be written in the directory from which you
+started the Jupyter or Python session.
+
+
+
+
+
+
+
+
+
+
Selecting Indices
+
+
Explain in simple terms what idxmin and
+idxmax do in the short program below. When would you use
+these methods?
+
+
PYTHON
+
+
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.idxmin())
+print(data.idxmax())
+
+
+
+
+
+
+
+
+
+
For each column in data, idxmin will return
+the index value corresponding to each column’s minimum;
+idxmax will do accordingly the same for each column’s
+maximum value.
+
You can use these functions whenever you want to get the row index of
+the minimum/maximum value and not the actual minimum/maximum value.
+
+
+
+
+
+
+
+
+
+
Practice with Selection
+
+
Assume Pandas has been imported and the Gapminder GDP data for Europe
+has been loaded. Write an expression to select each of the
+following:
+
GDP per capita for all countries in 1982.
+
GDP per capita for Denmark for all years.
+
GDP per capita for all countries for years after 1985.
+
GDP per capita for each country in 2007 as a multiple of GDP per
+capita for that country in 1952.
+
+
+
+
+
+
+
+
+
1:
+
+
PYTHON
+
+
data['gdpPercap_1982']
+
+
2:
+
+
PYTHON
+
+
data.loc['Denmark',:]
+
+
3:
+
+
PYTHON
+
+
data.loc[:,'gdpPercap_1985':]
+
+
Pandas is smart enough to recognize the number at the end of the
+column label and does not give you an error, although no column named
+gdpPercap_1985 actually exists. This is useful if new
+columns are added to the CSV file later.
+
4:
+
+
PYTHON
+
+
data['gdpPercap_2007']/data['gdpPercap_1952']
+
+
+
+
+
+
+
+
+
+
+
Many Ways of Access
+
+
There are at least two ways of accessing a value or slice of a
+DataFrame: by name or index. However, there are many others. For
+example, a single column or row can be accessed either as a
+DataFrame or a Series object.
+
Suggest different ways of doing the following operations on a
+DataFrame:
+
Access a single column
+
Access a single row
+
Access an individual DataFrame element
+
Access several columns
+
Access several rows
+
Access a subset of specific rows and columns
+
Access a subset of row and column ranges
+
+
+
+
+
+
+
+
+
1. Access a single column:
+
+
PYTHON
+
+
# by name
+data["col_name"] # as a Series
+data[["col_name"]] # as a DataFrame
+
+# by name using .loc
+data.T.loc["col_name"] # as a Series
+data.T.loc[["col_name"]].T # as a DataFrame
+
+# Dot notation (Series)
+data.col_name
+
+# by index (iloc)
+data.iloc[:, col_index] # as a Series
+data.iloc[:, [col_index]] # as a DataFrame
+
+# using a mask
+data.T[data.T.index =="col_name"].T
+
+
2. Access a single row:
+
+
PYTHON
+
+
# by name using .loc
+data.loc["row_name"] # as a Series
+data.loc[["row_name"]] # as a DataFrame
+
+# by name
+data.T["row_name"] # as a Series
+data.T[["row_name"]].T # as a DataFrame
+
+# by index
+data.iloc[row_index] # as a Series
+data.iloc[[row_index]] # as a DataFrame
+
+# using mask
+data[data.index =="row_name"]
+
+
3. Access an individual DataFrame element:
+
+
PYTHON
+
+
# by column/row names
+data["column_name"]["row_name"] # as a Series
+
+data[["col_name"]].loc["row_name"] # as a Series
+data[["col_name"]].loc[["row_name"]] # as a DataFrame
+
+data.loc["row_name"]["col_name"] # as a value
+data.loc[["row_name"]]["col_name"] # as a Series
+data.loc[["row_name"]][["col_name"]] # as a DataFrame
+
+data.loc["row_name", "col_name"] # as a value
+data.loc[["row_name"], "col_name"] # as a Series. Preserves index. Column name is moved to `.name`.
+data.loc["row_name", ["col_name"]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.loc[["row_name"], ["col_name"]] # as a DataFrame (preserves original index and column name)
+
+# by column/row names: Dot notation
+data.col_name.row_name
+
+# by column/row indices
+data.iloc[row_index, col_index] # as a value
+data.iloc[[row_index], col_index] # as a Series. Preserves index. Column name is moved to `.name`
+data.iloc[row_index, [col_index]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.iloc[[row_index], [col_index]] # as a DataFrame (preserves original index and column name)
+
+# column name + row index
+data["col_name"][row_index]
+data.col_name[row_index]
+data["col_name"].iloc[row_index]
+
+# column index + row name
+data.iloc[:, [col_index]].loc["row_name"] # as a Series
+data.iloc[:, [col_index]].loc[["row_name"]] # as a DataFrame
+
+# using masks
+data[data.index =="row_name"].T[data.T.index =="col_name"].T
+
+
4. Access several columns:
+
+
PYTHON
+
+
# by name
+data[["col1", "col2", "col3"]]
+data.loc[:, ["col1", "col2", "col3"]]
+
+# by index
+data.iloc[:, [col1_index, col2_index, col3_index]]
+
+
5. Access several rows
+
+
PYTHON
+
+
# by name
+data.loc[["row1", "row2", "row3"]]
+
+# by index
+data.iloc[[row1_index, row2_index, row3_index]]
+
+
6. Access a subset of specific rows and columns
+
+
PYTHON
+
+
# by names
+data.loc[["row1", "row2", "row3"], ["col1", "col2", "col3"]]
+
+# by indices
+data.iloc[[row1_index, row2_index, row3_index], [col1_index, col2_index, col3_index]]
+
+# column names + row indices
+data[["col1", "col2", "col3"]].iloc[[row1_index, row2_index, row3_index]]
+
+# column indices + row names
+data.iloc[:, [col1_index, col2_index, col3_index]].loc[["row1", "row2", "row3"]]
+
+
7. Access a subset of row and column ranges
+
+
PYTHON
+
+
# by name
+data.loc["row1":"row2", "col1":"col2"]
+
+# by index
+data.iloc[row1_index:row2_index, col1_index:col2_index]
+
+# column names + row indices
+data.loc[:, "col1_name":"col2_name"].iloc[row1_index:row2_index]
+
+# column indices + row names
+data.iloc[:, col1_index:col2_index].loc["row1":"row2"]
+
+
+
+
+
+
+
+
+
+
+
Exploring available methods using the
+dir() function
+
+
Python includes a dir() function that can be used to
+display all of the available methods (functions) that are built into a
+data object. In Episode 4, we used some methods with a string. But we
+can see many more are available by using dir():
+
+
PYTHON
+
+
my_string ='Hello world!'# creation of a string object
+dir(my_string)
You can use help() or Shift+Tab to
+get more information about what these methods do.
+
Assume Pandas has been imported and the Gapminder GDP data for Europe
+has been loaded as data. Then, use dir() to
+find the function that prints out the median per-capita GDP across all
+European countries for each year that information is available.
+
+
+
+
+
+
+
+
+
Among many choices, dir() lists the
+median() function as a possibility. Thus,
+
+
PYTHON
+
+
data.median()
+
+
+
+
+
+
+
+
+
+
+
Interpretation
+
+
Poland’s borders have been stable since 1945, but changed several
+times in the years before then. How would you handle this if you were
+creating a table of GDP per capita for Poland for the entire twentieth
+century?
+
+
+
+
+
+
+
+
+
Key Points
+
+
Use DataFrame.iloc[..., ...] to select values by
+integer location.
+
Use : on its own to mean all columns or all rows.
+
Select multiple columns or rows using DataFrame.loc and
+a named slice.
+
Result of slicing can be used in further operations.
In our Jupyter Notebook example, running the cell should generate the
+figure directly below the code. The figure is also included in the
+Notebook document for future viewing. However, other Python environments
+like an interactive Python session started from a terminal or a Python
+script executed via the command line require an additional command to
+display the figure.
+
Instruct matplotlib to show a figure:
+
+
PYTHON
+
+
plt.show()
+
+
This command can also be used within a Notebook - for instance, to
+display multiple figures if several are created by a single cell.
Before plotting, we convert the column headings from a
+string to integer data type, since they
+represent numerical values, using str.replace()
+to remove the gpdPercap_ prefix and then astype(int)
+to convert the series of string values
+(['1952', '1957', ..., '2007']) to a series of integers:
+[1925, 1957, ..., 2007].
+
+
PYTHON
+
+
import pandas as pd
+
+data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+
+# Extract year from last 4 characters of each column name
+# The current column names are structured as 'gdpPercap_(year)',
+# so we want to keep the (year) part only for clarity when plotting GDP vs. years
+# To do this we use replace(), which removes from the string the characters stated in the argument
+# This method works on strings, so we use replace() from Pandas Series.str vectorized string functions
+
+years = data.columns.str.replace('gdpPercap_', '')
+
+# Convert year values to integers, saving results back to dataframe
+
+data.columns = years.astype(int)
+
+data.loc['Australia'].plot()
+
+
Select and transform data, then plot it.
+
By default, DataFrame.plot
+plots with the rows as the X axis.
+
We can transpose the data in order to plot multiple series.
+
+
PYTHON
+
+
data.T.plot()
+plt.ylabel('GDP per capita')
+
+
Many styles of plot are available.
+
For example, do a bar plot using a fancier style.
+
+
PYTHON
+
+
plt.style.use('ggplot')
+data.T.plot(kind='bar')
+plt.ylabel('GDP per capita')
+
+
Data can also be plotted by calling the matplotlib
+plot function directly.
+
The command is plt.plot(x, y)
+
+
The color and format of markers can also be specified as an
+additional optional argument e.g., b- is a blue line,
+g-- is a green dashed line.
+
Get Australia data from dataframe
+
+
PYTHON
+
+
years = data.columns
+gdp_australia = data.loc['Australia']
+
+plt.plot(years, gdp_australia, 'g--')
+
+
Can plot many sets of data together.
+
+
PYTHON
+
+
# Select two countries' worth of data.
+gdp_australia = data.loc['Australia']
+gdp_nz = data.loc['New Zealand']
+
+# Plot with differently-colored markers.
+plt.plot(years, gdp_australia, 'b-', label='Australia')
+plt.plot(years, gdp_nz, 'g-', label='New Zealand')
+
+# Create legend.
+plt.legend(loc='upper left')
+plt.xlabel('Year')
+plt.ylabel('GDP per capita ($)')
+
+
+
+
+
+
+
Adding a Legend
+
+
Often when plotting multiple datasets on the same figure it is
+desirable to have a legend describing the data.
By default matplotlib will attempt to place the legend in a suitable
+position. If you would rather specify a position this can be done with
+the loc= argument, e.g to place the legend in the upper
+left corner of the plot, specify loc='upper left'
+
+
+
+
Plot a scatter plot correlating the GDP of Australia and New
+Zealand
+
Use either plt.scatter or
+DataFrame.plot.scatter
+
+
+
PYTHON
+
+
plt.scatter(gdp_australia, gdp_nz)
+
+
+
PYTHON
+
+
data.T.plot.scatter(x ='Australia', y ='New Zealand')
+
+
+
+
+
+
+
Minima and Maxima
+
+
Fill in the blanks below to plot the minimum GDP per capita over time
+for all the countries in Europe. Modify it again to plot the maximum GDP
+per capita over time for Europe.
Modify the example in the notes to create a scatter plot showing the
+relationship between the minimum and maximum GDP per capita among the
+countries in Asia for each year in the data set. What relationship do
+you see (if any)?
No particular correlations can be seen between the minimum and
+maximum GDP values year on year. It seems the fortunes of asian
+countries do not rise and fall together.
+
+
+
+
+
+
+
+
+
+
Correlations (continued)
+
+
+
You might note that the variability in the maximum is much higher
+than that of the minimum. Take a look at the maximum and the max
+indexes:
Seems the variability in this value is due to a sharp drop after
+1972. Some geopolitics at play perhaps? Given the dominance of oil
+producing countries, maybe the Brent crude index would make an
+interesting comparison? Whilst Myanmar consistently has the lowest GDP,
+the highest GDP nation has varied more notably.
+
+
+
+
+
+
+
+
+
+
More Correlations
+
+
This short program creates a plot showing the correlation between GDP
+and life expectancy for 2007, normalizing marker size by population:
Using online help and other resources, explain what each argument to
+plot does.
+
+
+
+
+
+
+
+
+
A good place to look is the documentation for the plot function -
+help(data_all.plot).
+
kind - As seen already this determines the kind of plot to be
+drawn.
+
x and y - A column name or index that determines what data will be
+placed on the x and y axes of the plot
+
s - Details for this can be found in the documentation of
+plt.scatter. A single number or one value for each data point.
+Determines the size of the plotted points.
+
+
+
+
+
+
+
+
+
+
Saving your plot to a file
+
+
If you are satisfied with the plot you see you may want to save it to
+a file, perhaps to include it in a publication. There is a function in
+the matplotlib.pyplot module that accomplishes this: savefig.
+Calling this function, e.g. with
+
+
PYTHON
+
+
plt.savefig('my_figure.png')
+
+
will save the current figure to the file my_figure.png.
+The file format will automatically be deduced from the file name
+extension (other formats are pdf, ps, eps and svg).
+
Note that functions in plt refer to a global figure
+variable and after a figure has been displayed to the screen (e.g. with
+plt.show) matplotlib will make this variable refer to a new
+empty figure. Therefore, make sure you call plt.savefig
+before the plot is displayed to the screen, otherwise you may find a
+file with an empty plot.
+
When using dataframes, data is often generated and plotted to screen
+in one line. In addition to using plt.savefig, we can save
+a reference to the current figure in a local variable (with
+plt.gcf) and call the savefig class method
+from that variable to save the figure to file.
+
+
PYTHON
+
+
data.plot(kind='bar')
+fig = plt.gcf() # get current figure
+fig.savefig('my_figure.png')
+
+
+
+
+
+
+
+
+
+
Making your plots accessible
+
+
Whenever you are generating plots to go into a paper or a
+presentation, there are a few things you can do to make sure that
+everyone can understand your plots.
+
Always make sure your text is large enough to read. Use the
+fontsize parameter in xlabel,
+ylabel, title, and legend, and tick_params
+with labelsize to increase the text size of the numbers
+on your axes.
+
Similarly, you should make your graph elements easy to see. Use
+s to increase the size of your scatterplot markers and
+linewidth to increase the sizes of your plot lines.
+
Using color (and nothing else) to distinguish between different plot
+elements will make your plots unreadable to anyone who is colorblind, or
+who happens to have a black-and-white office printer. For lines, the
+linestyle parameter lets you use different types of lines.
+For scatterplots, marker lets you change the shape of your
+points. If you’re unsure about your colors, you can use Coblis
+or Color Oracle to simulate what
+your plots would look like to those with colorblindness.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+matplotlib is the
+most widely used scientific plotting library in Python.
print('zeroth item of pressures:', pressures[0])
+print('fourth item of pressures:', pressures[4])
+
+
+
OUTPUT
+
+
zeroth item of pressures: 0.273
+fourth item of pressures: 0.276
+
+
Lists’ values can be replaced by assigning to them.
+
Use an index expression on the left of assignment to replace a
+value.
+
+
PYTHON
+
+
pressures[0] =0.265
+print('pressures is now:', pressures)
+
+
+
OUTPUT
+
+
pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]
+
+
Appending items to a list lengthens it.
+
Use list_name.append to add items to the end of a
+list.
+
+
PYTHON
+
+
primes = [2, 3, 5]
+print('primes is initially:', primes)
+primes.append(7)
+print('primes has become:', primes)
+
+
+
OUTPUT
+
+
primes is initially: [2, 3, 5]
+primes has become: [2, 3, 5, 7]
+
+
+append is a method of lists.
+
Like a function, but tied to a particular object.
+
+
Use object_name.method_name to call methods.
+
Deliberately resembles the way we refer to things in a library.
+
+
We will meet other methods of lists as we go along.
+
Use help(list) for a preview.
+
+
+extend is similar to append, but it allows
+you to combine two lists. For example:
+
+
PYTHON
+
+
teen_primes = [11, 13, 17, 19]
+middle_aged_primes = [37, 41, 43, 47]
+print('primes is currently:', primes)
+primes.extend(teen_primes)
+print('primes has now become:', primes)
+primes.append(middle_aged_primes)
+print('primes has finally become:', primes)
+
+
+
OUTPUT
+
+
primes is currently: [2, 3, 5, 7]
+primes has now become: [2, 3, 5, 7, 11, 13, 17, 19]
+primes has finally become: [2, 3, 5, 7, 11, 13, 17, 19, [37, 41, 43, 47]]
+
+
Note that while extend maintains the “flat” structure of
+the list, appending a list to a list means the last element in
+primes will itself be a list, not an integer. Lists can
+contain values of any type; therefore, lists of lists are possible.
+
Use del to remove items from a list entirely.
+
We use del list_name[index] to remove an element from a
+list (in the example, 9 is not a prime number) and thus shorten it.
+
+del is not a function or a method, but a statement in
+the language.
+
+
PYTHON
+
+
primes = [2, 3, 5, 7, 9]
+print('primes before removing last item:', primes)
+del primes[4]
+print('primes after removing last item:', primes)
+
+
+
OUTPUT
+
+
primes before removing last item: [2, 3, 5, 7, 9]
+primes after removing last item: [2, 3, 5, 7]
+
+
The empty list contains no values.
+
Use [] on its own to represent a list that doesn’t
+contain any values.
+
“The zero of lists.”
+
+
Helpful as a starting point for collecting values (which we will see
+in the next episode).
+
Lists may contain values of different types.
+
A single list may contain numbers, strings, and anything else.
If start and stop are both non-negative
+integers, how long is the list values[start:stop]?
+
+
+
+
+
+
+
+
+
The list values[start:stop] has up to
+stop - start elements. For example,
+values[1:4] has the 3 elements values[1],
+values[2], and values[3]. Why ‘up to’? As we
+saw in episode 2, if stop
+is greater than the total length of the list values, we
+will still get a list back but it will be shorter than expected.
+
+
+
+
+
+
+
+
+
+
From Strings to Lists and Back
+
+
Given this:
+
+
PYTHON
+
+
print('string to list:', list('tin'))
+print('list to string:', ''.join(['g', 'o', 'l', 'd']))
+
+
+
OUTPUT
+
+
string to list: ['t', 'i', 'n']
+list to string: gold
+
+
What does list('some string') do?
+
What does '-'.join(['x', 'y', 'z']) generate?
+
+
+
+
+
+
+
+
+
+list('some string')
+converts a string into a list containing all of its characters.
+
+join
+returns a string that is the concatenation of each string
+element in the list and adds the separator between each element in the
+list. This results in x-y-z. The separator between the
+elements is the string that provides this method.
+
+
+
+
+
+
+
+
+
+
Working With the End
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='helium'
+print(element[-1])
+
+
How does Python interpret a negative index?
+
If a list or string has N elements, what is the most negative index
+that can safely be used with it, and what location does that index
+represent?
+
If values is a list, what does
+del values[-1] do?
+
How can you display all elements but the last one without changing
+values? (Hint: you will need to combine slicing and
+negative indexing.)
+
+
+
+
+
+
+
+
+
The program prints m.
+
Python interprets a negative index as starting from the end (as
+opposed to starting from the beginning). The last element is
+-1.
+
The last index that can safely be used with a list of N elements is
+element -N, which represents the first element.
+
+del values[-1] removes the last element from the
+list.
+
values[:-1]
+
+
+
+
+
+
+
+
+
+
Stepping Through a List
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='fluorine'
+print(element[::2])
+print(element[::-1])
+
+
If we write a slice as low:high:stride, what does
+stride do?
+
What expression would select all of the even-numbered items from a
+collection?
+
+
+
+
+
+
+
+
+
The program prints
+
+
PYTHON
+
+
furn
+eniroulf
+
+
+stride is the step size of the slice.
+
The slice 1::2 selects all even-numbered items from a
+collection: it starts with element 1 (which is the second
+element, since indexing starts at 0), goes on until the end
+(since no end is given), and uses a step size of
+2 (i.e., selects every second element).
+
+
+
+
+
+
+
+
+
+
Slice Bounds
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='lithium'
+print(element[0:20])
+print(element[-1:3])
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
lithium
+
+
The first statement prints the whole string, since the slice goes
+beyond the total length of the string. The second statement returns an
+empty string, because the slice goes “out of bounds” of the string.
+
+
+
+
+
+
+
+
+
+
Sort and Sorted
+
+
What do these two programs print? In simple terms, explain the
+difference between sorted(letters) and
+letters.sort().
+
+
PYTHON
+
+
# Program A
+letters =list('gold')
+result =sorted(letters)
+print('letters is', letters, 'and result is', result)
+
+
+
PYTHON
+
+
# Program B
+letters =list('gold')
+result = letters.sort()
+print('letters is', letters, 'and result is', result)
+
+
+
+
+
+
+
+
+
+
Program A prints
+
+
OUTPUT
+
+
letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']
+
+
Program B prints
+
+
OUTPUT
+
+
letters is ['d', 'g', 'l', 'o'] and result is None
+
+
sorted(letters) returns a sorted copy of the list
+letters (the original list letters remains
+unchanged), while letters.sort() sorts the list
+letters in-place and does not return anything.
+
+
+
+
+
+
+
+
+
+
Copying (or Not)
+
+
What do these two programs print? In simple terms, explain the
+difference between new = old and
+new = old[:].
+
+
PYTHON
+
+
# Program A
+old =list('gold')
+new = old # simple assignment
+new[0] ='D'
+print('new is', new, 'and old is', old)
+
+
+
PYTHON
+
+
# Program B
+old =list('gold')
+new = old[:] # assigning a slice
+new[0] ='D'
+print('new is', new, 'and old is', old)
+
+
+
+
+
+
+
+
+
+
Program A prints
+
+
OUTPUT
+
+
new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']
+
+
Program B prints
+
+
OUTPUT
+
+
new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']
+
+
new = old makes new a reference to the list
+old; new and old point towards
+the same object.
+
new = old[:] however creates a new list object
+new containing all elements from the list old;
+new and old are different objects.
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
A list stores many values in a single structure.
+
Use an item’s index to fetch it from a list.
+
Lists’ values can be replaced by assigning to them.
+
Appending items to a list lengthens it.
+
Use del to remove items from a list entirely.
+
The empty list contains no values.
+
Lists may contain values of different types.
+
Character strings can be indexed like lists.
+
Character strings are immutable.
+
Indexing beyond the end of the collection is an error.
+
+
diff --git a/12-for-loops.html b/12-for-loops.html
new file mode 100644
index 000000000..090ca4358
--- /dev/null
+++ b/12-for-loops.html
@@ -0,0 +1,1178 @@
+
+Plotting and Programming in Python: For Loops
+ Skip to main content
+
This error can be fixed by removing the extra spaces at the
+beginning of the second line.
+
Loop variables can be called anything.
+
As with all variables, loop variables are:
+
Created on demand.
+
Meaningless: their names can be anything at all.
+
+
+
PYTHON
+
+
for kitten in [2, 3, 5]:
+print(kitten)
+
+
The body of a loop can contain many statements.
+
But no loop should be more than a few lines long.
+
Hard for human beings to keep larger chunks of code in mind.
+
+
PYTHON
+
+
primes = [2, 3, 5]
+for p in primes:
+ squared = p **2
+ cubed = p **3
+print(p, squared, cubed)
+
+
+
OUTPUT
+
+
2 4 8
+3 9 27
+5 25 125
+
+
Use range to iterate over a sequence of numbers.
+
The built-in function range
+produces a sequence of numbers.
+
+Not a list: the numbers are produced on demand to make
+looping over large ranges more efficient.
+
+
+range(N) is the numbers 0..N-1
+
Exactly the legal indices of a list or character string of length
+N
+
+
+
PYTHON
+
+
print('a range is not a list: range(0, 3)')
+for number inrange(0, 3):
+print(number)
+
+
+
OUTPUT
+
+
a range is not a list: range(0, 3)
+0
+1
+2
+
+
The Accumulator pattern turns many values into one.
+
A common pattern in programs is to:
+
Initialize an accumulator variable to zero, the empty
+string, or the empty list.
+
Update the variable with values from a collection.
+
+
+
PYTHON
+
+
# Sum the first 10 integers.
+total =0
+for number inrange(10):
+ total = total + (number +1)
+print(total)
+
+
+
OUTPUT
+
+
55
+
+
Read total = total + (number + 1) as:
+
Add 1 to the current value of the loop variable
+number.
+
Add that to the current value of the accumulator variable
+total.
+
Assign that to total, replacing the current value.
+
+
We have to add number + 1 because range
+produces 0..9, not 1..10.
+
+
+
+
+
+
Classifying Errors
+
+
Is an indentation error a syntax error or a runtime error?
+
+
+
+
+
+
+
+
+
An IndentationError is a syntax error. Programs with syntax errors
+cannot be started. A program with a runtime error will start but an
+error will be thrown under certain conditions.
+
+
+
+
+
+
+
+
+
+
Tracing Execution
+
+
Create a table showing the numbers of the lines that are executed
+when this program runs, and the values of the variables after each line
+is executed.
+
+
PYTHON
+
+
total =0
+for char in"tin":
+ total = total +1
+
+
+
+
+
+
+
+
+
+
Line no
+
Variables
+
1
+
total = 0
+
2
+
total = 0 char = ‘t’
+
3
+
total = 1 char = ‘t’
+
2
+
total = 1 char = ‘i’
+
3
+
total = 2 char = ‘i’
+
2
+
total = 2 char = ‘n’
+
3
+
total = 3 char = ‘n’
+
+
+
+
+
+
+
+
+
+
Reversing a String
+
+
Fill in the blanks in the program below so that it prints “nit” (the
+reverse of the original character string “tin”).
+
+
PYTHON
+
+
original ="tin"
+result = ____
+for char in original:
+ result = ____
+print(result)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
original ="tin"
+result =""
+for char in original:
+ result = char + result
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+
+
Fill in the blanks in each of the programs below to produce the
+indicated result.
+
+
PYTHON
+
+
# Total length of the strings in the list: ["red", "green", "blue"] => 12
+total =0
+for word in ["red", "green", "blue"]:
+ ____ = ____ +len(word)
+print(total)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
total =0
+for word in ["red", "green", "blue"]:
+ total = total +len(word)
+print(total)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+(continued)
+
+
+
+
PYTHON
+
+
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
+lengths = ____
+for word in ["red", "green", "blue"]:
+ lengths.____(____)
+print(lengths)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
lengths = []
+for word in ["red", "green", "blue"]:
+ lengths.append(len(word))
+print(lengths)
words = ["red", "green", "blue"]
+result =""
+for word in words:
+ result = result + word
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+(continued)
+
+
+
Create an acronym: Starting from the list
+["red", "green", "blue"], create the acronym
+"RGB" using a for loop.
+
Hint: You may need to use a string method to
+properly format the acronym.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
acronym =""
+for word in ["red", "green", "blue"]:
+ acronym = acronym + word[0].upper()
+print(acronym)
+
+
+
+
+
+
+
+
+
+
+
Cumulative Sum
+
+
Reorder and properly indent the lines of code below so that they
+print a list with the cumulative sum of data. The result should be
+[1, 3, 5, 10].
+
+
PYTHON
+
+
cumulative.append(total)
+for number in data:
+cumulative = []
+total = total + number
+total =0
+print(cumulative)
+data = [1,2,2,5]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
total =0
+data = [1,2,2,5]
+cumulative = []
+for number in data:
+ total = total + number
+ cumulative.append(total)
+print(cumulative)
+
+
+
+
+
+
+
+
+
+
+
Identifying Variable Name Errors
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code and read the error message. What type of
+NameError do you think this is? Is it a string with no
+quotes, a misspelled variable, or a variable that should have been
+defined but was not?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
PYTHON
+
+
for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (Number %3) ==0:
+ message = message + a
+else:
+ message = message +"b"
+print(message)
+
+
+
+
+
+
+
+
+
+
Python variable names are case sensitive: number and
+Number refer to different variables.
+
The variable message needs to be initialized as an
+empty string.
+
We want to add the string "a" to message,
+not the undefined variable a.
+
+
PYTHON
+
+
message =""
+for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (number %3) ==0:
+ message = message +"a"
+else:
+ message = message +"b"
+print(message)
+
+
+
+
+
+
+
+
+
+
+
Identifying Item Errors
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code, and read the error message. What type of error is
+it?
+
Fix the error.
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+
+
+
+
This list has 4 elements and the index to access the last element in
+the list is 3.
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[3])
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
A for loop executes commands once for each value in a
+collection.
+
A for loop is made up of a collection, a loop variable,
+and a body.
+
The first line of the for loop must end with a colon,
+and the body must be indented.
+
Indentation is always meaningful in Python.
+
Loop variables can be called anything (but it is strongly advised to
+have a meaningful name to the looping variable).
+
The body of a loop can contain many statements.
+
Use range to iterate over a sequence of numbers.
+
The Accumulator pattern turns many values into one.
Often use conditionals in a loop to “evolve” the values of
+variables.
+
+
PYTHON
+
+
velocity =10.0
+for i inrange(5): # execute the loop 5 times
+print(i, ':', velocity)
+if velocity >20.0:
+print('moving too fast')
+ velocity = velocity -5.0
+else:
+print('moving too slow')
+ velocity = velocity +10.0
+print('final velocity:', velocity)
+
+
+
OUTPUT
+
+
0 : 10.0
+moving too slow
+1 : 20.0
+moving too slow
+2 : 30.0
+moving too fast
+3 : 25.0
+moving too fast
+4 : 20.0
+moving too slow
+final velocity: 30.0
+
+
Create a table showing variables’ values to trace a program’s
+execution.
+
+i
+
+
+0
+
+
+.
+
+
+1
+
+
+.
+
+
+2
+
+
+.
+
+
+3
+
+
+.
+
+
+4
+
+
+.
+
+
+velocity
+
+
+10.0
+
+
+20.0
+
+
+.
+
+
+30.0
+
+
+.
+
+
+25.0
+
+
+.
+
+
+20.0
+
+
+.
+
+
+30.0
+
+
The program must have a print statement
+outside the body of the loop to show the final value of
+velocity, since its value is updated by the last iteration
+of the loop.
+
+
+
+
+
+
Compound Relations Using and,
+or, and Parentheses
+
+
Often, you want some combination of things to be true. You can
+combine relations within a conditional using and and
+or. Continuing the example above, suppose you have
+
+
PYTHON
+
+
mass = [ 3.54, 2.07, 9.22, 1.86, 1.71]
+velocity = [10.00, 20.00, 30.00, 25.00, 20.00]
+
+i =0
+for i inrange(5):
+if mass[i] >5and velocity[i] >20:
+print("Fast heavy object. Duck!")
+elif mass[i] >2and mass[i] <=5and velocity[i] <=20:
+print("Normal traffic")
+elif mass[i] <=2and velocity[i] <=20:
+print("Slow light object. Ignore it")
+else:
+print("Whoa! Something is up with the data. Check it")
+
+
Just like with arithmetic, you can and should use parentheses
+whenever there is possible ambiguity. A good general rule is to
+always use parentheses when mixing and and
+or in the same condition. That is, instead of:
+
+
PYTHON
+
+
if mass[i] <=2or mass[i] >=5and velocity[i] >20:
+
+
write one of these:
+
+
PYTHON
+
+
if (mass[i] <=2or mass[i] >=5) and velocity[i] >20:
+if mass[i] <=2or (mass[i] >=5and velocity[i] >20):
+
+
so it is perfectly clear to a reader (and to Python) what you really
+mean.
Fill in the blanks so that this program creates a new list containing
+zeroes where the original list’s values were negative and ones where the
+original list’s values were positive.
+
+
PYTHON
+
+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = ____
+for value in original:
+if ____:
+ result.append(0)
+else:
+ ____
+print(result)
+
+
+
OUTPUT
+
+
[0, 1, 1, 1, 0, 1]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = []
+for value in original:
+if value <0.0:
+ result.append(0)
+else:
+ result.append(1)
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Processing Small Files
+
+
Modify this program so that it only processes files with fewer than
+50 records.
+
+
PYTHON
+
+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+ contents = pd.read_csv(filename)
+ ____:
+print(filename, len(contents))
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+ contents = pd.read_csv(filename)
+iflen(contents) <50:
+print(filename, len(contents))
+
+
+
+
+
+
+
+
+
+
+
Initializing
+
+
Modify this program so that it finds the largest and smallest values
+in the list no matter what the range of values originally is.
+
+
PYTHON
+
+
values = [...some test data...]
+smallest, largest =None, None
+for v in values:
+if ____:
+ smallest, largest = v, v
+ ____:
+ smallest =min(____, v)
+ largest =max(____, v)
+print(smallest, largest)
+
+
What are the advantages and disadvantages of using this method to
+find the range of the data?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest =None, None
+for v in values:
+if smallest isNoneand largest isNone:
+ smallest, largest = v, v
+else:
+ smallest =min(smallest, v)
+ largest =max(largest, v)
+print(smallest, largest)
+
+
If you wrote == None instead of is None,
+that works too, but Python programmers always write is None
+because of the special way None works in the language.
+
It can be argued that an advantage of using this method would be to
+make the code more readable. However, a disadvantage is that this code
+is not efficient because within each iteration of the for
+loop statement, there are two more loops that run over two numbers each
+(the min and max functions). It would be more
+efficient to iterate over each number just once:
+
+
PYTHON
+
+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest =None, None
+for v in values:
+if smallest isNoneor v < smallest:
+ smallest = v
+if largest isNoneor v > largest:
+ largest = v
+print(smallest, largest)
+
+
Now we have one loop, but four comparison tests. There are two ways
+we could improve it further: either use fewer comparisons in each
+iteration, or use two loops that each contain only one comparison test.
+The simplest solution is often the best:
+
+
diff --git a/14-looping-data-sets.html b/14-looping-data-sets.html
new file mode 100644
index 000000000..53cb68901
--- /dev/null
+++ b/14-looping-data-sets.html
@@ -0,0 +1,857 @@
+
+Plotting and Programming in Python: Looping Over Data Sets
+ Skip to main content
+
Use glob.glob
+to find sets of files whose names match a pattern.
+
In Unix, the term “globbing” means “matching a set of files with a
+pattern”.
+
The most common patterns are:
+
+* meaning “match zero or more characters”
+
+? meaning “match exactly one character”
+
+
Python’s standard library contains the glob
+module to provide pattern matching functionality
+
The glob
+module contains a function also called glob to match file
+patterns
+
E.g., glob.glob('*.txt') matches all files in the
+current directory whose names end with .txt.
+
Result is a (possibly empty) list of character strings.
+
+
PYTHON
+
+
import glob
+print('all csv files in data directory:', glob.glob('data/*.csv'))
+
+
+
OUTPUT
+
+
all csv files in data directory: ['data/gapminder_all.csv', 'data/gapminder_gdp_africa.csv', \
+'data/gapminder_gdp_americas.csv', 'data/gapminder_gdp_asia.csv', 'data/gapminder_gdp_europe.csv', \
+'data/gapminder_gdp_oceania.csv']
+
+
+
PYTHON
+
+
print('all PDB files:', glob.glob('*.pdb'))
+
+
+
OUTPUT
+
+
all PDB files: []
+
+
Use glob and for to process batches of
+files.
+
Helps a lot if the files are named and stored systematically and
+consistently so that simple patterns will find the right data.
+
+
PYTHON
+
+
for filename in glob.glob('data/gapminder_*.csv'):
+ data = pd.read_csv(filename)
+print(filename, data['gdpPercap_1952'].min())
You might have chosen to initialize the fewest variable
+with a number greater than the numbers you’re dealing with, but that
+could lead to trouble if you reuse the code with bigger numbers. Python
+lets you use positive infinity, which will work no matter how big your
+numbers are. What other special strings does the float
+function recognize?
+
+
+
+
+
+
+
+
+
+
Comparing Data
+
+
Write a program that reads in the regional data sets and plots the
+average GDP per capita for each region over time in a single chart.
+Pandas will raise an error if it encounters non-numeric columns in a
+dataframe computation so you may need to either filter out those columns
+or tell pandas to ignore them.
+
+
+
+
+
+
+
+
+
This solution builds a useful legend by using the string
+split method to extract the region from
+the path ‘data/gapminder_gdp_a_specific_region.csv’.
+
+
PYTHON
+
+
import glob
+import pandas as pd
+import matplotlib.pyplot as plt
+fig, ax = plt.subplots(1,1)
+for filename in glob.glob('data/gapminder_gdp*.csv'):
+ dataframe = pd.read_csv(filename)
+# extract <region> from the filename, expected to be in the format 'data/gapminder_gdp_<region>.csv'.
+# we will split the string using the split method and `_` as our separator,
+# retrieve the last string in the list that split returns (`<region>.csv`),
+# and then remove the `.csv` extension from that string.
+# NOTE: the pathlib module covered in the next callout also offers
+# convenient abstractions for working with filesystem paths and could solve this as well:
+# from pathlib import Path
+# region = Path(filename).stem.split('_')[-1]
+ region = filename.split('_')[-1][:-4]
+# pandas raises errors when it encounters non-numeric columns in a dataframe computation
+# but we can tell pandas to ignore them with the `numeric_only` parameter
+ dataframe.mean(numeric_only=True).plot(ax=ax, label=region)
+# NOTE: another way of doing this selects just the columns with gdp in their name using the filter method
+# dataframe.filter(like="gdp").mean().plot(ax=ax, label=region)
+
+plt.legend()
+plt.show()
+
+
+
+
+
+
+
+
+
+
+
Dealing with File Paths
+
+
The pathlib
+module provides useful abstractions for file and path manipulation
+like returning the name of a file without the file extension. This is
+very useful when looping over files and directories. In the example
+below, we create a Path object and inspect its
+attributes.
A common refrain in software engineering is “Don’t Repeat Yourself”.
+How do the techniques we’ve learned in the last lessons help us avoid
+repeating ourselves? Note that in practice there is some nuance to
+this and should be balanced with doing the simplest thing that could
+possibly work.
+
+
What are the pros / cons of making a variable global or local to a
+function?
+
When would you consider turning a block of code into a function
+definition?
Explain and identify the difference between function definition and
+function call.
+
Write a function that takes a small, fixed number of arguments and
+produces a single result.
+
+
+
+
+
+
Break programs down into functions to make them easier to
+understand.
+
Human beings can only keep a few items in working memory at a
+time.
+
Understand larger/more complicated ideas by understanding and
+combining pieces.
+
Components in a machine.
+
Lemmas when proving theorems.
+
+
Functions serve the same purpose in programs.
+
+Encapsulate complexity so that we can treat it as a single
+“thing”.
+
+
Also enables re-use.
+
Write one time, use many times.
+
+
Define a function using def with a name, parameters,
+and a block of code.
+
Begin the definition of a new function with def.
+
Followed by the name of the function.
+
Must obey the same rules as variable names.
+
+
Then parameters in parentheses.
+
Empty parentheses if the function doesn’t take any inputs.
+
We will discuss this in detail in a moment.
+
+
Then a colon.
+
Then an indented block of code.
+
+
PYTHON
+
+
def print_greeting():
+print('Hello!')
+print('The weather is nice today.')
+print('Right?')
+
+
Defining a function does not run it.
+
Defining a function does not run it.
+
Like assigning a value to a variable.
+
+
Must call the function to execute the code it contains.
+
+
PYTHON
+
+
print_greeting()
+
+
+
OUTPUT
+
+
Hello!
+
+
Arguments in a function call are matched to its defined
+parameters.
+
Functions are most useful when they can operate on different
+data.
+
Specify parameters when defining a function.
+
These become variables when the function is executed.
+
Are assigned the arguments in the call (i.e., the values passed to
+the function).
+
If you don’t name the arguments when using them in the call, the
+arguments will be matched to parameters in the order the parameters are
+defined in the function.
Or, we can name the arguments when we call the function, which allows
+us to specify them in any order and adds clarity to the call site;
+otherwise as one is reading the code they might forget if the second
+argument is the month or the day for example.
+
+
PYTHON
+
+
print_date(month=3, day=19, year=1871)
+
+
+
OUTPUT
+
+
1871/3/19
+
+
Via Twitter:
+() contains the ingredients for the function while the body
+contains the recipe.
+
Functions may return a result to their caller using
+return.
+
Use return ... to give a value back to the caller.
+
May occur anywhere in the function.
+
But functions are easier to understand if return
+occurs:
+
A function that doesn’t explicitly return a value
+automatically returns None.
+
+
PYTHON
+
+
result = print_date(1871, 3, 19)
+print('result of call is:', result)
+
+
+
OUTPUT
+
+
1871/3/19
+result of call is: None
+
+
+
+
+
+
+
Identifying Syntax Errors
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code and read the error message. Is it a
+SyntaxError or an IndentationError?
+
Fix the error.
+
Repeat steps 2 and 3 until you have fixed all the errors.
+
+
PYTHON
+
+
def another_function
+print("Syntax errors are annoying.")
+print("But at least python tells us about them!")
+print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def another_function():
+print("Syntax errors are annoying.")
+print("But at least Python tells us about them!")
+print("So they are usually not too hard to fix.")
A function call always needs parenthesis, otherwise you get memory
+address of the function object. So, if we wanted to call the function
+named report, and give it the value 22.5 to report on, we could have our
+function call as follows
After fixing the problem above, explain why running this example
+code:
+
+
PYTHON
+
+
result = print_time(11, 37, 59)
+print('result of call is:', result)
+
+
gives this output:
+
+
OUTPUT
+
+
11:37:59
+result of call is: None
+
+
Why is the result of the call None?
+
+
+
+
+
+
+
+
+
The problem with the example is that the function
+print_time() is defined after the call to the
+function is made. Python doesn’t know how to resolve the name
+print_time since it hasn’t been defined yet and will raise
+a NameError e.g.,
+NameError: name 'print_time' is not defined
+
The first line of output 11:37:59 is printed by the
+first line of code, result = print_time(11, 37, 59) that
+binds the value returned by invoking print_time to the
+variable result. The second line is from the second print
+call to print the contents of the result variable.
+
print_time() does not explicitly return
+a value, so it automatically returns None.
+
+
+
+
+
+
+
+
+
+
Encapsulation
+
+
Fill in the blanks to create a function that takes a single filename
+as an argument, loads the data in the file named by the argument, and
+returns the minimum value in that data.
+
+
PYTHON
+
+
import pandas as pd
+
+def min_in_data(____):
+ data = ____
+return ____
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import pandas as pd
+
+def min_in_data(filename):
+ data = pd.read_csv(filename)
+return data.min()
+
+
+
+
+
+
+
+
+
+
+
Find the First
+
+
Fill in the blanks to create a function that takes a list of numbers
+as an argument and returns the first negative value in the list. What
+does your function do if the list is empty? What if the list has no
+negative numbers?
+
+
PYTHON
+
+
def first_negative(values):
+for v in ____:
+if ____:
+return ____
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def first_negative(values):
+for v in values:
+if v <0:
+return v
+
+
If an empty list or a list with all positive values is passed to this
+function, it returns None:
+
+
PYTHON
+
+
my_list = []
+print(first_negative(my_list))
+
+
+
OUTPUT
+
+
None
+
+
+
+
+
+
+
+
+
+
+
Calling by Name
+
+
Earlier we saw this function:
+
+
PYTHON
+
+
def print_date(year, month, day):
+ joined =str(year) +'/'+str(month) +'/'+str(day)
+print(joined)
+
+
We saw that we can call the function using named arguments,
+like this:
+
+
PYTHON
+
+
print_date(day=1, month=2, year=2003)
+
+
What does print_date(day=1, month=2, year=2003)
+print?
+
When have you seen a function call like this before?
+
When and why is it useful to call functions this way?
+
+
+
+
+
+
+
+
+
2003/2/1
+
We saw examples of using named arguments when working with
+the pandas library. For example, when reading in a dataset using
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country'),
+the last argument index_col is a named argument.
+
Using named arguments can make code more readable since one can see
+from the function call what name the different arguments have inside the
+function. It can also reduce the chances of passing arguments in the
+wrong order, since by using named arguments the order doesn’t
+matter.
+
+
+
+
+
+
+
+
+
+
Encapsulation of an If/Print Block
+
+
The code below will run on a label-printer for chicken eggs. A
+digital scale will report a chicken egg mass (in grams) to the computer
+and then the computer will print a label.
+
+
PYTHON
+
+
import random
+for i inrange(10):
+
+# simulating the mass of a chicken egg
+# the (random) mass will be 70 +/- 20 grams
+ mass =70+20.0* (2.0* random.random() -1.0)
+
+print(mass)
+
+# egg sizing machinery prints a label
+if mass >=85:
+print("jumbo")
+elif mass >=70:
+print("large")
+elif mass <70and mass >=55:
+print("medium")
+else:
+print("small")
+
+
The if-block that classifies the eggs might be useful in other
+situations, so to avoid repeating it, we could fold it into a function,
+get_egg_label(). Revising the program to use the function
+would give us this:
+
+
PYTHON
+
+
# revised version
+import random
+for i inrange(10):
+
+# simulating the mass of a chicken egg
+# the (random) mass will be 70 +/- 20 grams
+ mass =70+20.0* (2.0* random.random() -1.0)
+
+print(mass, get_egg_label(mass))
+
+
Create a function definition for get_egg_label() that
+will work with the revised program above. Note that the
+get_egg_label() function’s return value will be important.
+Sample output from the above program would be
+71.23 large.
+
A dirty egg might have a mass of more than 90 grams, and a spoiled
+or broken egg will probably have a mass that’s less than 50 grams.
+Modify your get_egg_label() function to account for these
+error conditions. Sample output could be
+25 too light, probably spoiled.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def get_egg_label(mass):
+# egg sizing machinery prints a label
+ egg_label ="Unlabelled"
+if mass >=90:
+ egg_label ="warning: egg might be dirty"
+elif mass >=85:
+ egg_label ="jumbo"
+elif mass >=70:
+ egg_label ="large"
+elif mass <70and mass >=55:
+ egg_label ="medium"
+elif mass <50:
+ egg_label ="too light, probably spoiled"
+else:
+ egg_label ="small"
+return egg_label
How would you generalize this function if you did not know
+beforehand which specific years occurred as columns in the data? For
+instance, what if we also had data from years ending in 1 and 9 for each
+decade? (Hint: use the columns to filter out the ones that correspond to
+the decade, instead of enumerating them in the code.)
+
+
+
+
+
+
+
+
+
The average GDP for Japan across the years reported for the 1980s is
+computed with:
To obtain the average for the relevant years, we need to loop over
+them:
+
+
PYTHON
+
+
def avg_gdp_in_decade(country, continent, year):
+ data_countries = pd.read_csv('data/gapminder_gdp_'+ continent +'.csv', index_col=0)
+ c = data_countries.loc[country]
+ gdp_decade ='gdpPercap_'+str(year //10)
+ total =0.0
+ num_years =0
+for yr_header in c.index: # c's index contains reported years
+if yr_header.startswith(gdp_decade):
+ total = total + c.loc[yr_header]
+ num_years = num_years +1
+return total/num_years
+
+
The function can now be called by:
+
+
PYTHON
+
+
avg_gdp_in_decade('Japan','asia',1983)
+
+
+
OUTPUT
+
+
20880.023800000003
+
+
+
+
+
+
+
+
+
+
+
Simulating a dynamical system
+
+
In mathematics, a dynamical
+system is a system in which a function describes the time dependence
+of a point in a geometrical space. A canonical example of a dynamical
+system is the logistic map, a
+growth model that computes a new population density (between 0 and 1)
+based on the current density. In the model, time takes discrete values
+0, 1, 2, …
+
Define a function called logistic_map that takes two
+inputs: x, representing the current population (at time
+t), and a parameter r = 1. This function
+should return a value representing the state of the system (population)
+at time t + 1, using the mapping function:
+
f(t+1) = r * f(t) * [1 - f(t)]
+
Using a for or while loop, iterate the
+logistic_map function defined in part 1, starting from an
+initial population of 0.5, for a period of time
+t_final = 10. Store the intermediate results in a list so
+that after the loop terminates you have accumulated a sequence of values
+representing the state of the logistic map at times
+t = [0,1,...,t_final] (11 values in total). Print this list
+to see the evolution of the population.
+
Encapsulate the logic of your loop into a function called
+iterate that takes the initial population as its first
+input, the parameter t_final as its second input and the
+parameter r as its third input. The function should return
+the list of values representing the state of the logistic map at times
+t = [0,1,...,t_final]. Run this function for periods
+t_final = 100 and 1000 and print some of the
+values. Is the population trending toward a steady state?
Functions will often contain conditionals. Here is a short example
+that will indicate which quartile the argument is in based on hand-coded
+values for the quartile cut points.
+
+
PYTHON
+
+
def calculate_life_quartile(exp):
+if exp <58.41:
+# This observation is in the first quartile
+return1
+elif exp >=58.41and exp <67.05:
+# This observation is in the second quartile
+return2
+elif exp >=67.05and exp <71.70:
+# This observation is in the third quartile
+return3
+elif exp >=71.70:
+# This observation is in the fourth quartile
+return4
+else:
+# This observation has bad data
+returnNone
+
+calculate_life_quartile(62.5)
+
+
+
OUTPUT
+
+
2
+
+
That function would typically be used within a for loop,
+but Pandas has a different, more efficient way of doing the same thing,
+and that is by applying a function to a dataframe or a portion
+of a dataframe. Here is an example, using the definition above.
+
+
PYTHON
+
+
data = pd.read_csv('data/gapminder_all.csv')
+data['life_qrtl'] = data['lifeExp_1952'].apply(calculate_life_quartile)
+
+
There is a lot in that second line, so let’s take it piece by piece.
+On the right side of the = we start with
+data['lifeExp'], which is the column in the dataframe
+called data labeled lifExp. We use the
+apply() to do what it says, apply the
+calculate_life_quartile to the value of this column for
+every row in the dataframe.
+
+
+
+
+
+
+
+
+
Key Points
+
+
Break programs down into functions to make them easier to
+understand.
+
Define a function using def with a name, parameters,
+and a block of code.
+
Defining a function does not run it.
+
Arguments in a function call are matched to its defined
+parameters.
+
Functions may return a result to their caller using
+return.
Read a traceback and determine the file, function, and line number
+on which the error occurred, the type of error, and the error
+message.
+
+
+
+
+
+
The scope of a variable is the part of a program that can ‘see’ that
+variable.
+
There are only so many sensible names for variables.
+
People using functions shouldn’t have to worry about what variable
+names the author of the function used.
+
People writing functions shouldn’t have to worry about what variable
+names the function’s caller uses.
+
The part of a program in which a variable is visible is called its
+scope.
+
+
PYTHON
+
+
pressure =103.9
+
+def adjust(t):
+ temperature = t *1.43/ pressure
+return temperature
+
+
+pressure is a global variable.
+
Defined outside any particular function.
+
Visible everywhere.
+
+
+t and temperature are local
+variables in adjust.
+
Defined in the function.
+
Not visible in the main program.
+
Remember: a function parameter is a variable that is automatically
+assigned a value when the function is called.
+
+
+
PYTHON
+
+
print('adjusted:', adjust(0.9))
+print('temperature after call:', temperature)
+
+
+
OUTPUT
+
+
adjusted:0.01238691049085659
+
+
+
ERROR
+
+
Traceback (most recent call last):
+ File "/Users/swcarpentry/foo.py", line 8, in <module>
+ print('temperature after call:', temperature)
+NameError: name 'temperature' is not defined
+
+
+
+
+
+
+
Local and Global Variable Use
+
+
Trace the values of all variables in this program as it is executed.
+(Use ‘—’ as the value of variables before and after they exist.)
Read the traceback below, and identify the following:
+
How many levels does the traceback have?
+
What is the file name where the error occurred?
+
What is the function name where the error occurred?
+
On which line number in this function did the error occur?
+
What is the type of error?
+
What is the error message?
+
+
ERROR
+
+
---------------------------------------------------------------------------
+KeyError Traceback (most recent call last)
+<ipython-input-2-e4c4cbafeeb5> in <module>()
+ 1 import errors_02
+----> 2 errors_02.print_friday_message()
+
+/Users/ghopper/thesis/code/errors_02.py in print_friday_message()
+ 13
+ 14 def print_friday_message():
+---> 15 print_message("Friday")
+
+/Users/ghopper/thesis/code/errors_02.py in print_message(day)
+ 9 "sunday": "Aw, the weekend is almost over."
+ 10 }
+---> 11 print(messages[day])
+ 12
+ 13
+
+KeyError: 'Friday'
+
+
+
+
+
+
+
+
+
+
Three levels.
+
errors_02.py
+
print_message
+
Line 11
+
+KeyError. These errors occur when we are trying to look
+up a key that does not exist (usually in a data structure such as a
+dictionary). We can find more information about the
+KeyError and other built-in exceptions in the Python
+docs.
+
KeyError: 'Friday'
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
The scope of a variable is the part of a program that can ‘see’ that
+variable.
Provide sound justifications for basic rules of coding style.
+
Refactor one-page programs to make them more readable and justify
+the changes.
+
Use Python community coding standards (PEP-8).
+
+
+
+
+
+
Coding style
+
A consistent coding style helps others (including our future selves)
+read and understand code more easily. Code is read much more often than
+it is written, and as the Zen of Python
+states, “Readability counts”. Python proposed a standard style through
+one of its first Python Enhancement Proposals (PEP), PEP8.
+
Some points worth highlighting:
+
document your code and ensure that assumptions, internal algorithms,
+expected inputs, expected outputs, etc., are clear
+
use clear, semantically meaningful variable names
+
use white-space, not tabs, to indent lines (tabs can cause
+problems across different text editors, operating systems, and version
+control systems)
+
Follow standard Python style in your code.
+
+PEP8: a style
+guide for Python that discusses topics such as how to name variables,
+how to indent your code, how to structure your import
+statements, etc. Adhering to PEP8 makes it easier for other Python
+developers to read and understand your code, and to understand what
+their contributions should look like.
+
To check your code for compliance with PEP8, you can use the pycodestyle application
+and tools like the black code
+formatter can automatically format your code to conform to PEP8 and
+pycodestyle (a Jupyter notebook formatter also exists nb_black).
+
Some groups and organizations follow different style guidelines
+besides PEP8. For example, the Google style
+guide on Python makes slightly different recommendations. Google
+wrote an application that can help you format your code in either their
+style or PEP8 called yapf.
+
With respect to coding style, the key is consistency.
+Choose a style for your project be it PEP8, the Google style, or
+something else and do your best to ensure that you and anyone else you
+are collaborating with sticks to it. Consistency within a project is
+often more impactful than the particular style used. A consistent style
+will make your software easier to read and understand for others and for
+your future self.
+
Use assertions to check for internal errors.
+
Assertions are a simple but powerful method for making sure that the
+context in which your code is executing is as you expect.
+
+
PYTHON
+
+
def calc_bulk_density(mass, volume):
+'''Return dry bulk density = powder mass / powder volume.'''
+assert volume >0
+return mass / volume
+
+
If the assertion is False, the Python interpreter raises
+an AssertionError runtime exception. The source code for
+the expression that failed will be displayed as part of the error
+message. To ignore assertions in your code run the interpreter with the
+‘-O’ (optimize) switch. Assertions should contain only simple checks and
+never change the state of the program. For example, an assertion should
+never contain an assignment.
+
Use docstrings to provide builtin help.
+
If the first thing in a function is a character string that is not
+assigned directly to a variable, Python attaches it to the function,
+accessible via the builtin help function. This string that provides
+documentation is also known as a docstring.
+
+
PYTHON
+
+
def average(values):
+"Return average of values, or None if no values are supplied."
+
+iflen(values) ==0:
+returnNone
+returnsum(values) /len(values)
+
+help(average)
+
+
+
OUTPUT
+
+
Help on function average in module __main__:
+
+average(values)
+ Return average of values, or None if no values are supplied.
+
+
+
+
+
+
+
Multiline Strings
+
+
Often use multiline strings for documentation. These start
+and end with three quote characters (either single or double) and end
+with three matching characters.
+
+
PYTHON
+
+
"""This string spans
+multiple lines.
+
+Blank lines are allowed."""
+
+
+
+
+
+
+
+
+
+
What Will Be Shown?
+
+
Highlight the lines in the code below that will be available as
+online help. Are there lines that should be made available, but won’t
+be? Will any lines produce a syntax error or a runtime error?
+
+
PYTHON
+
+
"Find maximum edit distance between multiple sequences."
+# This finds the maximum distance between all sequences.
+
+def overall_max(sequences):
+'''Determine overall maximum edit distance.'''
+
+ highest =0
+for left in sequences:
+for right in sequences:
+'''Avoid checking sequence against itself.'''
+if left != right:
+ this = edit_distance(left, right)
+ highest =max(highest, this)
+
+# Report.
+return highest
+
+
+
+
+
+
+
+
+
+
Document This
+
+
Use comments to describe and help others understand potentially
+unintuitive sections or individual lines of code. They are especially
+useful to whoever may need to understand and edit your code in the
+future, including yourself.
+
Use docstrings to document the acceptable inputs and expected outputs
+of a method or class, its purpose, assumptions and intended behavior.
+Docstrings are displayed when a user invokes the builtin
+help method on your method or class.
+
Turn the comment in the following function into a docstring and check
+that help displays it properly.
+
+
PYTHON
+
+
def middle(a, b, c):
+# Return the middle value of three.
+# Assumes the values can actually be compared.
+ values = [a, b, c]
+ values.sort()
+return values[1]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def middle(a, b, c):
+'''Return the middle value of three.
+ Assumes the values can actually be compared.'''
+ values = [a, b, c]
+ values.sort()
+return values[1]
+
+
+
+
+
+
+
+
+
+
+
Clean Up This Code
+
+
Read this short program and try to predict what it does.
+
Run it: how accurate was your prediction?
+
Refactor the program to make it more readable. Remember to run it
+after each change to ensure its behavior hasn’t changed.
+
Compare your rewrite with your neighbor’s. What did you do the same?
+What did you do differently, and why?
+
+
PYTHON
+
+
n =10
+s ='et cetera'
+print(s)
+i =0
+while i < n:
+# print('at', j)
+ new =''
+for j inrange(len(s)):
+ left = j-1
+ right = (j+1)%len(s)
+if s[left]==s[right]: new = new +'-'
+else: new = new +'*'
+ s=''.join(new)
+print(s)
+ i +=1
+
+
+
+
+
+
+
+
+
+
Here’s one solution.
+
+
PYTHON
+
+
def string_machine(input_string, iterations):
+"""
+ Takes input_string and generates a new string with -'s and *'s
+ corresponding to characters that have identical adjacent characters
+ or not, respectively. Iterates through this procedure with the resultant
+ strings for the supplied number of iterations.
+ """
+print(input_string)
+ input_string_length =len(input_string)
+ old = input_string
+for i inrange(iterations):
+ new =''
+# iterate through characters in previous string
+for j inrange(input_string_length):
+ left = j-1
+ right = (j+1) % input_string_length # ensure right index wraps around
+if old[left] == old[right]:
+ new = new +'-'
+else:
+ new = new +'*'
+print(new)
+# store new string as old
+ old = new
+
+string_machine('et cetera', 10)
Name and locate scientific Python community sites for software,
+workshops, and help.
+
+
+
+
+
+
Leslie Lamport once said, “Writing is nature’s way of showing you how
+sloppy your thinking is.” The same is true of programming: many things
+that seem obvious when we’re thinking about them turn out to be anything
+but when we have to explain them precisely.
+
Python supports a large and diverse community across academia and
+industry.
+
+
diff --git a/404.html b/404.html
new file mode 100644
index 000000000..e668c9f57
--- /dev/null
+++ b/404.html
@@ -0,0 +1,546 @@
+
+Plotting and Programming in Python: Page not found
+ Skip to main content
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Plotting and Programming in Python
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Page not found
+
+
Our apologies!
+
We cannot seem to find the page you are looking for. Here are some
+tips that may help:
to Share—copy and redistribute the material in any
+medium or format
+
to Adapt—remix, transform, and build upon the
+material
+
for any purpose, even commercially.
+
The licensor cannot revoke these freedoms as long as you follow the
+license terms.
+
Under the following terms:
+
Attribution—You must give appropriate credit
+(mentioning that your work is derived from work that is Copyright (c)
+The Carpentries and, where practical, linking to https://carpentries.org/), provide a link to the
+license, and indicate if changes were made. You may do so in any
+reasonable manner, but not in any way that suggests the licensor
+endorses you or your use.
+
No additional restrictions—You may not apply
+legal terms or technological measures that legally restrict others from
+doing anything the license permits. With the understanding
+that:
+
Notices:
+
You do not have to comply with the license for elements of the
+material in the public domain or where your use is permitted by an
+applicable exception or limitation.
+
No warranties are given. The license may not give you all of the
+permissions necessary for your intended use. For example, other rights
+such as publicity, privacy, or moral rights may limit how you use the
+material.
+
Software
+
Except where otherwise noted, the example programs and other software
+provided by The Carpentries are made available under the OSI-approved MIT
+license.
+
Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+“Software”), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
Trademark
+
“The Carpentries”, “Software Carpentry”, “Data Carpentry”, and
+“Library Carpentry” and their respective logos are registered trademarks
+of Community Initiatives.
Understand the difference between a Python script and a Jupyter
+notebook.
+
Create Markdown cells in a notebook.
+
Create and run Python cells in a notebook.
+
+
+
+
+
+
+
To run Python, we are going to use Jupyter Notebooks via JupyterLab for
+the remainder of this workshop. Jupyter notebooks are common in data
+science and visualization and serve as a convenient common-denominator
+experience for running Python code interactively where we can easily
+view and share the results of our Python code.
+
There are other ways of editing, managing, and running code. Software
+developers often use an integrated development environment (IDE) like PyCharm or Visual Studio Code, or text
+editors like Vim or Emacs, to create and edit their Python programs.
+After editing and saving your Python programs you can execute those
+programs within the IDE itself or directly on the command line. In
+contrast, Jupyter notebooks let us execute and view the results of our
+Python code immediately within the notebook.
+
JupyterLab has several other handy features:
+
+
You can easily type, edit, and copy and paste blocks of code.
+
Tab complete allows you to easily access the names of things you are
+using and learn more about them.
+
It allows you to annotate your code with links, different sized
+text, bullets, etc. to make it more accessible to you and your
+collaborators.
+
It allows you to display figures next to the code that produces them
+to tell a complete story of the analysis.
+
+
Each notebook contains one or more cells that contain code, text, or
+images.
+
Getting Started with JupyterLab
+
+
+
JupyterLab is an application server with a web user interface from Project Jupyter that enables one to work
+with documents and activities such as Jupyter notebooks, text editors,
+terminals, and even custom components in a flexible, integrated, and
+extensible manner. JupyterLab requires a reasonably up-to-date browser
+(ideally a current version of Chrome, Safari, or Firefox); Internet
+Explorer versions 9 and below are not supported.
+
JupyterLab is included as part of the Anaconda Python distribution.
+If you have not already installed the Anaconda Python distribution, see
+the setup instructions for installation
+instructions.
+
In this lesson we will run JupyterLab locally on our own machines so
+it will not require an internet connection besides the initial
+connection to download and install Anaconda and JupyterLab
+
+
Start the JupyterLab server on your machine
+
Use a web browser to open a special localhost URL that connects to
+your JupyterLab server
+
The JupyterLab server does the work and the web browser renders the
+result
+
Type code into the browser and see the results after your JupyterLab
+server has finished executing your code
Experienced users of Jupyter notebooks interested in a more detailed
+discussion of the similarities and differences between the JupyterLab
+and Jupyter notebook user interfaces can find more information in the JupyterLab
+user interface documentation.
+
+
+
+
Starting JupyterLab
+
+
+
You can start the JupyterLab server through the command line or
+through an application called Anaconda Navigator. Anaconda
+Navigator is included as part of the Anaconda Python distribution.
+
+
macOS - Command Line
+
+
To start the JupyterLab server you will need to access the command
+line through the Terminal. There are two ways to open Terminal on
+Mac.
+
+
In your Applications folder, open Utilities and double-click on
+Terminal
+
Press Command + spacebar to launch Spotlight.
+Type Terminal and then double-click the search result or
+hit Enter
+
+
+
After you have launched Terminal, type the command to launch the
+JupyterLab server.
+
+
BASH
+
+
$ jupyter lab
+
+
+
+
Windows Users - Command Line
+
+
To start the JupyterLab server you will need to access the Anaconda
+Prompt.
+
Press Windows Logo Key and search for
+Anaconda Prompt, click the result or press enter.
+
After you have launched the Anaconda Prompt, type the command:
+
+
BASH
+
+
$ jupyter lab
+
+
+
+
Anaconda Navigator
+
+
To start a JupyterLab server from Anaconda Navigator you must first
+start
+Anaconda Navigator (click for detailed instructions on macOS, Windows,
+and Linux). You can search for Anaconda Navigator via Spotlight on
+macOS (Command + spacebar), the Windows search
+function (Windows Logo Key) or opening a terminal shell and
+executing the anaconda-navigator executable from the
+command line.
+
After you have launched Anaconda Navigator, click the
+Launch button under JupyterLab. You may need to scroll down
+to find it.
+
Here is a screenshot of an Anaconda Navigator page similar to the one
+that should open on either macOS or Windows.
+
+
+
And here is a screenshot of a JupyterLab landing page that should be
+similar to the one that opens in your default web browser after starting
+the JupyterLab server on either macOS or Windows.
+
+
+
+
The JupyterLab Interface
+
+
+
JupyterLab has many features found in traditional integrated
+development environments (IDEs) but is focused on providing flexible
+building blocks for interactive, exploratory computing.
+
The JupyterLab
+Interface consists of the Menu Bar, a collapsable Left Side Bar, and
+the Main Work Area which contains tabs of documents and activities.
+
+
Menu Bar
+
+
The Menu Bar at the top of JupyterLab has the top-level menus that
+expose various actions available in JupyterLab along with their keyboard
+shortcuts (where applicable). The following menus are included by
+default.
+
+
+File: Actions related to files and directories such
+as New, Open, Close, Save, etc. The
+File menu also includes the Shut Down action used to
+shutdown the JupyterLab server.
+
+Edit: Actions related to editing documents and
+other activities such as Undo, Cut, Copy,
+Paste, etc.
+
+View: Actions that alter the appearance of
+JupyterLab.
+
+Run: Actions for running code in different
+activities such as notebooks and code consoles (discussed below).
+
+Kernel: Actions for managing kernels. Kernels in
+Jupyter will be explained in more detail below.
+
+Tabs: A list of the open documents and activities
+in the main work area.
+
+Settings: Common JupyterLab settings can be
+configured using this menu. There is also an Advanced Settings
+Editor option in the dropdown menu that provides more fine-grained
+control of JupyterLab settings and configuration options.
+
+Help: A list of JupyterLab and kernel help
+links.
+
+
+
+
+
+
+
Kernels
+
+
The JupyterLab docs
+define kernels as “separate processes started by the server that runs
+your code in different programming languages and environments.” When we
+open a Jupyter Notebook, that starts a kernel - a process - that is
+going to run the code. In this lesson, we’ll be using the Jupyter
+ipython kernel which lets us run Python 3 code interactively.
+
Using other Jupyter kernels
+for other programming languages would let us write and execute code
+in other programming languages in the same JupyterLab interface, like R,
+Java, Julia, Ruby, JavaScript, Fortran, etc.
+
+
+
+
A screenshot of the default Menu Bar is provided below.
+
+
+
+
+
Left Sidebar
+
+
The left sidebar contains a number of commonly used tabs, such as a
+file browser (showing the contents of the directory where the JupyterLab
+server was launched), a list of running kernels and terminals, the
+command palette, and a list of open tabs in the main work area. A
+screenshot of the default Left Side Bar is provided below.
+
+
+
The left sidebar can be collapsed or expanded by selecting “Show Left
+Sidebar” in the View menu or by clicking on the active sidebar tab.
+
+
+
Main Work Area
+
+
The main work area in JupyterLab enables you to arrange documents
+(notebooks, text files, etc.) and other activities (terminals, code
+consoles, etc.) into panels of tabs that can be resized or subdivided. A
+screenshot of the default Main Work Area is provided below.
+
If you do not see the Launcher tab, click the blue plus sign under
+the “File” and “Edit” menus and it will appear.
+
+
+
Drag a tab to the center of a tab panel to move the tab to the panel.
+Subdivide a tab panel by dragging a tab to the left, right, top, or
+bottom of the panel. The work area has a single current activity. The
+tab for the current activity is marked with a colored top border (blue
+by default).
+
+
Creating a Python script
+
+
+
+
To start writing a new Python program click the Text File icon under
+the Other header in the Launcher tab of the Main Work Area.
+
+
You can also create a new plain text file by selecting the New
+-> Text File from the File menu in the Menu Bar.
+
+
+
To convert this plain text file to a Python program, select the
+Save File As action from the File menu in the Menu Bar
+and give your new text file a name that ends with the .py
+extension.
+
+
The .py extension lets everyone (including the
+operating system) know that this text file is a Python program.
+
This is convention, not a requirement.
+
+
+
Creating a Jupyter Notebook
+
+
+
To open a new notebook click the Python 3 icon under the
+Notebook header in the Launcher tab in the main work area. You
+can also create a new notebook by selecting New -> Notebook
+from the File menu in the Menu Bar.
+
Additional notes on Jupyter notebooks.
+
+
Notebook files have the extension .ipynb to distinguish
+them from plain-text Python programs.
+
Notebooks can be exported as Python scripts that can be run from the
+command line.
+
+
Below is a screenshot of a Jupyter notebook running inside
+JupyterLab. If you are interested in more details, then see the official
+notebook documentation.
+
+
+
+
+
+
+
+
How It’s Stored
+
+
+
The notebook file is stored in a format called JSON.
+
Just like a webpage, what’s saved looks different from what you see
+in your browser.
+
But this format allows Jupyter to mix source code, text, and images,
+all in one file.
+
+
+
+
+
+
+
+
+
+
Arranging Documents into Panels of Tabs
+
+
In the JupyterLab Main Work Area you can arrange documents into
+panels of tabs. Here is an example from the official
+documentation.
+
+
+
First, create a text file, Python console, and terminal window and
+arrange them into three panels in the main work area. Next, create a
+notebook, terminal window, and text file and arrange them into three
+panels in the main work area. Finally, create your own combination of
+panels and tabs. What combination of panels and tabs do you think will
+be most useful for your workflow?
+
+
+
+
+
+
+
+
+
After creating the necessary tabs, you can drag one of the tabs to
+the center of a panel to move the tab to the panel; next you can
+subdivide a tab panel by dragging a tab to the left, right, top, or
+bottom of the panel.
+
+
+
+
+
+
+
+
+
+
Code vs. Text
+
+
Jupyter mixes code and text in different types of blocks, called
+cells. We often use the term “code” to mean “the source code of software
+written in a language such as Python”. A “code cell” in a Notebook is a
+cell that contains software; a “text cell” is one that contains ordinary
+prose written for human beings.
+
+
+
+
The Notebook has Command and Edit modes.
+
+
+
+
If you press Esc and Return alternately, the
+outer border of your code cell will change from gray to blue.
+
These are the Command (gray) and
+Edit (blue) modes of your notebook.
+
Command mode allows you to edit notebook-level features, and Edit
+mode changes the content of cells.
+
When in Command mode (esc/gray),
+
+
The b key will make a new cell below the currently
+selected cell.
+
The a key will make one above.
+
The x key will delete the current cell.
+
The z key will undo your last cell operation (which could
+be a deletion, creation, etc).
+
+
+
All actions can be done using the menus, but there are lots of
+keyboard shortcuts to speed things up.
+
+
+
+
+
+
+
Command Vs. Edit
+
+
In the Jupyter notebook page are you currently in Command or Edit
+mode?
+Switch between the modes. Use the shortcuts to generate a new cell. Use
+the shortcuts to delete a cell. Use the shortcuts to undo the last cell
+operation you performed.
+
+
+
+
+
+
+
+
+
Command mode has a grey border and Edit mode has a blue border. Use
+Esc and Return to switch between modes. You need
+to be in Command mode (Press Esc if your cell is blue). Type
+b or a. You need to be in Command mode (Press
+Esc if your cell is blue). Type x. You need to be
+in Command mode (Press Esc if your cell is blue). Type
+z.
+
+
+
+
+
+
Use the keyboard and mouse to select and edit cells.
+
+
+
Pressing the Return key turns the border blue and engages
+Edit mode, which allows you to type within the cell.
+
Because we want to be able to write many lines of code in a single
+cell, pressing the Return key when in Edit mode (blue) moves
+the cursor to the next line in the cell just like in a text editor.
+
We need some other way to tell the Notebook we want to run what’s in
+the cell.
+
Pressing Shift+Return together will execute
+the contents of the cell.
+
Notice that the Return and Shift keys on the
+right of the keyboard are right next to each other.
+
+
+
+
The Notebook will turn Markdown into pretty-printed
+documentation.
+
Create a nested list in a Markdown cell in a notebook that looks like
+this:
+
+
Get funding.
+
Do work.
+
+
+
Design experiment.
+
Collect data.
+
Analyze.
+
+
+
Write up.
+
Publish.
+
+
+
+
+
+
+
+
+
+
This challenge integrates both the numbered list and bullet list.
+Note that the bullet list is indented 2 spaces so that it is inline with
+the items of the numbered list.
What is displayed when a Python cell in a notebook that contains
+several calculations is executed? For example, what happens when this
+cell is executed?
+
+
PYTHON
+
+
7*3
+2+1
+
+
+
+
+
+
+
+
+
+
Python returns the output of the last calculation.
+
+
PYTHON
+
+
3
+
+
+
+
+
+
+
+
+
+
+
Change an Existing Cell from Code to Markdown
+
+
What happens if you write some Python in a code cell and then you
+switch it to a Markdown cell? For example, put the following in a code
+cell:
+
+
PYTHON
+
+
x =6*7+12
+print(x)
+
+
And then run it with Shift+Return to be sure
+that it works as a code cell. Now go back to the cell and use
+Esc then m to switch the cell to Markdown and
+“run” it with Shift+Return. What happened and how
+might this be useful?
+
+
+
+
+
+
+
+
+
The Python code gets treated like Markdown text. The lines appear as
+if they are part of one contiguous paragraph. This could be useful to
+temporarily turn on and off cells in notebooks that get used for
+multiple purposes.
+
+
PYTHON
+
+
x =6*7+12print(x)
+
+
+
+
+
+
+
+
+
+
+
Equations
+
+
Standard Markdown (such as we’re using for these notes) won’t render
+equations, but the Notebook will. Create a new Markdown cell and enter
+the following:
+
$\sum_{i=1}^{N} 2^{-i} \approx 1$
+
(It’s probably easier to copy and paste.) What does it display? What
+do you think the underscore, _, circumflex, ^,
+and dollar sign, $, do?
+
+
+
+
+
+
+
+
+
The notebook shows the equation as it would be rendered from LaTeX
+equation syntax. The dollar sign, $, is used to tell
+Markdown that the text in between is a LaTeX equation. If you’re not
+familiar with LaTeX, underscore, _, is used for subscripts
+and circumflex, ^, is used for superscripts. A pair of
+curly braces, { and }, is used to group text
+together so that the statement i=1 becomes the subscript
+and N becomes the superscript. Similarly, -i
+is in curly braces to make the whole statement the superscript for
+2. \sum and \approx are LaTeX
+commands for “sum over” and “approximate” symbols.
+
+
+
+
+
+
Closing JupyterLab
+
+
+
+
From the Menu Bar select the “File” menu and then choose “Shut Down”
+at the bottom of the dropdown menu. You will be prompted to confirm that
+you wish to shutdown the JupyterLab server (don’t forget to save your
+work!). Click “Shut Down” to shutdown the JupyterLab server.
+
To restart the JupyterLab server you will need to re-run the
+following command from a shell.
+
+
$ jupyter lab
+
+
+
+
+
+
Closing JupyterLab
+
+
Practice closing and restarting the JupyterLab server.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Python scripts are plain text files.
+
Use the Jupyter Notebook for editing and running Python.
+
The Notebook has Command and Edit modes.
+
Use the keyboard and mouse to select and edit cells.
+
The Notebook will turn Markdown into pretty-printed
+documentation.
Write programs that assign scalar values to variables and perform
+calculations with those values.
+
Correctly trace value changes in programs that use scalar
+assignment.
+
+
+
+
+
+
+
Use variables to store values.
+
+
+
+
Variables are names for values.
+
+
Variable names
+
+
can only contain letters, digits, and underscore
+_ (typically used to separate words in long variable
+names)
+
cannot start with a digit
+
are case sensitive (age, Age and AGE are three
+different variables)
+
+
+
The name should also be meaningful so you or another programmer
+know what it is
+
Variable names that start with underscores like
+__alistairs_real_age have a special meaning so we won’t do
+that until we understand the convention.
+
In Python the = symbol assigns the value on the
+right to the name on the left.
+
The variable is created when a value is assigned to it.
+
+
Here, Python assigns an age to a variable age and a
+name in quotes to a variable first_name.
+
+
PYTHON
+
+
age =42
+first_name ='Ahmed'
+
+
+
Use print to display values.
+
+
+
+
Python has a built-in function called print that prints
+things as text.
+
Call the function (i.e., tell Python to run it) by using its
+name.
+
Provide values to the function (i.e., the things to print) in
+parentheses.
+
To add a string to the printout, wrap the string in single or double
+quotes.
+
The values passed to the function are called
+arguments
+
+
+
+
PYTHON
+
+
print(first_name, 'is', age, 'years old')
+
+
+
OUTPUT
+
+
Ahmed is 42 years old
+
+
+
+print automatically puts a single space between items
+to separate them.
+
And wraps around to a new line at the end.
+
Variables must be created before they are used.
+
+
+
+
If a variable doesn’t exist yet, or if the name has been
+mis-spelled, Python reports an error. (Unlike some languages, which
+“guess” a default value.)
+
+
+
PYTHON
+
+
print(last_name)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-1-c1fbb4e96102> in <module>()
+----> 1 print(last_name)
+
+NameError: name 'last_name' is not defined
+
+
+
The last line of an error message is usually the most
+informative.
Be aware that it is the order of execution of cells that is
+important in a Jupyter notebook, not the order in which they appear.
+Python will remember all the code that was run previously,
+including any variables you have defined, irrespective of the order in
+the notebook. Therefore if you define variables lower down the notebook
+and then (re)run cells further up, those defined further down will still
+be present. As an example, create two cells with the following content,
+in this order:
+
+
PYTHON
+
+
print(myval)
+
+
+
PYTHON
+
+
myval =1
+
+
If you execute this in order, the first cell will give an error.
+However, if you run the first cell after the second cell it
+will print out 1. To prevent confusion, it can be helpful
+to use the Kernel -> Restart & Run All
+option which clears the interpreter and runs everything from a clean
+slate going top to bottom.
+
+
+
+
Variables can be used in calculations.
+
+
+
+
We can use variables in calculations just as if they were values.
+
+
Remember, we assigned the value 42 to age
+a few lines ago.
+
+
+
+
+
PYTHON
+
+
age = age +3
+print('Age in three years:', age)
+
+
+
OUTPUT
+
+
Age in three years: 45
+
+
Use an index to get a single character from a string.
+
+
+
+
The characters (individual letters, numbers, and so on) in a string
+are ordered. For example, the string 'AB' is not the same
+as 'BA'. Because of this ordering, we can treat the string
+as a list of characters.
+
Each position in the string (first, second, etc.) is given a number.
+This number is called an index or sometimes a
+subscript.
+
Indices are numbered from 0.
+
Use the position’s index in square brackets to get the character at
+that position.
+
+
+
PYTHON
+
+
atom_name ='helium'
+print(atom_name[0])
+
+
+
OUTPUT
+
+
h
+
+
Use a slice to get a substring.
+
+
+
+
A part of a string is called a substring. A
+substring can be as short as a single character.
+
An item in a list is called an element. Whenever we treat a string
+as if it were a list, the string’s elements are its individual
+characters.
+
A slice is a part of a string (or, more generally, a part of any
+list-like thing).
+
We take a slice with the notation [start:stop], where
+start is the integer index of the first element we want and
+stop is the integer index of the element just
+after the last element we want.
+
The difference between stop and start is
+the slice’s length.
+
Taking a slice does not change the contents of the original string.
+Instead, taking a slice returns a copy of part of the original
+string.
+
+
+
PYTHON
+
+
atom_name ='sodium'
+print(atom_name[0:3])
+
+
+
OUTPUT
+
+
sod
+
+
Use the built-in function len to find the length of a
+string.
+
+
+
+
PYTHON
+
+
print(len('helium'))
+
+
+
OUTPUT
+
+
6
+
+
+
Nested functions are evaluated from the inside out, like in
+mathematics.
+
Python is case-sensitive.
+
+
+
+
Python thinks that upper- and lower-case letters are different, so
+Name and name are different variables.
+
There are conventions for using upper-case letters at the start of
+variable names so we will use lower-case letters for now.
+
Use meaningful variable names.
+
+
+
+
Python doesn’t care what you call variables as long as they obey the
+rules (alphanumeric characters and the underscore).
Use meaningful variable names to help other people understand what
+the program does.
+
The most important “other person” is your future self.
+
+
+
+
+
+
+
Swapping Values
+
+
Fill the table showing the values of the variables in this program
+after each statement is executed.
+
+
PYTHON
+
+
# Command # Value of x # Value of y # Value of swap #
+x =1.0# # # #
+y =3.0# # # #
+swap = x # # # #
+x = y # # # #
+y = swap # # # #
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
# Command # Value of x # Value of y # Value of swap #
+x=1.0# 1.0 # not defined # not defined #
+y=3.0# 1.0 # 3.0 # not defined #
+swap=x# 1.0 # 3.0 # 1.0 #
+x=y# 3.0 # 3.0 # 1.0 #
+y=swap# 3.0 # 1.0 # 1.0 #
+
+
These three lines exchange the values in x and
+y using the swap variable for temporary
+storage. This is a fairly common programming idiom.
+
+
+
+
+
+
+
+
+
+
Predicting Values
+
+
What is the final value of position in the program
+below? (Try to predict the value without running the program, then check
+your prediction.)
The initial variable is assigned the value
+'left'. In the second line, the position
+variable also receives the string value 'left'. In third
+line, the initial variable is given the value
+'right', but the position variable retains its
+string value of 'left'.
+
+
+
+
+
+
+
+
+
+
Challenge
+
+
If you assign a = 123, what happens if you try to get
+the second digit of a via a[1]?
+
+
+
+
+
+
+
+
+
Numbers are not strings or sequences and Python will raise an error
+if you try to perform an index operation on a number. In the next lesson on types and type
+conversion we will learn more about types and how to convert between
+different types. If you want the Nth digit of a number you can convert
+it into a string using the str built-in function and then
+perform an index operation on that string.
+
+
PYTHON
+
+
a =123
+print(a[1])
+
+
+
ERROR
+
+
TypeError: 'int' object is not subscriptable
+
+
+
PYTHON
+
+
a =str(123)
+print(a[1])
+
+
+
OUTPUT
+
+
2
+
+
+
+
+
+
+
+
+
+
+
Choosing a Name
+
+
Which is a better variable name, m, min, or
+minutes? Why? Hint: think about which code you would rather
+inherit from someone who is leaving the lab:
+
+
ts = m * 60 + s
+
tot_sec = min * 60 + sec
+
total_seconds = minutes * 60 + seconds
+
+
+
+
+
+
+
+
+
+
minutes is better because min might mean
+something like “minimum” (and actually is an existing built-in function
+in Python that we will cover later).
+species_name[11:] (without a value after the
+colon)
+
+species_name[:4] (without a value before the
+colon)
+
+species_name[:] (just a colon)
+
species_name[11:-3]
+
species_name[-5:-3]
+
What happens when you choose a stop value which is out
+of range? (i.e., try species_name[0:20] or
+species_name[:103])
+
+
+
+
+
+
+
+
+
+
+
+species_name[2:8] returns the substring
+'acia b'
+
+
+species_name[11:] returns the substring
+'folia', from position 11 until the end
+
+species_name[:4] returns the substring
+'Acac', from the start up to but not including position
+4
+
+species_name[:] returns the entire string
+'Acacia buxifolia'
+
+
+species_name[11:-3] returns the substring
+'fo', from the 11th position to the third last
+position
+
+species_name[-5:-3] also returns the substring
+'fo', from the fifth last position to the third last
+
If a part of the slice is out of range, the operation does not fail.
+species_name[0:20] gives the same result as
+species_name[0:], and species_name[:103] gives
+the same result as species_name[:]
+
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Use variables to store values.
+
Use print to display values.
+
Variables persist between cells.
+
Variables must be created before they are used.
+
Variables can be used in calculations.
+
Use an index to get a single character from a string.
+
Use a slice to get a substring.
+
Use the built-in function len to find the length of a
+string.
Explain key differences between integers and floating point
+numbers.
+
Explain key differences between numbers and character strings.
+
Use built-in functions to convert between integers, floating point
+numbers, and strings.
+
+
+
+
+
+
+
Every value has a type.
+
+
+
+
Every value in a program has a specific type.
+
Integer (int): represents positive or negative whole
+numbers like 3 or -512.
+
Floating point number (float): represents real numbers
+like 3.14159 or -2.5.
+
Character string (usually called “string”, str): text.
+
+
Written in either single quotes or double quotes (as long as they
+match).
+
The quote marks aren’t printed when the string is displayed.
+
+
+
Use the built-in function type to find the type of a
+value.
+
+
+
+
Use the built-in function type to find out what type a
+value has.
+
Works on variables as well.
+
+
But remember: the value has the type — the
+variable is just a label.
+
+
+
+
+
PYTHON
+
+
print(type(52))
+
+
+
OUTPUT
+
+
<class 'int'>
+
+
+
PYTHON
+
+
fitness ='average'
+print(type(fitness))
+
+
+
OUTPUT
+
+
<class 'str'>
+
+
Types control what operations (or methods) can be performed on a
+given value.
+
+
+
+
A value’s type determines what the program can do to it.
+
+
+
PYTHON
+
+
print(5-3)
+
+
+
OUTPUT
+
+
2
+
+
+
PYTHON
+
+
print('hello'-'h')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-2-67f5626a1e07> in <module>()
+----> 1 print('hello' - 'h')
+
+TypeError: unsupported operand type(s) for -: 'str' and 'str'
+
+
You can use the “+” and “*” operators on strings.
+
+
+
+
“Adding” character strings concatenates them.
+
+
+
PYTHON
+
+
full_name ='Ahmed'+' '+'Walsh'
+print(full_name)
+
+
+
OUTPUT
+
+
Ahmed Walsh
+
+
+
Multiplying a character string by an integer N creates a
+new string that consists of that character string repeated N
+times.
+
+
Since multiplication is repeated addition.
+
+
+
+
+
PYTHON
+
+
separator ='='*10
+print(separator)
+
+
+
OUTPUT
+
+
==========
+
+
Strings have a length (but numbers don’t).
+
+
+
+
The built-in function len counts the number of
+characters in a string.
+
+
+
PYTHON
+
+
print(len(full_name))
+
+
+
OUTPUT
+
+
11
+
+
+
But numbers don’t have a length (not even zero).
+
+
+
PYTHON
+
+
print(len(52))
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-3-f769e8e8097d> in <module>()
+----> 1 print(len(52))
+
+TypeError: object of type 'int' has no len()
+
+
Must convert numbers to strings or vice versa when operating on
+them.
+
+
+
+
Cannot add numbers and strings.
+
+
+
PYTHON
+
+
print(1+'2')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-4-fe4f54a023c6> in <module>()
+----> 1 print(1 + '2')
+
+TypeError: unsupported operand type(s) for +: 'int' and 'str'
+
+
+
Not allowed because it’s ambiguous: should 1 + '2' be
+3 or '12'?
+
Some types can be converted to other types by using the type name as
+a function.
+
+
+
PYTHON
+
+
print(1+int('2'))
+print(str(1) +'2')
+
+
+
OUTPUT
+
+
3
+12
+
+
Can mix integers and floats freely in operations.
+
+
+
+
Integers and floating-point numbers can be mixed in arithmetic.
+
+
Python 3 automatically converts integers to floats as needed.
The computer reads the value of variable_one when doing
+the multiplication, creates a new value, and assigns it to
+variable_two.
+
Afterwards, the value of variable_two is set to the new
+value and not dependent on variable_one so its
+value does not automatically change when variable_one
+changes.
+
+
+
+
+
+
+
Fractions
+
+
What type of value is 3.4? How can you find out?
+
+
+
+
+
+
+
+
+
It is a floating-point number (often abbreviated “float”). It is
+possible to find out by using the built-in function
+type().
+
+
PYTHON
+
+
print(type(3.4))
+
+
+
OUTPUT
+
+
<class 'float'>
+
+
+
+
+
+
+
+
+
+
+
Automatic Type Conversion
+
+
What type of value is 3.25 + 4?
+
+
+
+
+
+
+
+
+
It is a float: integers are automatically converted to floats as
+necessary.
+
+
PYTHON
+
+
result =3.25+4
+print(result, 'is', type(result))
+
+
+
OUTPUT
+
+
7.25 is <class 'float'>
+
+
+
+
+
+
+
+
+
+
+
Choose a Type
+
+
What type of value (integer, floating point number, or character
+string) would you use to represent each of the following? Try to come up
+with more than one good answer for each problem. For example, in # 1,
+when would counting days with a floating point variable make more sense
+than using an integer?
+
+
Number of days since the start of the year.
+
Time elapsed from the start of the year until now in days.
+
Serial number of a piece of lab equipment.
+
A lab specimen’s age
+
Current population of a city.
+
Average population of a city over time.
+
+
+
+
+
+
+
+
+
+
The answers to the questions are:
+
+
Integer, since the number of days would lie between 1 and 365.
+
Floating point, since fractional days are required
+
Character string if serial number contains letters and numbers,
+otherwise integer if the serial number consists only of numerals
+
This will vary! How do you define a specimen’s age? whole days since
+collection (integer)? date and time (string)?
+
Choose floating point to represent population as large aggregates
+(eg millions), or integer to represent population in units of
+individuals.
+
Floating point number, since an average is likely to have a
+fractional part.
+
+
+
+
+
+
+
+
+
+
+
Division Types
+
+
In Python 3, the // operator performs integer
+(whole-number) floor division, the / operator performs
+floating-point division, and the % (or modulo)
+operator calculates and returns the remainder from integer division:
If num_subjects is the number of subjects taking part in
+a study, and num_per_survey is the number that can take
+part in a single survey, write an expression that calculates the number
+of surveys needed to reach everyone once.
+
+
+
+
+
+
+
+
+
We want the minimum number of surveys that reaches everyone once,
+which is the rounded up value of
+num_subjects/ num_per_survey. This is equivalent to
+performing a floor division with // and adding 1. Before
+the division we need to subtract 1 from the number of subjects to deal
+with the case where num_subjects is evenly divisible by
+num_per_survey.
Where reasonable, float() will convert a string to a
+floating point number, and int() will convert a floating
+point number to an integer:
+
+
PYTHON
+
+
print("string to float:", float("3.4"))
+print("float to int:", int(3.4))
+
+
+
OUTPUT
+
+
string to float: 3.4
+float to int: 3
+
+
If the conversion doesn’t make sense, however, an error message will
+occur.
+
+
PYTHON
+
+
print("string to float:", float("Hello world!"))
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-5-df3b790bf0a2> in <module>
+----> 1 print("string to float:", float("Hello world!"))
+
+ValueError: could not convert string to float: 'Hello world!'
+
+
Given this information, what do you expect the following program to
+do?
+
What does it actually do?
+
Why do you think it does that?
+
+
PYTHON
+
+
print("fractional string to int:", int("3.4"))
+
+
+
+
+
+
+
+
+
+
What do you expect this program to do? It would not be so
+unreasonable to expect the Python 3 int command to convert
+the string “3.4” to 3.4 and an additional type conversion to 3. After
+all, Python 3 performs a lot of other magic - isn’t that part of its
+charm?
+
+
PYTHON
+
+
int("3.4")
+
+
+
OUTPUT
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-2-ec6729dfccdc> in <module>
+----> 1 int("3.4")
+ValueError: invalid literal for int() with base 10: '3.4'
+
+
However, Python 3 throws an error. Why? To be consistent, possibly.
+If you ask Python to perform two consecutive typecasts, you must convert
+it explicitly in code.
+
+
PYTHON
+
+
int(float("3.4"))
+
+
+
OUTPUT
+
+
3
+
+
+
+
+
+
+
+
+
+
+
Arithmetic with Different Types
+
+
Which of the following will return the floating point number
+2.0? Note: there may be more than one right answer.
+
+
PYTHON
+
+
first =1.0
+second ="1"
+third ="1.1"
+
+
+
first + float(second)
+
float(second) + float(third)
+
first + int(third)
+
first + int(float(third))
+
int(first) + int(float(third))
+
2.0 * second
+
+
+
+
+
+
+
+
+
+
Answer: 1 and 4
+
+
+
+
+
+
+
+
+
+
Complex Numbers
+
+
Python provides complex numbers, which are written as
+1.0+2.0j. If val is a complex number, its real
+and imaginary parts can be accessed using dot notation as
+val.real and val.imag.
Why do you think Python uses j instead of
+i for the imaginary part?
+
What do you expect 1 + 2j + 3 to produce?
+
What do you expect 4j to be? What about
+4 j or 4 + j?
+
+
+
+
+
+
+
+
+
+
+
Standard mathematics treatments typically use i to
+denote an imaginary number. However, from media reports it was an early
+convention established from electrical engineering that now presents a
+technically expensive area to change. Stack
+Overflow provides additional explanation and discussion.
+
+
(4+2j)
+
+4j and Syntax Error: invalid syntax. In
+the latter cases, j is considered a variable and the
+statement depends on if j is defined and if so, its
+assigned value.
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Every value has a type.
+
Use the built-in function type to find the type of a
+value.
+
Types control what operations can be done on values.
+
Strings can be added and multiplied.
+
Strings have a length (but numbers don’t).
+
Must convert numbers to strings or vice versa when operating on
+them.
+
Can mix integers and floats freely in operations.
+
Variables only change value when something is assigned to them.
Use help to display documentation for built-in functions.
+
Correctly describe situations in which SyntaxError and NameError
+occur.
+
+
+
+
+
+
+
Use comments to add documentation to programs.
+
+
+
+
PYTHON
+
+
# This sentence isn't executed by Python.
+adjustment =0.5# Neither is this - anything after '#' is ignored.
+
+
A function may take zero or more arguments.
+
+
+
+
We have seen some functions already — now let’s take a closer
+look.
+
An argument is a value passed into a function.
+
+len takes exactly one.
+
+int, str, and float create a
+new value from an existing one.
+
+print takes zero or more.
+
+print with no arguments prints a blank line.
+
+
Must always use parentheses, even if they’re empty, so that Python
+knows a function is being called.
+
+
+
+
+
PYTHON
+
+
print('before')
+print()
+print('after')
+
+
+
OUTPUT
+
+
before
+
+after
+
+
Every function returns something.
+
+
+
+
Every function call produces some result.
+
If the function doesn’t have a useful result to return, it usually
+returns the special value None. None is a
+Python object that stands in anytime there is no value.
+
+
+
PYTHON
+
+
result =print('example')
+print('result of print is', result)
+
+
+
OUTPUT
+
+
example
+result of print is None
+
+
Commonly-used built-in functions include max,
+min, and round.
+
+
+
+
Use max to find the largest value of one or more
+values.
+
Use min to find the smallest.
+
Both work on character strings as well as numbers.
+
+
“Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.
+
+
+
+
+
PYTHON
+
+
print(max(1, 2, 3))
+print(min('a', 'A', '0'))
+
+
+
OUTPUT
+
+
3
+0
+
+
Functions may only work for certain (combinations of)
+arguments.
+
+
+
+
+max and min must be given at least one
+argument.
+
+
“Largest of the empty set” is a meaningless question.
+
+
+
And they must be given things that can meaningfully be
+compared.
+
+
+
PYTHON
+
+
print(max(1, 'a'))
+
+
+
ERROR
+
+
TypeError Traceback (most recent call last)
+<ipython-input-52-3f049acf3762> in <module>
+----> 1 print(max(1, 'a'))
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
Functions may have default values for some arguments.
+
+
+
+
+round will round off a floating-point number.
+
By default, rounds to zero decimal places.
+
+
+
PYTHON
+
+
round(3.712)
+
+
+
OUTPUT
+
+
4
+
+
+
We can specify the number of decimal places we want.
+
+
+
PYTHON
+
+
round(3.712, 1)
+
+
+
OUTPUT
+
+
3.7
+
+
Functions attached to objects are called methods
+
+
+
+
Functions take another form that will be common in the pandas
+episodes.
+
Methods have parentheses like functions, but come after the
+variable.
+
Some methods are used for internal Python operations, and are marked
+with double underlines.
+
+
+
PYTHON
+
+
my_string ='Hello world!'# creation of a string object
+
+print(len(my_string)) # the len function takes a string as an argument and returns the length of the string
+
+print(my_string.swapcase()) # calling the swapcase method on the my_string object
+
+print(my_string.__len__()) # calling the internal __len__ method on the my_string object, used by len(my_string)
+
+
+
OUTPUT
+
+
12
+hELLO WORLD!
+12
+
+
+
You might even see them chained together. They operate left to
+right.
+
+
+
PYTHON
+
+
print(my_string.isupper()) # Not all the letters are uppercase
+print(my_string.upper()) # This capitalizes all the letters
+
+print(my_string.upper().isupper()) # Now all the letters are uppercase
+
+
+
OUTPUT
+
+
False
+HELLO WORLD
+True
+
+
Use the built-in function help to get help for a
+function.
+
+
+
+
Every built-in function has online documentation.
+
+
+
PYTHON
+
+
help(round)
+
+
+
OUTPUT
+
+
Help on built-in function round in module builtins:
+
+round(number, ndigits=None)
+ Round a number to a given precision in decimal digits.
+
+ The return value is an integer if ndigits is omitted or None. Otherwise
+ the return value has the same type as the number. ndigits may be negative.
+
+
The Jupyter Notebook has two ways to get help.
+
+
+
+
Option 1: Place the cursor near where the function is invoked in a
+cell (i.e., the function name or its parameters),
+
+
Hold down Shift, and press Tab.
+
Do this several times to expand the information returned.
+
+
+
Option 2: Type the function name in a cell with a question mark
+after it. Then run the cell.
+
Python reports a syntax error when it can’t understand the source of
+a program.
+
+
+
+
Won’t even try to run the program if it can’t be parsed.
+
+
+
PYTHON
+
+
# Forgot to close the quote marks around the string.
+name ='Feng
+
+
+
ERROR
+
+
File "<ipython-input-56-f42768451d55>", line 2
+ name = 'Feng
+ ^
+SyntaxError: EOL while scanning string literal
+
+
+
PYTHON
+
+
# An extra '=' in the assignment.
+age ==52
+
+
+
ERROR
+
+
File "<ipython-input-57-ccc3df3cf902>", line 2
+ age = = 52
+ ^
+SyntaxError: invalid syntax
+
+
+
Look more closely at the error message:
+
+
+
PYTHON
+
+
print("hello world"
+
+
+
ERROR
+
+
File "<ipython-input-6-d1cc229bf815>", line 1
+ print ("hello world"
+ ^
+SyntaxError: unexpected EOF while parsing
+
+
+
The message indicates a problem on first line of the input (“line
+1”).
+
+
In this case the “ipython-input” section of the file name tells us
+that we are working with input into IPython, the Python interpreter used
+by the Jupyter Notebook.
+
+
+
The -6- part of the filename indicates that the error
+occurred in cell 6 of our Notebook.
+
Next is the problematic line of code, indicating the problem with a
+^ pointer.
+
Python reports a runtime error when something goes wrong while a
+program is executing.
+
+
+
+
PYTHON
+
+
age =53
+remaining =100- aege # mis-spelled 'age'
+
+
+
ERROR
+
+
NameError Traceback (most recent call last)
+<ipython-input-59-1214fb6c55fc> in <module>
+ 1 age = 53
+----> 2 remaining = 100 - aege # mis-spelled 'age'
+
+NameError: name 'aege' is not defined
+
+
+
Fix syntax errors by reading the source and runtime errors by
+tracing execution.
+
+
+
+
+
+
+
What Happens When
+
+
+
Explain in simple terms the order of operations in the following
+program: when does the addition happen, when does the subtraction
+happen, when is each function called, etc.
max(len(rich), poor) throws a TypeError. This turns into
+max(4, 'tin') and as we discussed earlier a string and
+integer cannot meaningfully be compared.
+
+
ERROR
+
+
TypeError Traceback (most recent call last)
+<ipython-input-65-bc82ad05177a> in <module>
+----> 1 max(len(rich), poor)
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
+
+
+
+
+
+
+
+
+
Why Not?
+
+
Why is it that max and min do not return
+None when they are called with no arguments?
+
+
+
+
+
+
+
+
+
max and min return TypeErrors in this case
+because the correct number of parameters was not supplied. If it just
+returned None, the error would be much harder to trace as
+it would likely be stored into a variable and used later in the program,
+only to likely throw a runtime error.
+
+
+
+
+
+
+
+
+
+
Last Character of a String
+
+
If Python starts counting from zero, and len returns the
+number of characters in a string, what index expression will get the
+last character in the string name? (Note: we will see a
+simpler way to do this in a later episode.)
+
+
+
+
+
+
+
+
+
name[len(name) - 1]
+
+
+
+
+
+
+
+
+
+
Explore the Python docs!
+
+
The official Python
+documentation is arguably the most complete source of information
+about the language. It is available in different languages and contains
+a lot of useful resources. The Built-in
+Functions page contains a catalogue of all of these functions,
+including the ones that we’ve covered in this lesson. Some of these are
+more advanced and unnecessary at the moment, but others are very simple
+and useful.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Use comments to add documentation to programs.
+
A function may take zero or more arguments.
+
Commonly-used built-in functions include max,
+min, and round.
+
Functions may only work for certain (combinations of)
+arguments.
+
Functions may have default values for some arguments.
+
Use the built-in function help to get help for a
+function.
+
The Jupyter Notebook has two ways to get help.
+
Every function returns something.
+
Python reports a syntax error when it can’t understand the source of
+a program.
+
Python reports a runtime error when something goes wrong while a
+program is executing.
+
Fix syntax errors by reading the source code, and runtime errors by
+tracing the program’s execution.
How can I use software that other people have written?
+
How can I find out what that software does?
+
+
+
+
+
+
+
+
Objectives
+
+
Explain what software libraries are and why programmers create and
+use them.
+
Write programs that import and use modules from Python’s standard
+library.
+
Find and read documentation for the standard library interactively
+(in the interpreter) and online.
+
+
+
+
+
+
+
Most of the power of a programming language is in its
+libraries.
+
+
+
+
A library is a collection of files (called
+modules) that contains functions for use by other programs.
+
+
May also contain data values (e.g., numerical constants) and other
+things.
+
Library’s contents are supposed to be related, but there’s no way to
+enforce that.
+
+
+
The Python standard
+library is an extensive suite of modules that comes with Python
+itself.
+
Many additional libraries are available from PyPI (the Python Package
+Index).
+
We will see later how to write new libraries.
+
+
+
+
+
+
+
Libraries and modules
+
+
A library is a collection of modules, but the terms are often used
+interchangeably, especially since many libraries only consist of a
+single module, so don’t worry if you mix them.
+
+
+
+
A program must import a library module before using it.
+
+
+
+
Use import to load a library module into a program’s
+memory.
+
Then refer to things from the module as
+module_name.thing_name.
+
+
Python uses . to mean “part of”.
+
+
+
Using math, one of the modules in the standard
+library:
+
+
+
PYTHON
+
+
import math
+
+print('pi is', math.pi)
+print('cos(pi) is', math.cos(math.pi))
+
+
+
OUTPUT
+
+
pi is 3.141592653589793
+cos(pi) is -1.0
+
+
+
Have to refer to each item with the module’s name.
+
+
+math.cos(pi) won’t work: the reference to
+pi doesn’t somehow “inherit” the function’s reference to
+math.
+
+
+
Use help to learn about the contents of a library
+module.
+
+
+
+
Works just like help for a function.
+
+
+
PYTHON
+
+
help(math)
+
+
+
OUTPUT
+
+
Help on module math:
+
+NAME
+ math
+
+MODULE REFERENCE
+ http://docs.python.org/3/library/math
+
+ The following documentation is automatically generated from the Python
+ source files. It may be incomplete, incorrect or include features that
+ are considered implementation detail and may vary between Python
+ implementations. When in doubt, consult the module reference at the
+ location listed above.
+
+DESCRIPTION
+ This module is always available. It provides access to the
+ mathematical functions defined by the C standard.
+
+FUNCTIONS
+ acos(x, /)
+ Return the arc cosine (measured in radians) of x.
+⋮ ⋮ ⋮
+
+
Import specific items from a library module to shorten
+programs.
+
+
+
+
Use from ... import ... to load only specific items
+from a library module.
+
Then refer to them directly without library name as prefix.
+
+
+
PYTHON
+
+
from math import cos, pi
+
+print('cos(pi) is', cos(pi))
+
+
+
OUTPUT
+
+
cos(pi) is -1.0
+
+
Create an alias for a library module when importing it to shorten
+programs.
+
+
+
+
Use import ... as ... to give a library a short
+alias while importing it.
+
Then refer to items in the library using that shortened name.
+
+
+
PYTHON
+
+
import math as m
+
+print('cos(pi) is', m.cos(m.pi))
+
+
+
OUTPUT
+
+
cos(pi) is -1.0
+
+
+
Commonly used for libraries that are frequently used or have long
+names.
+
+
E.g., the matplotlib plotting library is often aliased
+as mpl.
+
+
+
But can make programs harder to understand, since readers must learn
+your program’s aliases.
+
+
+
+
+
+
+
Exploring the Math Module
+
+
+
What function from the math module can you use to
+calculate a square root without using sqrt?
+
Since the library contains this function, why does sqrt
+exist?
+
+
+
+
+
+
+
+
+
+
+
Using help(math) we see that we’ve got
+pow(x,y) in addition to sqrt(x), so we could
+use pow(x, 0.5) to find a square root.
+
The sqrt(x) function is arguably more readable than
+pow(x, 0.5) when implementing equations. Readability is a
+cornerstone of good programming, so it makes sense to provide a special
+function for this specific common case.
+
+
Also, the design of Python’s math library has its origin
+in the C standard, which includes both sqrt(x) and
+pow(x,y), so a little bit of the history of programming is
+showing in Python’s function names.
+
+
+
+
+
+
+
+
+
+
Locating the Right Module
+
+
You want to select a random character from a string:
The string has 11 characters, each having a positional index from 0
+to 10. You could use the random.randrange
+or random.randint
+functions to get a random integer between 0 and 10, and then select the
+bases character at that index:
+
+
PYTHON
+
+
from random import randrange
+
+random_index = randrange(len(bases))
+print(bases[random_index])
+
+
or more compactly:
+
+
PYTHON
+
+
from random import randrange
+
+print(bases[randrange(len(bases))])
+
+
Perhaps you found the random.sample
+function? It allows for slightly less typing but might be a bit harder
+to understand just by reading:
+
+
PYTHON
+
+
from random import sample
+
+print(sample(bases, 1)[0])
+
+
Note that this function returns a list of values. We will learn about
+lists in episode 11.
+
The simplest and shortest solution is the random.choice
+function that does exactly what we want:
+
+
PYTHON
+
+
from random import choice
+
+print(choice(bases))
+
+
+
+
+
+
+
+
+
+
+
Jigsaw Puzzle (Parson’s Problem) Programming Example
+
+
Rearrange the following statements so that a random DNA base is
+printed and its index in the string. Not all statements may be needed.
+Feel free to use/add intermediate variables.
+
+
PYTHON
+
+
bases="ACTTGCTTGAC"
+import math
+import random
+___ = random.randrange(n_bases)
+___ =len(bases)
+print("random base ", bases[___], "base index", ___)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import math
+import random
+bases ="ACTTGCTTGAC"
+n_bases =len(bases)
+idx = random.randrange(n_bases)
+print("random base", bases[idx], "base index", idx)
+
+
+
+
+
+
+
+
+
+
+
When Is Help Available?
+
+
When a colleague of yours types help(math), Python
+reports an error:
+
+
ERROR
+
+
NameError: name 'math' is not defined
+
+
What has your colleague forgotten to do?
+
+
+
+
+
+
+
+
+
Importing the math module (import math)
+
+
+
+
+
+
+
+
+
+
Importing With Aliases
+
+
+
Fill in the blanks so that the program below prints
+90.0.
+
Rewrite the program so that it uses import
+withoutas.
+
Which form do you find easier to read?
+
+
+
PYTHON
+
+
import math as m
+angle = ____.degrees(____.pi /2)
+print(____)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import math as m
+angle = m.degrees(m.pi /2)
+print(angle)
+
+
can be written as
+
+
PYTHON
+
+
import math
+angle = math.degrees(math.pi /2)
+print(angle)
+
+
Since you just wrote the code and are familiar with it, you might
+actually find the first version easier to read. But when trying to read
+a huge piece of code written by someone else, or when getting back to
+your own huge piece of code after several months, non-abbreviated names
+are often easier, except where there are clear abbreviation
+conventions.
+
+
+
+
+
+
+
+
+
+
There Are Many Ways To Import Libraries!
+
+
Match the following print statements with the appropriate library
+calls.
+
Print commands:
+
+
print("sin(pi/2) =", sin(pi/2))
+
print("sin(pi/2) =", m.sin(m.pi/2))
+
print("sin(pi/2) =", math.sin(math.pi/2))
+
+
Library calls:
+
+
from math import sin, pi
+
import math
+
import math as m
+
from math import *
+
+
+
+
+
+
+
+
+
+
+
Library calls 1 and 4. In order to directly refer to
+sin and pi without the library name as prefix,
+you need to use the from ... import ... statement. Whereas
+library call 1 specifically imports the two functions sin
+and pi, library call 4 imports all functions in the
+math module.
+
Library call 3. Here sin and pi are
+referred to with a shortened library name m instead of
+math. Library call 3 does exactly that using the
+import ... as ... syntax - it creates an alias for
+math in the form of the shortened name m.
+
Library call 2. Here sin and pi are
+referred to with the regular library name math, so the
+regular import ... call suffices.
+
+
Note: although library call 4 works, importing all
+names from a module using a wildcard import is not recommended as it makes it
+unclear which names from the module are used in the code. In general it
+is best to make your imports as specific as possible and to only import
+what your code uses. In library call 1, the import
+statement explicitly tells us that the sin function is
+imported from the math module, but library call 4 does not
+convey this information.
+
+
+
+
+
+
+
+
+
+
Importing Specific Items
+
+
+
Fill in the blanks so that the program below prints
+90.0.
+
Do you find this version easier to read than preceding ones?
+
Why wouldn’t programmers always use this form of
+import?
+
+
+
PYTHON
+
+
____ math import ____, ____
+angle = degrees(pi /2)
+print(angle)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
from math import degrees, pi
+angle = degrees(pi /2)
+print(angle)
+
+
Most likely you find this version easier to read since it’s less
+dense. The main reason not to use this form of import is to avoid name
+clashes. For instance, you wouldn’t import degrees this way
+if you also wanted to use the name degrees for a variable
+or function of your own. Or if you were to also import a function named
+degrees from another library.
+
+
+
+
+
+
+
+
+
+
Reading Error Messages
+
+
+
Read the code below and try to identify what the errors are without
+running it.
+
Run the code, and read the error message. What type of error is
+it?
+
+
+
PYTHON
+
+
from math import log
+log(0)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-1-d72e1d780bab> in <module>
+ 1 from math import log
+----> 2 log(0)
+
+ValueError: math domain error
+
+
+
The logarithm of x is only defined for
+x > 0, so 0 is outside the domain of the function.
+
You get an error of type ValueError, indicating that
+the function received an inappropriate argument value. The additional
+message “math domain error” makes it clearer what the problem is.
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Most of the power of a programming language is in its
+libraries.
+
A program must import a library module in order to use it.
+
Use help to learn about the contents of a library
+module.
+
Import specific items from a library to shorten programs.
+
Create an alias for a library when importing it to shorten
+programs.
The columns in a dataframe are the observed variables, and the rows
+are the observations.
+
Pandas uses backslash \ to show wrapped lines when
+output is too wide to fit the screen.
+
Using descriptive dataframe names helps us distinguish between
+multiple dataframes so we won’t accidentally overwrite a dataframe or
+read from the wrong one.
+
+
+
+
+
+
+
File Not Found
+
+
Our lessons store their data files in a data
+sub-directory, which is why the path to the file is
+data/gapminder_gdp_oceania.csv. If you forget to include
+data/, or if you include it but your copy of the file is
+somewhere else, you will get a runtime
+error that ends with a line like this:
+
+
ERROR
+
+
FileNotFoundError: [Errno 2] No such file or directory: 'data/gapminder_gdp_oceania.csv'
+
+
+
+
+
Use index_col to specify that a column’s values should
+be used as row headings.
+
+
+
+
Row headings are numbers (0 and 1 in this case).
+
Really want to index by country.
+
Pass the name of the column to read_csv as its
+index_col parameter to do this.
+
Naming the dataframe data_oceania_country tells us
+which region the data includes (oceania) and how it is
+indexed (country).
Use DataFrame.describe() to get summary statistics
+about data.
+
+
+
DataFrame.describe() gets the summary statistics of only
+the columns that have numerical data. All other columns are ignored,
+unless you use the argument include='all'.
Not particularly useful with just two records, but very helpful when
+there are thousands.
+
+
+
+
+
+
+
Reading Other Data
+
+
Read the data in gapminder_gdp_americas.csv (which
+should be in the same directory as
+gapminder_gdp_oceania.csv) into a variable called
+data_americas and display its summary statistics.
+
+
+
+
+
+
+
+
+
To read in a CSV, we use pd.read_csv and pass the
+filename 'data/gapminder_gdp_americas.csv' to it. We also
+once again pass the column name 'country' to the parameter
+index_col in order to index by country. The summary
+statistics can be displayed with the DataFrame.describe()
+method.
After reading the data for the Americas, use
+help(data_americas.head) and
+help(data_americas.tail) to find out what
+DataFrame.head and DataFrame.tail do.
+
+
What method call will display the first three rows of this
+data?
+
What method call will display the last three columns of this data?
+(Hint: you may need to change your view of the data.)
+
+
+
+
+
+
+
+
+
+
+
We can check out the first five rows of data_americas
+by executing data_americas.head() which lets us view the
+beginning of the DataFrame. We can specify the number of rows we wish to
+see by specifying the parameter n in our call to
+data_americas.head(). To view the first three rows,
+execute:
To check out the last three rows of data_americas, we
+would use the command, americas.tail(n=3), analogous to
+head() used above. However, here we want to look at the
+last three columns so we need to change our view and then use
+tail(). To do so, we create a new DataFrame in which rows
+and columns are switched:
+
+
+
PYTHON
+
+
americas_flipped = data_americas.T
+
+
We can then view the last three columns of americas by
+viewing the last three rows of americas_flipped:
This shows the data that we want, but we may prefer to display three
+columns instead of three rows, so we can flip it back:
+
+
PYTHON
+
+
americas_flipped.tail(n=3).T
+
+
Note: we could have done the above in a single line
+of code by ‘chaining’ the commands:
+
+
PYTHON
+
+
data_americas.T.tail(n=3).T
+
+
+
+
+
+
+
+
+
+
+
Reading Files in Other Directories
+
+
The data for your current project is stored in a file called
+microbes.csv, which is located in a folder called
+field_data. You are doing analysis in a notebook called
+analysis.ipynb in a sibling folder called
+thesis:
What value(s) should you pass to read_csv to read
+microbes.csv in analysis.ipynb?
+
+
+
+
+
+
+
+
+
We need to specify the path to the file of interest in the call to
+pd.read_csv. We first need to ‘jump’ out of the folder
+thesis using ‘../’ and then into the folder
+field_data using ‘field_data/’. Then we can specify the
+filename `microbes.csv. The result is as follows:
As well as the read_csv function for reading data from a
+file, Pandas provides a to_csv function to write dataframes
+to files. Applying what you’ve learned about reading from files, write
+one of your dataframes to a file called processed.csv. You
+can use help to get information on how to use
+to_csv.
+
+
+
+
+
+
+
+
+
In order to write the DataFrame data_americas to a file
+called processed.csv, execute the following command:
+
+
PYTHON
+
+
data_americas.to_csv('processed.csv')
+
+
For help on read_csv or to_csv, you could
+execute, for example:
+
+
PYTHON
+
+
help(data_americas.to_csv)
+help(pd.read_csv)
+
+
Note that help(to_csv) or help(pd.to_csv)
+throws an error! This is due to the fact that to_csv is not
+a global Pandas function, but a member function of DataFrames. This
+means you can only call it on an instance of a DataFrame e.g.,
+data_americas.to_csv or
+data_oceania.to_csv
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Use the Pandas library to get basic statistics out of tabular
+data.
+
Use index_col to specify that a column’s values should
+be used as row headings.
+
Use DataFrame.info to find out more about a
+dataframe.
+
The DataFrame.columns variable stores information about
+the dataframe’s columns.
+
Use DataFrame.T to transpose a dataframe.
+
Use DataFrame.describe to get summary statistics about
+data.
How can I do statistical analysis of tabular data?
+
+
+
+
+
+
+
+
Objectives
+
+
Select individual values from a Pandas dataframe.
+
Select entire rows or entire columns from a dataframe.
+
Select a subset of both rows and columns from a dataframe in a
+single operation.
+
Select a subset of a dataframe by a single Boolean criterion.
+
+
+
+
+
+
+
Note about Pandas DataFrames/Series
+
+
+
A DataFrame
+is a collection of Series;
+The DataFrame is the way Pandas represents a table, and Series is the
+data-structure Pandas use to represent a column.
+
Pandas is built on top of the Numpy library, which in practice means
+that most of the methods defined for Numpy Arrays apply to Pandas
+Series/DataFrames.
+
What makes Pandas so attractive is the powerful interface to access
+individual records of the table, proper handling of missing values, and
+relational-databases operations between DataFrames.
+
Selecting values
+
+
+
To access a value at the position [i,j] of a DataFrame,
+we have two options, depending on what is the meaning of i
+in use. Remember that a DataFrame provides an index as a way to
+identify the rows of the table; a row, then, has a position
+inside the table as well as a label, which uniquely identifies
+its entry in the DataFrame.
+
Use DataFrame.iloc[..., ...] to select values by their
+(entry) position
+
+
+
+
Can specify location by numerical index analogously to 2D version of
+character selection in strings.
+
+
+
PYTHON
+
+
import pandas as pd
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.iloc[0, 0])
+
+
+
OUTPUT
+
+
1601.056136
+
+
Use DataFrame.loc[..., ...] to select values by their
+(entry) label.
+
+
+
+
Can specify location by row and/or column name.
+
+
+
PYTHON
+
+
print(data.loc["Albania", "gdpPercap_1952"])
+
+
+
OUTPUT
+
+
1601.056136
+
+
Use : on its own to mean all columns or all rows.
+
In the above code, we discover that slicing using
+loc is inclusive at both ends, which differs from
+slicing using iloc, where slicing
+indicates everything up to but not including the final index.
+
Result of slicing can be used in further operations.
+
+
+
+
Usually don’t just print a slice.
+
All the statistical operators that work on entire dataframes work
+the same way on slices.
Returns a similarly-shaped dataframe of True and
+False.
+
+
+
PYTHON
+
+
# Use a subset of data to keep output readable.
+subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
+print('Subset of data:\n', subset)
+
+# Which values were greater than 10000 ?
+print('\nWhere are values large?\n', subset >10000)
A frame full of Booleans is sometimes called a mask because
+of how it can be used.
+
+
+
PYTHON
+
+
mask = subset >10000
+print(subset[mask])
+
+
+
OUTPUT
+
+
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
+country
+Italy NaN 10022.40131 12269.27378
+Montenegro NaN NaN NaN
+Netherlands 12790.84956 15363.25136 18794.74567
+Norway 13450.40151 16361.87647 18965.05551
+Poland NaN NaN NaN
+
+
+
Get the value where the mask is true, and NaN (Not a Number) where
+it is false.
+
Useful because NaNs are ignored by operations like max, min,
+average, etc.
Pandas vectorizing methods and grouping operations are features that
+provide users much flexibility to analyse their data.
+
For instance, let’s say we want to have a clearer view on how the
+European countries split themselves according to their GDP.
+
+
We may have a glance by splitting the countries in two groups during
+the years surveyed, those who presented a GDP higher than the
+European average and those with a lower GDP.
+
We then estimate a wealthy score based on the historical
+(from 1962 to 2007) values, where we account how many times a country
+has participated in the groups of lower or higher
+GDP
Clearly, the second statement produces an additional column and an
+additional row compared to the first statement.
+What conclusion can we draw? We see that a numerical slice, 0:2,
+omits the final index (i.e. index 2) in the range provided,
+while a named slice, ‘gdpPercap_1952’:‘gdpPercap_1962’,
+includes the final element.
+
+
+
+
+
+
+
+
+
+
Reconstructing Data
+
+
Explain what each line in the following short program does: what is
+in first, second, etc.?
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+
+
This line loads the dataset containing the GDP data from all
+countries into a dataframe called first. The
+index_col='country' parameter selects which column to use
+as the row labels in the dataframe.
+
+
PYTHON
+
+
second = first[first['continent'] =='Americas']
+
+
This line makes a selection: only those rows of first
+for which the ‘continent’ column matches ‘Americas’ are extracted.
+Notice how the Boolean expression inside the brackets,
+first['continent'] == 'Americas', is used to select only
+those rows where the expression is true. Try printing this expression!
+Can you print also its individual True/False elements? (hint: first
+assign the expression to a variable)
+
+
PYTHON
+
+
third = second.drop('Puerto Rico')
+
+
As the syntax suggests, this line drops the row from
+second where the label is ‘Puerto Rico’. The resulting
+dataframe third has one row less than the original
+dataframe second.
+
+
PYTHON
+
+
fourth = third.drop('continent', axis =1)
+
+
Again we apply the drop function, but in this case we are dropping
+not a row but a whole column. To accomplish this, we need to specify
+also the axis parameter (we want to drop the second column
+which has index 1).
+
+
PYTHON
+
+
fourth.to_csv('result.csv')
+
+
The final step is to write the data that we have been working on to a
+csv file. Pandas makes this easy with the to_csv()
+function. The only required argument to the function is the filename.
+Note that the file will be written in the directory from which you
+started the Jupyter or Python session.
+
+
+
+
+
+
+
+
+
+
Selecting Indices
+
+
Explain in simple terms what idxmin and
+idxmax do in the short program below. When would you use
+these methods?
+
+
PYTHON
+
+
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.idxmin())
+print(data.idxmax())
+
+
+
+
+
+
+
+
+
+
For each column in data, idxmin will return
+the index value corresponding to each column’s minimum;
+idxmax will do accordingly the same for each column’s
+maximum value.
+
You can use these functions whenever you want to get the row index of
+the minimum/maximum value and not the actual minimum/maximum value.
+
+
+
+
+
+
+
+
+
+
Practice with Selection
+
+
Assume Pandas has been imported and the Gapminder GDP data for Europe
+has been loaded. Write an expression to select each of the
+following:
+
+
GDP per capita for all countries in 1982.
+
GDP per capita for Denmark for all years.
+
GDP per capita for all countries for years after 1985.
+
GDP per capita for each country in 2007 as a multiple of GDP per
+capita for that country in 1952.
+
+
+
+
+
+
+
+
+
+
1:
+
+
PYTHON
+
+
data['gdpPercap_1982']
+
+
2:
+
+
PYTHON
+
+
data.loc['Denmark',:]
+
+
3:
+
+
PYTHON
+
+
data.loc[:,'gdpPercap_1985':]
+
+
Pandas is smart enough to recognize the number at the end of the
+column label and does not give you an error, although no column named
+gdpPercap_1985 actually exists. This is useful if new
+columns are added to the CSV file later.
+
4:
+
+
PYTHON
+
+
data['gdpPercap_2007']/data['gdpPercap_1952']
+
+
+
+
+
+
+
+
+
+
+
Many Ways of Access
+
+
There are at least two ways of accessing a value or slice of a
+DataFrame: by name or index. However, there are many others. For
+example, a single column or row can be accessed either as a
+DataFrame or a Series object.
+
Suggest different ways of doing the following operations on a
+DataFrame:
+
+
Access a single column
+
Access a single row
+
Access an individual DataFrame element
+
Access several columns
+
Access several rows
+
Access a subset of specific rows and columns
+
Access a subset of row and column ranges
+
+
+
+
+
+
+
+
+
+
1. Access a single column:
+
+
PYTHON
+
+
# by name
+data["col_name"] # as a Series
+data[["col_name"]] # as a DataFrame
+
+# by name using .loc
+data.T.loc["col_name"] # as a Series
+data.T.loc[["col_name"]].T # as a DataFrame
+
+# Dot notation (Series)
+data.col_name
+
+# by index (iloc)
+data.iloc[:, col_index] # as a Series
+data.iloc[:, [col_index]] # as a DataFrame
+
+# using a mask
+data.T[data.T.index =="col_name"].T
+
+
2. Access a single row:
+
+
PYTHON
+
+
# by name using .loc
+data.loc["row_name"] # as a Series
+data.loc[["row_name"]] # as a DataFrame
+
+# by name
+data.T["row_name"] # as a Series
+data.T[["row_name"]].T # as a DataFrame
+
+# by index
+data.iloc[row_index] # as a Series
+data.iloc[[row_index]] # as a DataFrame
+
+# using mask
+data[data.index =="row_name"]
+
+
3. Access an individual DataFrame element:
+
+
PYTHON
+
+
# by column/row names
+data["column_name"]["row_name"] # as a Series
+
+data[["col_name"]].loc["row_name"] # as a Series
+data[["col_name"]].loc[["row_name"]] # as a DataFrame
+
+data.loc["row_name"]["col_name"] # as a value
+data.loc[["row_name"]]["col_name"] # as a Series
+data.loc[["row_name"]][["col_name"]] # as a DataFrame
+
+data.loc["row_name", "col_name"] # as a value
+data.loc[["row_name"], "col_name"] # as a Series. Preserves index. Column name is moved to `.name`.
+data.loc["row_name", ["col_name"]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.loc[["row_name"], ["col_name"]] # as a DataFrame (preserves original index and column name)
+
+# by column/row names: Dot notation
+data.col_name.row_name
+
+# by column/row indices
+data.iloc[row_index, col_index] # as a value
+data.iloc[[row_index], col_index] # as a Series. Preserves index. Column name is moved to `.name`
+data.iloc[row_index, [col_index]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.iloc[[row_index], [col_index]] # as a DataFrame (preserves original index and column name)
+
+# column name + row index
+data["col_name"][row_index]
+data.col_name[row_index]
+data["col_name"].iloc[row_index]
+
+# column index + row name
+data.iloc[:, [col_index]].loc["row_name"] # as a Series
+data.iloc[:, [col_index]].loc[["row_name"]] # as a DataFrame
+
+# using masks
+data[data.index =="row_name"].T[data.T.index =="col_name"].T
+
+
4. Access several columns:
+
+
PYTHON
+
+
# by name
+data[["col1", "col2", "col3"]]
+data.loc[:, ["col1", "col2", "col3"]]
+
+# by index
+data.iloc[:, [col1_index, col2_index, col3_index]]
+
+
5. Access several rows
+
+
PYTHON
+
+
# by name
+data.loc[["row1", "row2", "row3"]]
+
+# by index
+data.iloc[[row1_index, row2_index, row3_index]]
+
+
6. Access a subset of specific rows and columns
+
+
PYTHON
+
+
# by names
+data.loc[["row1", "row2", "row3"], ["col1", "col2", "col3"]]
+
+# by indices
+data.iloc[[row1_index, row2_index, row3_index], [col1_index, col2_index, col3_index]]
+
+# column names + row indices
+data[["col1", "col2", "col3"]].iloc[[row1_index, row2_index, row3_index]]
+
+# column indices + row names
+data.iloc[:, [col1_index, col2_index, col3_index]].loc[["row1", "row2", "row3"]]
+
+
7. Access a subset of row and column ranges
+
+
PYTHON
+
+
# by name
+data.loc["row1":"row2", "col1":"col2"]
+
+# by index
+data.iloc[row1_index:row2_index, col1_index:col2_index]
+
+# column names + row indices
+data.loc[:, "col1_name":"col2_name"].iloc[row1_index:row2_index]
+
+# column indices + row names
+data.iloc[:, col1_index:col2_index].loc["row1":"row2"]
+
+
+
+
+
+
+
+
+
+
+
Exploring available methods using the
+dir() function
+
+
Python includes a dir() function that can be used to
+display all of the available methods (functions) that are built into a
+data object. In Episode 4, we used some methods with a string. But we
+can see many more are available by using dir():
+
+
PYTHON
+
+
my_string ='Hello world!'# creation of a string object
+dir(my_string)
You can use help() or Shift+Tab to
+get more information about what these methods do.
+
Assume Pandas has been imported and the Gapminder GDP data for Europe
+has been loaded as data. Then, use dir() to
+find the function that prints out the median per-capita GDP across all
+European countries for each year that information is available.
+
+
+
+
+
+
+
+
+
Among many choices, dir() lists the
+median() function as a possibility. Thus,
+
+
PYTHON
+
+
data.median()
+
+
+
+
+
+
+
+
+
+
+
Interpretation
+
+
Poland’s borders have been stable since 1945, but changed several
+times in the years before then. How would you handle this if you were
+creating a table of GDP per capita for Poland for the entire twentieth
+century?
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Use DataFrame.iloc[..., ...] to select values by
+integer location.
+
Use : on its own to mean all columns or all rows.
+
Select multiple columns or rows using DataFrame.loc and
+a named slice.
+
Result of slicing can be used in further operations.
In our Jupyter Notebook example, running the cell should generate the
+figure directly below the code. The figure is also included in the
+Notebook document for future viewing. However, other Python environments
+like an interactive Python session started from a terminal or a Python
+script executed via the command line require an additional command to
+display the figure.
+
Instruct matplotlib to show a figure:
+
+
PYTHON
+
+
plt.show()
+
+
This command can also be used within a Notebook - for instance, to
+display multiple figures if several are created by a single cell.
Before plotting, we convert the column headings from a
+string to integer data type, since they
+represent numerical values, using str.replace()
+to remove the gpdPercap_ prefix and then astype(int)
+to convert the series of string values
+(['1952', '1957', ..., '2007']) to a series of integers:
+[1925, 1957, ..., 2007].
+
+
+
PYTHON
+
+
import pandas as pd
+
+data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+
+# Extract year from last 4 characters of each column name
+# The current column names are structured as 'gdpPercap_(year)',
+# so we want to keep the (year) part only for clarity when plotting GDP vs. years
+# To do this we use replace(), which removes from the string the characters stated in the argument
+# This method works on strings, so we use replace() from Pandas Series.str vectorized string functions
+
+years = data.columns.str.replace('gdpPercap_', '')
+
+# Convert year values to integers, saving results back to dataframe
+
+data.columns = years.astype(int)
+
+data.loc['Australia'].plot()
+
+
Select and transform data, then plot it.
+
+
+
+
By default, DataFrame.plot
+plots with the rows as the X axis.
+
We can transpose the data in order to plot multiple series.
+
+
+
PYTHON
+
+
data.T.plot()
+plt.ylabel('GDP per capita')
+
+
Many styles of plot are available.
+
+
+
+
For example, do a bar plot using a fancier style.
+
+
+
PYTHON
+
+
plt.style.use('ggplot')
+data.T.plot(kind='bar')
+plt.ylabel('GDP per capita')
+
+
Data can also be plotted by calling the matplotlib
+plot function directly.
+
+
+
+
The command is plt.plot(x, y)
+
+
The color and format of markers can also be specified as an
+additional optional argument e.g., b- is a blue line,
+g-- is a green dashed line.
+
Get Australia data from dataframe
+
+
+
+
PYTHON
+
+
years = data.columns
+gdp_australia = data.loc['Australia']
+
+plt.plot(years, gdp_australia, 'g--')
+
+
Can plot many sets of data together.
+
+
+
+
PYTHON
+
+
# Select two countries' worth of data.
+gdp_australia = data.loc['Australia']
+gdp_nz = data.loc['New Zealand']
+
+# Plot with differently-colored markers.
+plt.plot(years, gdp_australia, 'b-', label='Australia')
+plt.plot(years, gdp_nz, 'g-', label='New Zealand')
+
+# Create legend.
+plt.legend(loc='upper left')
+plt.xlabel('Year')
+plt.ylabel('GDP per capita ($)')
+
+
+
+
+
+
+
Adding a Legend
+
+
Often when plotting multiple datasets on the same figure it is
+desirable to have a legend describing the data.
By default matplotlib will attempt to place the legend in a suitable
+position. If you would rather specify a position this can be done with
+the loc= argument, e.g to place the legend in the upper
+left corner of the plot, specify loc='upper left'
+
+
+
+
+
Plot a scatter plot correlating the GDP of Australia and New
+Zealand
+
Use either plt.scatter or
+DataFrame.plot.scatter
+
+
+
+
PYTHON
+
+
plt.scatter(gdp_australia, gdp_nz)
+
+
+
PYTHON
+
+
data.T.plot.scatter(x ='Australia', y ='New Zealand')
+
+
+
+
+
+
+
Minima and Maxima
+
+
Fill in the blanks below to plot the minimum GDP per capita over time
+for all the countries in Europe. Modify it again to plot the maximum GDP
+per capita over time for Europe.
Modify the example in the notes to create a scatter plot showing the
+relationship between the minimum and maximum GDP per capita among the
+countries in Asia for each year in the data set. What relationship do
+you see (if any)?
No particular correlations can be seen between the minimum and
+maximum GDP values year on year. It seems the fortunes of asian
+countries do not rise and fall together.
+
+
+
+
+
+
+
+
+
+
Correlations (continued)
+
+
+
You might note that the variability in the maximum is much higher
+than that of the minimum. Take a look at the maximum and the max
+indexes:
Seems the variability in this value is due to a sharp drop after
+1972. Some geopolitics at play perhaps? Given the dominance of oil
+producing countries, maybe the Brent crude index would make an
+interesting comparison? Whilst Myanmar consistently has the lowest GDP,
+the highest GDP nation has varied more notably.
+
+
+
+
+
+
+
+
+
+
More Correlations
+
+
This short program creates a plot showing the correlation between GDP
+and life expectancy for 2007, normalizing marker size by population:
Using online help and other resources, explain what each argument to
+plot does.
+
+
+
+
+
+
+
+
+
A good place to look is the documentation for the plot function -
+help(data_all.plot).
+
kind - As seen already this determines the kind of plot to be
+drawn.
+
x and y - A column name or index that determines what data will be
+placed on the x and y axes of the plot
+
s - Details for this can be found in the documentation of
+plt.scatter. A single number or one value for each data point.
+Determines the size of the plotted points.
+
+
+
+
+
+
+
+
+
+
Saving your plot to a file
+
+
If you are satisfied with the plot you see you may want to save it to
+a file, perhaps to include it in a publication. There is a function in
+the matplotlib.pyplot module that accomplishes this: savefig.
+Calling this function, e.g. with
+
+
PYTHON
+
+
plt.savefig('my_figure.png')
+
+
will save the current figure to the file my_figure.png.
+The file format will automatically be deduced from the file name
+extension (other formats are pdf, ps, eps and svg).
+
Note that functions in plt refer to a global figure
+variable and after a figure has been displayed to the screen (e.g. with
+plt.show) matplotlib will make this variable refer to a new
+empty figure. Therefore, make sure you call plt.savefig
+before the plot is displayed to the screen, otherwise you may find a
+file with an empty plot.
+
When using dataframes, data is often generated and plotted to screen
+in one line. In addition to using plt.savefig, we can save
+a reference to the current figure in a local variable (with
+plt.gcf) and call the savefig class method
+from that variable to save the figure to file.
+
+
PYTHON
+
+
data.plot(kind='bar')
+fig = plt.gcf() # get current figure
+fig.savefig('my_figure.png')
+
+
+
+
+
+
+
+
+
+
Making your plots accessible
+
+
Whenever you are generating plots to go into a paper or a
+presentation, there are a few things you can do to make sure that
+everyone can understand your plots.
+
+
Always make sure your text is large enough to read. Use the
+fontsize parameter in xlabel,
+ylabel, title, and legend, and tick_params
+with labelsize to increase the text size of the numbers
+on your axes.
+
Similarly, you should make your graph elements easy to see. Use
+s to increase the size of your scatterplot markers and
+linewidth to increase the sizes of your plot lines.
+
Using color (and nothing else) to distinguish between different plot
+elements will make your plots unreadable to anyone who is colorblind, or
+who happens to have a black-and-white office printer. For lines, the
+linestyle parameter lets you use different types of lines.
+For scatterplots, marker lets you change the shape of your
+points. If you’re unsure about your colors, you can use Coblis
+or Color Oracle to simulate what
+your plots would look like to those with colorblindness.
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
+matplotlib is the
+most widely used scientific plotting library in Python.
print('zeroth item of pressures:', pressures[0])
+print('fourth item of pressures:', pressures[4])
+
+
+
OUTPUT
+
+
zeroth item of pressures: 0.273
+fourth item of pressures: 0.276
+
+
Lists’ values can be replaced by assigning to them.
+
+
+
+
Use an index expression on the left of assignment to replace a
+value.
+
+
+
PYTHON
+
+
pressures[0] =0.265
+print('pressures is now:', pressures)
+
+
+
OUTPUT
+
+
pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]
+
+
Appending items to a list lengthens it.
+
+
+
+
Use list_name.append to add items to the end of a
+list.
+
+
+
PYTHON
+
+
primes = [2, 3, 5]
+print('primes is initially:', primes)
+primes.append(7)
+print('primes has become:', primes)
+
+
+
OUTPUT
+
+
primes is initially: [2, 3, 5]
+primes has become: [2, 3, 5, 7]
+
+
+
+append is a method of lists.
+
+
Like a function, but tied to a particular object.
+
+
+
Use object_name.method_name to call methods.
+
+
Deliberately resembles the way we refer to things in a library.
+
+
+
We will meet other methods of lists as we go along.
+
+
Use help(list) for a preview.
+
+
+
+extend is similar to append, but it allows
+you to combine two lists. For example:
+
+
+
PYTHON
+
+
teen_primes = [11, 13, 17, 19]
+middle_aged_primes = [37, 41, 43, 47]
+print('primes is currently:', primes)
+primes.extend(teen_primes)
+print('primes has now become:', primes)
+primes.append(middle_aged_primes)
+print('primes has finally become:', primes)
+
+
+
OUTPUT
+
+
primes is currently: [2, 3, 5, 7]
+primes has now become: [2, 3, 5, 7, 11, 13, 17, 19]
+primes has finally become: [2, 3, 5, 7, 11, 13, 17, 19, [37, 41, 43, 47]]
+
+
Note that while extend maintains the “flat” structure of
+the list, appending a list to a list means the last element in
+primes will itself be a list, not an integer. Lists can
+contain values of any type; therefore, lists of lists are possible.
+
Use del to remove items from a list entirely.
+
+
+
+
We use del list_name[index] to remove an element from a
+list (in the example, 9 is not a prime number) and thus shorten it.
+
+del is not a function or a method, but a statement in
+the language.
+
+
+
PYTHON
+
+
primes = [2, 3, 5, 7, 9]
+print('primes before removing last item:', primes)
+del primes[4]
+print('primes after removing last item:', primes)
+
+
+
OUTPUT
+
+
primes before removing last item: [2, 3, 5, 7, 9]
+primes after removing last item: [2, 3, 5, 7]
+
+
The empty list contains no values.
+
+
+
+
Use [] on its own to represent a list that doesn’t
+contain any values.
+
+
“The zero of lists.”
+
+
+
Helpful as a starting point for collecting values (which we will see
+in the next episode).
+
Lists may contain values of different types.
+
+
+
+
A single list may contain numbers, strings, and anything else.
If start and stop are both non-negative
+integers, how long is the list values[start:stop]?
+
+
+
+
+
+
+
+
+
The list values[start:stop] has up to
+stop - start elements. For example,
+values[1:4] has the 3 elements values[1],
+values[2], and values[3]. Why ‘up to’? As we
+saw in episode 2, if stop
+is greater than the total length of the list values, we
+will still get a list back but it will be shorter than expected.
+
+
+
+
+
+
+
+
+
+
From Strings to Lists and Back
+
+
Given this:
+
+
PYTHON
+
+
print('string to list:', list('tin'))
+print('list to string:', ''.join(['g', 'o', 'l', 'd']))
+
+
+
OUTPUT
+
+
string to list: ['t', 'i', 'n']
+list to string: gold
+
+
+
What does list('some string') do?
+
What does '-'.join(['x', 'y', 'z']) generate?
+
+
+
+
+
+
+
+
+
+
+
+list('some string')
+converts a string into a list containing all of its characters.
+
+join
+returns a string that is the concatenation of each string
+element in the list and adds the separator between each element in the
+list. This results in x-y-z. The separator between the
+elements is the string that provides this method.
+
+
+
+
+
+
+
+
+
+
+
Working With the End
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='helium'
+print(element[-1])
+
+
+
How does Python interpret a negative index?
+
If a list or string has N elements, what is the most negative index
+that can safely be used with it, and what location does that index
+represent?
+
If values is a list, what does
+del values[-1] do?
+
How can you display all elements but the last one without changing
+values? (Hint: you will need to combine slicing and
+negative indexing.)
+
+
+
+
+
+
+
+
+
+
The program prints m.
+
+
Python interprets a negative index as starting from the end (as
+opposed to starting from the beginning). The last element is
+-1.
+
The last index that can safely be used with a list of N elements is
+element -N, which represents the first element.
+
+del values[-1] removes the last element from the
+list.
+
values[:-1]
+
+
+
+
+
+
+
+
+
+
+
Stepping Through a List
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='fluorine'
+print(element[::2])
+print(element[::-1])
+
+
+
If we write a slice as low:high:stride, what does
+stride do?
+
What expression would select all of the even-numbered items from a
+collection?
+
+
+
+
+
+
+
+
+
+
The program prints
+
+
PYTHON
+
+
furn
+eniroulf
+
+
+
+stride is the step size of the slice.
+
The slice 1::2 selects all even-numbered items from a
+collection: it starts with element 1 (which is the second
+element, since indexing starts at 0), goes on until the end
+(since no end is given), and uses a step size of
+2 (i.e., selects every second element).
+
+
+
+
+
+
+
+
+
+
+
Slice Bounds
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='lithium'
+print(element[0:20])
+print(element[-1:3])
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
lithium
+
+
The first statement prints the whole string, since the slice goes
+beyond the total length of the string. The second statement returns an
+empty string, because the slice goes “out of bounds” of the string.
+
+
+
+
+
+
+
+
+
+
Sort and Sorted
+
+
What do these two programs print? In simple terms, explain the
+difference between sorted(letters) and
+letters.sort().
+
+
PYTHON
+
+
# Program A
+letters =list('gold')
+result =sorted(letters)
+print('letters is', letters, 'and result is', result)
+
+
+
PYTHON
+
+
# Program B
+letters =list('gold')
+result = letters.sort()
+print('letters is', letters, 'and result is', result)
+
+
+
+
+
+
+
+
+
+
Program A prints
+
+
OUTPUT
+
+
letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']
+
+
Program B prints
+
+
OUTPUT
+
+
letters is ['d', 'g', 'l', 'o'] and result is None
+
+
sorted(letters) returns a sorted copy of the list
+letters (the original list letters remains
+unchanged), while letters.sort() sorts the list
+letters in-place and does not return anything.
+
+
+
+
+
+
+
+
+
+
Copying (or Not)
+
+
What do these two programs print? In simple terms, explain the
+difference between new = old and
+new = old[:].
+
+
PYTHON
+
+
# Program A
+old =list('gold')
+new = old # simple assignment
+new[0] ='D'
+print('new is', new, 'and old is', old)
+
+
+
PYTHON
+
+
# Program B
+old =list('gold')
+new = old[:] # assigning a slice
+new[0] ='D'
+print('new is', new, 'and old is', old)
+
+
+
+
+
+
+
+
+
+
Program A prints
+
+
OUTPUT
+
+
new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']
+
+
Program B prints
+
+
OUTPUT
+
+
new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']
+
+
new = old makes new a reference to the list
+old; new and old point towards
+the same object.
+
new = old[:] however creates a new list object
+new containing all elements from the list old;
+new and old are different objects.
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
A list stores many values in a single structure.
+
Use an item’s index to fetch it from a list.
+
Lists’ values can be replaced by assigning to them.
+
Appending items to a list lengthens it.
+
Use del to remove items from a list entirely.
+
The empty list contains no values.
+
Lists may contain values of different types.
+
Character strings can be indexed like lists.
+
Character strings are immutable.
+
Indexing beyond the end of the collection is an error.
This error can be fixed by removing the extra spaces at the
+beginning of the second line.
+
Loop variables can be called anything.
+
+
+
+
As with all variables, loop variables are:
+
+
Created on demand.
+
Meaningless: their names can be anything at all.
+
+
+
+
+
PYTHON
+
+
for kitten in [2, 3, 5]:
+print(kitten)
+
+
The body of a loop can contain many statements.
+
+
+
+
But no loop should be more than a few lines long.
+
Hard for human beings to keep larger chunks of code in mind.
+
+
+
PYTHON
+
+
primes = [2, 3, 5]
+for p in primes:
+ squared = p **2
+ cubed = p **3
+print(p, squared, cubed)
+
+
+
OUTPUT
+
+
2 4 8
+3 9 27
+5 25 125
+
+
Use range to iterate over a sequence of numbers.
+
+
+
+
The built-in function range
+produces a sequence of numbers.
+
+
+Not a list: the numbers are produced on demand to make
+looping over large ranges more efficient.
+
+
+
+range(N) is the numbers 0..N-1
+
+
Exactly the legal indices of a list or character string of length
+N
+
+
+
+
+
PYTHON
+
+
print('a range is not a list: range(0, 3)')
+for number inrange(0, 3):
+print(number)
+
+
+
OUTPUT
+
+
a range is not a list: range(0, 3)
+0
+1
+2
+
+
The Accumulator pattern turns many values into one.
+
+
+
+
A common pattern in programs is to:
+
+
Initialize an accumulator variable to zero, the empty
+string, or the empty list.
+
Update the variable with values from a collection.
+
+
+
+
+
PYTHON
+
+
# Sum the first 10 integers.
+total =0
+for number inrange(10):
+ total = total + (number +1)
+print(total)
+
+
+
OUTPUT
+
+
55
+
+
+
Read total = total + (number + 1) as:
+
+
Add 1 to the current value of the loop variable
+number.
+
Add that to the current value of the accumulator variable
+total.
+
Assign that to total, replacing the current value.
+
+
+
We have to add number + 1 because range
+produces 0..9, not 1..10.
+
+
+
+
+
+
+
Classifying Errors
+
+
Is an indentation error a syntax error or a runtime error?
+
+
+
+
+
+
+
+
+
An IndentationError is a syntax error. Programs with syntax errors
+cannot be started. A program with a runtime error will start but an
+error will be thrown under certain conditions.
+
+
+
+
+
+
+
+
+
+
Tracing Execution
+
+
Create a table showing the numbers of the lines that are executed
+when this program runs, and the values of the variables after each line
+is executed.
+
+
PYTHON
+
+
total =0
+for char in"tin":
+ total = total +1
+
+
+
+
+
+
+
+
+
+
+
+
Line no
+
Variables
+
+
+
+
1
+
total = 0
+
+
+
2
+
total = 0 char = ‘t’
+
+
+
3
+
total = 1 char = ‘t’
+
+
+
2
+
total = 1 char = ‘i’
+
+
+
3
+
total = 2 char = ‘i’
+
+
+
2
+
total = 2 char = ‘n’
+
+
+
3
+
total = 3 char = ‘n’
+
+
+
+
+
+
+
+
+
+
+
+
+
Reversing a String
+
+
Fill in the blanks in the program below so that it prints “nit” (the
+reverse of the original character string “tin”).
+
+
PYTHON
+
+
original ="tin"
+result = ____
+for char in original:
+ result = ____
+print(result)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
original ="tin"
+result =""
+for char in original:
+ result = char + result
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+
+
Fill in the blanks in each of the programs below to produce the
+indicated result.
+
+
PYTHON
+
+
# Total length of the strings in the list: ["red", "green", "blue"] => 12
+total =0
+for word in ["red", "green", "blue"]:
+ ____ = ____ +len(word)
+print(total)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
total =0
+for word in ["red", "green", "blue"]:
+ total = total +len(word)
+print(total)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+(continued)
+
+
+
+
PYTHON
+
+
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
+lengths = ____
+for word in ["red", "green", "blue"]:
+ lengths.____(____)
+print(lengths)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
lengths = []
+for word in ["red", "green", "blue"]:
+ lengths.append(len(word))
+print(lengths)
words = ["red", "green", "blue"]
+result =""
+for word in words:
+ result = result + word
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+(continued)
+
+
+
Create an acronym: Starting from the list
+["red", "green", "blue"], create the acronym
+"RGB" using a for loop.
+
Hint: You may need to use a string method to
+properly format the acronym.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
acronym =""
+for word in ["red", "green", "blue"]:
+ acronym = acronym + word[0].upper()
+print(acronym)
+
+
+
+
+
+
+
+
+
+
+
Cumulative Sum
+
+
Reorder and properly indent the lines of code below so that they
+print a list with the cumulative sum of data. The result should be
+[1, 3, 5, 10].
+
+
PYTHON
+
+
cumulative.append(total)
+for number in data:
+cumulative = []
+total = total + number
+total =0
+print(cumulative)
+data = [1,2,2,5]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
total =0
+data = [1,2,2,5]
+cumulative = []
+for number in data:
+ total = total + number
+ cumulative.append(total)
+print(cumulative)
+
+
+
+
+
+
+
+
+
+
+
Identifying Variable Name Errors
+
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code and read the error message. What type of
+NameError do you think this is? Is it a string with no
+quotes, a misspelled variable, or a variable that should have been
+defined but was not?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
+
PYTHON
+
+
for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (Number %3) ==0:
+ message = message + a
+else:
+ message = message +"b"
+print(message)
+
+
+
+
+
+
+
+
+
+
+
Python variable names are case sensitive: number and
+Number refer to different variables.
+
The variable message needs to be initialized as an
+empty string.
+
We want to add the string "a" to message,
+not the undefined variable a.
+
+
+
PYTHON
+
+
message =""
+for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (number %3) ==0:
+ message = message +"a"
+else:
+ message = message +"b"
+print(message)
+
+
+
+
+
+
+
+
+
+
+
Identifying Item Errors
+
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code, and read the error message. What type of error is
+it?
+
Fix the error.
+
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+
+
+
+
This list has 4 elements and the index to access the last element in
+the list is 3.
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[3])
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
A for loop executes commands once for each value in a
+collection.
+
A for loop is made up of a collection, a loop variable,
+and a body.
+
The first line of the for loop must end with a colon,
+and the body must be indented.
+
Indentation is always meaningful in Python.
+
Loop variables can be called anything (but it is strongly advised to
+have a meaningful name to the looping variable).
+
The body of a loop can contain many statements.
+
Use range to iterate over a sequence of numbers.
+
The Accumulator pattern turns many values into one.
Often use conditionals in a loop to “evolve” the values of
+variables.
+
+
+
PYTHON
+
+
velocity =10.0
+for i inrange(5): # execute the loop 5 times
+print(i, ':', velocity)
+if velocity >20.0:
+print('moving too fast')
+ velocity = velocity -5.0
+else:
+print('moving too slow')
+ velocity = velocity +10.0
+print('final velocity:', velocity)
+
+
+
OUTPUT
+
+
0 : 10.0
+moving too slow
+1 : 20.0
+moving too slow
+2 : 30.0
+moving too fast
+3 : 25.0
+moving too fast
+4 : 20.0
+moving too slow
+final velocity: 30.0
+
+
Create a table showing variables’ values to trace a program’s
+execution.
+
+
+
+
+
+i
+
+
+0
+
+
+.
+
+
+1
+
+
+.
+
+
+2
+
+
+.
+
+
+3
+
+
+.
+
+
+4
+
+
+.
+
+
+
+
+velocity
+
+
+10.0
+
+
+20.0
+
+
+.
+
+
+30.0
+
+
+.
+
+
+25.0
+
+
+.
+
+
+20.0
+
+
+.
+
+
+30.0
+
+
+
+
+
The program must have a print statement
+outside the body of the loop to show the final value of
+velocity, since its value is updated by the last iteration
+of the loop.
+
+
+
+
+
+
+
Compound Relations Using and,
+or, and Parentheses
+
+
Often, you want some combination of things to be true. You can
+combine relations within a conditional using and and
+or. Continuing the example above, suppose you have
+
+
PYTHON
+
+
mass = [ 3.54, 2.07, 9.22, 1.86, 1.71]
+velocity = [10.00, 20.00, 30.00, 25.00, 20.00]
+
+i =0
+for i inrange(5):
+if mass[i] >5and velocity[i] >20:
+print("Fast heavy object. Duck!")
+elif mass[i] >2and mass[i] <=5and velocity[i] <=20:
+print("Normal traffic")
+elif mass[i] <=2and velocity[i] <=20:
+print("Slow light object. Ignore it")
+else:
+print("Whoa! Something is up with the data. Check it")
+
+
Just like with arithmetic, you can and should use parentheses
+whenever there is possible ambiguity. A good general rule is to
+always use parentheses when mixing and and
+or in the same condition. That is, instead of:
+
+
PYTHON
+
+
if mass[i] <=2or mass[i] >=5and velocity[i] >20:
+
+
write one of these:
+
+
PYTHON
+
+
if (mass[i] <=2or mass[i] >=5) and velocity[i] >20:
+if mass[i] <=2or (mass[i] >=5and velocity[i] >20):
+
+
so it is perfectly clear to a reader (and to Python) what you really
+mean.
Fill in the blanks so that this program creates a new list containing
+zeroes where the original list’s values were negative and ones where the
+original list’s values were positive.
+
+
PYTHON
+
+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = ____
+for value in original:
+if ____:
+ result.append(0)
+else:
+ ____
+print(result)
+
+
+
OUTPUT
+
+
[0, 1, 1, 1, 0, 1]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = []
+for value in original:
+if value <0.0:
+ result.append(0)
+else:
+ result.append(1)
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Processing Small Files
+
+
Modify this program so that it only processes files with fewer than
+50 records.
+
+
PYTHON
+
+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+ contents = pd.read_csv(filename)
+ ____:
+print(filename, len(contents))
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+ contents = pd.read_csv(filename)
+iflen(contents) <50:
+print(filename, len(contents))
+
+
+
+
+
+
+
+
+
+
+
Initializing
+
+
Modify this program so that it finds the largest and smallest values
+in the list no matter what the range of values originally is.
+
+
PYTHON
+
+
values = [...some test data...]
+smallest, largest =None, None
+for v in values:
+if ____:
+ smallest, largest = v, v
+ ____:
+ smallest =min(____, v)
+ largest =max(____, v)
+print(smallest, largest)
+
+
What are the advantages and disadvantages of using this method to
+find the range of the data?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest =None, None
+for v in values:
+if smallest isNoneand largest isNone:
+ smallest, largest = v, v
+else:
+ smallest =min(smallest, v)
+ largest =max(largest, v)
+print(smallest, largest)
+
+
If you wrote == None instead of is None,
+that works too, but Python programmers always write is None
+because of the special way None works in the language.
+
It can be argued that an advantage of using this method would be to
+make the code more readable. However, a disadvantage is that this code
+is not efficient because within each iteration of the for
+loop statement, there are two more loops that run over two numbers each
+(the min and max functions). It would be more
+efficient to iterate over each number just once:
+
+
PYTHON
+
+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest =None, None
+for v in values:
+if smallest isNoneor v < smallest:
+ smallest = v
+if largest isNoneor v > largest:
+ largest = v
+print(smallest, largest)
+
+
Now we have one loop, but four comparison tests. There are two ways
+we could improve it further: either use fewer comparisons in each
+iteration, or use two loops that each contain only one comparison test.
+The simplest solution is often the best:
Use glob.glob
+to find sets of files whose names match a pattern.
+
+
+
+
In Unix, the term “globbing” means “matching a set of files with a
+pattern”.
+
The most common patterns are:
+
+
+* meaning “match zero or more characters”
+
+? meaning “match exactly one character”
+
+
+
Python’s standard library contains the glob
+module to provide pattern matching functionality
+
The glob
+module contains a function also called glob to match file
+patterns
+
E.g., glob.glob('*.txt') matches all files in the
+current directory whose names end with .txt.
+
Result is a (possibly empty) list of character strings.
+
+
+
PYTHON
+
+
import glob
+print('all csv files in data directory:', glob.glob('data/*.csv'))
+
+
+
OUTPUT
+
+
all csv files in data directory: ['data/gapminder_all.csv', 'data/gapminder_gdp_africa.csv', \
+'data/gapminder_gdp_americas.csv', 'data/gapminder_gdp_asia.csv', 'data/gapminder_gdp_europe.csv', \
+'data/gapminder_gdp_oceania.csv']
+
+
+
PYTHON
+
+
print('all PDB files:', glob.glob('*.pdb'))
+
+
+
OUTPUT
+
+
all PDB files: []
+
+
Use glob and for to process batches of
+files.
+
+
+
+
Helps a lot if the files are named and stored systematically and
+consistently so that simple patterns will find the right data.
+
+
+
PYTHON
+
+
for filename in glob.glob('data/gapminder_*.csv'):
+ data = pd.read_csv(filename)
+print(filename, data['gdpPercap_1952'].min())
You might have chosen to initialize the fewest variable
+with a number greater than the numbers you’re dealing with, but that
+could lead to trouble if you reuse the code with bigger numbers. Python
+lets you use positive infinity, which will work no matter how big your
+numbers are. What other special strings does the float
+function recognize?
+
+
+
+
+
+
+
+
+
+
Comparing Data
+
+
Write a program that reads in the regional data sets and plots the
+average GDP per capita for each region over time in a single chart.
+Pandas will raise an error if it encounters non-numeric columns in a
+dataframe computation so you may need to either filter out those columns
+or tell pandas to ignore them.
+
+
+
+
+
+
+
+
+
This solution builds a useful legend by using the string
+split method to extract the region from
+the path ‘data/gapminder_gdp_a_specific_region.csv’.
+
+
PYTHON
+
+
import glob
+import pandas as pd
+import matplotlib.pyplot as plt
+fig, ax = plt.subplots(1,1)
+for filename in glob.glob('data/gapminder_gdp*.csv'):
+ dataframe = pd.read_csv(filename)
+# extract <region> from the filename, expected to be in the format 'data/gapminder_gdp_<region>.csv'.
+# we will split the string using the split method and `_` as our separator,
+# retrieve the last string in the list that split returns (`<region>.csv`),
+# and then remove the `.csv` extension from that string.
+# NOTE: the pathlib module covered in the next callout also offers
+# convenient abstractions for working with filesystem paths and could solve this as well:
+# from pathlib import Path
+# region = Path(filename).stem.split('_')[-1]
+ region = filename.split('_')[-1][:-4]
+# pandas raises errors when it encounters non-numeric columns in a dataframe computation
+# but we can tell pandas to ignore them with the `numeric_only` parameter
+ dataframe.mean(numeric_only=True).plot(ax=ax, label=region)
+# NOTE: another way of doing this selects just the columns with gdp in their name using the filter method
+# dataframe.filter(like="gdp").mean().plot(ax=ax, label=region)
+
+plt.legend()
+plt.show()
+
+
+
+
+
+
+
+
+
+
+
Dealing with File Paths
+
+
The pathlib
+module provides useful abstractions for file and path manipulation
+like returning the name of a file without the file extension. This is
+very useful when looping over files and directories. In the example
+below, we create a Path object and inspect its
+attributes.
A common refrain in software engineering is “Don’t Repeat Yourself”.
+How do the techniques we’ve learned in the last lessons help us avoid
+repeating ourselves? Note that in practice there is some nuance to
+this and should be balanced with doing the simplest thing that could
+possibly work.
+
+
What are the pros / cons of making a variable global or local to a
+function?
+
When would you consider turning a block of code into a function
+definition?
Explain and identify the difference between function definition and
+function call.
+
Write a function that takes a small, fixed number of arguments and
+produces a single result.
+
+
+
+
+
+
+
Break programs down into functions to make them easier to
+understand.
+
+
+
+
Human beings can only keep a few items in working memory at a
+time.
+
Understand larger/more complicated ideas by understanding and
+combining pieces.
+
+
Components in a machine.
+
Lemmas when proving theorems.
+
+
+
Functions serve the same purpose in programs.
+
+
+Encapsulate complexity so that we can treat it as a single
+“thing”.
+
+
+
Also enables re-use.
+
+
Write one time, use many times.
+
+
+
Define a function using def with a name, parameters,
+and a block of code.
+
+
+
+
Begin the definition of a new function with def.
+
Followed by the name of the function.
+
+
Must obey the same rules as variable names.
+
+
+
Then parameters in parentheses.
+
+
Empty parentheses if the function doesn’t take any inputs.
+
We will discuss this in detail in a moment.
+
+
+
Then a colon.
+
Then an indented block of code.
+
+
+
PYTHON
+
+
def print_greeting():
+print('Hello!')
+print('The weather is nice today.')
+print('Right?')
+
+
Defining a function does not run it.
+
+
+
+
Defining a function does not run it.
+
+
Like assigning a value to a variable.
+
+
+
Must call the function to execute the code it contains.
+
+
+
PYTHON
+
+
print_greeting()
+
+
+
OUTPUT
+
+
Hello!
+
+
Arguments in a function call are matched to its defined
+parameters.
+
+
+
+
Functions are most useful when they can operate on different
+data.
+
Specify parameters when defining a function.
+
+
These become variables when the function is executed.
+
Are assigned the arguments in the call (i.e., the values passed to
+the function).
+
If you don’t name the arguments when using them in the call, the
+arguments will be matched to parameters in the order the parameters are
+defined in the function.
Or, we can name the arguments when we call the function, which allows
+us to specify them in any order and adds clarity to the call site;
+otherwise as one is reading the code they might forget if the second
+argument is the month or the day for example.
+
+
PYTHON
+
+
print_date(month=3, day=19, year=1871)
+
+
+
OUTPUT
+
+
1871/3/19
+
+
+
Via Twitter:
+() contains the ingredients for the function while the body
+contains the recipe.
+
Functions may return a result to their caller using
+return.
+
+
+
+
Use return ... to give a value back to the caller.
+
May occur anywhere in the function.
+
But functions are easier to understand if return
+occurs:
+
A function that doesn’t explicitly return a value
+automatically returns None.
+
+
+
PYTHON
+
+
result = print_date(1871, 3, 19)
+print('result of call is:', result)
+
+
+
OUTPUT
+
+
1871/3/19
+result of call is: None
+
+
+
+
+
+
+
Identifying Syntax Errors
+
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code and read the error message. Is it a
+SyntaxError or an IndentationError?
+
Fix the error.
+
Repeat steps 2 and 3 until you have fixed all the errors.
+
+
+
PYTHON
+
+
def another_function
+print("Syntax errors are annoying.")
+print("But at least python tells us about them!")
+print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def another_function():
+print("Syntax errors are annoying.")
+print("But at least Python tells us about them!")
+print("So they are usually not too hard to fix.")
A function call always needs parenthesis, otherwise you get memory
+address of the function object. So, if we wanted to call the function
+named report, and give it the value 22.5 to report on, we could have our
+function call as follows
After fixing the problem above, explain why running this example
+code:
+
+
+
PYTHON
+
+
result = print_time(11, 37, 59)
+print('result of call is:', result)
+
+
gives this output:
+
+
OUTPUT
+
+
11:37:59
+result of call is: None
+
+
+
Why is the result of the call None?
+
+
+
+
+
+
+
+
+
+
+
The problem with the example is that the function
+print_time() is defined after the call to the
+function is made. Python doesn’t know how to resolve the name
+print_time since it hasn’t been defined yet and will raise
+a NameError e.g.,
+NameError: name 'print_time' is not defined
+
The first line of output 11:37:59 is printed by the
+first line of code, result = print_time(11, 37, 59) that
+binds the value returned by invoking print_time to the
+variable result. The second line is from the second print
+call to print the contents of the result variable.
+
print_time() does not explicitly return
+a value, so it automatically returns None.
+
+
+
+
+
+
+
+
+
+
+
Encapsulation
+
+
Fill in the blanks to create a function that takes a single filename
+as an argument, loads the data in the file named by the argument, and
+returns the minimum value in that data.
+
+
PYTHON
+
+
import pandas as pd
+
+def min_in_data(____):
+ data = ____
+return ____
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import pandas as pd
+
+def min_in_data(filename):
+ data = pd.read_csv(filename)
+return data.min()
+
+
+
+
+
+
+
+
+
+
+
Find the First
+
+
Fill in the blanks to create a function that takes a list of numbers
+as an argument and returns the first negative value in the list. What
+does your function do if the list is empty? What if the list has no
+negative numbers?
+
+
PYTHON
+
+
def first_negative(values):
+for v in ____:
+if ____:
+return ____
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def first_negative(values):
+for v in values:
+if v <0:
+return v
+
+
If an empty list or a list with all positive values is passed to this
+function, it returns None:
+
+
PYTHON
+
+
my_list = []
+print(first_negative(my_list))
+
+
+
OUTPUT
+
+
None
+
+
+
+
+
+
+
+
+
+
+
Calling by Name
+
+
Earlier we saw this function:
+
+
PYTHON
+
+
def print_date(year, month, day):
+ joined =str(year) +'/'+str(month) +'/'+str(day)
+print(joined)
+
+
We saw that we can call the function using named arguments,
+like this:
+
+
PYTHON
+
+
print_date(day=1, month=2, year=2003)
+
+
+
What does print_date(day=1, month=2, year=2003)
+print?
+
When have you seen a function call like this before?
+
When and why is it useful to call functions this way?
+
+
+
+
+
+
+
+
+
+
+
2003/2/1
+
We saw examples of using named arguments when working with
+the pandas library. For example, when reading in a dataset using
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country'),
+the last argument index_col is a named argument.
+
Using named arguments can make code more readable since one can see
+from the function call what name the different arguments have inside the
+function. It can also reduce the chances of passing arguments in the
+wrong order, since by using named arguments the order doesn’t
+matter.
+
+
+
+
+
+
+
+
+
+
+
Encapsulation of an If/Print Block
+
+
The code below will run on a label-printer for chicken eggs. A
+digital scale will report a chicken egg mass (in grams) to the computer
+and then the computer will print a label.
+
+
PYTHON
+
+
import random
+for i inrange(10):
+
+# simulating the mass of a chicken egg
+# the (random) mass will be 70 +/- 20 grams
+ mass =70+20.0* (2.0* random.random() -1.0)
+
+print(mass)
+
+# egg sizing machinery prints a label
+if mass >=85:
+print("jumbo")
+elif mass >=70:
+print("large")
+elif mass <70and mass >=55:
+print("medium")
+else:
+print("small")
+
+
The if-block that classifies the eggs might be useful in other
+situations, so to avoid repeating it, we could fold it into a function,
+get_egg_label(). Revising the program to use the function
+would give us this:
+
+
PYTHON
+
+
# revised version
+import random
+for i inrange(10):
+
+# simulating the mass of a chicken egg
+# the (random) mass will be 70 +/- 20 grams
+ mass =70+20.0* (2.0* random.random() -1.0)
+
+print(mass, get_egg_label(mass))
+
+
+
Create a function definition for get_egg_label() that
+will work with the revised program above. Note that the
+get_egg_label() function’s return value will be important.
+Sample output from the above program would be
+71.23 large.
+
A dirty egg might have a mass of more than 90 grams, and a spoiled
+or broken egg will probably have a mass that’s less than 50 grams.
+Modify your get_egg_label() function to account for these
+error conditions. Sample output could be
+25 too light, probably spoiled.
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def get_egg_label(mass):
+# egg sizing machinery prints a label
+ egg_label ="Unlabelled"
+if mass >=90:
+ egg_label ="warning: egg might be dirty"
+elif mass >=85:
+ egg_label ="jumbo"
+elif mass >=70:
+ egg_label ="large"
+elif mass <70and mass >=55:
+ egg_label ="medium"
+elif mass <50:
+ egg_label ="too light, probably spoiled"
+else:
+ egg_label ="small"
+return egg_label
How would you generalize this function if you did not know
+beforehand which specific years occurred as columns in the data? For
+instance, what if we also had data from years ending in 1 and 9 for each
+decade? (Hint: use the columns to filter out the ones that correspond to
+the decade, instead of enumerating them in the code.)
+
+
+
+
+
+
+
+
+
+
+
The average GDP for Japan across the years reported for the 1980s is
+computed with:
To obtain the average for the relevant years, we need to loop over
+them:
+
+
+
PYTHON
+
+
def avg_gdp_in_decade(country, continent, year):
+ data_countries = pd.read_csv('data/gapminder_gdp_'+ continent +'.csv', index_col=0)
+ c = data_countries.loc[country]
+ gdp_decade ='gdpPercap_'+str(year //10)
+ total =0.0
+ num_years =0
+for yr_header in c.index: # c's index contains reported years
+if yr_header.startswith(gdp_decade):
+ total = total + c.loc[yr_header]
+ num_years = num_years +1
+return total/num_years
+
+
The function can now be called by:
+
+
PYTHON
+
+
avg_gdp_in_decade('Japan','asia',1983)
+
+
+
OUTPUT
+
+
20880.023800000003
+
+
+
+
+
+
+
+
+
+
+
Simulating a dynamical system
+
+
In mathematics, a dynamical
+system is a system in which a function describes the time dependence
+of a point in a geometrical space. A canonical example of a dynamical
+system is the logistic map, a
+growth model that computes a new population density (between 0 and 1)
+based on the current density. In the model, time takes discrete values
+0, 1, 2, …
+
+
Define a function called logistic_map that takes two
+inputs: x, representing the current population (at time
+t), and a parameter r = 1. This function
+should return a value representing the state of the system (population)
+at time t + 1, using the mapping function:
+
+
f(t+1) = r * f(t) * [1 - f(t)]
+
+
Using a for or while loop, iterate the
+logistic_map function defined in part 1, starting from an
+initial population of 0.5, for a period of time
+t_final = 10. Store the intermediate results in a list so
+that after the loop terminates you have accumulated a sequence of values
+representing the state of the logistic map at times
+t = [0,1,...,t_final] (11 values in total). Print this list
+to see the evolution of the population.
+
Encapsulate the logic of your loop into a function called
+iterate that takes the initial population as its first
+input, the parameter t_final as its second input and the
+parameter r as its third input. The function should return
+the list of values representing the state of the logistic map at times
+t = [0,1,...,t_final]. Run this function for periods
+t_final = 100 and 1000 and print some of the
+values. Is the population trending toward a steady state?
Functions will often contain conditionals. Here is a short example
+that will indicate which quartile the argument is in based on hand-coded
+values for the quartile cut points.
+
+
PYTHON
+
+
def calculate_life_quartile(exp):
+if exp <58.41:
+# This observation is in the first quartile
+return1
+elif exp >=58.41and exp <67.05:
+# This observation is in the second quartile
+return2
+elif exp >=67.05and exp <71.70:
+# This observation is in the third quartile
+return3
+elif exp >=71.70:
+# This observation is in the fourth quartile
+return4
+else:
+# This observation has bad data
+returnNone
+
+calculate_life_quartile(62.5)
+
+
+
OUTPUT
+
+
2
+
+
That function would typically be used within a for loop,
+but Pandas has a different, more efficient way of doing the same thing,
+and that is by applying a function to a dataframe or a portion
+of a dataframe. Here is an example, using the definition above.
+
+
PYTHON
+
+
data = pd.read_csv('data/gapminder_all.csv')
+data['life_qrtl'] = data['lifeExp_1952'].apply(calculate_life_quartile)
+
+
There is a lot in that second line, so let’s take it piece by piece.
+On the right side of the = we start with
+data['lifeExp'], which is the column in the dataframe
+called data labeled lifExp. We use the
+apply() to do what it says, apply the
+calculate_life_quartile to the value of this column for
+every row in the dataframe.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Break programs down into functions to make them easier to
+understand.
+
Define a function using def with a name, parameters,
+and a block of code.
+
Defining a function does not run it.
+
Arguments in a function call are matched to its defined
+parameters.
+
Functions may return a result to their caller using
+return.
Read a traceback and determine the file, function, and line number
+on which the error occurred, the type of error, and the error
+message.
+
+
+
+
+
+
+
The scope of a variable is the part of a program that can ‘see’ that
+variable.
+
+
+
+
There are only so many sensible names for variables.
+
People using functions shouldn’t have to worry about what variable
+names the author of the function used.
+
People writing functions shouldn’t have to worry about what variable
+names the function’s caller uses.
+
The part of a program in which a variable is visible is called its
+scope.
+
+
+
PYTHON
+
+
pressure =103.9
+
+def adjust(t):
+ temperature = t *1.43/ pressure
+return temperature
+
+
+
+pressure is a global variable.
+
+
Defined outside any particular function.
+
Visible everywhere.
+
+
+
+t and temperature are local
+variables in adjust.
+
+
Defined in the function.
+
Not visible in the main program.
+
Remember: a function parameter is a variable that is automatically
+assigned a value when the function is called.
+
+
+
+
+
PYTHON
+
+
print('adjusted:', adjust(0.9))
+print('temperature after call:', temperature)
+
+
+
OUTPUT
+
+
adjusted:0.01238691049085659
+
+
+
ERROR
+
+
Traceback (most recent call last):
+ File "/Users/swcarpentry/foo.py", line 8, in <module>
+ print('temperature after call:', temperature)
+NameError: name 'temperature' is not defined
+
+
+
+
+
+
+
Local and Global Variable Use
+
+
Trace the values of all variables in this program as it is executed.
+(Use ‘—’ as the value of variables before and after they exist.)
Read the traceback below, and identify the following:
+
+
How many levels does the traceback have?
+
What is the file name where the error occurred?
+
What is the function name where the error occurred?
+
On which line number in this function did the error occur?
+
What is the type of error?
+
What is the error message?
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+KeyError Traceback (most recent call last)
+<ipython-input-2-e4c4cbafeeb5> in <module>()
+ 1 import errors_02
+----> 2 errors_02.print_friday_message()
+
+/Users/ghopper/thesis/code/errors_02.py in print_friday_message()
+ 13
+ 14 def print_friday_message():
+---> 15 print_message("Friday")
+
+/Users/ghopper/thesis/code/errors_02.py in print_message(day)
+ 9 "sunday": "Aw, the weekend is almost over."
+ 10 }
+---> 11 print(messages[day])
+ 12
+ 13
+
+KeyError: 'Friday'
+
+
+
+
+
+
+
+
+
+
+
Three levels.
+
errors_02.py
+
print_message
+
Line 11
+
+KeyError. These errors occur when we are trying to look
+up a key that does not exist (usually in a data structure such as a
+dictionary). We can find more information about the
+KeyError and other built-in exceptions in the Python
+docs.
+
KeyError: 'Friday'
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
The scope of a variable is the part of a program that can ‘see’ that
+variable.
Provide sound justifications for basic rules of coding style.
+
Refactor one-page programs to make them more readable and justify
+the changes.
+
Use Python community coding standards (PEP-8).
+
+
+
+
+
+
+
Coding style
+
+
+
A consistent coding style helps others (including our future selves)
+read and understand code more easily. Code is read much more often than
+it is written, and as the Zen of Python
+states, “Readability counts”. Python proposed a standard style through
+one of its first Python Enhancement Proposals (PEP), PEP8.
+
Some points worth highlighting:
+
+
document your code and ensure that assumptions, internal algorithms,
+expected inputs, expected outputs, etc., are clear
+
use clear, semantically meaningful variable names
+
use white-space, not tabs, to indent lines (tabs can cause
+problems across different text editors, operating systems, and version
+control systems)
+
Follow standard Python style in your code.
+
+
+
+
+PEP8: a style
+guide for Python that discusses topics such as how to name variables,
+how to indent your code, how to structure your import
+statements, etc. Adhering to PEP8 makes it easier for other Python
+developers to read and understand your code, and to understand what
+their contributions should look like.
+
To check your code for compliance with PEP8, you can use the pycodestyle application
+and tools like the black code
+formatter can automatically format your code to conform to PEP8 and
+pycodestyle (a Jupyter notebook formatter also exists nb_black).
+
Some groups and organizations follow different style guidelines
+besides PEP8. For example, the Google style
+guide on Python makes slightly different recommendations. Google
+wrote an application that can help you format your code in either their
+style or PEP8 called yapf.
+
With respect to coding style, the key is consistency.
+Choose a style for your project be it PEP8, the Google style, or
+something else and do your best to ensure that you and anyone else you
+are collaborating with sticks to it. Consistency within a project is
+often more impactful than the particular style used. A consistent style
+will make your software easier to read and understand for others and for
+your future self.
+
Use assertions to check for internal errors.
+
+
+
Assertions are a simple but powerful method for making sure that the
+context in which your code is executing is as you expect.
+
+
PYTHON
+
+
def calc_bulk_density(mass, volume):
+'''Return dry bulk density = powder mass / powder volume.'''
+assert volume >0
+return mass / volume
+
+
If the assertion is False, the Python interpreter raises
+an AssertionError runtime exception. The source code for
+the expression that failed will be displayed as part of the error
+message. To ignore assertions in your code run the interpreter with the
+‘-O’ (optimize) switch. Assertions should contain only simple checks and
+never change the state of the program. For example, an assertion should
+never contain an assignment.
+
Use docstrings to provide builtin help.
+
+
+
If the first thing in a function is a character string that is not
+assigned directly to a variable, Python attaches it to the function,
+accessible via the builtin help function. This string that provides
+documentation is also known as a docstring.
+
+
PYTHON
+
+
def average(values):
+"Return average of values, or None if no values are supplied."
+
+iflen(values) ==0:
+returnNone
+returnsum(values) /len(values)
+
+help(average)
+
+
+
OUTPUT
+
+
Help on function average in module __main__:
+
+average(values)
+ Return average of values, or None if no values are supplied.
+
+
+
+
+
+
+
Multiline Strings
+
+
Often use multiline strings for documentation. These start
+and end with three quote characters (either single or double) and end
+with three matching characters.
+
+
PYTHON
+
+
"""This string spans
+multiple lines.
+
+Blank lines are allowed."""
+
+
+
+
+
+
+
+
+
+
What Will Be Shown?
+
+
Highlight the lines in the code below that will be available as
+online help. Are there lines that should be made available, but won’t
+be? Will any lines produce a syntax error or a runtime error?
+
+
PYTHON
+
+
"Find maximum edit distance between multiple sequences."
+# This finds the maximum distance between all sequences.
+
+def overall_max(sequences):
+'''Determine overall maximum edit distance.'''
+
+ highest =0
+for left in sequences:
+for right in sequences:
+'''Avoid checking sequence against itself.'''
+if left != right:
+ this = edit_distance(left, right)
+ highest =max(highest, this)
+
+# Report.
+return highest
+
+
+
+
+
+
+
+
+
+
Document This
+
+
Use comments to describe and help others understand potentially
+unintuitive sections or individual lines of code. They are especially
+useful to whoever may need to understand and edit your code in the
+future, including yourself.
+
Use docstrings to document the acceptable inputs and expected outputs
+of a method or class, its purpose, assumptions and intended behavior.
+Docstrings are displayed when a user invokes the builtin
+help method on your method or class.
+
Turn the comment in the following function into a docstring and check
+that help displays it properly.
+
+
PYTHON
+
+
def middle(a, b, c):
+# Return the middle value of three.
+# Assumes the values can actually be compared.
+ values = [a, b, c]
+ values.sort()
+return values[1]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def middle(a, b, c):
+'''Return the middle value of three.
+ Assumes the values can actually be compared.'''
+ values = [a, b, c]
+ values.sort()
+return values[1]
+
+
+
+
+
+
+
+
+
+
+
Clean Up This Code
+
+
+
Read this short program and try to predict what it does.
+
Run it: how accurate was your prediction?
+
Refactor the program to make it more readable. Remember to run it
+after each change to ensure its behavior hasn’t changed.
+
Compare your rewrite with your neighbor’s. What did you do the same?
+What did you do differently, and why?
+
+
+
PYTHON
+
+
n =10
+s ='et cetera'
+print(s)
+i =0
+while i < n:
+# print('at', j)
+ new =''
+for j inrange(len(s)):
+ left = j-1
+ right = (j+1)%len(s)
+if s[left]==s[right]: new = new +'-'
+else: new = new +'*'
+ s=''.join(new)
+print(s)
+ i +=1
+
+
+
+
+
+
+
+
+
+
Here’s one solution.
+
+
PYTHON
+
+
def string_machine(input_string, iterations):
+"""
+ Takes input_string and generates a new string with -'s and *'s
+ corresponding to characters that have identical adjacent characters
+ or not, respectively. Iterates through this procedure with the resultant
+ strings for the supplied number of iterations.
+ """
+print(input_string)
+ input_string_length =len(input_string)
+ old = input_string
+for i inrange(iterations):
+ new =''
+# iterate through characters in previous string
+for j inrange(input_string_length):
+ left = j-1
+ right = (j+1) % input_string_length # ensure right index wraps around
+if old[left] == old[right]:
+ new = new +'-'
+else:
+ new = new +'*'
+print(new)
+# store new string as old
+ old = new
+
+string_machine('et cetera', 10)
Name and locate scientific Python community sites for software,
+workshops, and help.
+
+
+
+
+
+
+
Leslie Lamport once said, “Writing is nature’s way of showing you how
+sloppy your thinking is.” The same is true of programming: many things
+that seem obvious when we’re thinking about them turn out to be anything
+but when we have to explain them precisely.
+
Python supports a large and diverse community across academia and
+industry.
+
We are filling in the exercises below in order to make the lesson plan
+more concrete. Contributions (both in the form of pull requests with
+filled-in exercises, and comments on specific exercises, ordering, and
+timings) are greatly appreciated.
+
+
+
+
Process Used
+
+
Michael Pollan’s advice if he taught R or Python programming:
This lesson was developed using a slimmed-down variant of the
+“Understanding by Design” process. The main sections are:
+
Assumptions about audience, time, etc. (The current draft also
+includes some conclusions and decisions in this section - that should be
+refactored.)
+
Desired results: overall goals, summative assessments at half-day
+granularity, what learners will be able to do, what learners will
+know.
+
Learning plan: each episode has a heading that summarizes what
+will be covered, then estimates time that will be spent on teaching and
+on exercises, while the exercises are given as bullet points.
+
Stage 1: Assumptions
+
Audience
+
Graduate students in numerate disciplines from cosmology to
+archaeology
+
Who have manipulated data in spreadsheets and with interactive tools
+like SAS
+
But have not programmed beyond CPD
+(copy-paste-despair)
+
+
Constraints
+
One full day 09:00-16:30
+
06:15 class time
+
0:45 lunch
+
0:30 total for two coffee breaks
+
+
Learners use native installs on their own machines
+
May use VMs or cloud resources at instructor’s discretion
+
But must keep native local install as an option
+
+
No dependence on other Carpentry modules
+
In particular, does not require knowledge of shell or version
+control
+
+
Use the Jupyter Notebook
+
Authentic tool used by many instructors
+
There isn’t really an alternative
+
And means that even people who have seen a bit of Python before will
+probably learn something
+
+
+
Motivating Example
+
Creating 2D plots suitable for inclusion in papers
+
Appeals to almost everyone
+
Makes lesson usable by both Carpentries
+
And means that even people who have seen a bit of Python before will
+probably learn something
+
+
+
Data
+
Use the gapminder data throughout
+
But break into multiple files by continent
+
To make display of output from examples tidier (e.g., use
+Australia/New Zealand, which is only two lines)
+
And allow examples showing use of multiple data sets
+
+
+
Focus on Pandas instead of NumPy
+
Makes lesson usable by both Data Carpentry and Software
+Carpentry
+
Genuine novices are likely to want data analysis
+
And people with some prior experience:
+
will accept data analysis as an authentic task,
+
and are unlikely to have encountered Pandas, so they’ll still get
+something useful out of the lesson
+
+
+
Challenges will mostly not be “write this code from
+scratch”
+
Want lots of short exercises that can reliably be finished in
+allotted time
+
So use MCQs, fill-in-the-blanks, Parsons Problems, “tweak this
+code”, etc.
+
+
Stage 2: Desired Results
+
+
Questions
+
How do I…
+
…read tabular data?
+
…plot a single vector of values?
+
…create a time series plot?
+
…create one plot for each of several data sets?
+
…get extra data from a single data set for plotting?
+
…write programs I can read and re-use in future?
+
+
+
Skills
+
I can…
+
…write short scripts using loops and conditionals.
+
…write functions with a fixed number of parameters that return a
+single result.
+
…import libraries using aliases and refer to those libraries’
+contents.
+
…do simple data extraction and formatting using Pandas.
+
+
+
Concepts
+
I know…
+
…that a program is a piece of lab equipment that implements an
+analysis
+
Needs to be validated/calibrated before/during use
+
Makes analysis reproducible, reviewable, shareable
+
+
…that programs are written for people, not for computers
+
Meaningful variable names
+
Modularity for readability as well as re-use
+
No duplication
+
Document purpose and use
+
+
…that there is no magic: the programs they use are no different in
+principle from those they build
+
…how to assign values to variables
+
…what integers, floats, strings, NumPy arrays, and Pandas dataframes
+are
+
…how to trace the execution of a for loop
+
…how to trace the execution of if/else
+statements
+
…how to create and index lists
+
…how to create and index NumPy arrays
+
…how to create and index Pandas dataframes
+
…how to create time series plots
+
…the difference between defining and calling a function
+
…where to find documentation on standard libraries
+
…how to find out what else scientific Python offers
+
+
Stage 3: Learning Plan
+
+
Summative Assessment
+
Midpoint: create time-series plot for each file in a directory.
+
Final: extract data from Pandas dataframe and create comparative
+multi-line time series plot.
Select entire rows or entire columns from a dataframe.
+
Select a subset of both rows and columns from a dataframe in a
+single operation.
+
Select a subset of a dataframe by a single Boolean criterion.
+
+
Challenges: 15 min
+
Write an expression to find the Per Capita GDP of Serbia in
+2007.
+
What rule governs what is (or isn’t) included in numerical and named
+slices in Pandas?
+
What does each line in the following short program do?
+
What do idxmin and idxmax do?
+
Write expressions to get the GDP per capita for all countries in
+1982, for all countries after 1985, etc.
+
Given the way its borders have changed since 1900, what would you do
+if asked to create a table of GDP per capita for Poland for the
+Twentieth Century?
Image 1 of 1: ‘A line of Python code, print(atom_name[0]), demonstrates that using the zero index will output just the initial letter, in this case ‘h’ for helium.’
Image 1 of 1: ‘A line chart showing time (hr) relative to position (km), using the values provided in the code block above. By default, the plotted line is blue against a white background, and the axes have been scaled automatically to fit the range of the input data.’
+
+
Figure 2
+
Image 1 of 1: ‘GDP plot for Australia’
+
+
Figure 3
+
Image 1 of 1: ‘GDP plot for Australia and New Zealand’
+
+
Figure 4
+
Image 1 of 1: ‘GDP barplot for Australia’
+
+
Figure 5
+
Image 1 of 1: ‘GDP formatted plot for Australia’
+
+
Figure 6
+
Image 1 of 1: ‘GDP formatted plot for Australia and New Zealand’
+
+
Figure 7
+
Image 1 of 1: ‘GDP correlation using plt.scatter’
+
+
Figure 8
+
Image 1 of 1: ‘GDP correlation using data.T.plot.scatter’
+
+
+
+
diff --git a/index.html b/index.html
new file mode 100644
index 000000000..ca8694213
--- /dev/null
+++ b/index.html
@@ -0,0 +1,545 @@
+
+Plotting and Programming in Python: Summary and Setup
+ Skip to main content
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Plotting and Programming in Python
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Summary and Setup
+
+
+
This lesson is an introduction to programming in Python 3 for people
+with little or no previous programming experience. It uses plotting as
+its motivating example and is designed to be used in both Data Carpentry and Software Carpentry
+workshops. This lesson references JupyterLab but
+can be taught using alternative Python 3 interpreters as well (e.g.,
+repl.it, Anaconda).
+
+
+
+
+
+
Prerequisites
+
+
Learners need to understand what files and directories are, what
+a working directory is, and how to start a Python interpreter.
+
Learners must install Python 3 before the class starts.
The data we will be using is taken from the gapminder
+dataset. To obtain it, download and unzip the file python-novice-gapminder-data.zip.
+In order to follow the presented material, you should launch the
+JupyterLab server in the root directory (see Starting
+JupyterLab).
+
+
diff --git a/instructor-notes.html b/instructor-notes.html
new file mode 100644
index 000000000..cafbcf484
--- /dev/null
+++ b/instructor-notes.html
@@ -0,0 +1,630 @@
+
+
+
+
+
+Plotting and Programming in Python: Instructor Notes
+
+
+
+
+
+
+
+
+
+
+
+
+ Skip to main content
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Plotting and Programming in Python
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Instructor Notes
+
+
General Notes
+
+
+
+
It’s all right not to get through the whole lesson.
+
+This lesson is designed for people who have never programmed before, but
+any given class may include people with a wide range of prior
+experience. We have therefore included enough material to fill a full
+day if need be, but expect that many offerings will only get as far as
+the introduction to Pandas.
+
+
Don’t tell people to Google things.
+
+One of the goals of this lesson is to help novices build a workable
+mental model of how programming works. Until they have that model, they
+will not know what to search for or how to recognize a helpful answer.
+Telling them to Google can also give the impression that we think their
+problem is trivial. (That said, if learners have done enough programming
+before to be past these issues, having them search for solutions online
+can help them solidify their understanding.) It’s also worth quoting Trevor
+King’s comment about online search: “If you find anything, other
+folks were confused enough to bother with a blog or Stack Overflow post,
+so it’s probably not trivial.”
+
Understand the difference between a Python script and a Jupyter
+notebook.
+
Create Markdown cells in a notebook.
+
Create and run Python cells in a notebook.
+
+
+
+
+
+
To run Python, we are going to use Jupyter Notebooks via JupyterLab for
+the remainder of this workshop. Jupyter notebooks are common in data
+science and visualization and serve as a convenient common-denominator
+experience for running Python code interactively where we can easily
+view and share the results of our Python code.
+
There are other ways of editing, managing, and running code. Software
+developers often use an integrated development environment (IDE) like PyCharm or Visual Studio Code, or text
+editors like Vim or Emacs, to create and edit their Python programs.
+After editing and saving your Python programs you can execute those
+programs within the IDE itself or directly on the command line. In
+contrast, Jupyter notebooks let us execute and view the results of our
+Python code immediately within the notebook.
+
JupyterLab has several other handy features:
+
You can easily type, edit, and copy and paste blocks of code.
+
Tab complete allows you to easily access the names of things you are
+using and learn more about them.
+
It allows you to annotate your code with links, different sized
+text, bullets, etc. to make it more accessible to you and your
+collaborators.
+
It allows you to display figures next to the code that produces them
+to tell a complete story of the analysis.
+
Each notebook contains one or more cells that contain code, text, or
+images.
+
Getting Started with JupyterLab
+
JupyterLab is an application server with a web user interface from Project Jupyter that enables one to work
+with documents and activities such as Jupyter notebooks, text editors,
+terminals, and even custom components in a flexible, integrated, and
+extensible manner. JupyterLab requires a reasonably up-to-date browser
+(ideally a current version of Chrome, Safari, or Firefox); Internet
+Explorer versions 9 and below are not supported.
+
JupyterLab is included as part of the Anaconda Python distribution.
+If you have not already installed the Anaconda Python distribution, see
+the setup instructions for installation
+instructions.
+
In this lesson we will run JupyterLab locally on our own machines so
+it will not require an internet connection besides the initial
+connection to download and install Anaconda and JupyterLab
+
Start the JupyterLab server on your machine
+
Use a web browser to open a special localhost URL that connects to
+your JupyterLab server
+
The JupyterLab server does the work and the web browser renders the
+result
+
Type code into the browser and see the results after your JupyterLab
+server has finished executing your code
Experienced users of Jupyter notebooks interested in a more detailed
+discussion of the similarities and differences between the JupyterLab
+and Jupyter notebook user interfaces can find more information in the JupyterLab
+user interface documentation.
+
+
+
+
Starting JupyterLab
+
You can start the JupyterLab server through the command line or
+through an application called Anaconda Navigator. Anaconda
+Navigator is included as part of the Anaconda Python distribution.
+
+
macOS - Command Line
+
To start the JupyterLab server you will need to access the command
+line through the Terminal. There are two ways to open Terminal on
+Mac.
+
In your Applications folder, open Utilities and double-click on
+Terminal
+
Press Command + spacebar to launch Spotlight.
+Type Terminal and then double-click the search result or
+hit Enter
+
+
After you have launched Terminal, type the command to launch the
+JupyterLab server.
+
+
BASH
+
+
$ jupyter lab
+
+
+
+
Windows Users - Command Line
+
To start the JupyterLab server you will need to access the Anaconda
+Prompt.
+
Press Windows Logo Key and search for
+Anaconda Prompt, click the result or press enter.
+
After you have launched the Anaconda Prompt, type the command:
+
+
BASH
+
+
$ jupyter lab
+
+
+
+
Anaconda Navigator
+
To start a JupyterLab server from Anaconda Navigator you must first
+start
+Anaconda Navigator (click for detailed instructions on macOS, Windows,
+and Linux). You can search for Anaconda Navigator via Spotlight on
+macOS (Command + spacebar), the Windows search
+function (Windows Logo Key) or opening a terminal shell and
+executing the anaconda-navigator executable from the
+command line.
+
After you have launched Anaconda Navigator, click the
+Launch button under JupyterLab. You may need to scroll down
+to find it.
+
Here is a screenshot of an Anaconda Navigator page similar to the one
+that should open on either macOS or Windows.
+
+
+
And here is a screenshot of a JupyterLab landing page that should be
+similar to the one that opens in your default web browser after starting
+the JupyterLab server on either macOS or Windows.
+
+
+
+
The JupyterLab Interface
+
JupyterLab has many features found in traditional integrated
+development environments (IDEs) but is focused on providing flexible
+building blocks for interactive, exploratory computing.
+
The JupyterLab
+Interface consists of the Menu Bar, a collapsable Left Side Bar, and
+the Main Work Area which contains tabs of documents and activities.
+
+
Menu Bar
+
The Menu Bar at the top of JupyterLab has the top-level menus that
+expose various actions available in JupyterLab along with their keyboard
+shortcuts (where applicable). The following menus are included by
+default.
+
+File: Actions related to files and directories such
+as New, Open, Close, Save, etc. The
+File menu also includes the Shut Down action used to
+shutdown the JupyterLab server.
+
+Edit: Actions related to editing documents and
+other activities such as Undo, Cut, Copy,
+Paste, etc.
+
+View: Actions that alter the appearance of
+JupyterLab.
+
+Run: Actions for running code in different
+activities such as notebooks and code consoles (discussed below).
+
+Kernel: Actions for managing kernels. Kernels in
+Jupyter will be explained in more detail below.
+
+Tabs: A list of the open documents and activities
+in the main work area.
+
+Settings: Common JupyterLab settings can be
+configured using this menu. There is also an Advanced Settings
+Editor option in the dropdown menu that provides more fine-grained
+control of JupyterLab settings and configuration options.
+
+Help: A list of JupyterLab and kernel help
+links.
+
+
+
+
+
+
Kernels
+
+
The JupyterLab docs
+define kernels as “separate processes started by the server that runs
+your code in different programming languages and environments.” When we
+open a Jupyter Notebook, that starts a kernel - a process - that is
+going to run the code. In this lesson, we’ll be using the Jupyter
+ipython kernel which lets us run Python 3 code interactively.
+
Using other Jupyter kernels
+for other programming languages would let us write and execute code
+in other programming languages in the same JupyterLab interface, like R,
+Java, Julia, Ruby, JavaScript, Fortran, etc.
+
+
+
+
A screenshot of the default Menu Bar is provided below.
+
+
+
+
+
Left Sidebar
+
The left sidebar contains a number of commonly used tabs, such as a
+file browser (showing the contents of the directory where the JupyterLab
+server was launched), a list of running kernels and terminals, the
+command palette, and a list of open tabs in the main work area. A
+screenshot of the default Left Side Bar is provided below.
+
+
+
The left sidebar can be collapsed or expanded by selecting “Show Left
+Sidebar” in the View menu or by clicking on the active sidebar tab.
+
+
+
Main Work Area
+
The main work area in JupyterLab enables you to arrange documents
+(notebooks, text files, etc.) and other activities (terminals, code
+consoles, etc.) into panels of tabs that can be resized or subdivided. A
+screenshot of the default Main Work Area is provided below.
+
If you do not see the Launcher tab, click the blue plus sign under
+the “File” and “Edit” menus and it will appear.
+
+
+
Drag a tab to the center of a tab panel to move the tab to the panel.
+Subdivide a tab panel by dragging a tab to the left, right, top, or
+bottom of the panel. The work area has a single current activity. The
+tab for the current activity is marked with a colored top border (blue
+by default).
+
+
Creating a Python script
+
To start writing a new Python program click the Text File icon under
+the Other header in the Launcher tab of the Main Work Area.
+
You can also create a new plain text file by selecting the New
+-> Text File from the File menu in the Menu Bar.
+
+
To convert this plain text file to a Python program, select the
+Save File As action from the File menu in the Menu Bar
+and give your new text file a name that ends with the .py
+extension.
+
The .py extension lets everyone (including the
+operating system) know that this text file is a Python program.
+
This is convention, not a requirement.
+
+
Creating a Jupyter Notebook
+
To open a new notebook click the Python 3 icon under the
+Notebook header in the Launcher tab in the main work area. You
+can also create a new notebook by selecting New -> Notebook
+from the File menu in the Menu Bar.
+
Additional notes on Jupyter notebooks.
+
Notebook files have the extension .ipynb to distinguish
+them from plain-text Python programs.
+
Notebooks can be exported as Python scripts that can be run from the
+command line.
+
Below is a screenshot of a Jupyter notebook running inside
+JupyterLab. If you are interested in more details, then see the official
+notebook documentation.
+
+
+
+
+
+
+
+
How It’s Stored
+
+
The notebook file is stored in a format called JSON.
+
Just like a webpage, what’s saved looks different from what you see
+in your browser.
+
But this format allows Jupyter to mix source code, text, and images,
+all in one file.
+
+
+
+
+
+
+
+
+
Arranging Documents into Panels of Tabs
+
+
In the JupyterLab Main Work Area you can arrange documents into
+panels of tabs. Here is an example from the official
+documentation.
+
+
+
First, create a text file, Python console, and terminal window and
+arrange them into three panels in the main work area. Next, create a
+notebook, terminal window, and text file and arrange them into three
+panels in the main work area. Finally, create your own combination of
+panels and tabs. What combination of panels and tabs do you think will
+be most useful for your workflow?
+
+
+
+
+
+
+
+
+
After creating the necessary tabs, you can drag one of the tabs to
+the center of a panel to move the tab to the panel; next you can
+subdivide a tab panel by dragging a tab to the left, right, top, or
+bottom of the panel.
+
+
+
+
+
+
+
+
+
+
Code vs. Text
+
+
Jupyter mixes code and text in different types of blocks, called
+cells. We often use the term “code” to mean “the source code of software
+written in a language such as Python”. A “code cell” in a Notebook is a
+cell that contains software; a “text cell” is one that contains ordinary
+prose written for human beings.
+
+
+
+
The Notebook has Command and Edit modes.
+
If you press Esc and Return alternately, the
+outer border of your code cell will change from gray to blue.
+
These are the Command (gray) and
+Edit (blue) modes of your notebook.
+
Command mode allows you to edit notebook-level features, and Edit
+mode changes the content of cells.
+
When in Command mode (esc/gray),
+
The b key will make a new cell below the currently
+selected cell.
+
The a key will make one above.
+
The x key will delete the current cell.
+
The z key will undo your last cell operation (which could
+be a deletion, creation, etc).
+
+
All actions can be done using the menus, but there are lots of
+keyboard shortcuts to speed things up.
+
+
+
+
+
+
Command Vs. Edit
+
+
In the Jupyter notebook page are you currently in Command or Edit
+mode?
+Switch between the modes. Use the shortcuts to generate a new cell. Use
+the shortcuts to delete a cell. Use the shortcuts to undo the last cell
+operation you performed.
+
+
+
+
+
+
+
+
+
Command mode has a grey border and Edit mode has a blue border. Use
+Esc and Return to switch between modes. You need
+to be in Command mode (Press Esc if your cell is blue). Type
+b or a. You need to be in Command mode (Press
+Esc if your cell is blue). Type x. You need to be
+in Command mode (Press Esc if your cell is blue). Type
+z.
+
+
+
+
+
+
Use the keyboard and mouse to select and edit cells.
+
Pressing the Return key turns the border blue and engages
+Edit mode, which allows you to type within the cell.
+
Because we want to be able to write many lines of code in a single
+cell, pressing the Return key when in Edit mode (blue) moves
+the cursor to the next line in the cell just like in a text editor.
+
We need some other way to tell the Notebook we want to run what’s in
+the cell.
+
Pressing Shift+Return together will execute
+the contents of the cell.
+
Notice that the Return and Shift keys on the
+right of the keyboard are right next to each other.
+
+
+
The Notebook will turn Markdown into pretty-printed
+documentation.
Create a nested list in a Markdown cell in a notebook that looks like
+this:
+
Get funding.
+
Do work.
+
Design experiment.
+
Collect data.
+
Analyze.
+
Write up.
+
Publish.
+
+
+
+
+
+
+
+
+
This challenge integrates both the numbered list and bullet list.
+Note that the bullet list is indented 2 spaces so that it is inline with
+the items of the numbered list.
What is displayed when a Python cell in a notebook that contains
+several calculations is executed? For example, what happens when this
+cell is executed?
+
+
PYTHON
+
+
7*3
+2+1
+
+
+
+
+
+
+
+
+
+
Python returns the output of the last calculation.
+
+
PYTHON
+
+
3
+
+
+
+
+
+
+
+
+
+
+
Change an Existing Cell from Code to Markdown
+
+
What happens if you write some Python in a code cell and then you
+switch it to a Markdown cell? For example, put the following in a code
+cell:
+
+
PYTHON
+
+
x =6*7+12
+print(x)
+
+
And then run it with Shift+Return to be sure
+that it works as a code cell. Now go back to the cell and use
+Esc then m to switch the cell to Markdown and
+“run” it with Shift+Return. What happened and how
+might this be useful?
+
+
+
+
+
+
+
+
+
The Python code gets treated like Markdown text. The lines appear as
+if they are part of one contiguous paragraph. This could be useful to
+temporarily turn on and off cells in notebooks that get used for
+multiple purposes.
+
+
PYTHON
+
+
x =6*7+12print(x)
+
+
+
+
+
+
+
+
+
+
+
Equations
+
+
Standard Markdown (such as we’re using for these notes) won’t render
+equations, but the Notebook will. Create a new Markdown cell and enter
+the following:
+
$\sum_{i=1}^{N} 2^{-i} \approx 1$
+
(It’s probably easier to copy and paste.) What does it display? What
+do you think the underscore, _, circumflex, ^,
+and dollar sign, $, do?
+
+
+
+
+
+
+
+
+
The notebook shows the equation as it would be rendered from LaTeX
+equation syntax. The dollar sign, $, is used to tell
+Markdown that the text in between is a LaTeX equation. If you’re not
+familiar with LaTeX, underscore, _, is used for subscripts
+and circumflex, ^, is used for superscripts. A pair of
+curly braces, { and }, is used to group text
+together so that the statement i=1 becomes the subscript
+and N becomes the superscript. Similarly, -i
+is in curly braces to make the whole statement the superscript for
+2. \sum and \approx are LaTeX
+commands for “sum over” and “approximate” symbols.
+
+
+
+
+
+
Closing JupyterLab
+
From the Menu Bar select the “File” menu and then choose “Shut Down”
+at the bottom of the dropdown menu. You will be prompted to confirm that
+you wish to shutdown the JupyterLab server (don’t forget to save your
+work!). Click “Shut Down” to shutdown the JupyterLab server.
+
To restart the JupyterLab server you will need to re-run the
+following command from a shell.
+
$ jupyter lab
+
+
+
+
+
+
Closing JupyterLab
+
+
Practice closing and restarting the JupyterLab server.
+
+
+
+
+
+
+
+
+
Key Points
+
+
Python scripts are plain text files.
+
Use the Jupyter Notebook for editing and running Python.
+
The Notebook has Command and Edit modes.
+
Use the keyboard and mouse to select and edit cells.
+
The Notebook will turn Markdown into pretty-printed
+documentation.
+
+
diff --git a/instructor/02-variables.html b/instructor/02-variables.html
new file mode 100644
index 000000000..4df30a26d
--- /dev/null
+++ b/instructor/02-variables.html
@@ -0,0 +1,1082 @@
+
+Plotting and Programming in Python: Variables and Assignment
+ Skip to main content
+
Write programs that assign scalar values to variables and perform
+calculations with those values.
+
Correctly trace value changes in programs that use scalar
+assignment.
+
+
+
+
+
+
Use variables to store values.
+
Variables are names for values.
+
+
Variable names
+
can only contain letters, digits, and underscore
+_ (typically used to separate words in long variable
+names)
+
cannot start with a digit
+
are case sensitive (age, Age and AGE are three
+different variables)
+
+
The name should also be meaningful so you or another programmer
+know what it is
+
Variable names that start with underscores like
+__alistairs_real_age have a special meaning so we won’t do
+that until we understand the convention.
+
In Python the = symbol assigns the value on the
+right to the name on the left.
+
The variable is created when a value is assigned to it.
+
+
Here, Python assigns an age to a variable age and a
+name in quotes to a variable first_name.
+
+
PYTHON
+
+
age =42
+first_name ='Ahmed'
+
+
+
Use print to display values.
+
Python has a built-in function called print that prints
+things as text.
+
Call the function (i.e., tell Python to run it) by using its
+name.
+
Provide values to the function (i.e., the things to print) in
+parentheses.
+
To add a string to the printout, wrap the string in single or double
+quotes.
+
The values passed to the function are called
+arguments
+
+
+
PYTHON
+
+
print(first_name, 'is', age, 'years old')
+
+
+
OUTPUT
+
+
Ahmed is 42 years old
+
+
+print automatically puts a single space between items
+to separate them.
+
And wraps around to a new line at the end.
+
Variables must be created before they are used.
+
If a variable doesn’t exist yet, or if the name has been
+mis-spelled, Python reports an error. (Unlike some languages, which
+“guess” a default value.)
+
+
PYTHON
+
+
print(last_name)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-1-c1fbb4e96102> in <module>()
+----> 1 print(last_name)
+
+NameError: name 'last_name' is not defined
+
+
The last line of an error message is usually the most
+informative.
Be aware that it is the order of execution of cells that is
+important in a Jupyter notebook, not the order in which they appear.
+Python will remember all the code that was run previously,
+including any variables you have defined, irrespective of the order in
+the notebook. Therefore if you define variables lower down the notebook
+and then (re)run cells further up, those defined further down will still
+be present. As an example, create two cells with the following content,
+in this order:
+
+
PYTHON
+
+
print(myval)
+
+
+
PYTHON
+
+
myval =1
+
+
If you execute this in order, the first cell will give an error.
+However, if you run the first cell after the second cell it
+will print out 1. To prevent confusion, it can be helpful
+to use the Kernel -> Restart & Run All
+option which clears the interpreter and runs everything from a clean
+slate going top to bottom.
+
+
+
+
Variables can be used in calculations.
+
We can use variables in calculations just as if they were values.
+
Remember, we assigned the value 42 to age
+a few lines ago.
+
+
+
PYTHON
+
+
age = age +3
+print('Age in three years:', age)
+
+
+
OUTPUT
+
+
Age in three years: 45
+
+
Use an index to get a single character from a string.
+
The characters (individual letters, numbers, and so on) in a string
+are ordered. For example, the string 'AB' is not the same
+as 'BA'. Because of this ordering, we can treat the string
+as a list of characters.
+
Each position in the string (first, second, etc.) is given a number.
+This number is called an index or sometimes a
+subscript.
+
Indices are numbered from 0.
+
Use the position’s index in square brackets to get the character at
+that position.
+
+
PYTHON
+
+
atom_name ='helium'
+print(atom_name[0])
+
+
+
OUTPUT
+
+
h
+
+
Use a slice to get a substring.
+
A part of a string is called a substring. A
+substring can be as short as a single character.
+
An item in a list is called an element. Whenever we treat a string
+as if it were a list, the string’s elements are its individual
+characters.
+
A slice is a part of a string (or, more generally, a part of any
+list-like thing).
+
We take a slice with the notation [start:stop], where
+start is the integer index of the first element we want and
+stop is the integer index of the element just
+after the last element we want.
+
The difference between stop and start is
+the slice’s length.
+
Taking a slice does not change the contents of the original string.
+Instead, taking a slice returns a copy of part of the original
+string.
+
+
PYTHON
+
+
atom_name ='sodium'
+print(atom_name[0:3])
+
+
+
OUTPUT
+
+
sod
+
+
Use the built-in function len to find the length of a
+string.
+
+
PYTHON
+
+
print(len('helium'))
+
+
+
OUTPUT
+
+
6
+
+
Nested functions are evaluated from the inside out, like in
+mathematics.
+
Python is case-sensitive.
+
Python thinks that upper- and lower-case letters are different, so
+Name and name are different variables.
+
There are conventions for using upper-case letters at the start of
+variable names so we will use lower-case letters for now.
+
Use meaningful variable names.
+
Python doesn’t care what you call variables as long as they obey the
+rules (alphanumeric characters and the underscore).
Use meaningful variable names to help other people understand what
+the program does.
+
The most important “other person” is your future self.
+
+
+
+
+
+
Swapping Values
+
+
Fill the table showing the values of the variables in this program
+after each statement is executed.
+
+
PYTHON
+
+
# Command # Value of x # Value of y # Value of swap #
+x =1.0# # # #
+y =3.0# # # #
+swap = x # # # #
+x = y # # # #
+y = swap # # # #
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
# Command # Value of x # Value of y # Value of swap #
+x=1.0# 1.0 # not defined # not defined #
+y=3.0# 1.0 # 3.0 # not defined #
+swap=x# 1.0 # 3.0 # 1.0 #
+x=y# 3.0 # 3.0 # 1.0 #
+y=swap# 3.0 # 1.0 # 1.0 #
+
+
These three lines exchange the values in x and
+y using the swap variable for temporary
+storage. This is a fairly common programming idiom.
+
+
+
+
+
+
+
+
+
+
Predicting Values
+
+
What is the final value of position in the program
+below? (Try to predict the value without running the program, then check
+your prediction.)
The initial variable is assigned the value
+'left'. In the second line, the position
+variable also receives the string value 'left'. In third
+line, the initial variable is given the value
+'right', but the position variable retains its
+string value of 'left'.
+
+
+
+
+
+
+
+
+
+
Challenge
+
+
If you assign a = 123, what happens if you try to get
+the second digit of a via a[1]?
+
+
+
+
+
+
+
+
+
Numbers are not strings or sequences and Python will raise an error
+if you try to perform an index operation on a number. In the next lesson on types and type
+conversion we will learn more about types and how to convert between
+different types. If you want the Nth digit of a number you can convert
+it into a string using the str built-in function and then
+perform an index operation on that string.
+
+
PYTHON
+
+
a =123
+print(a[1])
+
+
+
ERROR
+
+
TypeError: 'int' object is not subscriptable
+
+
+
PYTHON
+
+
a =str(123)
+print(a[1])
+
+
+
OUTPUT
+
+
2
+
+
+
+
+
+
+
+
+
+
+
Choosing a Name
+
+
Which is a better variable name, m, min, or
+minutes? Why? Hint: think about which code you would rather
+inherit from someone who is leaving the lab:
+
ts = m * 60 + s
+
tot_sec = min * 60 + sec
+
total_seconds = minutes * 60 + seconds
+
+
+
+
+
+
+
+
+
minutes is better because min might mean
+something like “minimum” (and actually is an existing built-in function
+in Python that we will cover later).
+species_name[11:] (without a value after the
+colon)
+
+species_name[:4] (without a value before the
+colon)
+
+species_name[:] (just a colon)
+
species_name[11:-3]
+
species_name[-5:-3]
+
What happens when you choose a stop value which is out
+of range? (i.e., try species_name[0:20] or
+species_name[:103])
+
+
+
+
+
+
+
+
+
+species_name[2:8] returns the substring
+'acia b'
+
+
+species_name[11:] returns the substring
+'folia', from position 11 until the end
+
+species_name[:4] returns the substring
+'Acac', from the start up to but not including position
+4
+
+species_name[:] returns the entire string
+'Acacia buxifolia'
+
+
+species_name[11:-3] returns the substring
+'fo', from the 11th position to the third last
+position
+
+species_name[-5:-3] also returns the substring
+'fo', from the fifth last position to the third last
+
If a part of the slice is out of range, the operation does not fail.
+species_name[0:20] gives the same result as
+species_name[0:], and species_name[:103] gives
+the same result as species_name[:]
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
Use variables to store values.
+
Use print to display values.
+
Variables persist between cells.
+
Variables must be created before they are used.
+
Variables can be used in calculations.
+
Use an index to get a single character from a string.
+
Use a slice to get a substring.
+
Use the built-in function len to find the length of a
+string.
+
+
diff --git a/instructor/03-types-conversion.html b/instructor/03-types-conversion.html
new file mode 100644
index 000000000..f0327de72
--- /dev/null
+++ b/instructor/03-types-conversion.html
@@ -0,0 +1,1160 @@
+
+Plotting and Programming in Python: Data Types and Type Conversion
+ Skip to main content
+
Explain key differences between integers and floating point
+numbers.
+
Explain key differences between numbers and character strings.
+
Use built-in functions to convert between integers, floating point
+numbers, and strings.
+
+
+
+
+
+
Every value has a type.
+
Every value in a program has a specific type.
+
Integer (int): represents positive or negative whole
+numbers like 3 or -512.
+
Floating point number (float): represents real numbers
+like 3.14159 or -2.5.
+
Character string (usually called “string”, str): text.
+
Written in either single quotes or double quotes (as long as they
+match).
+
The quote marks aren’t printed when the string is displayed.
+
+
Use the built-in function type to find the type of a
+value.
+
Use the built-in function type to find out what type a
+value has.
+
Works on variables as well.
+
But remember: the value has the type — the
+variable is just a label.
+
+
+
PYTHON
+
+
print(type(52))
+
+
+
OUTPUT
+
+
<class 'int'>
+
+
+
PYTHON
+
+
fitness ='average'
+print(type(fitness))
+
+
+
OUTPUT
+
+
<class 'str'>
+
+
Types control what operations (or methods) can be performed on a
+given value.
+
A value’s type determines what the program can do to it.
+
+
PYTHON
+
+
print(5-3)
+
+
+
OUTPUT
+
+
2
+
+
+
PYTHON
+
+
print('hello'-'h')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-2-67f5626a1e07> in <module>()
+----> 1 print('hello' - 'h')
+
+TypeError: unsupported operand type(s) for -: 'str' and 'str'
+
+
You can use the “+” and “*” operators on strings.
+
“Adding” character strings concatenates them.
+
+
PYTHON
+
+
full_name ='Ahmed'+' '+'Walsh'
+print(full_name)
+
+
+
OUTPUT
+
+
Ahmed Walsh
+
+
Multiplying a character string by an integer N creates a
+new string that consists of that character string repeated N
+times.
+
Since multiplication is repeated addition.
+
+
+
PYTHON
+
+
separator ='='*10
+print(separator)
+
+
+
OUTPUT
+
+
==========
+
+
Strings have a length (but numbers don’t).
+
The built-in function len counts the number of
+characters in a string.
+
+
PYTHON
+
+
print(len(full_name))
+
+
+
OUTPUT
+
+
11
+
+
But numbers don’t have a length (not even zero).
+
+
PYTHON
+
+
print(len(52))
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-3-f769e8e8097d> in <module>()
+----> 1 print(len(52))
+
+TypeError: object of type 'int' has no len()
+
+
Must convert numbers to strings or vice versa when operating on
+them.
+
Cannot add numbers and strings.
+
+
PYTHON
+
+
print(1+'2')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-4-fe4f54a023c6> in <module>()
+----> 1 print(1 + '2')
+
+TypeError: unsupported operand type(s) for +: 'int' and 'str'
+
+
Not allowed because it’s ambiguous: should 1 + '2' be
+3 or '12'?
+
Some types can be converted to other types by using the type name as
+a function.
+
+
PYTHON
+
+
print(1+int('2'))
+print(str(1) +'2')
+
+
+
OUTPUT
+
+
3
+12
+
+
Can mix integers and floats freely in operations.
+
Integers and floating-point numbers can be mixed in arithmetic.
+
Python 3 automatically converts integers to floats as needed.
The computer reads the value of variable_one when doing
+the multiplication, creates a new value, and assigns it to
+variable_two.
+
Afterwards, the value of variable_two is set to the new
+value and not dependent on variable_one so its
+value does not automatically change when variable_one
+changes.
+
+
+
+
+
+
Fractions
+
+
What type of value is 3.4? How can you find out?
+
+
+
+
+
+
+
+
+
It is a floating-point number (often abbreviated “float”). It is
+possible to find out by using the built-in function
+type().
+
+
PYTHON
+
+
print(type(3.4))
+
+
+
OUTPUT
+
+
<class 'float'>
+
+
+
+
+
+
+
+
+
+
+
Automatic Type Conversion
+
+
What type of value is 3.25 + 4?
+
+
+
+
+
+
+
+
+
It is a float: integers are automatically converted to floats as
+necessary.
+
+
PYTHON
+
+
result =3.25+4
+print(result, 'is', type(result))
+
+
+
OUTPUT
+
+
7.25 is <class 'float'>
+
+
+
+
+
+
+
+
+
+
+
Choose a Type
+
+
What type of value (integer, floating point number, or character
+string) would you use to represent each of the following? Try to come up
+with more than one good answer for each problem. For example, in # 1,
+when would counting days with a floating point variable make more sense
+than using an integer?
+
Number of days since the start of the year.
+
Time elapsed from the start of the year until now in days.
+
Serial number of a piece of lab equipment.
+
A lab specimen’s age
+
Current population of a city.
+
Average population of a city over time.
+
+
+
+
+
+
+
+
+
The answers to the questions are:
+
Integer, since the number of days would lie between 1 and 365.
+
Floating point, since fractional days are required
+
Character string if serial number contains letters and numbers,
+otherwise integer if the serial number consists only of numerals
+
This will vary! How do you define a specimen’s age? whole days since
+collection (integer)? date and time (string)?
+
Choose floating point to represent population as large aggregates
+(eg millions), or integer to represent population in units of
+individuals.
+
Floating point number, since an average is likely to have a
+fractional part.
+
+
+
+
+
+
+
+
+
+
Division Types
+
+
In Python 3, the // operator performs integer
+(whole-number) floor division, the / operator performs
+floating-point division, and the % (or modulo)
+operator calculates and returns the remainder from integer division:
If num_subjects is the number of subjects taking part in
+a study, and num_per_survey is the number that can take
+part in a single survey, write an expression that calculates the number
+of surveys needed to reach everyone once.
+
+
+
+
+
+
+
+
+
We want the minimum number of surveys that reaches everyone once,
+which is the rounded up value of
+num_subjects/ num_per_survey. This is equivalent to
+performing a floor division with // and adding 1. Before
+the division we need to subtract 1 from the number of subjects to deal
+with the case where num_subjects is evenly divisible by
+num_per_survey.
Where reasonable, float() will convert a string to a
+floating point number, and int() will convert a floating
+point number to an integer:
+
+
PYTHON
+
+
print("string to float:", float("3.4"))
+print("float to int:", int(3.4))
+
+
+
OUTPUT
+
+
string to float: 3.4
+float to int: 3
+
+
If the conversion doesn’t make sense, however, an error message will
+occur.
+
+
PYTHON
+
+
print("string to float:", float("Hello world!"))
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-5-df3b790bf0a2> in <module>
+----> 1 print("string to float:", float("Hello world!"))
+
+ValueError: could not convert string to float: 'Hello world!'
+
+
Given this information, what do you expect the following program to
+do?
+
What does it actually do?
+
Why do you think it does that?
+
+
PYTHON
+
+
print("fractional string to int:", int("3.4"))
+
+
+
+
+
+
+
+
+
+
What do you expect this program to do? It would not be so
+unreasonable to expect the Python 3 int command to convert
+the string “3.4” to 3.4 and an additional type conversion to 3. After
+all, Python 3 performs a lot of other magic - isn’t that part of its
+charm?
+
+
PYTHON
+
+
int("3.4")
+
+
+
OUTPUT
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-2-ec6729dfccdc> in <module>
+----> 1 int("3.4")
+ValueError: invalid literal for int() with base 10: '3.4'
+
+
However, Python 3 throws an error. Why? To be consistent, possibly.
+If you ask Python to perform two consecutive typecasts, you must convert
+it explicitly in code.
+
+
PYTHON
+
+
int(float("3.4"))
+
+
+
OUTPUT
+
+
3
+
+
+
+
+
+
+
+
+
+
+
Arithmetic with Different Types
+
+
Which of the following will return the floating point number
+2.0? Note: there may be more than one right answer.
+
+
PYTHON
+
+
first =1.0
+second ="1"
+third ="1.1"
+
+
first + float(second)
+
float(second) + float(third)
+
first + int(third)
+
first + int(float(third))
+
int(first) + int(float(third))
+
2.0 * second
+
+
+
+
+
+
+
+
+
Answer: 1 and 4
+
+
+
+
+
+
+
+
+
+
Complex Numbers
+
+
Python provides complex numbers, which are written as
+1.0+2.0j. If val is a complex number, its real
+and imaginary parts can be accessed using dot notation as
+val.real and val.imag.
Why do you think Python uses j instead of
+i for the imaginary part?
+
What do you expect 1 + 2j + 3 to produce?
+
What do you expect 4j to be? What about
+4 j or 4 + j?
+
+
+
+
+
+
+
+
+
Standard mathematics treatments typically use i to
+denote an imaginary number. However, from media reports it was an early
+convention established from electrical engineering that now presents a
+technically expensive area to change. Stack
+Overflow provides additional explanation and discussion.
+
+
(4+2j)
+
+4j and Syntax Error: invalid syntax. In
+the latter cases, j is considered a variable and the
+statement depends on if j is defined and if so, its
+assigned value.
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
Every value has a type.
+
Use the built-in function type to find the type of a
+value.
+
Types control what operations can be done on values.
+
Strings can be added and multiplied.
+
Strings have a length (but numbers don’t).
+
Must convert numbers to strings or vice versa when operating on
+them.
+
Can mix integers and floats freely in operations.
+
Variables only change value when something is assigned to them.
+
+
diff --git a/instructor/04-built-in.html b/instructor/04-built-in.html
new file mode 100644
index 000000000..74c0dea2c
--- /dev/null
+++ b/instructor/04-built-in.html
@@ -0,0 +1,1064 @@
+
+Plotting and Programming in Python: Built-in Functions and Help
+ Skip to main content
+
Use help to display documentation for built-in functions.
+
Correctly describe situations in which SyntaxError and NameError
+occur.
+
+
+
+
+
+
Use comments to add documentation to programs.
+
+
PYTHON
+
+
# This sentence isn't executed by Python.
+adjustment =0.5# Neither is this - anything after '#' is ignored.
+
+
A function may take zero or more arguments.
+
We have seen some functions already — now let’s take a closer
+look.
+
An argument is a value passed into a function.
+
+len takes exactly one.
+
+int, str, and float create a
+new value from an existing one.
+
+print takes zero or more.
+
+print with no arguments prints a blank line.
+
Must always use parentheses, even if they’re empty, so that Python
+knows a function is being called.
+
+
+
PYTHON
+
+
print('before')
+print()
+print('after')
+
+
+
OUTPUT
+
+
before
+
+after
+
+
Every function returns something.
+
Every function call produces some result.
+
If the function doesn’t have a useful result to return, it usually
+returns the special value None. None is a
+Python object that stands in anytime there is no value.
+
+
PYTHON
+
+
result =print('example')
+print('result of print is', result)
+
+
+
OUTPUT
+
+
example
+result of print is None
+
+
Commonly-used built-in functions include max,
+min, and round.
+
Use max to find the largest value of one or more
+values.
+
Use min to find the smallest.
+
Both work on character strings as well as numbers.
+
“Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.
+
+
+
PYTHON
+
+
print(max(1, 2, 3))
+print(min('a', 'A', '0'))
+
+
+
OUTPUT
+
+
3
+0
+
+
Functions may only work for certain (combinations of)
+arguments.
+
+max and min must be given at least one
+argument.
+
“Largest of the empty set” is a meaningless question.
+
+
And they must be given things that can meaningfully be
+compared.
+
+
PYTHON
+
+
print(max(1, 'a'))
+
+
+
ERROR
+
+
TypeError Traceback (most recent call last)
+<ipython-input-52-3f049acf3762> in <module>
+----> 1 print(max(1, 'a'))
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
Functions may have default values for some arguments.
+
+round will round off a floating-point number.
+
By default, rounds to zero decimal places.
+
+
PYTHON
+
+
round(3.712)
+
+
+
OUTPUT
+
+
4
+
+
We can specify the number of decimal places we want.
+
+
PYTHON
+
+
round(3.712, 1)
+
+
+
OUTPUT
+
+
3.7
+
+
Functions attached to objects are called methods
+
Functions take another form that will be common in the pandas
+episodes.
+
Methods have parentheses like functions, but come after the
+variable.
+
Some methods are used for internal Python operations, and are marked
+with double underlines.
+
+
PYTHON
+
+
my_string ='Hello world!'# creation of a string object
+
+print(len(my_string)) # the len function takes a string as an argument and returns the length of the string
+
+print(my_string.swapcase()) # calling the swapcase method on the my_string object
+
+print(my_string.__len__()) # calling the internal __len__ method on the my_string object, used by len(my_string)
+
+
+
OUTPUT
+
+
12
+hELLO WORLD!
+12
+
+
You might even see them chained together. They operate left to
+right.
+
+
PYTHON
+
+
print(my_string.isupper()) # Not all the letters are uppercase
+print(my_string.upper()) # This capitalizes all the letters
+
+print(my_string.upper().isupper()) # Now all the letters are uppercase
+
+
+
OUTPUT
+
+
False
+HELLO WORLD
+True
+
+
Use the built-in function help to get help for a
+function.
+
Every built-in function has online documentation.
+
+
PYTHON
+
+
help(round)
+
+
+
OUTPUT
+
+
Help on built-in function round in module builtins:
+
+round(number, ndigits=None)
+ Round a number to a given precision in decimal digits.
+
+ The return value is an integer if ndigits is omitted or None. Otherwise
+ the return value has the same type as the number. ndigits may be negative.
+
+
The Jupyter Notebook has two ways to get help.
+
Option 1: Place the cursor near where the function is invoked in a
+cell (i.e., the function name or its parameters),
+
Hold down Shift, and press Tab.
+
Do this several times to expand the information returned.
+
+
Option 2: Type the function name in a cell with a question mark
+after it. Then run the cell.
+
Python reports a syntax error when it can’t understand the source of
+a program.
+
Won’t even try to run the program if it can’t be parsed.
+
+
PYTHON
+
+
# Forgot to close the quote marks around the string.
+name ='Feng
+
+
+
ERROR
+
+
File "<ipython-input-56-f42768451d55>", line 2
+ name = 'Feng
+ ^
+SyntaxError: EOL while scanning string literal
+
+
+
PYTHON
+
+
# An extra '=' in the assignment.
+age ==52
+
+
+
ERROR
+
+
File "<ipython-input-57-ccc3df3cf902>", line 2
+ age = = 52
+ ^
+SyntaxError: invalid syntax
+
+
Look more closely at the error message:
+
+
PYTHON
+
+
print("hello world"
+
+
+
ERROR
+
+
File "<ipython-input-6-d1cc229bf815>", line 1
+ print ("hello world"
+ ^
+SyntaxError: unexpected EOF while parsing
+
+
The message indicates a problem on first line of the input (“line
+1”).
+
In this case the “ipython-input” section of the file name tells us
+that we are working with input into IPython, the Python interpreter used
+by the Jupyter Notebook.
+
+
The -6- part of the filename indicates that the error
+occurred in cell 6 of our Notebook.
+
Next is the problematic line of code, indicating the problem with a
+^ pointer.
+
Python reports a runtime error when something goes wrong while a
+program is executing.
+
+
PYTHON
+
+
age =53
+remaining =100- aege # mis-spelled 'age'
+
+
+
ERROR
+
+
NameError Traceback (most recent call last)
+<ipython-input-59-1214fb6c55fc> in <module>
+ 1 age = 53
+----> 2 remaining = 100 - aege # mis-spelled 'age'
+
+NameError: name 'aege' is not defined
+
+
Fix syntax errors by reading the source and runtime errors by
+tracing execution.
+
+
+
+
+
+
What Happens When
+
+
Explain in simple terms the order of operations in the following
+program: when does the addition happen, when does the subtraction
+happen, when is each function called, etc.
max(len(rich), poor) throws a TypeError. This turns into
+max(4, 'tin') and as we discussed earlier a string and
+integer cannot meaningfully be compared.
+
+
ERROR
+
+
TypeError Traceback (most recent call last)
+<ipython-input-65-bc82ad05177a> in <module>
+----> 1 max(len(rich), poor)
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
+
+
+
+
+
+
+
+
+
Why Not?
+
+
Why is it that max and min do not return
+None when they are called with no arguments?
+
+
+
+
+
+
+
+
+
max and min return TypeErrors in this case
+because the correct number of parameters was not supplied. If it just
+returned None, the error would be much harder to trace as
+it would likely be stored into a variable and used later in the program,
+only to likely throw a runtime error.
+
+
+
+
+
+
+
+
+
+
Last Character of a String
+
+
If Python starts counting from zero, and len returns the
+number of characters in a string, what index expression will get the
+last character in the string name? (Note: we will see a
+simpler way to do this in a later episode.)
+
+
+
+
+
+
+
+
+
name[len(name) - 1]
+
+
+
+
+
+
+
+
+
+
Explore the Python docs!
+
+
The official Python
+documentation is arguably the most complete source of information
+about the language. It is available in different languages and contains
+a lot of useful resources. The Built-in
+Functions page contains a catalogue of all of these functions,
+including the ones that we’ve covered in this lesson. Some of these are
+more advanced and unnecessary at the moment, but others are very simple
+and useful.
+
+
+
+
+
+
+
+
+
Key Points
+
+
Use comments to add documentation to programs.
+
A function may take zero or more arguments.
+
Commonly-used built-in functions include max,
+min, and round.
+
Functions may only work for certain (combinations of)
+arguments.
+
Functions may have default values for some arguments.
+
Use the built-in function help to get help for a
+function.
+
The Jupyter Notebook has two ways to get help.
+
Every function returns something.
+
Python reports a syntax error when it can’t understand the source of
+a program.
+
Python reports a runtime error when something goes wrong while a
+program is executing.
+
Fix syntax errors by reading the source code, and runtime errors by
+tracing the program’s execution.
How can I use software that other people have written?
+
How can I find out what that software does?
+
+
+
+
+
+
+
Objectives
+
Explain what software libraries are and why programmers create and
+use them.
+
Write programs that import and use modules from Python’s standard
+library.
+
Find and read documentation for the standard library interactively
+(in the interpreter) and online.
+
+
+
+
+
+
Most of the power of a programming language is in its
+libraries.
+
A library is a collection of files (called
+modules) that contains functions for use by other programs.
+
May also contain data values (e.g., numerical constants) and other
+things.
+
Library’s contents are supposed to be related, but there’s no way to
+enforce that.
+
+
The Python standard
+library is an extensive suite of modules that comes with Python
+itself.
+
Many additional libraries are available from PyPI (the Python Package
+Index).
+
We will see later how to write new libraries.
+
+
+
+
+
+
Libraries and modules
+
+
A library is a collection of modules, but the terms are often used
+interchangeably, especially since many libraries only consist of a
+single module, so don’t worry if you mix them.
+
+
+
+
A program must import a library module before using it.
+
Use import to load a library module into a program’s
+memory.
+
Then refer to things from the module as
+module_name.thing_name.
+
Python uses . to mean “part of”.
+
+
Using math, one of the modules in the standard
+library:
+
+
PYTHON
+
+
import math
+
+print('pi is', math.pi)
+print('cos(pi) is', math.cos(math.pi))
+
+
+
OUTPUT
+
+
pi is 3.141592653589793
+cos(pi) is -1.0
+
+
Have to refer to each item with the module’s name.
+
+math.cos(pi) won’t work: the reference to
+pi doesn’t somehow “inherit” the function’s reference to
+math.
+
+
Use help to learn about the contents of a library
+module.
+
Works just like help for a function.
+
+
PYTHON
+
+
help(math)
+
+
+
OUTPUT
+
+
Help on module math:
+
+NAME
+ math
+
+MODULE REFERENCE
+ http://docs.python.org/3/library/math
+
+ The following documentation is automatically generated from the Python
+ source files. It may be incomplete, incorrect or include features that
+ are considered implementation detail and may vary between Python
+ implementations. When in doubt, consult the module reference at the
+ location listed above.
+
+DESCRIPTION
+ This module is always available. It provides access to the
+ mathematical functions defined by the C standard.
+
+FUNCTIONS
+ acos(x, /)
+ Return the arc cosine (measured in radians) of x.
+⋮ ⋮ ⋮
+
+
Import specific items from a library module to shorten
+programs.
+
Use from ... import ... to load only specific items
+from a library module.
+
Then refer to them directly without library name as prefix.
+
+
PYTHON
+
+
from math import cos, pi
+
+print('cos(pi) is', cos(pi))
+
+
+
OUTPUT
+
+
cos(pi) is -1.0
+
+
Create an alias for a library module when importing it to shorten
+programs.
+
Use import ... as ... to give a library a short
+alias while importing it.
+
Then refer to items in the library using that shortened name.
+
+
PYTHON
+
+
import math as m
+
+print('cos(pi) is', m.cos(m.pi))
+
+
+
OUTPUT
+
+
cos(pi) is -1.0
+
+
Commonly used for libraries that are frequently used or have long
+names.
+
E.g., the matplotlib plotting library is often aliased
+as mpl.
+
+
But can make programs harder to understand, since readers must learn
+your program’s aliases.
+
+
+
+
+
+
Exploring the Math Module
+
+
What function from the math module can you use to
+calculate a square root without using sqrt?
+
Since the library contains this function, why does sqrt
+exist?
+
+
+
+
+
+
+
+
+
Using help(math) we see that we’ve got
+pow(x,y) in addition to sqrt(x), so we could
+use pow(x, 0.5) to find a square root.
+
The sqrt(x) function is arguably more readable than
+pow(x, 0.5) when implementing equations. Readability is a
+cornerstone of good programming, so it makes sense to provide a special
+function for this specific common case.
+
Also, the design of Python’s math library has its origin
+in the C standard, which includes both sqrt(x) and
+pow(x,y), so a little bit of the history of programming is
+showing in Python’s function names.
+
+
+
+
+
+
+
+
+
+
Locating the Right Module
+
+
You want to select a random character from a string:
The string has 11 characters, each having a positional index from 0
+to 10. You could use the random.randrange
+or random.randint
+functions to get a random integer between 0 and 10, and then select the
+bases character at that index:
+
+
PYTHON
+
+
from random import randrange
+
+random_index = randrange(len(bases))
+print(bases[random_index])
+
+
or more compactly:
+
+
PYTHON
+
+
from random import randrange
+
+print(bases[randrange(len(bases))])
+
+
Perhaps you found the random.sample
+function? It allows for slightly less typing but might be a bit harder
+to understand just by reading:
+
+
PYTHON
+
+
from random import sample
+
+print(sample(bases, 1)[0])
+
+
Note that this function returns a list of values. We will learn about
+lists in episode 11.
+
The simplest and shortest solution is the random.choice
+function that does exactly what we want:
+
+
PYTHON
+
+
from random import choice
+
+print(choice(bases))
+
+
+
+
+
+
+
+
+
+
+
Jigsaw Puzzle (Parson’s Problem) Programming Example
+
+
Rearrange the following statements so that a random DNA base is
+printed and its index in the string. Not all statements may be needed.
+Feel free to use/add intermediate variables.
+
+
PYTHON
+
+
bases="ACTTGCTTGAC"
+import math
+import random
+___ = random.randrange(n_bases)
+___ =len(bases)
+print("random base ", bases[___], "base index", ___)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import math
+import random
+bases ="ACTTGCTTGAC"
+n_bases =len(bases)
+idx = random.randrange(n_bases)
+print("random base", bases[idx], "base index", idx)
+
+
+
+
+
+
+
+
+
+
+
When Is Help Available?
+
+
When a colleague of yours types help(math), Python
+reports an error:
+
+
ERROR
+
+
NameError: name 'math' is not defined
+
+
What has your colleague forgotten to do?
+
+
+
+
+
+
+
+
+
Importing the math module (import math)
+
+
+
+
+
+
+
+
+
+
Importing With Aliases
+
+
Fill in the blanks so that the program below prints
+90.0.
+
Rewrite the program so that it uses import
+withoutas.
+
Which form do you find easier to read?
+
+
PYTHON
+
+
import math as m
+angle = ____.degrees(____.pi /2)
+print(____)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import math as m
+angle = m.degrees(m.pi /2)
+print(angle)
+
+
can be written as
+
+
PYTHON
+
+
import math
+angle = math.degrees(math.pi /2)
+print(angle)
+
+
Since you just wrote the code and are familiar with it, you might
+actually find the first version easier to read. But when trying to read
+a huge piece of code written by someone else, or when getting back to
+your own huge piece of code after several months, non-abbreviated names
+are often easier, except where there are clear abbreviation
+conventions.
+
+
+
+
+
+
+
+
+
+
There Are Many Ways To Import Libraries!
+
+
Match the following print statements with the appropriate library
+calls.
+
Print commands:
+
print("sin(pi/2) =", sin(pi/2))
+
print("sin(pi/2) =", m.sin(m.pi/2))
+
print("sin(pi/2) =", math.sin(math.pi/2))
+
Library calls:
+
from math import sin, pi
+
import math
+
import math as m
+
from math import *
+
+
+
+
+
+
+
+
+
Library calls 1 and 4. In order to directly refer to
+sin and pi without the library name as prefix,
+you need to use the from ... import ... statement. Whereas
+library call 1 specifically imports the two functions sin
+and pi, library call 4 imports all functions in the
+math module.
+
Library call 3. Here sin and pi are
+referred to with a shortened library name m instead of
+math. Library call 3 does exactly that using the
+import ... as ... syntax - it creates an alias for
+math in the form of the shortened name m.
+
Library call 2. Here sin and pi are
+referred to with the regular library name math, so the
+regular import ... call suffices.
+
Note: although library call 4 works, importing all
+names from a module using a wildcard import is not recommended as it makes it
+unclear which names from the module are used in the code. In general it
+is best to make your imports as specific as possible and to only import
+what your code uses. In library call 1, the import
+statement explicitly tells us that the sin function is
+imported from the math module, but library call 4 does not
+convey this information.
+
+
+
+
+
+
+
+
+
+
Importing Specific Items
+
+
Fill in the blanks so that the program below prints
+90.0.
+
Do you find this version easier to read than preceding ones?
+
Why wouldn’t programmers always use this form of
+import?
+
+
PYTHON
+
+
____ math import ____, ____
+angle = degrees(pi /2)
+print(angle)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
from math import degrees, pi
+angle = degrees(pi /2)
+print(angle)
+
+
Most likely you find this version easier to read since it’s less
+dense. The main reason not to use this form of import is to avoid name
+clashes. For instance, you wouldn’t import degrees this way
+if you also wanted to use the name degrees for a variable
+or function of your own. Or if you were to also import a function named
+degrees from another library.
+
+
+
+
+
+
+
+
+
+
Reading Error Messages
+
+
Read the code below and try to identify what the errors are without
+running it.
+
Run the code, and read the error message. What type of error is
+it?
+
+
PYTHON
+
+
from math import log
+log(0)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-1-d72e1d780bab> in <module>
+ 1 from math import log
+----> 2 log(0)
+
+ValueError: math domain error
+
+
The logarithm of x is only defined for
+x > 0, so 0 is outside the domain of the function.
+
You get an error of type ValueError, indicating that
+the function received an inappropriate argument value. The additional
+message “math domain error” makes it clearer what the problem is.
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
Most of the power of a programming language is in its
+libraries.
+
A program must import a library module in order to use it.
+
Use help to learn about the contents of a library
+module.
+
Import specific items from a library to shorten programs.
+
Create an alias for a library when importing it to shorten
+programs.
+
+
diff --git a/instructor/07-reading-tabular.html b/instructor/07-reading-tabular.html
new file mode 100644
index 000000000..67cdaee8a
--- /dev/null
+++ b/instructor/07-reading-tabular.html
@@ -0,0 +1,1083 @@
+
+Plotting and Programming in Python: Reading Tabular Data into DataFrames
+ Skip to main content
+
The columns in a dataframe are the observed variables, and the rows
+are the observations.
+
Pandas uses backslash \ to show wrapped lines when
+output is too wide to fit the screen.
+
Using descriptive dataframe names helps us distinguish between
+multiple dataframes so we won’t accidentally overwrite a dataframe or
+read from the wrong one.
+
+
+
+
+
+
File Not Found
+
+
Our lessons store their data files in a data
+sub-directory, which is why the path to the file is
+data/gapminder_gdp_oceania.csv. If you forget to include
+data/, or if you include it but your copy of the file is
+somewhere else, you will get a runtime
+error that ends with a line like this:
+
+
ERROR
+
+
FileNotFoundError: [Errno 2] No such file or directory: 'data/gapminder_gdp_oceania.csv'
+
+
+
+
+
Use index_col to specify that a column’s values should
+be used as row headings.
+
Row headings are numbers (0 and 1 in this case).
+
Really want to index by country.
+
Pass the name of the column to read_csv as its
+index_col parameter to do this.
+
Naming the dataframe data_oceania_country tells us
+which region the data includes (oceania) and how it is
+indexed (country).
Use DataFrame.describe() to get summary statistics
+about data.
+
DataFrame.describe() gets the summary statistics of only
+the columns that have numerical data. All other columns are ignored,
+unless you use the argument include='all'.
Not particularly useful with just two records, but very helpful when
+there are thousands.
+
+
+
+
+
+
Reading Other Data
+
+
Read the data in gapminder_gdp_americas.csv (which
+should be in the same directory as
+gapminder_gdp_oceania.csv) into a variable called
+data_americas and display its summary statistics.
+
+
+
+
+
+
+
+
+
To read in a CSV, we use pd.read_csv and pass the
+filename 'data/gapminder_gdp_americas.csv' to it. We also
+once again pass the column name 'country' to the parameter
+index_col in order to index by country. The summary
+statistics can be displayed with the DataFrame.describe()
+method.
After reading the data for the Americas, use
+help(data_americas.head) and
+help(data_americas.tail) to find out what
+DataFrame.head and DataFrame.tail do.
+
What method call will display the first three rows of this
+data?
+
What method call will display the last three columns of this data?
+(Hint: you may need to change your view of the data.)
+
+
+
+
+
+
+
+
+
We can check out the first five rows of data_americas
+by executing data_americas.head() which lets us view the
+beginning of the DataFrame. We can specify the number of rows we wish to
+see by specifying the parameter n in our call to
+data_americas.head(). To view the first three rows,
+execute:
To check out the last three rows of data_americas, we
+would use the command, americas.tail(n=3), analogous to
+head() used above. However, here we want to look at the
+last three columns so we need to change our view and then use
+tail(). To do so, we create a new DataFrame in which rows
+and columns are switched:
+
+
PYTHON
+
+
americas_flipped = data_americas.T
+
+
We can then view the last three columns of americas by
+viewing the last three rows of americas_flipped:
This shows the data that we want, but we may prefer to display three
+columns instead of three rows, so we can flip it back:
+
+
PYTHON
+
+
americas_flipped.tail(n=3).T
+
+
Note: we could have done the above in a single line
+of code by ‘chaining’ the commands:
+
+
PYTHON
+
+
data_americas.T.tail(n=3).T
+
+
+
+
+
+
+
+
+
+
+
Reading Files in Other Directories
+
+
The data for your current project is stored in a file called
+microbes.csv, which is located in a folder called
+field_data. You are doing analysis in a notebook called
+analysis.ipynb in a sibling folder called
+thesis:
What value(s) should you pass to read_csv to read
+microbes.csv in analysis.ipynb?
+
+
+
+
+
+
+
+
+
We need to specify the path to the file of interest in the call to
+pd.read_csv. We first need to ‘jump’ out of the folder
+thesis using ‘../’ and then into the folder
+field_data using ‘field_data/’. Then we can specify the
+filename `microbes.csv. The result is as follows:
As well as the read_csv function for reading data from a
+file, Pandas provides a to_csv function to write dataframes
+to files. Applying what you’ve learned about reading from files, write
+one of your dataframes to a file called processed.csv. You
+can use help to get information on how to use
+to_csv.
+
+
+
+
+
+
+
+
+
In order to write the DataFrame data_americas to a file
+called processed.csv, execute the following command:
+
+
PYTHON
+
+
data_americas.to_csv('processed.csv')
+
+
For help on read_csv or to_csv, you could
+execute, for example:
+
+
PYTHON
+
+
help(data_americas.to_csv)
+help(pd.read_csv)
+
+
Note that help(to_csv) or help(pd.to_csv)
+throws an error! This is due to the fact that to_csv is not
+a global Pandas function, but a member function of DataFrames. This
+means you can only call it on an instance of a DataFrame e.g.,
+data_americas.to_csv or
+data_oceania.to_csv
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
Use the Pandas library to get basic statistics out of tabular
+data.
+
Use index_col to specify that a column’s values should
+be used as row headings.
+
Use DataFrame.info to find out more about a
+dataframe.
+
The DataFrame.columns variable stores information about
+the dataframe’s columns.
+
Use DataFrame.T to transpose a dataframe.
+
Use DataFrame.describe to get summary statistics about
+data.
How can I do statistical analysis of tabular data?
+
+
+
+
+
+
+
Objectives
+
Select individual values from a Pandas dataframe.
+
Select entire rows or entire columns from a dataframe.
+
Select a subset of both rows and columns from a dataframe in a
+single operation.
+
Select a subset of a dataframe by a single Boolean criterion.
+
+
+
+
+
+
Note about Pandas DataFrames/Series
+
A DataFrame
+is a collection of Series;
+The DataFrame is the way Pandas represents a table, and Series is the
+data-structure Pandas use to represent a column.
+
Pandas is built on top of the Numpy library, which in practice means
+that most of the methods defined for Numpy Arrays apply to Pandas
+Series/DataFrames.
+
What makes Pandas so attractive is the powerful interface to access
+individual records of the table, proper handling of missing values, and
+relational-databases operations between DataFrames.
+
Selecting values
+
To access a value at the position [i,j] of a DataFrame,
+we have two options, depending on what is the meaning of i
+in use. Remember that a DataFrame provides an index as a way to
+identify the rows of the table; a row, then, has a position
+inside the table as well as a label, which uniquely identifies
+its entry in the DataFrame.
+
Use DataFrame.iloc[..., ...] to select values by their
+(entry) position
+
Can specify location by numerical index analogously to 2D version of
+character selection in strings.
+
+
PYTHON
+
+
import pandas as pd
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.iloc[0, 0])
+
+
+
OUTPUT
+
+
1601.056136
+
+
Use DataFrame.loc[..., ...] to select values by their
+(entry) label.
In the above code, we discover that slicing using
+loc is inclusive at both ends, which differs from
+slicing using iloc, where slicing
+indicates everything up to but not including the final index.
+
Result of slicing can be used in further operations.
+
Usually don’t just print a slice.
+
All the statistical operators that work on entire dataframes work
+the same way on slices.
Returns a similarly-shaped dataframe of True and
+False.
+
+
PYTHON
+
+
# Use a subset of data to keep output readable.
+subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
+print('Subset of data:\n', subset)
+
+# Which values were greater than 10000 ?
+print('\nWhere are values large?\n', subset >10000)
A frame full of Booleans is sometimes called a mask because
+of how it can be used.
+
+
PYTHON
+
+
mask = subset >10000
+print(subset[mask])
+
+
+
OUTPUT
+
+
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
+country
+Italy NaN 10022.40131 12269.27378
+Montenegro NaN NaN NaN
+Netherlands 12790.84956 15363.25136 18794.74567
+Norway 13450.40151 16361.87647 18965.05551
+Poland NaN NaN NaN
+
+
Get the value where the mask is true, and NaN (Not a Number) where
+it is false.
+
Useful because NaNs are ignored by operations like max, min,
+average, etc.
Learners often struggle here, many may not work with financial data
+and concepts so they find the example concepts difficult to get their
+head around. The biggest problem though is the line generating the
+wealth_score, this step needs to be talked through throughly: * It uses
+implicit conversion between boolean and float values which has not been
+covered in the course so far. * The axis=1 argument needs to be
+explained clearly.
+
+
+
+
+
Pandas vectorizing methods and grouping operations are features that
+provide users much flexibility to analyse their data.
+
For instance, let’s say we want to have a clearer view on how the
+European countries split themselves according to their GDP.
+
We may have a glance by splitting the countries in two groups during
+the years surveyed, those who presented a GDP higher than the
+European average and those with a lower GDP.
+
We then estimate a wealthy score based on the historical
+(from 1962 to 2007) values, where we account how many times a country
+has participated in the groups of lower or higher
+GDP
Clearly, the second statement produces an additional column and an
+additional row compared to the first statement.
+What conclusion can we draw? We see that a numerical slice, 0:2,
+omits the final index (i.e. index 2) in the range provided,
+while a named slice, ‘gdpPercap_1952’:‘gdpPercap_1962’,
+includes the final element.
+
+
+
+
+
+
+
+
+
+
Reconstructing Data
+
+
Explain what each line in the following short program does: what is
+in first, second, etc.?
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+
+
This line loads the dataset containing the GDP data from all
+countries into a dataframe called first. The
+index_col='country' parameter selects which column to use
+as the row labels in the dataframe.
+
+
PYTHON
+
+
second = first[first['continent'] =='Americas']
+
+
This line makes a selection: only those rows of first
+for which the ‘continent’ column matches ‘Americas’ are extracted.
+Notice how the Boolean expression inside the brackets,
+first['continent'] == 'Americas', is used to select only
+those rows where the expression is true. Try printing this expression!
+Can you print also its individual True/False elements? (hint: first
+assign the expression to a variable)
+
+
PYTHON
+
+
third = second.drop('Puerto Rico')
+
+
As the syntax suggests, this line drops the row from
+second where the label is ‘Puerto Rico’. The resulting
+dataframe third has one row less than the original
+dataframe second.
+
+
PYTHON
+
+
fourth = third.drop('continent', axis =1)
+
+
Again we apply the drop function, but in this case we are dropping
+not a row but a whole column. To accomplish this, we need to specify
+also the axis parameter (we want to drop the second column
+which has index 1).
+
+
PYTHON
+
+
fourth.to_csv('result.csv')
+
+
The final step is to write the data that we have been working on to a
+csv file. Pandas makes this easy with the to_csv()
+function. The only required argument to the function is the filename.
+Note that the file will be written in the directory from which you
+started the Jupyter or Python session.
+
+
+
+
+
+
+
+
+
+
Selecting Indices
+
+
Explain in simple terms what idxmin and
+idxmax do in the short program below. When would you use
+these methods?
+
+
PYTHON
+
+
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.idxmin())
+print(data.idxmax())
+
+
+
+
+
+
+
+
+
+
For each column in data, idxmin will return
+the index value corresponding to each column’s minimum;
+idxmax will do accordingly the same for each column’s
+maximum value.
+
You can use these functions whenever you want to get the row index of
+the minimum/maximum value and not the actual minimum/maximum value.
+
+
+
+
+
+
+
+
+
+
Practice with Selection
+
+
Assume Pandas has been imported and the Gapminder GDP data for Europe
+has been loaded. Write an expression to select each of the
+following:
+
GDP per capita for all countries in 1982.
+
GDP per capita for Denmark for all years.
+
GDP per capita for all countries for years after 1985.
+
GDP per capita for each country in 2007 as a multiple of GDP per
+capita for that country in 1952.
+
+
+
+
+
+
+
+
+
1:
+
+
PYTHON
+
+
data['gdpPercap_1982']
+
+
2:
+
+
PYTHON
+
+
data.loc['Denmark',:]
+
+
3:
+
+
PYTHON
+
+
data.loc[:,'gdpPercap_1985':]
+
+
Pandas is smart enough to recognize the number at the end of the
+column label and does not give you an error, although no column named
+gdpPercap_1985 actually exists. This is useful if new
+columns are added to the CSV file later.
+
4:
+
+
PYTHON
+
+
data['gdpPercap_2007']/data['gdpPercap_1952']
+
+
+
+
+
+
+
+
+
+
+
Many Ways of Access
+
+
There are at least two ways of accessing a value or slice of a
+DataFrame: by name or index. However, there are many others. For
+example, a single column or row can be accessed either as a
+DataFrame or a Series object.
+
Suggest different ways of doing the following operations on a
+DataFrame:
+
Access a single column
+
Access a single row
+
Access an individual DataFrame element
+
Access several columns
+
Access several rows
+
Access a subset of specific rows and columns
+
Access a subset of row and column ranges
+
+
+
+
+
+
+
+
+
1. Access a single column:
+
+
PYTHON
+
+
# by name
+data["col_name"] # as a Series
+data[["col_name"]] # as a DataFrame
+
+# by name using .loc
+data.T.loc["col_name"] # as a Series
+data.T.loc[["col_name"]].T # as a DataFrame
+
+# Dot notation (Series)
+data.col_name
+
+# by index (iloc)
+data.iloc[:, col_index] # as a Series
+data.iloc[:, [col_index]] # as a DataFrame
+
+# using a mask
+data.T[data.T.index =="col_name"].T
+
+
2. Access a single row:
+
+
PYTHON
+
+
# by name using .loc
+data.loc["row_name"] # as a Series
+data.loc[["row_name"]] # as a DataFrame
+
+# by name
+data.T["row_name"] # as a Series
+data.T[["row_name"]].T # as a DataFrame
+
+# by index
+data.iloc[row_index] # as a Series
+data.iloc[[row_index]] # as a DataFrame
+
+# using mask
+data[data.index =="row_name"]
+
+
3. Access an individual DataFrame element:
+
+
PYTHON
+
+
# by column/row names
+data["column_name"]["row_name"] # as a Series
+
+data[["col_name"]].loc["row_name"] # as a Series
+data[["col_name"]].loc[["row_name"]] # as a DataFrame
+
+data.loc["row_name"]["col_name"] # as a value
+data.loc[["row_name"]]["col_name"] # as a Series
+data.loc[["row_name"]][["col_name"]] # as a DataFrame
+
+data.loc["row_name", "col_name"] # as a value
+data.loc[["row_name"], "col_name"] # as a Series. Preserves index. Column name is moved to `.name`.
+data.loc["row_name", ["col_name"]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.loc[["row_name"], ["col_name"]] # as a DataFrame (preserves original index and column name)
+
+# by column/row names: Dot notation
+data.col_name.row_name
+
+# by column/row indices
+data.iloc[row_index, col_index] # as a value
+data.iloc[[row_index], col_index] # as a Series. Preserves index. Column name is moved to `.name`
+data.iloc[row_index, [col_index]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.iloc[[row_index], [col_index]] # as a DataFrame (preserves original index and column name)
+
+# column name + row index
+data["col_name"][row_index]
+data.col_name[row_index]
+data["col_name"].iloc[row_index]
+
+# column index + row name
+data.iloc[:, [col_index]].loc["row_name"] # as a Series
+data.iloc[:, [col_index]].loc[["row_name"]] # as a DataFrame
+
+# using masks
+data[data.index =="row_name"].T[data.T.index =="col_name"].T
+
+
4. Access several columns:
+
+
PYTHON
+
+
# by name
+data[["col1", "col2", "col3"]]
+data.loc[:, ["col1", "col2", "col3"]]
+
+# by index
+data.iloc[:, [col1_index, col2_index, col3_index]]
+
+
5. Access several rows
+
+
PYTHON
+
+
# by name
+data.loc[["row1", "row2", "row3"]]
+
+# by index
+data.iloc[[row1_index, row2_index, row3_index]]
+
+
6. Access a subset of specific rows and columns
+
+
PYTHON
+
+
# by names
+data.loc[["row1", "row2", "row3"], ["col1", "col2", "col3"]]
+
+# by indices
+data.iloc[[row1_index, row2_index, row3_index], [col1_index, col2_index, col3_index]]
+
+# column names + row indices
+data[["col1", "col2", "col3"]].iloc[[row1_index, row2_index, row3_index]]
+
+# column indices + row names
+data.iloc[:, [col1_index, col2_index, col3_index]].loc[["row1", "row2", "row3"]]
+
+
7. Access a subset of row and column ranges
+
+
PYTHON
+
+
# by name
+data.loc["row1":"row2", "col1":"col2"]
+
+# by index
+data.iloc[row1_index:row2_index, col1_index:col2_index]
+
+# column names + row indices
+data.loc[:, "col1_name":"col2_name"].iloc[row1_index:row2_index]
+
+# column indices + row names
+data.iloc[:, col1_index:col2_index].loc["row1":"row2"]
+
+
+
+
+
+
+
+
+
+
+
Exploring available methods using the
+dir() function
+
+
Python includes a dir() function that can be used to
+display all of the available methods (functions) that are built into a
+data object. In Episode 4, we used some methods with a string. But we
+can see many more are available by using dir():
+
+
PYTHON
+
+
my_string ='Hello world!'# creation of a string object
+dir(my_string)
You can use help() or Shift+Tab to
+get more information about what these methods do.
+
Assume Pandas has been imported and the Gapminder GDP data for Europe
+has been loaded as data. Then, use dir() to
+find the function that prints out the median per-capita GDP across all
+European countries for each year that information is available.
+
+
+
+
+
+
+
+
+
Among many choices, dir() lists the
+median() function as a possibility. Thus,
+
+
PYTHON
+
+
data.median()
+
+
+
+
+
+
+
+
+
+
+
Interpretation
+
+
Poland’s borders have been stable since 1945, but changed several
+times in the years before then. How would you handle this if you were
+creating a table of GDP per capita for Poland for the entire twentieth
+century?
+
+
+
+
+
+
+
+
+
Key Points
+
+
Use DataFrame.iloc[..., ...] to select values by
+integer location.
+
Use : on its own to mean all columns or all rows.
+
Select multiple columns or rows using DataFrame.loc and
+a named slice.
+
Result of slicing can be used in further operations.
In our Jupyter Notebook example, running the cell should generate the
+figure directly below the code. The figure is also included in the
+Notebook document for future viewing. However, other Python environments
+like an interactive Python session started from a terminal or a Python
+script executed via the command line require an additional command to
+display the figure.
+
Instruct matplotlib to show a figure:
+
+
PYTHON
+
+
plt.show()
+
+
This command can also be used within a Notebook - for instance, to
+display multiple figures if several are created by a single cell.
Before plotting, we convert the column headings from a
+string to integer data type, since they
+represent numerical values, using str.replace()
+to remove the gpdPercap_ prefix and then astype(int)
+to convert the series of string values
+(['1952', '1957', ..., '2007']) to a series of integers:
+[1925, 1957, ..., 2007].
+
+
PYTHON
+
+
import pandas as pd
+
+data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+
+# Extract year from last 4 characters of each column name
+# The current column names are structured as 'gdpPercap_(year)',
+# so we want to keep the (year) part only for clarity when plotting GDP vs. years
+# To do this we use replace(), which removes from the string the characters stated in the argument
+# This method works on strings, so we use replace() from Pandas Series.str vectorized string functions
+
+years = data.columns.str.replace('gdpPercap_', '')
+
+# Convert year values to integers, saving results back to dataframe
+
+data.columns = years.astype(int)
+
+data.loc['Australia'].plot()
+
+
Select and transform data, then plot it.
+
By default, DataFrame.plot
+plots with the rows as the X axis.
+
We can transpose the data in order to plot multiple series.
+
+
PYTHON
+
+
data.T.plot()
+plt.ylabel('GDP per capita')
+
+
Many styles of plot are available.
+
For example, do a bar plot using a fancier style.
+
+
PYTHON
+
+
plt.style.use('ggplot')
+data.T.plot(kind='bar')
+plt.ylabel('GDP per capita')
+
+
Data can also be plotted by calling the matplotlib
+plot function directly.
+
The command is plt.plot(x, y)
+
+
The color and format of markers can also be specified as an
+additional optional argument e.g., b- is a blue line,
+g-- is a green dashed line.
+
Get Australia data from dataframe
+
+
PYTHON
+
+
years = data.columns
+gdp_australia = data.loc['Australia']
+
+plt.plot(years, gdp_australia, 'g--')
+
+
Can plot many sets of data together.
+
+
PYTHON
+
+
# Select two countries' worth of data.
+gdp_australia = data.loc['Australia']
+gdp_nz = data.loc['New Zealand']
+
+# Plot with differently-colored markers.
+plt.plot(years, gdp_australia, 'b-', label='Australia')
+plt.plot(years, gdp_nz, 'g-', label='New Zealand')
+
+# Create legend.
+plt.legend(loc='upper left')
+plt.xlabel('Year')
+plt.ylabel('GDP per capita ($)')
+
+
+
+
+
+
+
Adding a Legend
+
+
Often when plotting multiple datasets on the same figure it is
+desirable to have a legend describing the data.
By default matplotlib will attempt to place the legend in a suitable
+position. If you would rather specify a position this can be done with
+the loc= argument, e.g to place the legend in the upper
+left corner of the plot, specify loc='upper left'
+
+
+
+
Plot a scatter plot correlating the GDP of Australia and New
+Zealand
+
Use either plt.scatter or
+DataFrame.plot.scatter
+
+
+
PYTHON
+
+
plt.scatter(gdp_australia, gdp_nz)
+
+
+
PYTHON
+
+
data.T.plot.scatter(x ='Australia', y ='New Zealand')
+
+
+
+
+
+
+
Minima and Maxima
+
+
Fill in the blanks below to plot the minimum GDP per capita over time
+for all the countries in Europe. Modify it again to plot the maximum GDP
+per capita over time for Europe.
Modify the example in the notes to create a scatter plot showing the
+relationship between the minimum and maximum GDP per capita among the
+countries in Asia for each year in the data set. What relationship do
+you see (if any)?
No particular correlations can be seen between the minimum and
+maximum GDP values year on year. It seems the fortunes of asian
+countries do not rise and fall together.
+
+
+
+
+
+
+
+
+
+
Correlations (continued)
+
+
+
You might note that the variability in the maximum is much higher
+than that of the minimum. Take a look at the maximum and the max
+indexes:
Seems the variability in this value is due to a sharp drop after
+1972. Some geopolitics at play perhaps? Given the dominance of oil
+producing countries, maybe the Brent crude index would make an
+interesting comparison? Whilst Myanmar consistently has the lowest GDP,
+the highest GDP nation has varied more notably.
+
+
+
+
+
+
+
+
+
+
More Correlations
+
+
This short program creates a plot showing the correlation between GDP
+and life expectancy for 2007, normalizing marker size by population:
Using online help and other resources, explain what each argument to
+plot does.
+
+
+
+
+
+
+
+
+
A good place to look is the documentation for the plot function -
+help(data_all.plot).
+
kind - As seen already this determines the kind of plot to be
+drawn.
+
x and y - A column name or index that determines what data will be
+placed on the x and y axes of the plot
+
s - Details for this can be found in the documentation of
+plt.scatter. A single number or one value for each data point.
+Determines the size of the plotted points.
+
+
+
+
+
+
+
+
+
+
Saving your plot to a file
+
+
If you are satisfied with the plot you see you may want to save it to
+a file, perhaps to include it in a publication. There is a function in
+the matplotlib.pyplot module that accomplishes this: savefig.
+Calling this function, e.g. with
+
+
PYTHON
+
+
plt.savefig('my_figure.png')
+
+
will save the current figure to the file my_figure.png.
+The file format will automatically be deduced from the file name
+extension (other formats are pdf, ps, eps and svg).
+
Note that functions in plt refer to a global figure
+variable and after a figure has been displayed to the screen (e.g. with
+plt.show) matplotlib will make this variable refer to a new
+empty figure. Therefore, make sure you call plt.savefig
+before the plot is displayed to the screen, otherwise you may find a
+file with an empty plot.
+
When using dataframes, data is often generated and plotted to screen
+in one line. In addition to using plt.savefig, we can save
+a reference to the current figure in a local variable (with
+plt.gcf) and call the savefig class method
+from that variable to save the figure to file.
+
+
PYTHON
+
+
data.plot(kind='bar')
+fig = plt.gcf() # get current figure
+fig.savefig('my_figure.png')
+
+
+
+
+
+
+
+
+
+
Making your plots accessible
+
+
Whenever you are generating plots to go into a paper or a
+presentation, there are a few things you can do to make sure that
+everyone can understand your plots.
+
Always make sure your text is large enough to read. Use the
+fontsize parameter in xlabel,
+ylabel, title, and legend, and tick_params
+with labelsize to increase the text size of the numbers
+on your axes.
+
Similarly, you should make your graph elements easy to see. Use
+s to increase the size of your scatterplot markers and
+linewidth to increase the sizes of your plot lines.
+
Using color (and nothing else) to distinguish between different plot
+elements will make your plots unreadable to anyone who is colorblind, or
+who happens to have a black-and-white office printer. For lines, the
+linestyle parameter lets you use different types of lines.
+For scatterplots, marker lets you change the shape of your
+points. If you’re unsure about your colors, you can use Coblis
+or Color Oracle to simulate what
+your plots would look like to those with colorblindness.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+matplotlib is the
+most widely used scientific plotting library in Python.
print('zeroth item of pressures:', pressures[0])
+print('fourth item of pressures:', pressures[4])
+
+
+
OUTPUT
+
+
zeroth item of pressures: 0.273
+fourth item of pressures: 0.276
+
+
Lists’ values can be replaced by assigning to them.
+
Use an index expression on the left of assignment to replace a
+value.
+
+
PYTHON
+
+
pressures[0] =0.265
+print('pressures is now:', pressures)
+
+
+
OUTPUT
+
+
pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]
+
+
Appending items to a list lengthens it.
+
Use list_name.append to add items to the end of a
+list.
+
+
PYTHON
+
+
primes = [2, 3, 5]
+print('primes is initially:', primes)
+primes.append(7)
+print('primes has become:', primes)
+
+
+
OUTPUT
+
+
primes is initially: [2, 3, 5]
+primes has become: [2, 3, 5, 7]
+
+
+append is a method of lists.
+
Like a function, but tied to a particular object.
+
+
Use object_name.method_name to call methods.
+
Deliberately resembles the way we refer to things in a library.
+
+
We will meet other methods of lists as we go along.
+
Use help(list) for a preview.
+
+
+extend is similar to append, but it allows
+you to combine two lists. For example:
+
+
PYTHON
+
+
teen_primes = [11, 13, 17, 19]
+middle_aged_primes = [37, 41, 43, 47]
+print('primes is currently:', primes)
+primes.extend(teen_primes)
+print('primes has now become:', primes)
+primes.append(middle_aged_primes)
+print('primes has finally become:', primes)
+
+
+
OUTPUT
+
+
primes is currently: [2, 3, 5, 7]
+primes has now become: [2, 3, 5, 7, 11, 13, 17, 19]
+primes has finally become: [2, 3, 5, 7, 11, 13, 17, 19, [37, 41, 43, 47]]
+
+
Note that while extend maintains the “flat” structure of
+the list, appending a list to a list means the last element in
+primes will itself be a list, not an integer. Lists can
+contain values of any type; therefore, lists of lists are possible.
+
Use del to remove items from a list entirely.
+
We use del list_name[index] to remove an element from a
+list (in the example, 9 is not a prime number) and thus shorten it.
+
+del is not a function or a method, but a statement in
+the language.
+
+
PYTHON
+
+
primes = [2, 3, 5, 7, 9]
+print('primes before removing last item:', primes)
+del primes[4]
+print('primes after removing last item:', primes)
+
+
+
OUTPUT
+
+
primes before removing last item: [2, 3, 5, 7, 9]
+primes after removing last item: [2, 3, 5, 7]
+
+
The empty list contains no values.
+
Use [] on its own to represent a list that doesn’t
+contain any values.
+
“The zero of lists.”
+
+
Helpful as a starting point for collecting values (which we will see
+in the next episode).
+
Lists may contain values of different types.
+
A single list may contain numbers, strings, and anything else.
If start and stop are both non-negative
+integers, how long is the list values[start:stop]?
+
+
+
+
+
+
+
+
+
The list values[start:stop] has up to
+stop - start elements. For example,
+values[1:4] has the 3 elements values[1],
+values[2], and values[3]. Why ‘up to’? As we
+saw in episode 2, if stop
+is greater than the total length of the list values, we
+will still get a list back but it will be shorter than expected.
+
+
+
+
+
+
+
+
+
+
From Strings to Lists and Back
+
+
Given this:
+
+
PYTHON
+
+
print('string to list:', list('tin'))
+print('list to string:', ''.join(['g', 'o', 'l', 'd']))
+
+
+
OUTPUT
+
+
string to list: ['t', 'i', 'n']
+list to string: gold
+
+
What does list('some string') do?
+
What does '-'.join(['x', 'y', 'z']) generate?
+
+
+
+
+
+
+
+
+
+list('some string')
+converts a string into a list containing all of its characters.
+
+join
+returns a string that is the concatenation of each string
+element in the list and adds the separator between each element in the
+list. This results in x-y-z. The separator between the
+elements is the string that provides this method.
+
+
+
+
+
+
+
+
+
+
Working With the End
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='helium'
+print(element[-1])
+
+
How does Python interpret a negative index?
+
If a list or string has N elements, what is the most negative index
+that can safely be used with it, and what location does that index
+represent?
+
If values is a list, what does
+del values[-1] do?
+
How can you display all elements but the last one without changing
+values? (Hint: you will need to combine slicing and
+negative indexing.)
+
+
+
+
+
+
+
+
+
The program prints m.
+
Python interprets a negative index as starting from the end (as
+opposed to starting from the beginning). The last element is
+-1.
+
The last index that can safely be used with a list of N elements is
+element -N, which represents the first element.
+
+del values[-1] removes the last element from the
+list.
+
values[:-1]
+
+
+
+
+
+
+
+
+
+
Stepping Through a List
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='fluorine'
+print(element[::2])
+print(element[::-1])
+
+
If we write a slice as low:high:stride, what does
+stride do?
+
What expression would select all of the even-numbered items from a
+collection?
+
+
+
+
+
+
+
+
+
The program prints
+
+
PYTHON
+
+
furn
+eniroulf
+
+
+stride is the step size of the slice.
+
The slice 1::2 selects all even-numbered items from a
+collection: it starts with element 1 (which is the second
+element, since indexing starts at 0), goes on until the end
+(since no end is given), and uses a step size of
+2 (i.e., selects every second element).
+
+
+
+
+
+
+
+
+
+
Slice Bounds
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='lithium'
+print(element[0:20])
+print(element[-1:3])
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
lithium
+
+
The first statement prints the whole string, since the slice goes
+beyond the total length of the string. The second statement returns an
+empty string, because the slice goes “out of bounds” of the string.
+
+
+
+
+
+
+
+
+
+
Sort and Sorted
+
+
What do these two programs print? In simple terms, explain the
+difference between sorted(letters) and
+letters.sort().
+
+
PYTHON
+
+
# Program A
+letters =list('gold')
+result =sorted(letters)
+print('letters is', letters, 'and result is', result)
+
+
+
PYTHON
+
+
# Program B
+letters =list('gold')
+result = letters.sort()
+print('letters is', letters, 'and result is', result)
+
+
+
+
+
+
+
+
+
+
Program A prints
+
+
OUTPUT
+
+
letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']
+
+
Program B prints
+
+
OUTPUT
+
+
letters is ['d', 'g', 'l', 'o'] and result is None
+
+
sorted(letters) returns a sorted copy of the list
+letters (the original list letters remains
+unchanged), while letters.sort() sorts the list
+letters in-place and does not return anything.
+
+
+
+
+
+
+
+
+
+
Copying (or Not)
+
+
What do these two programs print? In simple terms, explain the
+difference between new = old and
+new = old[:].
+
+
PYTHON
+
+
# Program A
+old =list('gold')
+new = old # simple assignment
+new[0] ='D'
+print('new is', new, 'and old is', old)
+
+
+
PYTHON
+
+
# Program B
+old =list('gold')
+new = old[:] # assigning a slice
+new[0] ='D'
+print('new is', new, 'and old is', old)
+
+
+
+
+
+
+
+
+
+
Program A prints
+
+
OUTPUT
+
+
new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']
+
+
Program B prints
+
+
OUTPUT
+
+
new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']
+
+
new = old makes new a reference to the list
+old; new and old point towards
+the same object.
+
new = old[:] however creates a new list object
+new containing all elements from the list old;
+new and old are different objects.
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
A list stores many values in a single structure.
+
Use an item’s index to fetch it from a list.
+
Lists’ values can be replaced by assigning to them.
+
Appending items to a list lengthens it.
+
Use del to remove items from a list entirely.
+
The empty list contains no values.
+
Lists may contain values of different types.
+
Character strings can be indexed like lists.
+
Character strings are immutable.
+
Indexing beyond the end of the collection is an error.
+
+
diff --git a/instructor/12-for-loops.html b/instructor/12-for-loops.html
new file mode 100644
index 000000000..fc4379e51
--- /dev/null
+++ b/instructor/12-for-loops.html
@@ -0,0 +1,1180 @@
+
+Plotting and Programming in Python: For Loops
+ Skip to main content
+
This error can be fixed by removing the extra spaces at the
+beginning of the second line.
+
Loop variables can be called anything.
+
As with all variables, loop variables are:
+
Created on demand.
+
Meaningless: their names can be anything at all.
+
+
+
PYTHON
+
+
for kitten in [2, 3, 5]:
+print(kitten)
+
+
The body of a loop can contain many statements.
+
But no loop should be more than a few lines long.
+
Hard for human beings to keep larger chunks of code in mind.
+
+
PYTHON
+
+
primes = [2, 3, 5]
+for p in primes:
+ squared = p **2
+ cubed = p **3
+print(p, squared, cubed)
+
+
+
OUTPUT
+
+
2 4 8
+3 9 27
+5 25 125
+
+
Use range to iterate over a sequence of numbers.
+
The built-in function range
+produces a sequence of numbers.
+
+Not a list: the numbers are produced on demand to make
+looping over large ranges more efficient.
+
+
+range(N) is the numbers 0..N-1
+
Exactly the legal indices of a list or character string of length
+N
+
+
+
PYTHON
+
+
print('a range is not a list: range(0, 3)')
+for number inrange(0, 3):
+print(number)
+
+
+
OUTPUT
+
+
a range is not a list: range(0, 3)
+0
+1
+2
+
+
The Accumulator pattern turns many values into one.
+
A common pattern in programs is to:
+
Initialize an accumulator variable to zero, the empty
+string, or the empty list.
+
Update the variable with values from a collection.
+
+
+
PYTHON
+
+
# Sum the first 10 integers.
+total =0
+for number inrange(10):
+ total = total + (number +1)
+print(total)
+
+
+
OUTPUT
+
+
55
+
+
Read total = total + (number + 1) as:
+
Add 1 to the current value of the loop variable
+number.
+
Add that to the current value of the accumulator variable
+total.
+
Assign that to total, replacing the current value.
+
+
We have to add number + 1 because range
+produces 0..9, not 1..10.
+
+
+
+
+
+
Classifying Errors
+
+
Is an indentation error a syntax error or a runtime error?
+
+
+
+
+
+
+
+
+
An IndentationError is a syntax error. Programs with syntax errors
+cannot be started. A program with a runtime error will start but an
+error will be thrown under certain conditions.
+
+
+
+
+
+
+
+
+
+
Tracing Execution
+
+
Create a table showing the numbers of the lines that are executed
+when this program runs, and the values of the variables after each line
+is executed.
+
+
PYTHON
+
+
total =0
+for char in"tin":
+ total = total +1
+
+
+
+
+
+
+
+
+
+
Line no
+
Variables
+
1
+
total = 0
+
2
+
total = 0 char = ‘t’
+
3
+
total = 1 char = ‘t’
+
2
+
total = 1 char = ‘i’
+
3
+
total = 2 char = ‘i’
+
2
+
total = 2 char = ‘n’
+
3
+
total = 3 char = ‘n’
+
+
+
+
+
+
+
+
+
+
Reversing a String
+
+
Fill in the blanks in the program below so that it prints “nit” (the
+reverse of the original character string “tin”).
+
+
PYTHON
+
+
original ="tin"
+result = ____
+for char in original:
+ result = ____
+print(result)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
original ="tin"
+result =""
+for char in original:
+ result = char + result
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+
+
Fill in the blanks in each of the programs below to produce the
+indicated result.
+
+
PYTHON
+
+
# Total length of the strings in the list: ["red", "green", "blue"] => 12
+total =0
+for word in ["red", "green", "blue"]:
+ ____ = ____ +len(word)
+print(total)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
total =0
+for word in ["red", "green", "blue"]:
+ total = total +len(word)
+print(total)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+(continued)
+
+
+
+
PYTHON
+
+
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
+lengths = ____
+for word in ["red", "green", "blue"]:
+ lengths.____(____)
+print(lengths)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
lengths = []
+for word in ["red", "green", "blue"]:
+ lengths.append(len(word))
+print(lengths)
words = ["red", "green", "blue"]
+result =""
+for word in words:
+ result = result + word
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+(continued)
+
+
+
Create an acronym: Starting from the list
+["red", "green", "blue"], create the acronym
+"RGB" using a for loop.
+
Hint: You may need to use a string method to
+properly format the acronym.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
acronym =""
+for word in ["red", "green", "blue"]:
+ acronym = acronym + word[0].upper()
+print(acronym)
+
+
+
+
+
+
+
+
+
+
+
Cumulative Sum
+
+
Reorder and properly indent the lines of code below so that they
+print a list with the cumulative sum of data. The result should be
+[1, 3, 5, 10].
+
+
PYTHON
+
+
cumulative.append(total)
+for number in data:
+cumulative = []
+total = total + number
+total =0
+print(cumulative)
+data = [1,2,2,5]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
total =0
+data = [1,2,2,5]
+cumulative = []
+for number in data:
+ total = total + number
+ cumulative.append(total)
+print(cumulative)
+
+
+
+
+
+
+
+
+
+
+
Identifying Variable Name Errors
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code and read the error message. What type of
+NameError do you think this is? Is it a string with no
+quotes, a misspelled variable, or a variable that should have been
+defined but was not?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
PYTHON
+
+
for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (Number %3) ==0:
+ message = message + a
+else:
+ message = message +"b"
+print(message)
+
+
+
+
+
+
+
+
+
+
Python variable names are case sensitive: number and
+Number refer to different variables.
+
The variable message needs to be initialized as an
+empty string.
+
We want to add the string "a" to message,
+not the undefined variable a.
+
+
PYTHON
+
+
message =""
+for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (number %3) ==0:
+ message = message +"a"
+else:
+ message = message +"b"
+print(message)
+
+
+
+
+
+
+
+
+
+
+
Identifying Item Errors
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code, and read the error message. What type of error is
+it?
+
Fix the error.
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+
+
+
+
This list has 4 elements and the index to access the last element in
+the list is 3.
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[3])
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
A for loop executes commands once for each value in a
+collection.
+
A for loop is made up of a collection, a loop variable,
+and a body.
+
The first line of the for loop must end with a colon,
+and the body must be indented.
+
Indentation is always meaningful in Python.
+
Loop variables can be called anything (but it is strongly advised to
+have a meaningful name to the looping variable).
+
The body of a loop can contain many statements.
+
Use range to iterate over a sequence of numbers.
+
The Accumulator pattern turns many values into one.
Often use conditionals in a loop to “evolve” the values of
+variables.
+
+
PYTHON
+
+
velocity =10.0
+for i inrange(5): # execute the loop 5 times
+print(i, ':', velocity)
+if velocity >20.0:
+print('moving too fast')
+ velocity = velocity -5.0
+else:
+print('moving too slow')
+ velocity = velocity +10.0
+print('final velocity:', velocity)
+
+
+
OUTPUT
+
+
0 : 10.0
+moving too slow
+1 : 20.0
+moving too slow
+2 : 30.0
+moving too fast
+3 : 25.0
+moving too fast
+4 : 20.0
+moving too slow
+final velocity: 30.0
+
+
Create a table showing variables’ values to trace a program’s
+execution.
+
+i
+
+
+0
+
+
+.
+
+
+1
+
+
+.
+
+
+2
+
+
+.
+
+
+3
+
+
+.
+
+
+4
+
+
+.
+
+
+velocity
+
+
+10.0
+
+
+20.0
+
+
+.
+
+
+30.0
+
+
+.
+
+
+25.0
+
+
+.
+
+
+20.0
+
+
+.
+
+
+30.0
+
+
The program must have a print statement
+outside the body of the loop to show the final value of
+velocity, since its value is updated by the last iteration
+of the loop.
+
+
+
+
+
+
Compound Relations Using and,
+or, and Parentheses
+
+
Often, you want some combination of things to be true. You can
+combine relations within a conditional using and and
+or. Continuing the example above, suppose you have
+
+
PYTHON
+
+
mass = [ 3.54, 2.07, 9.22, 1.86, 1.71]
+velocity = [10.00, 20.00, 30.00, 25.00, 20.00]
+
+i =0
+for i inrange(5):
+if mass[i] >5and velocity[i] >20:
+print("Fast heavy object. Duck!")
+elif mass[i] >2and mass[i] <=5and velocity[i] <=20:
+print("Normal traffic")
+elif mass[i] <=2and velocity[i] <=20:
+print("Slow light object. Ignore it")
+else:
+print("Whoa! Something is up with the data. Check it")
+
+
Just like with arithmetic, you can and should use parentheses
+whenever there is possible ambiguity. A good general rule is to
+always use parentheses when mixing and and
+or in the same condition. That is, instead of:
+
+
PYTHON
+
+
if mass[i] <=2or mass[i] >=5and velocity[i] >20:
+
+
write one of these:
+
+
PYTHON
+
+
if (mass[i] <=2or mass[i] >=5) and velocity[i] >20:
+if mass[i] <=2or (mass[i] >=5and velocity[i] >20):
+
+
so it is perfectly clear to a reader (and to Python) what you really
+mean.
Fill in the blanks so that this program creates a new list containing
+zeroes where the original list’s values were negative and ones where the
+original list’s values were positive.
+
+
PYTHON
+
+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = ____
+for value in original:
+if ____:
+ result.append(0)
+else:
+ ____
+print(result)
+
+
+
OUTPUT
+
+
[0, 1, 1, 1, 0, 1]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = []
+for value in original:
+if value <0.0:
+ result.append(0)
+else:
+ result.append(1)
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Processing Small Files
+
+
Modify this program so that it only processes files with fewer than
+50 records.
+
+
PYTHON
+
+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+ contents = pd.read_csv(filename)
+ ____:
+print(filename, len(contents))
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+ contents = pd.read_csv(filename)
+iflen(contents) <50:
+print(filename, len(contents))
+
+
+
+
+
+
+
+
+
+
+
Initializing
+
+
Modify this program so that it finds the largest and smallest values
+in the list no matter what the range of values originally is.
+
+
PYTHON
+
+
values = [...some test data...]
+smallest, largest =None, None
+for v in values:
+if ____:
+ smallest, largest = v, v
+ ____:
+ smallest =min(____, v)
+ largest =max(____, v)
+print(smallest, largest)
+
+
What are the advantages and disadvantages of using this method to
+find the range of the data?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest =None, None
+for v in values:
+if smallest isNoneand largest isNone:
+ smallest, largest = v, v
+else:
+ smallest =min(smallest, v)
+ largest =max(largest, v)
+print(smallest, largest)
+
+
If you wrote == None instead of is None,
+that works too, but Python programmers always write is None
+because of the special way None works in the language.
+
It can be argued that an advantage of using this method would be to
+make the code more readable. However, a disadvantage is that this code
+is not efficient because within each iteration of the for
+loop statement, there are two more loops that run over two numbers each
+(the min and max functions). It would be more
+efficient to iterate over each number just once:
+
+
PYTHON
+
+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest =None, None
+for v in values:
+if smallest isNoneor v < smallest:
+ smallest = v
+if largest isNoneor v > largest:
+ largest = v
+print(smallest, largest)
+
+
Now we have one loop, but four comparison tests. There are two ways
+we could improve it further: either use fewer comparisons in each
+iteration, or use two loops that each contain only one comparison test.
+The simplest solution is often the best:
+
+
diff --git a/instructor/14-looping-data-sets.html b/instructor/14-looping-data-sets.html
new file mode 100644
index 000000000..9d9a7964a
--- /dev/null
+++ b/instructor/14-looping-data-sets.html
@@ -0,0 +1,859 @@
+
+Plotting and Programming in Python: Looping Over Data Sets
+ Skip to main content
+
Use glob.glob
+to find sets of files whose names match a pattern.
+
In Unix, the term “globbing” means “matching a set of files with a
+pattern”.
+
The most common patterns are:
+
+* meaning “match zero or more characters”
+
+? meaning “match exactly one character”
+
+
Python’s standard library contains the glob
+module to provide pattern matching functionality
+
The glob
+module contains a function also called glob to match file
+patterns
+
E.g., glob.glob('*.txt') matches all files in the
+current directory whose names end with .txt.
+
Result is a (possibly empty) list of character strings.
+
+
PYTHON
+
+
import glob
+print('all csv files in data directory:', glob.glob('data/*.csv'))
+
+
+
OUTPUT
+
+
all csv files in data directory: ['data/gapminder_all.csv', 'data/gapminder_gdp_africa.csv', \
+'data/gapminder_gdp_americas.csv', 'data/gapminder_gdp_asia.csv', 'data/gapminder_gdp_europe.csv', \
+'data/gapminder_gdp_oceania.csv']
+
+
+
PYTHON
+
+
print('all PDB files:', glob.glob('*.pdb'))
+
+
+
OUTPUT
+
+
all PDB files: []
+
+
Use glob and for to process batches of
+files.
+
Helps a lot if the files are named and stored systematically and
+consistently so that simple patterns will find the right data.
+
+
PYTHON
+
+
for filename in glob.glob('data/gapminder_*.csv'):
+ data = pd.read_csv(filename)
+print(filename, data['gdpPercap_1952'].min())
You might have chosen to initialize the fewest variable
+with a number greater than the numbers you’re dealing with, but that
+could lead to trouble if you reuse the code with bigger numbers. Python
+lets you use positive infinity, which will work no matter how big your
+numbers are. What other special strings does the float
+function recognize?
+
+
+
+
+
+
+
+
+
+
Comparing Data
+
+
Write a program that reads in the regional data sets and plots the
+average GDP per capita for each region over time in a single chart.
+Pandas will raise an error if it encounters non-numeric columns in a
+dataframe computation so you may need to either filter out those columns
+or tell pandas to ignore them.
+
+
+
+
+
+
+
+
+
This solution builds a useful legend by using the string
+split method to extract the region from
+the path ‘data/gapminder_gdp_a_specific_region.csv’.
+
+
PYTHON
+
+
import glob
+import pandas as pd
+import matplotlib.pyplot as plt
+fig, ax = plt.subplots(1,1)
+for filename in glob.glob('data/gapminder_gdp*.csv'):
+ dataframe = pd.read_csv(filename)
+# extract <region> from the filename, expected to be in the format 'data/gapminder_gdp_<region>.csv'.
+# we will split the string using the split method and `_` as our separator,
+# retrieve the last string in the list that split returns (`<region>.csv`),
+# and then remove the `.csv` extension from that string.
+# NOTE: the pathlib module covered in the next callout also offers
+# convenient abstractions for working with filesystem paths and could solve this as well:
+# from pathlib import Path
+# region = Path(filename).stem.split('_')[-1]
+ region = filename.split('_')[-1][:-4]
+# pandas raises errors when it encounters non-numeric columns in a dataframe computation
+# but we can tell pandas to ignore them with the `numeric_only` parameter
+ dataframe.mean(numeric_only=True).plot(ax=ax, label=region)
+# NOTE: another way of doing this selects just the columns with gdp in their name using the filter method
+# dataframe.filter(like="gdp").mean().plot(ax=ax, label=region)
+
+plt.legend()
+plt.show()
+
+
+
+
+
+
+
+
+
+
+
Dealing with File Paths
+
+
The pathlib
+module provides useful abstractions for file and path manipulation
+like returning the name of a file without the file extension. This is
+very useful when looping over files and directories. In the example
+below, we create a Path object and inspect its
+attributes.
A common refrain in software engineering is “Don’t Repeat Yourself”.
+How do the techniques we’ve learned in the last lessons help us avoid
+repeating ourselves? Note that in practice there is some nuance to
+this and should be balanced with doing the simplest thing that could
+possibly work.
+
+
What are the pros / cons of making a variable global or local to a
+function?
+
When would you consider turning a block of code into a function
+definition?
Explain and identify the difference between function definition and
+function call.
+
Write a function that takes a small, fixed number of arguments and
+produces a single result.
+
+
+
+
+
+
Break programs down into functions to make them easier to
+understand.
+
Human beings can only keep a few items in working memory at a
+time.
+
Understand larger/more complicated ideas by understanding and
+combining pieces.
+
Components in a machine.
+
Lemmas when proving theorems.
+
+
Functions serve the same purpose in programs.
+
+Encapsulate complexity so that we can treat it as a single
+“thing”.
+
+
Also enables re-use.
+
Write one time, use many times.
+
+
Define a function using def with a name, parameters,
+and a block of code.
+
Begin the definition of a new function with def.
+
Followed by the name of the function.
+
Must obey the same rules as variable names.
+
+
Then parameters in parentheses.
+
Empty parentheses if the function doesn’t take any inputs.
+
We will discuss this in detail in a moment.
+
+
Then a colon.
+
Then an indented block of code.
+
+
PYTHON
+
+
def print_greeting():
+print('Hello!')
+print('The weather is nice today.')
+print('Right?')
+
+
Defining a function does not run it.
+
Defining a function does not run it.
+
Like assigning a value to a variable.
+
+
Must call the function to execute the code it contains.
+
+
PYTHON
+
+
print_greeting()
+
+
+
OUTPUT
+
+
Hello!
+
+
Arguments in a function call are matched to its defined
+parameters.
+
Functions are most useful when they can operate on different
+data.
+
Specify parameters when defining a function.
+
These become variables when the function is executed.
+
Are assigned the arguments in the call (i.e., the values passed to
+the function).
+
If you don’t name the arguments when using them in the call, the
+arguments will be matched to parameters in the order the parameters are
+defined in the function.
Or, we can name the arguments when we call the function, which allows
+us to specify them in any order and adds clarity to the call site;
+otherwise as one is reading the code they might forget if the second
+argument is the month or the day for example.
+
+
PYTHON
+
+
print_date(month=3, day=19, year=1871)
+
+
+
OUTPUT
+
+
1871/3/19
+
+
Via Twitter:
+() contains the ingredients for the function while the body
+contains the recipe.
+
Functions may return a result to their caller using
+return.
+
Use return ... to give a value back to the caller.
+
May occur anywhere in the function.
+
But functions are easier to understand if return
+occurs:
+
A function that doesn’t explicitly return a value
+automatically returns None.
+
+
PYTHON
+
+
result = print_date(1871, 3, 19)
+print('result of call is:', result)
+
+
+
OUTPUT
+
+
1871/3/19
+result of call is: None
+
+
+
+
+
+
+
Identifying Syntax Errors
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code and read the error message. Is it a
+SyntaxError or an IndentationError?
+
Fix the error.
+
Repeat steps 2 and 3 until you have fixed all the errors.
+
+
PYTHON
+
+
def another_function
+print("Syntax errors are annoying.")
+print("But at least python tells us about them!")
+print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def another_function():
+print("Syntax errors are annoying.")
+print("But at least Python tells us about them!")
+print("So they are usually not too hard to fix.")
A function call always needs parenthesis, otherwise you get memory
+address of the function object. So, if we wanted to call the function
+named report, and give it the value 22.5 to report on, we could have our
+function call as follows
After fixing the problem above, explain why running this example
+code:
+
+
PYTHON
+
+
result = print_time(11, 37, 59)
+print('result of call is:', result)
+
+
gives this output:
+
+
OUTPUT
+
+
11:37:59
+result of call is: None
+
+
Why is the result of the call None?
+
+
+
+
+
+
+
+
+
The problem with the example is that the function
+print_time() is defined after the call to the
+function is made. Python doesn’t know how to resolve the name
+print_time since it hasn’t been defined yet and will raise
+a NameError e.g.,
+NameError: name 'print_time' is not defined
+
The first line of output 11:37:59 is printed by the
+first line of code, result = print_time(11, 37, 59) that
+binds the value returned by invoking print_time to the
+variable result. The second line is from the second print
+call to print the contents of the result variable.
+
print_time() does not explicitly return
+a value, so it automatically returns None.
+
+
+
+
+
+
+
+
+
+
Encapsulation
+
+
Fill in the blanks to create a function that takes a single filename
+as an argument, loads the data in the file named by the argument, and
+returns the minimum value in that data.
+
+
PYTHON
+
+
import pandas as pd
+
+def min_in_data(____):
+ data = ____
+return ____
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import pandas as pd
+
+def min_in_data(filename):
+ data = pd.read_csv(filename)
+return data.min()
+
+
+
+
+
+
+
+
+
+
+
Find the First
+
+
Fill in the blanks to create a function that takes a list of numbers
+as an argument and returns the first negative value in the list. What
+does your function do if the list is empty? What if the list has no
+negative numbers?
+
+
PYTHON
+
+
def first_negative(values):
+for v in ____:
+if ____:
+return ____
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def first_negative(values):
+for v in values:
+if v <0:
+return v
+
+
If an empty list or a list with all positive values is passed to this
+function, it returns None:
+
+
PYTHON
+
+
my_list = []
+print(first_negative(my_list))
+
+
+
OUTPUT
+
+
None
+
+
+
+
+
+
+
+
+
+
+
Calling by Name
+
+
Earlier we saw this function:
+
+
PYTHON
+
+
def print_date(year, month, day):
+ joined =str(year) +'/'+str(month) +'/'+str(day)
+print(joined)
+
+
We saw that we can call the function using named arguments,
+like this:
+
+
PYTHON
+
+
print_date(day=1, month=2, year=2003)
+
+
What does print_date(day=1, month=2, year=2003)
+print?
+
When have you seen a function call like this before?
+
When and why is it useful to call functions this way?
+
+
+
+
+
+
+
+
+
2003/2/1
+
We saw examples of using named arguments when working with
+the pandas library. For example, when reading in a dataset using
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country'),
+the last argument index_col is a named argument.
+
Using named arguments can make code more readable since one can see
+from the function call what name the different arguments have inside the
+function. It can also reduce the chances of passing arguments in the
+wrong order, since by using named arguments the order doesn’t
+matter.
+
+
+
+
+
+
+
+
+
+
Encapsulation of an If/Print Block
+
+
The code below will run on a label-printer for chicken eggs. A
+digital scale will report a chicken egg mass (in grams) to the computer
+and then the computer will print a label.
+
+
PYTHON
+
+
import random
+for i inrange(10):
+
+# simulating the mass of a chicken egg
+# the (random) mass will be 70 +/- 20 grams
+ mass =70+20.0* (2.0* random.random() -1.0)
+
+print(mass)
+
+# egg sizing machinery prints a label
+if mass >=85:
+print("jumbo")
+elif mass >=70:
+print("large")
+elif mass <70and mass >=55:
+print("medium")
+else:
+print("small")
+
+
The if-block that classifies the eggs might be useful in other
+situations, so to avoid repeating it, we could fold it into a function,
+get_egg_label(). Revising the program to use the function
+would give us this:
+
+
PYTHON
+
+
# revised version
+import random
+for i inrange(10):
+
+# simulating the mass of a chicken egg
+# the (random) mass will be 70 +/- 20 grams
+ mass =70+20.0* (2.0* random.random() -1.0)
+
+print(mass, get_egg_label(mass))
+
+
Create a function definition for get_egg_label() that
+will work with the revised program above. Note that the
+get_egg_label() function’s return value will be important.
+Sample output from the above program would be
+71.23 large.
+
A dirty egg might have a mass of more than 90 grams, and a spoiled
+or broken egg will probably have a mass that’s less than 50 grams.
+Modify your get_egg_label() function to account for these
+error conditions. Sample output could be
+25 too light, probably spoiled.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def get_egg_label(mass):
+# egg sizing machinery prints a label
+ egg_label ="Unlabelled"
+if mass >=90:
+ egg_label ="warning: egg might be dirty"
+elif mass >=85:
+ egg_label ="jumbo"
+elif mass >=70:
+ egg_label ="large"
+elif mass <70and mass >=55:
+ egg_label ="medium"
+elif mass <50:
+ egg_label ="too light, probably spoiled"
+else:
+ egg_label ="small"
+return egg_label
How would you generalize this function if you did not know
+beforehand which specific years occurred as columns in the data? For
+instance, what if we also had data from years ending in 1 and 9 for each
+decade? (Hint: use the columns to filter out the ones that correspond to
+the decade, instead of enumerating them in the code.)
+
+
+
+
+
+
+
+
+
The average GDP for Japan across the years reported for the 1980s is
+computed with:
To obtain the average for the relevant years, we need to loop over
+them:
+
+
PYTHON
+
+
def avg_gdp_in_decade(country, continent, year):
+ data_countries = pd.read_csv('data/gapminder_gdp_'+ continent +'.csv', index_col=0)
+ c = data_countries.loc[country]
+ gdp_decade ='gdpPercap_'+str(year //10)
+ total =0.0
+ num_years =0
+for yr_header in c.index: # c's index contains reported years
+if yr_header.startswith(gdp_decade):
+ total = total + c.loc[yr_header]
+ num_years = num_years +1
+return total/num_years
+
+
The function can now be called by:
+
+
PYTHON
+
+
avg_gdp_in_decade('Japan','asia',1983)
+
+
+
OUTPUT
+
+
20880.023800000003
+
+
+
+
+
+
+
+
+
+
+
Simulating a dynamical system
+
+
In mathematics, a dynamical
+system is a system in which a function describes the time dependence
+of a point in a geometrical space. A canonical example of a dynamical
+system is the logistic map, a
+growth model that computes a new population density (between 0 and 1)
+based on the current density. In the model, time takes discrete values
+0, 1, 2, …
+
Define a function called logistic_map that takes two
+inputs: x, representing the current population (at time
+t), and a parameter r = 1. This function
+should return a value representing the state of the system (population)
+at time t + 1, using the mapping function:
+
f(t+1) = r * f(t) * [1 - f(t)]
+
Using a for or while loop, iterate the
+logistic_map function defined in part 1, starting from an
+initial population of 0.5, for a period of time
+t_final = 10. Store the intermediate results in a list so
+that after the loop terminates you have accumulated a sequence of values
+representing the state of the logistic map at times
+t = [0,1,...,t_final] (11 values in total). Print this list
+to see the evolution of the population.
+
Encapsulate the logic of your loop into a function called
+iterate that takes the initial population as its first
+input, the parameter t_final as its second input and the
+parameter r as its third input. The function should return
+the list of values representing the state of the logistic map at times
+t = [0,1,...,t_final]. Run this function for periods
+t_final = 100 and 1000 and print some of the
+values. Is the population trending toward a steady state?
Functions will often contain conditionals. Here is a short example
+that will indicate which quartile the argument is in based on hand-coded
+values for the quartile cut points.
+
+
PYTHON
+
+
def calculate_life_quartile(exp):
+if exp <58.41:
+# This observation is in the first quartile
+return1
+elif exp >=58.41and exp <67.05:
+# This observation is in the second quartile
+return2
+elif exp >=67.05and exp <71.70:
+# This observation is in the third quartile
+return3
+elif exp >=71.70:
+# This observation is in the fourth quartile
+return4
+else:
+# This observation has bad data
+returnNone
+
+calculate_life_quartile(62.5)
+
+
+
OUTPUT
+
+
2
+
+
That function would typically be used within a for loop,
+but Pandas has a different, more efficient way of doing the same thing,
+and that is by applying a function to a dataframe or a portion
+of a dataframe. Here is an example, using the definition above.
+
+
PYTHON
+
+
data = pd.read_csv('data/gapminder_all.csv')
+data['life_qrtl'] = data['lifeExp_1952'].apply(calculate_life_quartile)
+
+
There is a lot in that second line, so let’s take it piece by piece.
+On the right side of the = we start with
+data['lifeExp'], which is the column in the dataframe
+called data labeled lifExp. We use the
+apply() to do what it says, apply the
+calculate_life_quartile to the value of this column for
+every row in the dataframe.
+
+
+
+
+
+
+
+
+
Key Points
+
+
Break programs down into functions to make them easier to
+understand.
+
Define a function using def with a name, parameters,
+and a block of code.
+
Defining a function does not run it.
+
Arguments in a function call are matched to its defined
+parameters.
+
Functions may return a result to their caller using
+return.
Read a traceback and determine the file, function, and line number
+on which the error occurred, the type of error, and the error
+message.
+
+
+
+
+
+
The scope of a variable is the part of a program that can ‘see’ that
+variable.
+
There are only so many sensible names for variables.
+
People using functions shouldn’t have to worry about what variable
+names the author of the function used.
+
People writing functions shouldn’t have to worry about what variable
+names the function’s caller uses.
+
The part of a program in which a variable is visible is called its
+scope.
+
+
PYTHON
+
+
pressure =103.9
+
+def adjust(t):
+ temperature = t *1.43/ pressure
+return temperature
+
+
+pressure is a global variable.
+
Defined outside any particular function.
+
Visible everywhere.
+
+
+t and temperature are local
+variables in adjust.
+
Defined in the function.
+
Not visible in the main program.
+
Remember: a function parameter is a variable that is automatically
+assigned a value when the function is called.
+
+
+
PYTHON
+
+
print('adjusted:', adjust(0.9))
+print('temperature after call:', temperature)
+
+
+
OUTPUT
+
+
adjusted:0.01238691049085659
+
+
+
ERROR
+
+
Traceback (most recent call last):
+ File "/Users/swcarpentry/foo.py", line 8, in <module>
+ print('temperature after call:', temperature)
+NameError: name 'temperature' is not defined
+
+
+
+
+
+
+
Local and Global Variable Use
+
+
Trace the values of all variables in this program as it is executed.
+(Use ‘—’ as the value of variables before and after they exist.)
Read the traceback below, and identify the following:
+
How many levels does the traceback have?
+
What is the file name where the error occurred?
+
What is the function name where the error occurred?
+
On which line number in this function did the error occur?
+
What is the type of error?
+
What is the error message?
+
+
ERROR
+
+
---------------------------------------------------------------------------
+KeyError Traceback (most recent call last)
+<ipython-input-2-e4c4cbafeeb5> in <module>()
+ 1 import errors_02
+----> 2 errors_02.print_friday_message()
+
+/Users/ghopper/thesis/code/errors_02.py in print_friday_message()
+ 13
+ 14 def print_friday_message():
+---> 15 print_message("Friday")
+
+/Users/ghopper/thesis/code/errors_02.py in print_message(day)
+ 9 "sunday": "Aw, the weekend is almost over."
+ 10 }
+---> 11 print(messages[day])
+ 12
+ 13
+
+KeyError: 'Friday'
+
+
+
+
+
+
+
+
+
+
Three levels.
+
errors_02.py
+
print_message
+
Line 11
+
+KeyError. These errors occur when we are trying to look
+up a key that does not exist (usually in a data structure such as a
+dictionary). We can find more information about the
+KeyError and other built-in exceptions in the Python
+docs.
+
KeyError: 'Friday'
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
The scope of a variable is the part of a program that can ‘see’ that
+variable.
Provide sound justifications for basic rules of coding style.
+
Refactor one-page programs to make them more readable and justify
+the changes.
+
Use Python community coding standards (PEP-8).
+
+
+
+
+
+
Coding style
+
A consistent coding style helps others (including our future selves)
+read and understand code more easily. Code is read much more often than
+it is written, and as the Zen of Python
+states, “Readability counts”. Python proposed a standard style through
+one of its first Python Enhancement Proposals (PEP), PEP8.
+
Some points worth highlighting:
+
document your code and ensure that assumptions, internal algorithms,
+expected inputs, expected outputs, etc., are clear
+
use clear, semantically meaningful variable names
+
use white-space, not tabs, to indent lines (tabs can cause
+problems across different text editors, operating systems, and version
+control systems)
+
Follow standard Python style in your code.
+
+PEP8: a style
+guide for Python that discusses topics such as how to name variables,
+how to indent your code, how to structure your import
+statements, etc. Adhering to PEP8 makes it easier for other Python
+developers to read and understand your code, and to understand what
+their contributions should look like.
+
To check your code for compliance with PEP8, you can use the pycodestyle application
+and tools like the black code
+formatter can automatically format your code to conform to PEP8 and
+pycodestyle (a Jupyter notebook formatter also exists nb_black).
+
Some groups and organizations follow different style guidelines
+besides PEP8. For example, the Google style
+guide on Python makes slightly different recommendations. Google
+wrote an application that can help you format your code in either their
+style or PEP8 called yapf.
+
With respect to coding style, the key is consistency.
+Choose a style for your project be it PEP8, the Google style, or
+something else and do your best to ensure that you and anyone else you
+are collaborating with sticks to it. Consistency within a project is
+often more impactful than the particular style used. A consistent style
+will make your software easier to read and understand for others and for
+your future self.
+
Use assertions to check for internal errors.
+
Assertions are a simple but powerful method for making sure that the
+context in which your code is executing is as you expect.
+
+
PYTHON
+
+
def calc_bulk_density(mass, volume):
+'''Return dry bulk density = powder mass / powder volume.'''
+assert volume >0
+return mass / volume
+
+
If the assertion is False, the Python interpreter raises
+an AssertionError runtime exception. The source code for
+the expression that failed will be displayed as part of the error
+message. To ignore assertions in your code run the interpreter with the
+‘-O’ (optimize) switch. Assertions should contain only simple checks and
+never change the state of the program. For example, an assertion should
+never contain an assignment.
+
Use docstrings to provide builtin help.
+
If the first thing in a function is a character string that is not
+assigned directly to a variable, Python attaches it to the function,
+accessible via the builtin help function. This string that provides
+documentation is also known as a docstring.
+
+
PYTHON
+
+
def average(values):
+"Return average of values, or None if no values are supplied."
+
+iflen(values) ==0:
+returnNone
+returnsum(values) /len(values)
+
+help(average)
+
+
+
OUTPUT
+
+
Help on function average in module __main__:
+
+average(values)
+ Return average of values, or None if no values are supplied.
+
+
+
+
+
+
+
Multiline Strings
+
+
Often use multiline strings for documentation. These start
+and end with three quote characters (either single or double) and end
+with three matching characters.
+
+
PYTHON
+
+
"""This string spans
+multiple lines.
+
+Blank lines are allowed."""
+
+
+
+
+
+
+
+
+
+
What Will Be Shown?
+
+
Highlight the lines in the code below that will be available as
+online help. Are there lines that should be made available, but won’t
+be? Will any lines produce a syntax error or a runtime error?
+
+
PYTHON
+
+
"Find maximum edit distance between multiple sequences."
+# This finds the maximum distance between all sequences.
+
+def overall_max(sequences):
+'''Determine overall maximum edit distance.'''
+
+ highest =0
+for left in sequences:
+for right in sequences:
+'''Avoid checking sequence against itself.'''
+if left != right:
+ this = edit_distance(left, right)
+ highest =max(highest, this)
+
+# Report.
+return highest
+
+
+
+
+
+
+
+
+
+
Document This
+
+
Use comments to describe and help others understand potentially
+unintuitive sections or individual lines of code. They are especially
+useful to whoever may need to understand and edit your code in the
+future, including yourself.
+
Use docstrings to document the acceptable inputs and expected outputs
+of a method or class, its purpose, assumptions and intended behavior.
+Docstrings are displayed when a user invokes the builtin
+help method on your method or class.
+
Turn the comment in the following function into a docstring and check
+that help displays it properly.
+
+
PYTHON
+
+
def middle(a, b, c):
+# Return the middle value of three.
+# Assumes the values can actually be compared.
+ values = [a, b, c]
+ values.sort()
+return values[1]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def middle(a, b, c):
+'''Return the middle value of three.
+ Assumes the values can actually be compared.'''
+ values = [a, b, c]
+ values.sort()
+return values[1]
+
+
+
+
+
+
+
+
+
+
+
Clean Up This Code
+
+
Read this short program and try to predict what it does.
+
Run it: how accurate was your prediction?
+
Refactor the program to make it more readable. Remember to run it
+after each change to ensure its behavior hasn’t changed.
+
Compare your rewrite with your neighbor’s. What did you do the same?
+What did you do differently, and why?
+
+
PYTHON
+
+
n =10
+s ='et cetera'
+print(s)
+i =0
+while i < n:
+# print('at', j)
+ new =''
+for j inrange(len(s)):
+ left = j-1
+ right = (j+1)%len(s)
+if s[left]==s[right]: new = new +'-'
+else: new = new +'*'
+ s=''.join(new)
+print(s)
+ i +=1
+
+
+
+
+
+
+
+
+
+
Here’s one solution.
+
+
PYTHON
+
+
def string_machine(input_string, iterations):
+"""
+ Takes input_string and generates a new string with -'s and *'s
+ corresponding to characters that have identical adjacent characters
+ or not, respectively. Iterates through this procedure with the resultant
+ strings for the supplied number of iterations.
+ """
+print(input_string)
+ input_string_length =len(input_string)
+ old = input_string
+for i inrange(iterations):
+ new =''
+# iterate through characters in previous string
+for j inrange(input_string_length):
+ left = j-1
+ right = (j+1) % input_string_length # ensure right index wraps around
+if old[left] == old[right]:
+ new = new +'-'
+else:
+ new = new +'*'
+print(new)
+# store new string as old
+ old = new
+
+string_machine('et cetera', 10)
Name and locate scientific Python community sites for software,
+workshops, and help.
+
+
+
+
+
+
Leslie Lamport once said, “Writing is nature’s way of showing you how
+sloppy your thinking is.” The same is true of programming: many things
+that seem obvious when we’re thinking about them turn out to be anything
+but when we have to explain them precisely.
+
Python supports a large and diverse community across academia and
+industry.
+
+
diff --git a/instructor/404.html b/instructor/404.html
new file mode 100644
index 000000000..6f995c79a
--- /dev/null
+++ b/instructor/404.html
@@ -0,0 +1,546 @@
+
+Plotting and Programming in Python: Page not found
+ Skip to main content
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Plotting and Programming in Python
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Page not found
+
+
Our apologies!
+
We cannot seem to find the page you are looking for. Here are some
+tips that may help:
to Share—copy and redistribute the material in any
+medium or format
+
to Adapt—remix, transform, and build upon the
+material
+
for any purpose, even commercially.
+
The licensor cannot revoke these freedoms as long as you follow the
+license terms.
+
Under the following terms:
+
Attribution—You must give appropriate credit
+(mentioning that your work is derived from work that is Copyright (c)
+The Carpentries and, where practical, linking to https://carpentries.org/), provide a link to the
+license, and indicate if changes were made. You may do so in any
+reasonable manner, but not in any way that suggests the licensor
+endorses you or your use.
+
No additional restrictions—You may not apply
+legal terms or technological measures that legally restrict others from
+doing anything the license permits. With the understanding
+that:
+
Notices:
+
You do not have to comply with the license for elements of the
+material in the public domain or where your use is permitted by an
+applicable exception or limitation.
+
No warranties are given. The license may not give you all of the
+permissions necessary for your intended use. For example, other rights
+such as publicity, privacy, or moral rights may limit how you use the
+material.
+
Software
+
Except where otherwise noted, the example programs and other software
+provided by The Carpentries are made available under the OSI-approved MIT
+license.
+
Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+“Software”), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
Trademark
+
“The Carpentries”, “Software Carpentry”, “Data Carpentry”, and
+“Library Carpentry” and their respective logos are registered trademarks
+of Community Initiatives.
Understand the difference between a Python script and a Jupyter
+notebook.
+
Create Markdown cells in a notebook.
+
Create and run Python cells in a notebook.
+
+
+
+
+
+
+
To run Python, we are going to use Jupyter Notebooks via JupyterLab for
+the remainder of this workshop. Jupyter notebooks are common in data
+science and visualization and serve as a convenient common-denominator
+experience for running Python code interactively where we can easily
+view and share the results of our Python code.
+
There are other ways of editing, managing, and running code. Software
+developers often use an integrated development environment (IDE) like PyCharm or Visual Studio Code, or text
+editors like Vim or Emacs, to create and edit their Python programs.
+After editing and saving your Python programs you can execute those
+programs within the IDE itself or directly on the command line. In
+contrast, Jupyter notebooks let us execute and view the results of our
+Python code immediately within the notebook.
+
JupyterLab has several other handy features:
+
+
You can easily type, edit, and copy and paste blocks of code.
+
Tab complete allows you to easily access the names of things you are
+using and learn more about them.
+
It allows you to annotate your code with links, different sized
+text, bullets, etc. to make it more accessible to you and your
+collaborators.
+
It allows you to display figures next to the code that produces them
+to tell a complete story of the analysis.
+
+
Each notebook contains one or more cells that contain code, text, or
+images.
+
Getting Started with JupyterLab
+
+
+
JupyterLab is an application server with a web user interface from Project Jupyter that enables one to work
+with documents and activities such as Jupyter notebooks, text editors,
+terminals, and even custom components in a flexible, integrated, and
+extensible manner. JupyterLab requires a reasonably up-to-date browser
+(ideally a current version of Chrome, Safari, or Firefox); Internet
+Explorer versions 9 and below are not supported.
+
JupyterLab is included as part of the Anaconda Python distribution.
+If you have not already installed the Anaconda Python distribution, see
+the setup instructions for installation
+instructions.
+
In this lesson we will run JupyterLab locally on our own machines so
+it will not require an internet connection besides the initial
+connection to download and install Anaconda and JupyterLab
+
+
Start the JupyterLab server on your machine
+
Use a web browser to open a special localhost URL that connects to
+your JupyterLab server
+
The JupyterLab server does the work and the web browser renders the
+result
+
Type code into the browser and see the results after your JupyterLab
+server has finished executing your code
Experienced users of Jupyter notebooks interested in a more detailed
+discussion of the similarities and differences between the JupyterLab
+and Jupyter notebook user interfaces can find more information in the JupyterLab
+user interface documentation.
+
+
+
+
Starting JupyterLab
+
+
+
You can start the JupyterLab server through the command line or
+through an application called Anaconda Navigator. Anaconda
+Navigator is included as part of the Anaconda Python distribution.
+
+
macOS - Command Line
+
+
To start the JupyterLab server you will need to access the command
+line through the Terminal. There are two ways to open Terminal on
+Mac.
+
+
In your Applications folder, open Utilities and double-click on
+Terminal
+
Press Command + spacebar to launch Spotlight.
+Type Terminal and then double-click the search result or
+hit Enter
+
+
+
After you have launched Terminal, type the command to launch the
+JupyterLab server.
+
+
BASH
+
+
$ jupyter lab
+
+
+
+
Windows Users - Command Line
+
+
To start the JupyterLab server you will need to access the Anaconda
+Prompt.
+
Press Windows Logo Key and search for
+Anaconda Prompt, click the result or press enter.
+
After you have launched the Anaconda Prompt, type the command:
+
+
BASH
+
+
$ jupyter lab
+
+
+
+
Anaconda Navigator
+
+
To start a JupyterLab server from Anaconda Navigator you must first
+start
+Anaconda Navigator (click for detailed instructions on macOS, Windows,
+and Linux). You can search for Anaconda Navigator via Spotlight on
+macOS (Command + spacebar), the Windows search
+function (Windows Logo Key) or opening a terminal shell and
+executing the anaconda-navigator executable from the
+command line.
+
After you have launched Anaconda Navigator, click the
+Launch button under JupyterLab. You may need to scroll down
+to find it.
+
Here is a screenshot of an Anaconda Navigator page similar to the one
+that should open on either macOS or Windows.
+
+
+
And here is a screenshot of a JupyterLab landing page that should be
+similar to the one that opens in your default web browser after starting
+the JupyterLab server on either macOS or Windows.
+
+
+
+
The JupyterLab Interface
+
+
+
JupyterLab has many features found in traditional integrated
+development environments (IDEs) but is focused on providing flexible
+building blocks for interactive, exploratory computing.
+
The JupyterLab
+Interface consists of the Menu Bar, a collapsable Left Side Bar, and
+the Main Work Area which contains tabs of documents and activities.
+
+
Menu Bar
+
+
The Menu Bar at the top of JupyterLab has the top-level menus that
+expose various actions available in JupyterLab along with their keyboard
+shortcuts (where applicable). The following menus are included by
+default.
+
+
+File: Actions related to files and directories such
+as New, Open, Close, Save, etc. The
+File menu also includes the Shut Down action used to
+shutdown the JupyterLab server.
+
+Edit: Actions related to editing documents and
+other activities such as Undo, Cut, Copy,
+Paste, etc.
+
+View: Actions that alter the appearance of
+JupyterLab.
+
+Run: Actions for running code in different
+activities such as notebooks and code consoles (discussed below).
+
+Kernel: Actions for managing kernels. Kernels in
+Jupyter will be explained in more detail below.
+
+Tabs: A list of the open documents and activities
+in the main work area.
+
+Settings: Common JupyterLab settings can be
+configured using this menu. There is also an Advanced Settings
+Editor option in the dropdown menu that provides more fine-grained
+control of JupyterLab settings and configuration options.
+
+Help: A list of JupyterLab and kernel help
+links.
+
+
+
+
+
+
+
Kernels
+
+
The JupyterLab docs
+define kernels as “separate processes started by the server that runs
+your code in different programming languages and environments.” When we
+open a Jupyter Notebook, that starts a kernel - a process - that is
+going to run the code. In this lesson, we’ll be using the Jupyter
+ipython kernel which lets us run Python 3 code interactively.
+
Using other Jupyter kernels
+for other programming languages would let us write and execute code
+in other programming languages in the same JupyterLab interface, like R,
+Java, Julia, Ruby, JavaScript, Fortran, etc.
+
+
+
+
A screenshot of the default Menu Bar is provided below.
+
+
+
+
+
Left Sidebar
+
+
The left sidebar contains a number of commonly used tabs, such as a
+file browser (showing the contents of the directory where the JupyterLab
+server was launched), a list of running kernels and terminals, the
+command palette, and a list of open tabs in the main work area. A
+screenshot of the default Left Side Bar is provided below.
+
+
+
The left sidebar can be collapsed or expanded by selecting “Show Left
+Sidebar” in the View menu or by clicking on the active sidebar tab.
+
+
+
Main Work Area
+
+
The main work area in JupyterLab enables you to arrange documents
+(notebooks, text files, etc.) and other activities (terminals, code
+consoles, etc.) into panels of tabs that can be resized or subdivided. A
+screenshot of the default Main Work Area is provided below.
+
If you do not see the Launcher tab, click the blue plus sign under
+the “File” and “Edit” menus and it will appear.
+
+
+
Drag a tab to the center of a tab panel to move the tab to the panel.
+Subdivide a tab panel by dragging a tab to the left, right, top, or
+bottom of the panel. The work area has a single current activity. The
+tab for the current activity is marked with a colored top border (blue
+by default).
+
+
Creating a Python script
+
+
+
+
To start writing a new Python program click the Text File icon under
+the Other header in the Launcher tab of the Main Work Area.
+
+
You can also create a new plain text file by selecting the New
+-> Text File from the File menu in the Menu Bar.
+
+
+
To convert this plain text file to a Python program, select the
+Save File As action from the File menu in the Menu Bar
+and give your new text file a name that ends with the .py
+extension.
+
+
The .py extension lets everyone (including the
+operating system) know that this text file is a Python program.
+
This is convention, not a requirement.
+
+
+
Creating a Jupyter Notebook
+
+
+
To open a new notebook click the Python 3 icon under the
+Notebook header in the Launcher tab in the main work area. You
+can also create a new notebook by selecting New -> Notebook
+from the File menu in the Menu Bar.
+
Additional notes on Jupyter notebooks.
+
+
Notebook files have the extension .ipynb to distinguish
+them from plain-text Python programs.
+
Notebooks can be exported as Python scripts that can be run from the
+command line.
+
+
Below is a screenshot of a Jupyter notebook running inside
+JupyterLab. If you are interested in more details, then see the official
+notebook documentation.
+
+
+
+
+
+
+
+
How It’s Stored
+
+
+
The notebook file is stored in a format called JSON.
+
Just like a webpage, what’s saved looks different from what you see
+in your browser.
+
But this format allows Jupyter to mix source code, text, and images,
+all in one file.
+
+
+
+
+
+
+
+
+
+
Arranging Documents into Panels of Tabs
+
+
In the JupyterLab Main Work Area you can arrange documents into
+panels of tabs. Here is an example from the official
+documentation.
+
+
+
First, create a text file, Python console, and terminal window and
+arrange them into three panels in the main work area. Next, create a
+notebook, terminal window, and text file and arrange them into three
+panels in the main work area. Finally, create your own combination of
+panels and tabs. What combination of panels and tabs do you think will
+be most useful for your workflow?
+
+
+
+
+
+
+
+
+
After creating the necessary tabs, you can drag one of the tabs to
+the center of a panel to move the tab to the panel; next you can
+subdivide a tab panel by dragging a tab to the left, right, top, or
+bottom of the panel.
+
+
+
+
+
+
+
+
+
+
Code vs. Text
+
+
Jupyter mixes code and text in different types of blocks, called
+cells. We often use the term “code” to mean “the source code of software
+written in a language such as Python”. A “code cell” in a Notebook is a
+cell that contains software; a “text cell” is one that contains ordinary
+prose written for human beings.
+
+
+
+
The Notebook has Command and Edit modes.
+
+
+
+
If you press Esc and Return alternately, the
+outer border of your code cell will change from gray to blue.
+
These are the Command (gray) and
+Edit (blue) modes of your notebook.
+
Command mode allows you to edit notebook-level features, and Edit
+mode changes the content of cells.
+
When in Command mode (esc/gray),
+
+
The b key will make a new cell below the currently
+selected cell.
+
The a key will make one above.
+
The x key will delete the current cell.
+
The z key will undo your last cell operation (which could
+be a deletion, creation, etc).
+
+
+
All actions can be done using the menus, but there are lots of
+keyboard shortcuts to speed things up.
+
+
+
+
+
+
+
Command Vs. Edit
+
+
In the Jupyter notebook page are you currently in Command or Edit
+mode?
+Switch between the modes. Use the shortcuts to generate a new cell. Use
+the shortcuts to delete a cell. Use the shortcuts to undo the last cell
+operation you performed.
+
+
+
+
+
+
+
+
+
Command mode has a grey border and Edit mode has a blue border. Use
+Esc and Return to switch between modes. You need
+to be in Command mode (Press Esc if your cell is blue). Type
+b or a. You need to be in Command mode (Press
+Esc if your cell is blue). Type x. You need to be
+in Command mode (Press Esc if your cell is blue). Type
+z.
+
+
+
+
+
+
Use the keyboard and mouse to select and edit cells.
+
+
+
Pressing the Return key turns the border blue and engages
+Edit mode, which allows you to type within the cell.
+
Because we want to be able to write many lines of code in a single
+cell, pressing the Return key when in Edit mode (blue) moves
+the cursor to the next line in the cell just like in a text editor.
+
We need some other way to tell the Notebook we want to run what’s in
+the cell.
+
Pressing Shift+Return together will execute
+the contents of the cell.
+
Notice that the Return and Shift keys on the
+right of the keyboard are right next to each other.
+
+
+
+
The Notebook will turn Markdown into pretty-printed
+documentation.
+
Create a nested list in a Markdown cell in a notebook that looks like
+this:
+
+
Get funding.
+
Do work.
+
+
+
Design experiment.
+
Collect data.
+
Analyze.
+
+
+
Write up.
+
Publish.
+
+
+
+
+
+
+
+
+
+
This challenge integrates both the numbered list and bullet list.
+Note that the bullet list is indented 2 spaces so that it is inline with
+the items of the numbered list.
What is displayed when a Python cell in a notebook that contains
+several calculations is executed? For example, what happens when this
+cell is executed?
+
+
PYTHON
+
+
7*3
+2+1
+
+
+
+
+
+
+
+
+
+
Python returns the output of the last calculation.
+
+
PYTHON
+
+
3
+
+
+
+
+
+
+
+
+
+
+
Change an Existing Cell from Code to Markdown
+
+
What happens if you write some Python in a code cell and then you
+switch it to a Markdown cell? For example, put the following in a code
+cell:
+
+
PYTHON
+
+
x =6*7+12
+print(x)
+
+
And then run it with Shift+Return to be sure
+that it works as a code cell. Now go back to the cell and use
+Esc then m to switch the cell to Markdown and
+“run” it with Shift+Return. What happened and how
+might this be useful?
+
+
+
+
+
+
+
+
+
The Python code gets treated like Markdown text. The lines appear as
+if they are part of one contiguous paragraph. This could be useful to
+temporarily turn on and off cells in notebooks that get used for
+multiple purposes.
+
+
PYTHON
+
+
x =6*7+12print(x)
+
+
+
+
+
+
+
+
+
+
+
Equations
+
+
Standard Markdown (such as we’re using for these notes) won’t render
+equations, but the Notebook will. Create a new Markdown cell and enter
+the following:
+
$\sum_{i=1}^{N} 2^{-i} \approx 1$
+
(It’s probably easier to copy and paste.) What does it display? What
+do you think the underscore, _, circumflex, ^,
+and dollar sign, $, do?
+
+
+
+
+
+
+
+
+
The notebook shows the equation as it would be rendered from LaTeX
+equation syntax. The dollar sign, $, is used to tell
+Markdown that the text in between is a LaTeX equation. If you’re not
+familiar with LaTeX, underscore, _, is used for subscripts
+and circumflex, ^, is used for superscripts. A pair of
+curly braces, { and }, is used to group text
+together so that the statement i=1 becomes the subscript
+and N becomes the superscript. Similarly, -i
+is in curly braces to make the whole statement the superscript for
+2. \sum and \approx are LaTeX
+commands for “sum over” and “approximate” symbols.
+
+
+
+
+
+
Closing JupyterLab
+
+
+
+
From the Menu Bar select the “File” menu and then choose “Shut Down”
+at the bottom of the dropdown menu. You will be prompted to confirm that
+you wish to shutdown the JupyterLab server (don’t forget to save your
+work!). Click “Shut Down” to shutdown the JupyterLab server.
+
To restart the JupyterLab server you will need to re-run the
+following command from a shell.
+
+
$ jupyter lab
+
+
+
+
+
+
Closing JupyterLab
+
+
Practice closing and restarting the JupyterLab server.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Python scripts are plain text files.
+
Use the Jupyter Notebook for editing and running Python.
+
The Notebook has Command and Edit modes.
+
Use the keyboard and mouse to select and edit cells.
+
The Notebook will turn Markdown into pretty-printed
+documentation.
Write programs that assign scalar values to variables and perform
+calculations with those values.
+
Correctly trace value changes in programs that use scalar
+assignment.
+
+
+
+
+
+
+
Use variables to store values.
+
+
+
+
Variables are names for values.
+
+
Variable names
+
+
can only contain letters, digits, and underscore
+_ (typically used to separate words in long variable
+names)
+
cannot start with a digit
+
are case sensitive (age, Age and AGE are three
+different variables)
+
+
+
The name should also be meaningful so you or another programmer
+know what it is
+
Variable names that start with underscores like
+__alistairs_real_age have a special meaning so we won’t do
+that until we understand the convention.
+
In Python the = symbol assigns the value on the
+right to the name on the left.
+
The variable is created when a value is assigned to it.
+
+
Here, Python assigns an age to a variable age and a
+name in quotes to a variable first_name.
+
+
PYTHON
+
+
age =42
+first_name ='Ahmed'
+
+
+
Use print to display values.
+
+
+
+
Python has a built-in function called print that prints
+things as text.
+
Call the function (i.e., tell Python to run it) by using its
+name.
+
Provide values to the function (i.e., the things to print) in
+parentheses.
+
To add a string to the printout, wrap the string in single or double
+quotes.
+
The values passed to the function are called
+arguments
+
+
+
+
PYTHON
+
+
print(first_name, 'is', age, 'years old')
+
+
+
OUTPUT
+
+
Ahmed is 42 years old
+
+
+
+print automatically puts a single space between items
+to separate them.
+
And wraps around to a new line at the end.
+
Variables must be created before they are used.
+
+
+
+
If a variable doesn’t exist yet, or if the name has been
+mis-spelled, Python reports an error. (Unlike some languages, which
+“guess” a default value.)
+
+
+
PYTHON
+
+
print(last_name)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-1-c1fbb4e96102> in <module>()
+----> 1 print(last_name)
+
+NameError: name 'last_name' is not defined
+
+
+
The last line of an error message is usually the most
+informative.
Be aware that it is the order of execution of cells that is
+important in a Jupyter notebook, not the order in which they appear.
+Python will remember all the code that was run previously,
+including any variables you have defined, irrespective of the order in
+the notebook. Therefore if you define variables lower down the notebook
+and then (re)run cells further up, those defined further down will still
+be present. As an example, create two cells with the following content,
+in this order:
+
+
PYTHON
+
+
print(myval)
+
+
+
PYTHON
+
+
myval =1
+
+
If you execute this in order, the first cell will give an error.
+However, if you run the first cell after the second cell it
+will print out 1. To prevent confusion, it can be helpful
+to use the Kernel -> Restart & Run All
+option which clears the interpreter and runs everything from a clean
+slate going top to bottom.
+
+
+
+
Variables can be used in calculations.
+
+
+
+
We can use variables in calculations just as if they were values.
+
+
Remember, we assigned the value 42 to age
+a few lines ago.
+
+
+
+
+
PYTHON
+
+
age = age +3
+print('Age in three years:', age)
+
+
+
OUTPUT
+
+
Age in three years: 45
+
+
Use an index to get a single character from a string.
+
+
+
+
The characters (individual letters, numbers, and so on) in a string
+are ordered. For example, the string 'AB' is not the same
+as 'BA'. Because of this ordering, we can treat the string
+as a list of characters.
+
Each position in the string (first, second, etc.) is given a number.
+This number is called an index or sometimes a
+subscript.
+
Indices are numbered from 0.
+
Use the position’s index in square brackets to get the character at
+that position.
+
+
+
PYTHON
+
+
atom_name ='helium'
+print(atom_name[0])
+
+
+
OUTPUT
+
+
h
+
+
Use a slice to get a substring.
+
+
+
+
A part of a string is called a substring. A
+substring can be as short as a single character.
+
An item in a list is called an element. Whenever we treat a string
+as if it were a list, the string’s elements are its individual
+characters.
+
A slice is a part of a string (or, more generally, a part of any
+list-like thing).
+
We take a slice with the notation [start:stop], where
+start is the integer index of the first element we want and
+stop is the integer index of the element just
+after the last element we want.
+
The difference between stop and start is
+the slice’s length.
+
Taking a slice does not change the contents of the original string.
+Instead, taking a slice returns a copy of part of the original
+string.
+
+
+
PYTHON
+
+
atom_name ='sodium'
+print(atom_name[0:3])
+
+
+
OUTPUT
+
+
sod
+
+
Use the built-in function len to find the length of a
+string.
+
+
+
+
PYTHON
+
+
print(len('helium'))
+
+
+
OUTPUT
+
+
6
+
+
+
Nested functions are evaluated from the inside out, like in
+mathematics.
+
Python is case-sensitive.
+
+
+
+
Python thinks that upper- and lower-case letters are different, so
+Name and name are different variables.
+
There are conventions for using upper-case letters at the start of
+variable names so we will use lower-case letters for now.
+
Use meaningful variable names.
+
+
+
+
Python doesn’t care what you call variables as long as they obey the
+rules (alphanumeric characters and the underscore).
Use meaningful variable names to help other people understand what
+the program does.
+
The most important “other person” is your future self.
+
+
+
+
+
+
+
Swapping Values
+
+
Fill the table showing the values of the variables in this program
+after each statement is executed.
+
+
PYTHON
+
+
# Command # Value of x # Value of y # Value of swap #
+x =1.0# # # #
+y =3.0# # # #
+swap = x # # # #
+x = y # # # #
+y = swap # # # #
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
# Command # Value of x # Value of y # Value of swap #
+x=1.0# 1.0 # not defined # not defined #
+y=3.0# 1.0 # 3.0 # not defined #
+swap=x# 1.0 # 3.0 # 1.0 #
+x=y# 3.0 # 3.0 # 1.0 #
+y=swap# 3.0 # 1.0 # 1.0 #
+
+
These three lines exchange the values in x and
+y using the swap variable for temporary
+storage. This is a fairly common programming idiom.
+
+
+
+
+
+
+
+
+
+
Predicting Values
+
+
What is the final value of position in the program
+below? (Try to predict the value without running the program, then check
+your prediction.)
The initial variable is assigned the value
+'left'. In the second line, the position
+variable also receives the string value 'left'. In third
+line, the initial variable is given the value
+'right', but the position variable retains its
+string value of 'left'.
+
+
+
+
+
+
+
+
+
+
Challenge
+
+
If you assign a = 123, what happens if you try to get
+the second digit of a via a[1]?
+
+
+
+
+
+
+
+
+
Numbers are not strings or sequences and Python will raise an error
+if you try to perform an index operation on a number. In the next lesson on types and type
+conversion we will learn more about types and how to convert between
+different types. If you want the Nth digit of a number you can convert
+it into a string using the str built-in function and then
+perform an index operation on that string.
+
+
PYTHON
+
+
a =123
+print(a[1])
+
+
+
ERROR
+
+
TypeError: 'int' object is not subscriptable
+
+
+
PYTHON
+
+
a =str(123)
+print(a[1])
+
+
+
OUTPUT
+
+
2
+
+
+
+
+
+
+
+
+
+
+
Choosing a Name
+
+
Which is a better variable name, m, min, or
+minutes? Why? Hint: think about which code you would rather
+inherit from someone who is leaving the lab:
+
+
ts = m * 60 + s
+
tot_sec = min * 60 + sec
+
total_seconds = minutes * 60 + seconds
+
+
+
+
+
+
+
+
+
+
minutes is better because min might mean
+something like “minimum” (and actually is an existing built-in function
+in Python that we will cover later).
+species_name[11:] (without a value after the
+colon)
+
+species_name[:4] (without a value before the
+colon)
+
+species_name[:] (just a colon)
+
species_name[11:-3]
+
species_name[-5:-3]
+
What happens when you choose a stop value which is out
+of range? (i.e., try species_name[0:20] or
+species_name[:103])
+
+
+
+
+
+
+
+
+
+
+
+species_name[2:8] returns the substring
+'acia b'
+
+
+species_name[11:] returns the substring
+'folia', from position 11 until the end
+
+species_name[:4] returns the substring
+'Acac', from the start up to but not including position
+4
+
+species_name[:] returns the entire string
+'Acacia buxifolia'
+
+
+species_name[11:-3] returns the substring
+'fo', from the 11th position to the third last
+position
+
+species_name[-5:-3] also returns the substring
+'fo', from the fifth last position to the third last
+
If a part of the slice is out of range, the operation does not fail.
+species_name[0:20] gives the same result as
+species_name[0:], and species_name[:103] gives
+the same result as species_name[:]
+
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Use variables to store values.
+
Use print to display values.
+
Variables persist between cells.
+
Variables must be created before they are used.
+
Variables can be used in calculations.
+
Use an index to get a single character from a string.
+
Use a slice to get a substring.
+
Use the built-in function len to find the length of a
+string.
Explain key differences between integers and floating point
+numbers.
+
Explain key differences between numbers and character strings.
+
Use built-in functions to convert between integers, floating point
+numbers, and strings.
+
+
+
+
+
+
+
Every value has a type.
+
+
+
+
Every value in a program has a specific type.
+
Integer (int): represents positive or negative whole
+numbers like 3 or -512.
+
Floating point number (float): represents real numbers
+like 3.14159 or -2.5.
+
Character string (usually called “string”, str): text.
+
+
Written in either single quotes or double quotes (as long as they
+match).
+
The quote marks aren’t printed when the string is displayed.
+
+
+
Use the built-in function type to find the type of a
+value.
+
+
+
+
Use the built-in function type to find out what type a
+value has.
+
Works on variables as well.
+
+
But remember: the value has the type — the
+variable is just a label.
+
+
+
+
+
PYTHON
+
+
print(type(52))
+
+
+
OUTPUT
+
+
<class 'int'>
+
+
+
PYTHON
+
+
fitness ='average'
+print(type(fitness))
+
+
+
OUTPUT
+
+
<class 'str'>
+
+
Types control what operations (or methods) can be performed on a
+given value.
+
+
+
+
A value’s type determines what the program can do to it.
+
+
+
PYTHON
+
+
print(5-3)
+
+
+
OUTPUT
+
+
2
+
+
+
PYTHON
+
+
print('hello'-'h')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-2-67f5626a1e07> in <module>()
+----> 1 print('hello' - 'h')
+
+TypeError: unsupported operand type(s) for -: 'str' and 'str'
+
+
You can use the “+” and “*” operators on strings.
+
+
+
+
“Adding” character strings concatenates them.
+
+
+
PYTHON
+
+
full_name ='Ahmed'+' '+'Walsh'
+print(full_name)
+
+
+
OUTPUT
+
+
Ahmed Walsh
+
+
+
Multiplying a character string by an integer N creates a
+new string that consists of that character string repeated N
+times.
+
+
Since multiplication is repeated addition.
+
+
+
+
+
PYTHON
+
+
separator ='='*10
+print(separator)
+
+
+
OUTPUT
+
+
==========
+
+
Strings have a length (but numbers don’t).
+
+
+
+
The built-in function len counts the number of
+characters in a string.
+
+
+
PYTHON
+
+
print(len(full_name))
+
+
+
OUTPUT
+
+
11
+
+
+
But numbers don’t have a length (not even zero).
+
+
+
PYTHON
+
+
print(len(52))
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-3-f769e8e8097d> in <module>()
+----> 1 print(len(52))
+
+TypeError: object of type 'int' has no len()
+
+
Must convert numbers to strings or vice versa when operating on
+them.
+
+
+
+
Cannot add numbers and strings.
+
+
+
PYTHON
+
+
print(1+'2')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+TypeError Traceback (most recent call last)
+<ipython-input-4-fe4f54a023c6> in <module>()
+----> 1 print(1 + '2')
+
+TypeError: unsupported operand type(s) for +: 'int' and 'str'
+
+
+
Not allowed because it’s ambiguous: should 1 + '2' be
+3 or '12'?
+
Some types can be converted to other types by using the type name as
+a function.
+
+
+
PYTHON
+
+
print(1+int('2'))
+print(str(1) +'2')
+
+
+
OUTPUT
+
+
3
+12
+
+
Can mix integers and floats freely in operations.
+
+
+
+
Integers and floating-point numbers can be mixed in arithmetic.
+
+
Python 3 automatically converts integers to floats as needed.
The computer reads the value of variable_one when doing
+the multiplication, creates a new value, and assigns it to
+variable_two.
+
Afterwards, the value of variable_two is set to the new
+value and not dependent on variable_one so its
+value does not automatically change when variable_one
+changes.
+
+
+
+
+
+
+
Fractions
+
+
What type of value is 3.4? How can you find out?
+
+
+
+
+
+
+
+
+
It is a floating-point number (often abbreviated “float”). It is
+possible to find out by using the built-in function
+type().
+
+
PYTHON
+
+
print(type(3.4))
+
+
+
OUTPUT
+
+
<class 'float'>
+
+
+
+
+
+
+
+
+
+
+
Automatic Type Conversion
+
+
What type of value is 3.25 + 4?
+
+
+
+
+
+
+
+
+
It is a float: integers are automatically converted to floats as
+necessary.
+
+
PYTHON
+
+
result =3.25+4
+print(result, 'is', type(result))
+
+
+
OUTPUT
+
+
7.25 is <class 'float'>
+
+
+
+
+
+
+
+
+
+
+
Choose a Type
+
+
What type of value (integer, floating point number, or character
+string) would you use to represent each of the following? Try to come up
+with more than one good answer for each problem. For example, in # 1,
+when would counting days with a floating point variable make more sense
+than using an integer?
+
+
Number of days since the start of the year.
+
Time elapsed from the start of the year until now in days.
+
Serial number of a piece of lab equipment.
+
A lab specimen’s age
+
Current population of a city.
+
Average population of a city over time.
+
+
+
+
+
+
+
+
+
+
The answers to the questions are:
+
+
Integer, since the number of days would lie between 1 and 365.
+
Floating point, since fractional days are required
+
Character string if serial number contains letters and numbers,
+otherwise integer if the serial number consists only of numerals
+
This will vary! How do you define a specimen’s age? whole days since
+collection (integer)? date and time (string)?
+
Choose floating point to represent population as large aggregates
+(eg millions), or integer to represent population in units of
+individuals.
+
Floating point number, since an average is likely to have a
+fractional part.
+
+
+
+
+
+
+
+
+
+
+
Division Types
+
+
In Python 3, the // operator performs integer
+(whole-number) floor division, the / operator performs
+floating-point division, and the % (or modulo)
+operator calculates and returns the remainder from integer division:
If num_subjects is the number of subjects taking part in
+a study, and num_per_survey is the number that can take
+part in a single survey, write an expression that calculates the number
+of surveys needed to reach everyone once.
+
+
+
+
+
+
+
+
+
We want the minimum number of surveys that reaches everyone once,
+which is the rounded up value of
+num_subjects/ num_per_survey. This is equivalent to
+performing a floor division with // and adding 1. Before
+the division we need to subtract 1 from the number of subjects to deal
+with the case where num_subjects is evenly divisible by
+num_per_survey.
Where reasonable, float() will convert a string to a
+floating point number, and int() will convert a floating
+point number to an integer:
+
+
PYTHON
+
+
print("string to float:", float("3.4"))
+print("float to int:", int(3.4))
+
+
+
OUTPUT
+
+
string to float: 3.4
+float to int: 3
+
+
If the conversion doesn’t make sense, however, an error message will
+occur.
+
+
PYTHON
+
+
print("string to float:", float("Hello world!"))
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-5-df3b790bf0a2> in <module>
+----> 1 print("string to float:", float("Hello world!"))
+
+ValueError: could not convert string to float: 'Hello world!'
+
+
Given this information, what do you expect the following program to
+do?
+
What does it actually do?
+
Why do you think it does that?
+
+
PYTHON
+
+
print("fractional string to int:", int("3.4"))
+
+
+
+
+
+
+
+
+
+
What do you expect this program to do? It would not be so
+unreasonable to expect the Python 3 int command to convert
+the string “3.4” to 3.4 and an additional type conversion to 3. After
+all, Python 3 performs a lot of other magic - isn’t that part of its
+charm?
+
+
PYTHON
+
+
int("3.4")
+
+
+
OUTPUT
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-2-ec6729dfccdc> in <module>
+----> 1 int("3.4")
+ValueError: invalid literal for int() with base 10: '3.4'
+
+
However, Python 3 throws an error. Why? To be consistent, possibly.
+If you ask Python to perform two consecutive typecasts, you must convert
+it explicitly in code.
+
+
PYTHON
+
+
int(float("3.4"))
+
+
+
OUTPUT
+
+
3
+
+
+
+
+
+
+
+
+
+
+
Arithmetic with Different Types
+
+
Which of the following will return the floating point number
+2.0? Note: there may be more than one right answer.
+
+
PYTHON
+
+
first =1.0
+second ="1"
+third ="1.1"
+
+
+
first + float(second)
+
float(second) + float(third)
+
first + int(third)
+
first + int(float(third))
+
int(first) + int(float(third))
+
2.0 * second
+
+
+
+
+
+
+
+
+
+
Answer: 1 and 4
+
+
+
+
+
+
+
+
+
+
Complex Numbers
+
+
Python provides complex numbers, which are written as
+1.0+2.0j. If val is a complex number, its real
+and imaginary parts can be accessed using dot notation as
+val.real and val.imag.
Why do you think Python uses j instead of
+i for the imaginary part?
+
What do you expect 1 + 2j + 3 to produce?
+
What do you expect 4j to be? What about
+4 j or 4 + j?
+
+
+
+
+
+
+
+
+
+
+
Standard mathematics treatments typically use i to
+denote an imaginary number. However, from media reports it was an early
+convention established from electrical engineering that now presents a
+technically expensive area to change. Stack
+Overflow provides additional explanation and discussion.
+
+
(4+2j)
+
+4j and Syntax Error: invalid syntax. In
+the latter cases, j is considered a variable and the
+statement depends on if j is defined and if so, its
+assigned value.
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Every value has a type.
+
Use the built-in function type to find the type of a
+value.
+
Types control what operations can be done on values.
+
Strings can be added and multiplied.
+
Strings have a length (but numbers don’t).
+
Must convert numbers to strings or vice versa when operating on
+them.
+
Can mix integers and floats freely in operations.
+
Variables only change value when something is assigned to them.
Use help to display documentation for built-in functions.
+
Correctly describe situations in which SyntaxError and NameError
+occur.
+
+
+
+
+
+
+
Use comments to add documentation to programs.
+
+
+
+
PYTHON
+
+
# This sentence isn't executed by Python.
+adjustment =0.5# Neither is this - anything after '#' is ignored.
+
+
A function may take zero or more arguments.
+
+
+
+
We have seen some functions already — now let’s take a closer
+look.
+
An argument is a value passed into a function.
+
+len takes exactly one.
+
+int, str, and float create a
+new value from an existing one.
+
+print takes zero or more.
+
+print with no arguments prints a blank line.
+
+
Must always use parentheses, even if they’re empty, so that Python
+knows a function is being called.
+
+
+
+
+
PYTHON
+
+
print('before')
+print()
+print('after')
+
+
+
OUTPUT
+
+
before
+
+after
+
+
Every function returns something.
+
+
+
+
Every function call produces some result.
+
If the function doesn’t have a useful result to return, it usually
+returns the special value None. None is a
+Python object that stands in anytime there is no value.
+
+
+
PYTHON
+
+
result =print('example')
+print('result of print is', result)
+
+
+
OUTPUT
+
+
example
+result of print is None
+
+
Commonly-used built-in functions include max,
+min, and round.
+
+
+
+
Use max to find the largest value of one or more
+values.
+
Use min to find the smallest.
+
Both work on character strings as well as numbers.
+
+
“Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.
+
+
+
+
+
PYTHON
+
+
print(max(1, 2, 3))
+print(min('a', 'A', '0'))
+
+
+
OUTPUT
+
+
3
+0
+
+
Functions may only work for certain (combinations of)
+arguments.
+
+
+
+
+max and min must be given at least one
+argument.
+
+
“Largest of the empty set” is a meaningless question.
+
+
+
And they must be given things that can meaningfully be
+compared.
+
+
+
PYTHON
+
+
print(max(1, 'a'))
+
+
+
ERROR
+
+
TypeError Traceback (most recent call last)
+<ipython-input-52-3f049acf3762> in <module>
+----> 1 print(max(1, 'a'))
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
Functions may have default values for some arguments.
+
+
+
+
+round will round off a floating-point number.
+
By default, rounds to zero decimal places.
+
+
+
PYTHON
+
+
round(3.712)
+
+
+
OUTPUT
+
+
4
+
+
+
We can specify the number of decimal places we want.
+
+
+
PYTHON
+
+
round(3.712, 1)
+
+
+
OUTPUT
+
+
3.7
+
+
Functions attached to objects are called methods
+
+
+
+
Functions take another form that will be common in the pandas
+episodes.
+
Methods have parentheses like functions, but come after the
+variable.
+
Some methods are used for internal Python operations, and are marked
+with double underlines.
+
+
+
PYTHON
+
+
my_string ='Hello world!'# creation of a string object
+
+print(len(my_string)) # the len function takes a string as an argument and returns the length of the string
+
+print(my_string.swapcase()) # calling the swapcase method on the my_string object
+
+print(my_string.__len__()) # calling the internal __len__ method on the my_string object, used by len(my_string)
+
+
+
OUTPUT
+
+
12
+hELLO WORLD!
+12
+
+
+
You might even see them chained together. They operate left to
+right.
+
+
+
PYTHON
+
+
print(my_string.isupper()) # Not all the letters are uppercase
+print(my_string.upper()) # This capitalizes all the letters
+
+print(my_string.upper().isupper()) # Now all the letters are uppercase
+
+
+
OUTPUT
+
+
False
+HELLO WORLD
+True
+
+
Use the built-in function help to get help for a
+function.
+
+
+
+
Every built-in function has online documentation.
+
+
+
PYTHON
+
+
help(round)
+
+
+
OUTPUT
+
+
Help on built-in function round in module builtins:
+
+round(number, ndigits=None)
+ Round a number to a given precision in decimal digits.
+
+ The return value is an integer if ndigits is omitted or None. Otherwise
+ the return value has the same type as the number. ndigits may be negative.
+
+
The Jupyter Notebook has two ways to get help.
+
+
+
+
Option 1: Place the cursor near where the function is invoked in a
+cell (i.e., the function name or its parameters),
+
+
Hold down Shift, and press Tab.
+
Do this several times to expand the information returned.
+
+
+
Option 2: Type the function name in a cell with a question mark
+after it. Then run the cell.
+
Python reports a syntax error when it can’t understand the source of
+a program.
+
+
+
+
Won’t even try to run the program if it can’t be parsed.
+
+
+
PYTHON
+
+
# Forgot to close the quote marks around the string.
+name ='Feng
+
+
+
ERROR
+
+
File "<ipython-input-56-f42768451d55>", line 2
+ name = 'Feng
+ ^
+SyntaxError: EOL while scanning string literal
+
+
+
PYTHON
+
+
# An extra '=' in the assignment.
+age ==52
+
+
+
ERROR
+
+
File "<ipython-input-57-ccc3df3cf902>", line 2
+ age = = 52
+ ^
+SyntaxError: invalid syntax
+
+
+
Look more closely at the error message:
+
+
+
PYTHON
+
+
print("hello world"
+
+
+
ERROR
+
+
File "<ipython-input-6-d1cc229bf815>", line 1
+ print ("hello world"
+ ^
+SyntaxError: unexpected EOF while parsing
+
+
+
The message indicates a problem on first line of the input (“line
+1”).
+
+
In this case the “ipython-input” section of the file name tells us
+that we are working with input into IPython, the Python interpreter used
+by the Jupyter Notebook.
+
+
+
The -6- part of the filename indicates that the error
+occurred in cell 6 of our Notebook.
+
Next is the problematic line of code, indicating the problem with a
+^ pointer.
+
Python reports a runtime error when something goes wrong while a
+program is executing.
+
+
+
+
PYTHON
+
+
age =53
+remaining =100- aege # mis-spelled 'age'
+
+
+
ERROR
+
+
NameError Traceback (most recent call last)
+<ipython-input-59-1214fb6c55fc> in <module>
+ 1 age = 53
+----> 2 remaining = 100 - aege # mis-spelled 'age'
+
+NameError: name 'aege' is not defined
+
+
+
Fix syntax errors by reading the source and runtime errors by
+tracing execution.
+
+
+
+
+
+
+
What Happens When
+
+
+
Explain in simple terms the order of operations in the following
+program: when does the addition happen, when does the subtraction
+happen, when is each function called, etc.
max(len(rich), poor) throws a TypeError. This turns into
+max(4, 'tin') and as we discussed earlier a string and
+integer cannot meaningfully be compared.
+
+
ERROR
+
+
TypeError Traceback (most recent call last)
+<ipython-input-65-bc82ad05177a> in <module>
+----> 1 max(len(rich), poor)
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
+
+
+
+
+
+
+
+
+
Why Not?
+
+
Why is it that max and min do not return
+None when they are called with no arguments?
+
+
+
+
+
+
+
+
+
max and min return TypeErrors in this case
+because the correct number of parameters was not supplied. If it just
+returned None, the error would be much harder to trace as
+it would likely be stored into a variable and used later in the program,
+only to likely throw a runtime error.
+
+
+
+
+
+
+
+
+
+
Last Character of a String
+
+
If Python starts counting from zero, and len returns the
+number of characters in a string, what index expression will get the
+last character in the string name? (Note: we will see a
+simpler way to do this in a later episode.)
+
+
+
+
+
+
+
+
+
name[len(name) - 1]
+
+
+
+
+
+
+
+
+
+
Explore the Python docs!
+
+
The official Python
+documentation is arguably the most complete source of information
+about the language. It is available in different languages and contains
+a lot of useful resources. The Built-in
+Functions page contains a catalogue of all of these functions,
+including the ones that we’ve covered in this lesson. Some of these are
+more advanced and unnecessary at the moment, but others are very simple
+and useful.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Use comments to add documentation to programs.
+
A function may take zero or more arguments.
+
Commonly-used built-in functions include max,
+min, and round.
+
Functions may only work for certain (combinations of)
+arguments.
+
Functions may have default values for some arguments.
+
Use the built-in function help to get help for a
+function.
+
The Jupyter Notebook has two ways to get help.
+
Every function returns something.
+
Python reports a syntax error when it can’t understand the source of
+a program.
+
Python reports a runtime error when something goes wrong while a
+program is executing.
+
Fix syntax errors by reading the source code, and runtime errors by
+tracing the program’s execution.
How can I use software that other people have written?
+
How can I find out what that software does?
+
+
+
+
+
+
+
+
Objectives
+
+
Explain what software libraries are and why programmers create and
+use them.
+
Write programs that import and use modules from Python’s standard
+library.
+
Find and read documentation for the standard library interactively
+(in the interpreter) and online.
+
+
+
+
+
+
+
Most of the power of a programming language is in its
+libraries.
+
+
+
+
A library is a collection of files (called
+modules) that contains functions for use by other programs.
+
+
May also contain data values (e.g., numerical constants) and other
+things.
+
Library’s contents are supposed to be related, but there’s no way to
+enforce that.
+
+
+
The Python standard
+library is an extensive suite of modules that comes with Python
+itself.
+
Many additional libraries are available from PyPI (the Python Package
+Index).
+
We will see later how to write new libraries.
+
+
+
+
+
+
+
Libraries and modules
+
+
A library is a collection of modules, but the terms are often used
+interchangeably, especially since many libraries only consist of a
+single module, so don’t worry if you mix them.
+
+
+
+
A program must import a library module before using it.
+
+
+
+
Use import to load a library module into a program’s
+memory.
+
Then refer to things from the module as
+module_name.thing_name.
+
+
Python uses . to mean “part of”.
+
+
+
Using math, one of the modules in the standard
+library:
+
+
+
PYTHON
+
+
import math
+
+print('pi is', math.pi)
+print('cos(pi) is', math.cos(math.pi))
+
+
+
OUTPUT
+
+
pi is 3.141592653589793
+cos(pi) is -1.0
+
+
+
Have to refer to each item with the module’s name.
+
+
+math.cos(pi) won’t work: the reference to
+pi doesn’t somehow “inherit” the function’s reference to
+math.
+
+
+
Use help to learn about the contents of a library
+module.
+
+
+
+
Works just like help for a function.
+
+
+
PYTHON
+
+
help(math)
+
+
+
OUTPUT
+
+
Help on module math:
+
+NAME
+ math
+
+MODULE REFERENCE
+ http://docs.python.org/3/library/math
+
+ The following documentation is automatically generated from the Python
+ source files. It may be incomplete, incorrect or include features that
+ are considered implementation detail and may vary between Python
+ implementations. When in doubt, consult the module reference at the
+ location listed above.
+
+DESCRIPTION
+ This module is always available. It provides access to the
+ mathematical functions defined by the C standard.
+
+FUNCTIONS
+ acos(x, /)
+ Return the arc cosine (measured in radians) of x.
+⋮ ⋮ ⋮
+
+
Import specific items from a library module to shorten
+programs.
+
+
+
+
Use from ... import ... to load only specific items
+from a library module.
+
Then refer to them directly without library name as prefix.
+
+
+
PYTHON
+
+
from math import cos, pi
+
+print('cos(pi) is', cos(pi))
+
+
+
OUTPUT
+
+
cos(pi) is -1.0
+
+
Create an alias for a library module when importing it to shorten
+programs.
+
+
+
+
Use import ... as ... to give a library a short
+alias while importing it.
+
Then refer to items in the library using that shortened name.
+
+
+
PYTHON
+
+
import math as m
+
+print('cos(pi) is', m.cos(m.pi))
+
+
+
OUTPUT
+
+
cos(pi) is -1.0
+
+
+
Commonly used for libraries that are frequently used or have long
+names.
+
+
E.g., the matplotlib plotting library is often aliased
+as mpl.
+
+
+
But can make programs harder to understand, since readers must learn
+your program’s aliases.
+
+
+
+
+
+
+
Exploring the Math Module
+
+
+
What function from the math module can you use to
+calculate a square root without using sqrt?
+
Since the library contains this function, why does sqrt
+exist?
+
+
+
+
+
+
+
+
+
+
+
Using help(math) we see that we’ve got
+pow(x,y) in addition to sqrt(x), so we could
+use pow(x, 0.5) to find a square root.
+
The sqrt(x) function is arguably more readable than
+pow(x, 0.5) when implementing equations. Readability is a
+cornerstone of good programming, so it makes sense to provide a special
+function for this specific common case.
+
+
Also, the design of Python’s math library has its origin
+in the C standard, which includes both sqrt(x) and
+pow(x,y), so a little bit of the history of programming is
+showing in Python’s function names.
+
+
+
+
+
+
+
+
+
+
Locating the Right Module
+
+
You want to select a random character from a string:
The string has 11 characters, each having a positional index from 0
+to 10. You could use the random.randrange
+or random.randint
+functions to get a random integer between 0 and 10, and then select the
+bases character at that index:
+
+
PYTHON
+
+
from random import randrange
+
+random_index = randrange(len(bases))
+print(bases[random_index])
+
+
or more compactly:
+
+
PYTHON
+
+
from random import randrange
+
+print(bases[randrange(len(bases))])
+
+
Perhaps you found the random.sample
+function? It allows for slightly less typing but might be a bit harder
+to understand just by reading:
+
+
PYTHON
+
+
from random import sample
+
+print(sample(bases, 1)[0])
+
+
Note that this function returns a list of values. We will learn about
+lists in episode 11.
+
The simplest and shortest solution is the random.choice
+function that does exactly what we want:
+
+
PYTHON
+
+
from random import choice
+
+print(choice(bases))
+
+
+
+
+
+
+
+
+
+
+
Jigsaw Puzzle (Parson’s Problem) Programming Example
+
+
Rearrange the following statements so that a random DNA base is
+printed and its index in the string. Not all statements may be needed.
+Feel free to use/add intermediate variables.
+
+
PYTHON
+
+
bases="ACTTGCTTGAC"
+import math
+import random
+___ = random.randrange(n_bases)
+___ =len(bases)
+print("random base ", bases[___], "base index", ___)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import math
+import random
+bases ="ACTTGCTTGAC"
+n_bases =len(bases)
+idx = random.randrange(n_bases)
+print("random base", bases[idx], "base index", idx)
+
+
+
+
+
+
+
+
+
+
+
When Is Help Available?
+
+
When a colleague of yours types help(math), Python
+reports an error:
+
+
ERROR
+
+
NameError: name 'math' is not defined
+
+
What has your colleague forgotten to do?
+
+
+
+
+
+
+
+
+
Importing the math module (import math)
+
+
+
+
+
+
+
+
+
+
Importing With Aliases
+
+
+
Fill in the blanks so that the program below prints
+90.0.
+
Rewrite the program so that it uses import
+withoutas.
+
Which form do you find easier to read?
+
+
+
PYTHON
+
+
import math as m
+angle = ____.degrees(____.pi /2)
+print(____)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import math as m
+angle = m.degrees(m.pi /2)
+print(angle)
+
+
can be written as
+
+
PYTHON
+
+
import math
+angle = math.degrees(math.pi /2)
+print(angle)
+
+
Since you just wrote the code and are familiar with it, you might
+actually find the first version easier to read. But when trying to read
+a huge piece of code written by someone else, or when getting back to
+your own huge piece of code after several months, non-abbreviated names
+are often easier, except where there are clear abbreviation
+conventions.
+
+
+
+
+
+
+
+
+
+
There Are Many Ways To Import Libraries!
+
+
Match the following print statements with the appropriate library
+calls.
+
Print commands:
+
+
print("sin(pi/2) =", sin(pi/2))
+
print("sin(pi/2) =", m.sin(m.pi/2))
+
print("sin(pi/2) =", math.sin(math.pi/2))
+
+
Library calls:
+
+
from math import sin, pi
+
import math
+
import math as m
+
from math import *
+
+
+
+
+
+
+
+
+
+
+
Library calls 1 and 4. In order to directly refer to
+sin and pi without the library name as prefix,
+you need to use the from ... import ... statement. Whereas
+library call 1 specifically imports the two functions sin
+and pi, library call 4 imports all functions in the
+math module.
+
Library call 3. Here sin and pi are
+referred to with a shortened library name m instead of
+math. Library call 3 does exactly that using the
+import ... as ... syntax - it creates an alias for
+math in the form of the shortened name m.
+
Library call 2. Here sin and pi are
+referred to with the regular library name math, so the
+regular import ... call suffices.
+
+
Note: although library call 4 works, importing all
+names from a module using a wildcard import is not recommended as it makes it
+unclear which names from the module are used in the code. In general it
+is best to make your imports as specific as possible and to only import
+what your code uses. In library call 1, the import
+statement explicitly tells us that the sin function is
+imported from the math module, but library call 4 does not
+convey this information.
+
+
+
+
+
+
+
+
+
+
Importing Specific Items
+
+
+
Fill in the blanks so that the program below prints
+90.0.
+
Do you find this version easier to read than preceding ones?
+
Why wouldn’t programmers always use this form of
+import?
+
+
+
PYTHON
+
+
____ math import ____, ____
+angle = degrees(pi /2)
+print(angle)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
from math import degrees, pi
+angle = degrees(pi /2)
+print(angle)
+
+
Most likely you find this version easier to read since it’s less
+dense. The main reason not to use this form of import is to avoid name
+clashes. For instance, you wouldn’t import degrees this way
+if you also wanted to use the name degrees for a variable
+or function of your own. Or if you were to also import a function named
+degrees from another library.
+
+
+
+
+
+
+
+
+
+
Reading Error Messages
+
+
+
Read the code below and try to identify what the errors are without
+running it.
+
Run the code, and read the error message. What type of error is
+it?
+
+
+
PYTHON
+
+
from math import log
+log(0)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
---------------------------------------------------------------------------
+ValueError Traceback (most recent call last)
+<ipython-input-1-d72e1d780bab> in <module>
+ 1 from math import log
+----> 2 log(0)
+
+ValueError: math domain error
+
+
+
The logarithm of x is only defined for
+x > 0, so 0 is outside the domain of the function.
+
You get an error of type ValueError, indicating that
+the function received an inappropriate argument value. The additional
+message “math domain error” makes it clearer what the problem is.
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Most of the power of a programming language is in its
+libraries.
+
A program must import a library module in order to use it.
+
Use help to learn about the contents of a library
+module.
+
Import specific items from a library to shorten programs.
+
Create an alias for a library when importing it to shorten
+programs.
The columns in a dataframe are the observed variables, and the rows
+are the observations.
+
Pandas uses backslash \ to show wrapped lines when
+output is too wide to fit the screen.
+
Using descriptive dataframe names helps us distinguish between
+multiple dataframes so we won’t accidentally overwrite a dataframe or
+read from the wrong one.
+
+
+
+
+
+
+
File Not Found
+
+
Our lessons store their data files in a data
+sub-directory, which is why the path to the file is
+data/gapminder_gdp_oceania.csv. If you forget to include
+data/, or if you include it but your copy of the file is
+somewhere else, you will get a runtime
+error that ends with a line like this:
+
+
ERROR
+
+
FileNotFoundError: [Errno 2] No such file or directory: 'data/gapminder_gdp_oceania.csv'
+
+
+
+
+
Use index_col to specify that a column’s values should
+be used as row headings.
+
+
+
+
Row headings are numbers (0 and 1 in this case).
+
Really want to index by country.
+
Pass the name of the column to read_csv as its
+index_col parameter to do this.
+
Naming the dataframe data_oceania_country tells us
+which region the data includes (oceania) and how it is
+indexed (country).
Use DataFrame.describe() to get summary statistics
+about data.
+
+
+
DataFrame.describe() gets the summary statistics of only
+the columns that have numerical data. All other columns are ignored,
+unless you use the argument include='all'.
Not particularly useful with just two records, but very helpful when
+there are thousands.
+
+
+
+
+
+
+
Reading Other Data
+
+
Read the data in gapminder_gdp_americas.csv (which
+should be in the same directory as
+gapminder_gdp_oceania.csv) into a variable called
+data_americas and display its summary statistics.
+
+
+
+
+
+
+
+
+
To read in a CSV, we use pd.read_csv and pass the
+filename 'data/gapminder_gdp_americas.csv' to it. We also
+once again pass the column name 'country' to the parameter
+index_col in order to index by country. The summary
+statistics can be displayed with the DataFrame.describe()
+method.
After reading the data for the Americas, use
+help(data_americas.head) and
+help(data_americas.tail) to find out what
+DataFrame.head and DataFrame.tail do.
+
+
What method call will display the first three rows of this
+data?
+
What method call will display the last three columns of this data?
+(Hint: you may need to change your view of the data.)
+
+
+
+
+
+
+
+
+
+
+
We can check out the first five rows of data_americas
+by executing data_americas.head() which lets us view the
+beginning of the DataFrame. We can specify the number of rows we wish to
+see by specifying the parameter n in our call to
+data_americas.head(). To view the first three rows,
+execute:
To check out the last three rows of data_americas, we
+would use the command, americas.tail(n=3), analogous to
+head() used above. However, here we want to look at the
+last three columns so we need to change our view and then use
+tail(). To do so, we create a new DataFrame in which rows
+and columns are switched:
+
+
+
PYTHON
+
+
americas_flipped = data_americas.T
+
+
We can then view the last three columns of americas by
+viewing the last three rows of americas_flipped:
This shows the data that we want, but we may prefer to display three
+columns instead of three rows, so we can flip it back:
+
+
PYTHON
+
+
americas_flipped.tail(n=3).T
+
+
Note: we could have done the above in a single line
+of code by ‘chaining’ the commands:
+
+
PYTHON
+
+
data_americas.T.tail(n=3).T
+
+
+
+
+
+
+
+
+
+
+
Reading Files in Other Directories
+
+
The data for your current project is stored in a file called
+microbes.csv, which is located in a folder called
+field_data. You are doing analysis in a notebook called
+analysis.ipynb in a sibling folder called
+thesis:
What value(s) should you pass to read_csv to read
+microbes.csv in analysis.ipynb?
+
+
+
+
+
+
+
+
+
We need to specify the path to the file of interest in the call to
+pd.read_csv. We first need to ‘jump’ out of the folder
+thesis using ‘../’ and then into the folder
+field_data using ‘field_data/’. Then we can specify the
+filename `microbes.csv. The result is as follows:
As well as the read_csv function for reading data from a
+file, Pandas provides a to_csv function to write dataframes
+to files. Applying what you’ve learned about reading from files, write
+one of your dataframes to a file called processed.csv. You
+can use help to get information on how to use
+to_csv.
+
+
+
+
+
+
+
+
+
In order to write the DataFrame data_americas to a file
+called processed.csv, execute the following command:
+
+
PYTHON
+
+
data_americas.to_csv('processed.csv')
+
+
For help on read_csv or to_csv, you could
+execute, for example:
+
+
PYTHON
+
+
help(data_americas.to_csv)
+help(pd.read_csv)
+
+
Note that help(to_csv) or help(pd.to_csv)
+throws an error! This is due to the fact that to_csv is not
+a global Pandas function, but a member function of DataFrames. This
+means you can only call it on an instance of a DataFrame e.g.,
+data_americas.to_csv or
+data_oceania.to_csv
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Use the Pandas library to get basic statistics out of tabular
+data.
+
Use index_col to specify that a column’s values should
+be used as row headings.
+
Use DataFrame.info to find out more about a
+dataframe.
+
The DataFrame.columns variable stores information about
+the dataframe’s columns.
+
Use DataFrame.T to transpose a dataframe.
+
Use DataFrame.describe to get summary statistics about
+data.
How can I do statistical analysis of tabular data?
+
+
+
+
+
+
+
+
Objectives
+
+
Select individual values from a Pandas dataframe.
+
Select entire rows or entire columns from a dataframe.
+
Select a subset of both rows and columns from a dataframe in a
+single operation.
+
Select a subset of a dataframe by a single Boolean criterion.
+
+
+
+
+
+
+
Note about Pandas DataFrames/Series
+
+
+
A DataFrame
+is a collection of Series;
+The DataFrame is the way Pandas represents a table, and Series is the
+data-structure Pandas use to represent a column.
+
Pandas is built on top of the Numpy library, which in practice means
+that most of the methods defined for Numpy Arrays apply to Pandas
+Series/DataFrames.
+
What makes Pandas so attractive is the powerful interface to access
+individual records of the table, proper handling of missing values, and
+relational-databases operations between DataFrames.
+
Selecting values
+
+
+
To access a value at the position [i,j] of a DataFrame,
+we have two options, depending on what is the meaning of i
+in use. Remember that a DataFrame provides an index as a way to
+identify the rows of the table; a row, then, has a position
+inside the table as well as a label, which uniquely identifies
+its entry in the DataFrame.
+
Use DataFrame.iloc[..., ...] to select values by their
+(entry) position
+
+
+
+
Can specify location by numerical index analogously to 2D version of
+character selection in strings.
+
+
+
PYTHON
+
+
import pandas as pd
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.iloc[0, 0])
+
+
+
OUTPUT
+
+
1601.056136
+
+
Use DataFrame.loc[..., ...] to select values by their
+(entry) label.
+
+
+
+
Can specify location by row and/or column name.
+
+
+
PYTHON
+
+
print(data.loc["Albania", "gdpPercap_1952"])
+
+
+
OUTPUT
+
+
1601.056136
+
+
Use : on its own to mean all columns or all rows.
+
In the above code, we discover that slicing using
+loc is inclusive at both ends, which differs from
+slicing using iloc, where slicing
+indicates everything up to but not including the final index.
+
Result of slicing can be used in further operations.
+
+
+
+
Usually don’t just print a slice.
+
All the statistical operators that work on entire dataframes work
+the same way on slices.
Returns a similarly-shaped dataframe of True and
+False.
+
+
+
PYTHON
+
+
# Use a subset of data to keep output readable.
+subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
+print('Subset of data:\n', subset)
+
+# Which values were greater than 10000 ?
+print('\nWhere are values large?\n', subset >10000)
A frame full of Booleans is sometimes called a mask because
+of how it can be used.
+
+
+
PYTHON
+
+
mask = subset >10000
+print(subset[mask])
+
+
+
OUTPUT
+
+
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
+country
+Italy NaN 10022.40131 12269.27378
+Montenegro NaN NaN NaN
+Netherlands 12790.84956 15363.25136 18794.74567
+Norway 13450.40151 16361.87647 18965.05551
+Poland NaN NaN NaN
+
+
+
Get the value where the mask is true, and NaN (Not a Number) where
+it is false.
+
Useful because NaNs are ignored by operations like max, min,
+average, etc.
Learners often struggle here, many may not work with financial data
+and concepts so they find the example concepts difficult to get their
+head around. The biggest problem though is the line generating the
+wealth_score, this step needs to be talked through throughly: * It uses
+implicit conversion between boolean and float values which has not been
+covered in the course so far. * The axis=1 argument needs to be
+explained clearly.
+
+
+
+
+
Pandas vectorizing methods and grouping operations are features that
+provide users much flexibility to analyse their data.
+
For instance, let’s say we want to have a clearer view on how the
+European countries split themselves according to their GDP.
+
+
We may have a glance by splitting the countries in two groups during
+the years surveyed, those who presented a GDP higher than the
+European average and those with a lower GDP.
+
We then estimate a wealthy score based on the historical
+(from 1962 to 2007) values, where we account how many times a country
+has participated in the groups of lower or higher
+GDP
Clearly, the second statement produces an additional column and an
+additional row compared to the first statement.
+What conclusion can we draw? We see that a numerical slice, 0:2,
+omits the final index (i.e. index 2) in the range provided,
+while a named slice, ‘gdpPercap_1952’:‘gdpPercap_1962’,
+includes the final element.
+
+
+
+
+
+
+
+
+
+
Reconstructing Data
+
+
Explain what each line in the following short program does: what is
+in first, second, etc.?
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+
+
This line loads the dataset containing the GDP data from all
+countries into a dataframe called first. The
+index_col='country' parameter selects which column to use
+as the row labels in the dataframe.
+
+
PYTHON
+
+
second = first[first['continent'] =='Americas']
+
+
This line makes a selection: only those rows of first
+for which the ‘continent’ column matches ‘Americas’ are extracted.
+Notice how the Boolean expression inside the brackets,
+first['continent'] == 'Americas', is used to select only
+those rows where the expression is true. Try printing this expression!
+Can you print also its individual True/False elements? (hint: first
+assign the expression to a variable)
+
+
PYTHON
+
+
third = second.drop('Puerto Rico')
+
+
As the syntax suggests, this line drops the row from
+second where the label is ‘Puerto Rico’. The resulting
+dataframe third has one row less than the original
+dataframe second.
+
+
PYTHON
+
+
fourth = third.drop('continent', axis =1)
+
+
Again we apply the drop function, but in this case we are dropping
+not a row but a whole column. To accomplish this, we need to specify
+also the axis parameter (we want to drop the second column
+which has index 1).
+
+
PYTHON
+
+
fourth.to_csv('result.csv')
+
+
The final step is to write the data that we have been working on to a
+csv file. Pandas makes this easy with the to_csv()
+function. The only required argument to the function is the filename.
+Note that the file will be written in the directory from which you
+started the Jupyter or Python session.
+
+
+
+
+
+
+
+
+
+
Selecting Indices
+
+
Explain in simple terms what idxmin and
+idxmax do in the short program below. When would you use
+these methods?
+
+
PYTHON
+
+
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.idxmin())
+print(data.idxmax())
+
+
+
+
+
+
+
+
+
+
For each column in data, idxmin will return
+the index value corresponding to each column’s minimum;
+idxmax will do accordingly the same for each column’s
+maximum value.
+
You can use these functions whenever you want to get the row index of
+the minimum/maximum value and not the actual minimum/maximum value.
+
+
+
+
+
+
+
+
+
+
Practice with Selection
+
+
Assume Pandas has been imported and the Gapminder GDP data for Europe
+has been loaded. Write an expression to select each of the
+following:
+
+
GDP per capita for all countries in 1982.
+
GDP per capita for Denmark for all years.
+
GDP per capita for all countries for years after 1985.
+
GDP per capita for each country in 2007 as a multiple of GDP per
+capita for that country in 1952.
+
+
+
+
+
+
+
+
+
+
1:
+
+
PYTHON
+
+
data['gdpPercap_1982']
+
+
2:
+
+
PYTHON
+
+
data.loc['Denmark',:]
+
+
3:
+
+
PYTHON
+
+
data.loc[:,'gdpPercap_1985':]
+
+
Pandas is smart enough to recognize the number at the end of the
+column label and does not give you an error, although no column named
+gdpPercap_1985 actually exists. This is useful if new
+columns are added to the CSV file later.
+
4:
+
+
PYTHON
+
+
data['gdpPercap_2007']/data['gdpPercap_1952']
+
+
+
+
+
+
+
+
+
+
+
Many Ways of Access
+
+
There are at least two ways of accessing a value or slice of a
+DataFrame: by name or index. However, there are many others. For
+example, a single column or row can be accessed either as a
+DataFrame or a Series object.
+
Suggest different ways of doing the following operations on a
+DataFrame:
+
+
Access a single column
+
Access a single row
+
Access an individual DataFrame element
+
Access several columns
+
Access several rows
+
Access a subset of specific rows and columns
+
Access a subset of row and column ranges
+
+
+
+
+
+
+
+
+
+
1. Access a single column:
+
+
PYTHON
+
+
# by name
+data["col_name"] # as a Series
+data[["col_name"]] # as a DataFrame
+
+# by name using .loc
+data.T.loc["col_name"] # as a Series
+data.T.loc[["col_name"]].T # as a DataFrame
+
+# Dot notation (Series)
+data.col_name
+
+# by index (iloc)
+data.iloc[:, col_index] # as a Series
+data.iloc[:, [col_index]] # as a DataFrame
+
+# using a mask
+data.T[data.T.index =="col_name"].T
+
+
2. Access a single row:
+
+
PYTHON
+
+
# by name using .loc
+data.loc["row_name"] # as a Series
+data.loc[["row_name"]] # as a DataFrame
+
+# by name
+data.T["row_name"] # as a Series
+data.T[["row_name"]].T # as a DataFrame
+
+# by index
+data.iloc[row_index] # as a Series
+data.iloc[[row_index]] # as a DataFrame
+
+# using mask
+data[data.index =="row_name"]
+
+
3. Access an individual DataFrame element:
+
+
PYTHON
+
+
# by column/row names
+data["column_name"]["row_name"] # as a Series
+
+data[["col_name"]].loc["row_name"] # as a Series
+data[["col_name"]].loc[["row_name"]] # as a DataFrame
+
+data.loc["row_name"]["col_name"] # as a value
+data.loc[["row_name"]]["col_name"] # as a Series
+data.loc[["row_name"]][["col_name"]] # as a DataFrame
+
+data.loc["row_name", "col_name"] # as a value
+data.loc[["row_name"], "col_name"] # as a Series. Preserves index. Column name is moved to `.name`.
+data.loc["row_name", ["col_name"]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.loc[["row_name"], ["col_name"]] # as a DataFrame (preserves original index and column name)
+
+# by column/row names: Dot notation
+data.col_name.row_name
+
+# by column/row indices
+data.iloc[row_index, col_index] # as a value
+data.iloc[[row_index], col_index] # as a Series. Preserves index. Column name is moved to `.name`
+data.iloc[row_index, [col_index]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.iloc[[row_index], [col_index]] # as a DataFrame (preserves original index and column name)
+
+# column name + row index
+data["col_name"][row_index]
+data.col_name[row_index]
+data["col_name"].iloc[row_index]
+
+# column index + row name
+data.iloc[:, [col_index]].loc["row_name"] # as a Series
+data.iloc[:, [col_index]].loc[["row_name"]] # as a DataFrame
+
+# using masks
+data[data.index =="row_name"].T[data.T.index =="col_name"].T
+
+
4. Access several columns:
+
+
PYTHON
+
+
# by name
+data[["col1", "col2", "col3"]]
+data.loc[:, ["col1", "col2", "col3"]]
+
+# by index
+data.iloc[:, [col1_index, col2_index, col3_index]]
+
+
5. Access several rows
+
+
PYTHON
+
+
# by name
+data.loc[["row1", "row2", "row3"]]
+
+# by index
+data.iloc[[row1_index, row2_index, row3_index]]
+
+
6. Access a subset of specific rows and columns
+
+
PYTHON
+
+
# by names
+data.loc[["row1", "row2", "row3"], ["col1", "col2", "col3"]]
+
+# by indices
+data.iloc[[row1_index, row2_index, row3_index], [col1_index, col2_index, col3_index]]
+
+# column names + row indices
+data[["col1", "col2", "col3"]].iloc[[row1_index, row2_index, row3_index]]
+
+# column indices + row names
+data.iloc[:, [col1_index, col2_index, col3_index]].loc[["row1", "row2", "row3"]]
+
+
7. Access a subset of row and column ranges
+
+
PYTHON
+
+
# by name
+data.loc["row1":"row2", "col1":"col2"]
+
+# by index
+data.iloc[row1_index:row2_index, col1_index:col2_index]
+
+# column names + row indices
+data.loc[:, "col1_name":"col2_name"].iloc[row1_index:row2_index]
+
+# column indices + row names
+data.iloc[:, col1_index:col2_index].loc["row1":"row2"]
+
+
+
+
+
+
+
+
+
+
+
Exploring available methods using the
+dir() function
+
+
Python includes a dir() function that can be used to
+display all of the available methods (functions) that are built into a
+data object. In Episode 4, we used some methods with a string. But we
+can see many more are available by using dir():
+
+
PYTHON
+
+
my_string ='Hello world!'# creation of a string object
+dir(my_string)
You can use help() or Shift+Tab to
+get more information about what these methods do.
+
Assume Pandas has been imported and the Gapminder GDP data for Europe
+has been loaded as data. Then, use dir() to
+find the function that prints out the median per-capita GDP across all
+European countries for each year that information is available.
+
+
+
+
+
+
+
+
+
Among many choices, dir() lists the
+median() function as a possibility. Thus,
+
+
PYTHON
+
+
data.median()
+
+
+
+
+
+
+
+
+
+
+
Interpretation
+
+
Poland’s borders have been stable since 1945, but changed several
+times in the years before then. How would you handle this if you were
+creating a table of GDP per capita for Poland for the entire twentieth
+century?
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Use DataFrame.iloc[..., ...] to select values by
+integer location.
+
Use : on its own to mean all columns or all rows.
+
Select multiple columns or rows using DataFrame.loc and
+a named slice.
+
Result of slicing can be used in further operations.
In our Jupyter Notebook example, running the cell should generate the
+figure directly below the code. The figure is also included in the
+Notebook document for future viewing. However, other Python environments
+like an interactive Python session started from a terminal or a Python
+script executed via the command line require an additional command to
+display the figure.
+
Instruct matplotlib to show a figure:
+
+
PYTHON
+
+
plt.show()
+
+
This command can also be used within a Notebook - for instance, to
+display multiple figures if several are created by a single cell.
Before plotting, we convert the column headings from a
+string to integer data type, since they
+represent numerical values, using str.replace()
+to remove the gpdPercap_ prefix and then astype(int)
+to convert the series of string values
+(['1952', '1957', ..., '2007']) to a series of integers:
+[1925, 1957, ..., 2007].
+
+
+
PYTHON
+
+
import pandas as pd
+
+data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+
+# Extract year from last 4 characters of each column name
+# The current column names are structured as 'gdpPercap_(year)',
+# so we want to keep the (year) part only for clarity when plotting GDP vs. years
+# To do this we use replace(), which removes from the string the characters stated in the argument
+# This method works on strings, so we use replace() from Pandas Series.str vectorized string functions
+
+years = data.columns.str.replace('gdpPercap_', '')
+
+# Convert year values to integers, saving results back to dataframe
+
+data.columns = years.astype(int)
+
+data.loc['Australia'].plot()
+
+
Select and transform data, then plot it.
+
+
+
+
By default, DataFrame.plot
+plots with the rows as the X axis.
+
We can transpose the data in order to plot multiple series.
+
+
+
PYTHON
+
+
data.T.plot()
+plt.ylabel('GDP per capita')
+
+
Many styles of plot are available.
+
+
+
+
For example, do a bar plot using a fancier style.
+
+
+
PYTHON
+
+
plt.style.use('ggplot')
+data.T.plot(kind='bar')
+plt.ylabel('GDP per capita')
+
+
Data can also be plotted by calling the matplotlib
+plot function directly.
+
+
+
+
The command is plt.plot(x, y)
+
+
The color and format of markers can also be specified as an
+additional optional argument e.g., b- is a blue line,
+g-- is a green dashed line.
+
Get Australia data from dataframe
+
+
+
+
PYTHON
+
+
years = data.columns
+gdp_australia = data.loc['Australia']
+
+plt.plot(years, gdp_australia, 'g--')
+
+
Can plot many sets of data together.
+
+
+
+
PYTHON
+
+
# Select two countries' worth of data.
+gdp_australia = data.loc['Australia']
+gdp_nz = data.loc['New Zealand']
+
+# Plot with differently-colored markers.
+plt.plot(years, gdp_australia, 'b-', label='Australia')
+plt.plot(years, gdp_nz, 'g-', label='New Zealand')
+
+# Create legend.
+plt.legend(loc='upper left')
+plt.xlabel('Year')
+plt.ylabel('GDP per capita ($)')
+
+
+
+
+
+
+
Adding a Legend
+
+
Often when plotting multiple datasets on the same figure it is
+desirable to have a legend describing the data.
By default matplotlib will attempt to place the legend in a suitable
+position. If you would rather specify a position this can be done with
+the loc= argument, e.g to place the legend in the upper
+left corner of the plot, specify loc='upper left'
+
+
+
+
+
Plot a scatter plot correlating the GDP of Australia and New
+Zealand
+
Use either plt.scatter or
+DataFrame.plot.scatter
+
+
+
+
PYTHON
+
+
plt.scatter(gdp_australia, gdp_nz)
+
+
+
PYTHON
+
+
data.T.plot.scatter(x ='Australia', y ='New Zealand')
+
+
+
+
+
+
+
Minima and Maxima
+
+
Fill in the blanks below to plot the minimum GDP per capita over time
+for all the countries in Europe. Modify it again to plot the maximum GDP
+per capita over time for Europe.
Modify the example in the notes to create a scatter plot showing the
+relationship between the minimum and maximum GDP per capita among the
+countries in Asia for each year in the data set. What relationship do
+you see (if any)?
No particular correlations can be seen between the minimum and
+maximum GDP values year on year. It seems the fortunes of asian
+countries do not rise and fall together.
+
+
+
+
+
+
+
+
+
+
Correlations (continued)
+
+
+
You might note that the variability in the maximum is much higher
+than that of the minimum. Take a look at the maximum and the max
+indexes:
Seems the variability in this value is due to a sharp drop after
+1972. Some geopolitics at play perhaps? Given the dominance of oil
+producing countries, maybe the Brent crude index would make an
+interesting comparison? Whilst Myanmar consistently has the lowest GDP,
+the highest GDP nation has varied more notably.
+
+
+
+
+
+
+
+
+
+
More Correlations
+
+
This short program creates a plot showing the correlation between GDP
+and life expectancy for 2007, normalizing marker size by population:
Using online help and other resources, explain what each argument to
+plot does.
+
+
+
+
+
+
+
+
+
A good place to look is the documentation for the plot function -
+help(data_all.plot).
+
kind - As seen already this determines the kind of plot to be
+drawn.
+
x and y - A column name or index that determines what data will be
+placed on the x and y axes of the plot
+
s - Details for this can be found in the documentation of
+plt.scatter. A single number or one value for each data point.
+Determines the size of the plotted points.
+
+
+
+
+
+
+
+
+
+
Saving your plot to a file
+
+
If you are satisfied with the plot you see you may want to save it to
+a file, perhaps to include it in a publication. There is a function in
+the matplotlib.pyplot module that accomplishes this: savefig.
+Calling this function, e.g. with
+
+
PYTHON
+
+
plt.savefig('my_figure.png')
+
+
will save the current figure to the file my_figure.png.
+The file format will automatically be deduced from the file name
+extension (other formats are pdf, ps, eps and svg).
+
Note that functions in plt refer to a global figure
+variable and after a figure has been displayed to the screen (e.g. with
+plt.show) matplotlib will make this variable refer to a new
+empty figure. Therefore, make sure you call plt.savefig
+before the plot is displayed to the screen, otherwise you may find a
+file with an empty plot.
+
When using dataframes, data is often generated and plotted to screen
+in one line. In addition to using plt.savefig, we can save
+a reference to the current figure in a local variable (with
+plt.gcf) and call the savefig class method
+from that variable to save the figure to file.
+
+
PYTHON
+
+
data.plot(kind='bar')
+fig = plt.gcf() # get current figure
+fig.savefig('my_figure.png')
+
+
+
+
+
+
+
+
+
+
Making your plots accessible
+
+
Whenever you are generating plots to go into a paper or a
+presentation, there are a few things you can do to make sure that
+everyone can understand your plots.
+
+
Always make sure your text is large enough to read. Use the
+fontsize parameter in xlabel,
+ylabel, title, and legend, and tick_params
+with labelsize to increase the text size of the numbers
+on your axes.
+
Similarly, you should make your graph elements easy to see. Use
+s to increase the size of your scatterplot markers and
+linewidth to increase the sizes of your plot lines.
+
Using color (and nothing else) to distinguish between different plot
+elements will make your plots unreadable to anyone who is colorblind, or
+who happens to have a black-and-white office printer. For lines, the
+linestyle parameter lets you use different types of lines.
+For scatterplots, marker lets you change the shape of your
+points. If you’re unsure about your colors, you can use Coblis
+or Color Oracle to simulate what
+your plots would look like to those with colorblindness.
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
+matplotlib is the
+most widely used scientific plotting library in Python.
print('zeroth item of pressures:', pressures[0])
+print('fourth item of pressures:', pressures[4])
+
+
+
OUTPUT
+
+
zeroth item of pressures: 0.273
+fourth item of pressures: 0.276
+
+
Lists’ values can be replaced by assigning to them.
+
+
+
+
Use an index expression on the left of assignment to replace a
+value.
+
+
+
PYTHON
+
+
pressures[0] =0.265
+print('pressures is now:', pressures)
+
+
+
OUTPUT
+
+
pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]
+
+
Appending items to a list lengthens it.
+
+
+
+
Use list_name.append to add items to the end of a
+list.
+
+
+
PYTHON
+
+
primes = [2, 3, 5]
+print('primes is initially:', primes)
+primes.append(7)
+print('primes has become:', primes)
+
+
+
OUTPUT
+
+
primes is initially: [2, 3, 5]
+primes has become: [2, 3, 5, 7]
+
+
+
+append is a method of lists.
+
+
Like a function, but tied to a particular object.
+
+
+
Use object_name.method_name to call methods.
+
+
Deliberately resembles the way we refer to things in a library.
+
+
+
We will meet other methods of lists as we go along.
+
+
Use help(list) for a preview.
+
+
+
+extend is similar to append, but it allows
+you to combine two lists. For example:
+
+
+
PYTHON
+
+
teen_primes = [11, 13, 17, 19]
+middle_aged_primes = [37, 41, 43, 47]
+print('primes is currently:', primes)
+primes.extend(teen_primes)
+print('primes has now become:', primes)
+primes.append(middle_aged_primes)
+print('primes has finally become:', primes)
+
+
+
OUTPUT
+
+
primes is currently: [2, 3, 5, 7]
+primes has now become: [2, 3, 5, 7, 11, 13, 17, 19]
+primes has finally become: [2, 3, 5, 7, 11, 13, 17, 19, [37, 41, 43, 47]]
+
+
Note that while extend maintains the “flat” structure of
+the list, appending a list to a list means the last element in
+primes will itself be a list, not an integer. Lists can
+contain values of any type; therefore, lists of lists are possible.
+
Use del to remove items from a list entirely.
+
+
+
+
We use del list_name[index] to remove an element from a
+list (in the example, 9 is not a prime number) and thus shorten it.
+
+del is not a function or a method, but a statement in
+the language.
+
+
+
PYTHON
+
+
primes = [2, 3, 5, 7, 9]
+print('primes before removing last item:', primes)
+del primes[4]
+print('primes after removing last item:', primes)
+
+
+
OUTPUT
+
+
primes before removing last item: [2, 3, 5, 7, 9]
+primes after removing last item: [2, 3, 5, 7]
+
+
The empty list contains no values.
+
+
+
+
Use [] on its own to represent a list that doesn’t
+contain any values.
+
+
“The zero of lists.”
+
+
+
Helpful as a starting point for collecting values (which we will see
+in the next episode).
+
Lists may contain values of different types.
+
+
+
+
A single list may contain numbers, strings, and anything else.
If start and stop are both non-negative
+integers, how long is the list values[start:stop]?
+
+
+
+
+
+
+
+
+
The list values[start:stop] has up to
+stop - start elements. For example,
+values[1:4] has the 3 elements values[1],
+values[2], and values[3]. Why ‘up to’? As we
+saw in episode 2, if stop
+is greater than the total length of the list values, we
+will still get a list back but it will be shorter than expected.
+
+
+
+
+
+
+
+
+
+
From Strings to Lists and Back
+
+
Given this:
+
+
PYTHON
+
+
print('string to list:', list('tin'))
+print('list to string:', ''.join(['g', 'o', 'l', 'd']))
+
+
+
OUTPUT
+
+
string to list: ['t', 'i', 'n']
+list to string: gold
+
+
+
What does list('some string') do?
+
What does '-'.join(['x', 'y', 'z']) generate?
+
+
+
+
+
+
+
+
+
+
+
+list('some string')
+converts a string into a list containing all of its characters.
+
+join
+returns a string that is the concatenation of each string
+element in the list and adds the separator between each element in the
+list. This results in x-y-z. The separator between the
+elements is the string that provides this method.
+
+
+
+
+
+
+
+
+
+
+
Working With the End
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='helium'
+print(element[-1])
+
+
+
How does Python interpret a negative index?
+
If a list or string has N elements, what is the most negative index
+that can safely be used with it, and what location does that index
+represent?
+
If values is a list, what does
+del values[-1] do?
+
How can you display all elements but the last one without changing
+values? (Hint: you will need to combine slicing and
+negative indexing.)
+
+
+
+
+
+
+
+
+
+
The program prints m.
+
+
Python interprets a negative index as starting from the end (as
+opposed to starting from the beginning). The last element is
+-1.
+
The last index that can safely be used with a list of N elements is
+element -N, which represents the first element.
+
+del values[-1] removes the last element from the
+list.
+
values[:-1]
+
+
+
+
+
+
+
+
+
+
+
Stepping Through a List
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='fluorine'
+print(element[::2])
+print(element[::-1])
+
+
+
If we write a slice as low:high:stride, what does
+stride do?
+
What expression would select all of the even-numbered items from a
+collection?
+
+
+
+
+
+
+
+
+
+
The program prints
+
+
PYTHON
+
+
furn
+eniroulf
+
+
+
+stride is the step size of the slice.
+
The slice 1::2 selects all even-numbered items from a
+collection: it starts with element 1 (which is the second
+element, since indexing starts at 0), goes on until the end
+(since no end is given), and uses a step size of
+2 (i.e., selects every second element).
+
+
+
+
+
+
+
+
+
+
+
Slice Bounds
+
+
What does the following program print?
+
+
PYTHON
+
+
element ='lithium'
+print(element[0:20])
+print(element[-1:3])
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
lithium
+
+
The first statement prints the whole string, since the slice goes
+beyond the total length of the string. The second statement returns an
+empty string, because the slice goes “out of bounds” of the string.
+
+
+
+
+
+
+
+
+
+
Sort and Sorted
+
+
What do these two programs print? In simple terms, explain the
+difference between sorted(letters) and
+letters.sort().
+
+
PYTHON
+
+
# Program A
+letters =list('gold')
+result =sorted(letters)
+print('letters is', letters, 'and result is', result)
+
+
+
PYTHON
+
+
# Program B
+letters =list('gold')
+result = letters.sort()
+print('letters is', letters, 'and result is', result)
+
+
+
+
+
+
+
+
+
+
Program A prints
+
+
OUTPUT
+
+
letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']
+
+
Program B prints
+
+
OUTPUT
+
+
letters is ['d', 'g', 'l', 'o'] and result is None
+
+
sorted(letters) returns a sorted copy of the list
+letters (the original list letters remains
+unchanged), while letters.sort() sorts the list
+letters in-place and does not return anything.
+
+
+
+
+
+
+
+
+
+
Copying (or Not)
+
+
What do these two programs print? In simple terms, explain the
+difference between new = old and
+new = old[:].
+
+
PYTHON
+
+
# Program A
+old =list('gold')
+new = old # simple assignment
+new[0] ='D'
+print('new is', new, 'and old is', old)
+
+
+
PYTHON
+
+
# Program B
+old =list('gold')
+new = old[:] # assigning a slice
+new[0] ='D'
+print('new is', new, 'and old is', old)
+
+
+
+
+
+
+
+
+
+
Program A prints
+
+
OUTPUT
+
+
new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']
+
+
Program B prints
+
+
OUTPUT
+
+
new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']
+
+
new = old makes new a reference to the list
+old; new and old point towards
+the same object.
+
new = old[:] however creates a new list object
+new containing all elements from the list old;
+new and old are different objects.
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
A list stores many values in a single structure.
+
Use an item’s index to fetch it from a list.
+
Lists’ values can be replaced by assigning to them.
+
Appending items to a list lengthens it.
+
Use del to remove items from a list entirely.
+
The empty list contains no values.
+
Lists may contain values of different types.
+
Character strings can be indexed like lists.
+
Character strings are immutable.
+
Indexing beyond the end of the collection is an error.
This error can be fixed by removing the extra spaces at the
+beginning of the second line.
+
Loop variables can be called anything.
+
+
+
+
As with all variables, loop variables are:
+
+
Created on demand.
+
Meaningless: their names can be anything at all.
+
+
+
+
+
PYTHON
+
+
for kitten in [2, 3, 5]:
+print(kitten)
+
+
The body of a loop can contain many statements.
+
+
+
+
But no loop should be more than a few lines long.
+
Hard for human beings to keep larger chunks of code in mind.
+
+
+
PYTHON
+
+
primes = [2, 3, 5]
+for p in primes:
+ squared = p **2
+ cubed = p **3
+print(p, squared, cubed)
+
+
+
OUTPUT
+
+
2 4 8
+3 9 27
+5 25 125
+
+
Use range to iterate over a sequence of numbers.
+
+
+
+
The built-in function range
+produces a sequence of numbers.
+
+
+Not a list: the numbers are produced on demand to make
+looping over large ranges more efficient.
+
+
+
+range(N) is the numbers 0..N-1
+
+
Exactly the legal indices of a list or character string of length
+N
+
+
+
+
+
PYTHON
+
+
print('a range is not a list: range(0, 3)')
+for number inrange(0, 3):
+print(number)
+
+
+
OUTPUT
+
+
a range is not a list: range(0, 3)
+0
+1
+2
+
+
The Accumulator pattern turns many values into one.
+
+
+
+
A common pattern in programs is to:
+
+
Initialize an accumulator variable to zero, the empty
+string, or the empty list.
+
Update the variable with values from a collection.
+
+
+
+
+
PYTHON
+
+
# Sum the first 10 integers.
+total =0
+for number inrange(10):
+ total = total + (number +1)
+print(total)
+
+
+
OUTPUT
+
+
55
+
+
+
Read total = total + (number + 1) as:
+
+
Add 1 to the current value of the loop variable
+number.
+
Add that to the current value of the accumulator variable
+total.
+
Assign that to total, replacing the current value.
+
+
+
We have to add number + 1 because range
+produces 0..9, not 1..10.
+
+
+
+
+
+
+
Classifying Errors
+
+
Is an indentation error a syntax error or a runtime error?
+
+
+
+
+
+
+
+
+
An IndentationError is a syntax error. Programs with syntax errors
+cannot be started. A program with a runtime error will start but an
+error will be thrown under certain conditions.
+
+
+
+
+
+
+
+
+
+
Tracing Execution
+
+
Create a table showing the numbers of the lines that are executed
+when this program runs, and the values of the variables after each line
+is executed.
+
+
PYTHON
+
+
total =0
+for char in"tin":
+ total = total +1
+
+
+
+
+
+
+
+
+
+
+
+
Line no
+
Variables
+
+
+
+
1
+
total = 0
+
+
+
2
+
total = 0 char = ‘t’
+
+
+
3
+
total = 1 char = ‘t’
+
+
+
2
+
total = 1 char = ‘i’
+
+
+
3
+
total = 2 char = ‘i’
+
+
+
2
+
total = 2 char = ‘n’
+
+
+
3
+
total = 3 char = ‘n’
+
+
+
+
+
+
+
+
+
+
+
+
+
Reversing a String
+
+
Fill in the blanks in the program below so that it prints “nit” (the
+reverse of the original character string “tin”).
+
+
PYTHON
+
+
original ="tin"
+result = ____
+for char in original:
+ result = ____
+print(result)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
original ="tin"
+result =""
+for char in original:
+ result = char + result
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+
+
Fill in the blanks in each of the programs below to produce the
+indicated result.
+
+
PYTHON
+
+
# Total length of the strings in the list: ["red", "green", "blue"] => 12
+total =0
+for word in ["red", "green", "blue"]:
+ ____ = ____ +len(word)
+print(total)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
total =0
+for word in ["red", "green", "blue"]:
+ total = total +len(word)
+print(total)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+(continued)
+
+
+
+
PYTHON
+
+
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
+lengths = ____
+for word in ["red", "green", "blue"]:
+ lengths.____(____)
+print(lengths)
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
lengths = []
+for word in ["red", "green", "blue"]:
+ lengths.append(len(word))
+print(lengths)
words = ["red", "green", "blue"]
+result =""
+for word in words:
+ result = result + word
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Practice Accumulating
+(continued)
+
+
+
Create an acronym: Starting from the list
+["red", "green", "blue"], create the acronym
+"RGB" using a for loop.
+
Hint: You may need to use a string method to
+properly format the acronym.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
acronym =""
+for word in ["red", "green", "blue"]:
+ acronym = acronym + word[0].upper()
+print(acronym)
+
+
+
+
+
+
+
+
+
+
+
Cumulative Sum
+
+
Reorder and properly indent the lines of code below so that they
+print a list with the cumulative sum of data. The result should be
+[1, 3, 5, 10].
+
+
PYTHON
+
+
cumulative.append(total)
+for number in data:
+cumulative = []
+total = total + number
+total =0
+print(cumulative)
+data = [1,2,2,5]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
total =0
+data = [1,2,2,5]
+cumulative = []
+for number in data:
+ total = total + number
+ cumulative.append(total)
+print(cumulative)
+
+
+
+
+
+
+
+
+
+
+
Identifying Variable Name Errors
+
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code and read the error message. What type of
+NameError do you think this is? Is it a string with no
+quotes, a misspelled variable, or a variable that should have been
+defined but was not?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
+
PYTHON
+
+
for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (Number %3) ==0:
+ message = message + a
+else:
+ message = message +"b"
+print(message)
+
+
+
+
+
+
+
+
+
+
+
Python variable names are case sensitive: number and
+Number refer to different variables.
+
The variable message needs to be initialized as an
+empty string.
+
We want to add the string "a" to message,
+not the undefined variable a.
+
+
+
PYTHON
+
+
message =""
+for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (number %3) ==0:
+ message = message +"a"
+else:
+ message = message +"b"
+print(message)
+
+
+
+
+
+
+
+
+
+
+
Identifying Item Errors
+
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code, and read the error message. What type of error is
+it?
+
Fix the error.
+
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+
+
+
+
This list has 4 elements and the index to access the last element in
+the list is 3.
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[3])
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
A for loop executes commands once for each value in a
+collection.
+
A for loop is made up of a collection, a loop variable,
+and a body.
+
The first line of the for loop must end with a colon,
+and the body must be indented.
+
Indentation is always meaningful in Python.
+
Loop variables can be called anything (but it is strongly advised to
+have a meaningful name to the looping variable).
+
The body of a loop can contain many statements.
+
Use range to iterate over a sequence of numbers.
+
The Accumulator pattern turns many values into one.
Often use conditionals in a loop to “evolve” the values of
+variables.
+
+
+
PYTHON
+
+
velocity =10.0
+for i inrange(5): # execute the loop 5 times
+print(i, ':', velocity)
+if velocity >20.0:
+print('moving too fast')
+ velocity = velocity -5.0
+else:
+print('moving too slow')
+ velocity = velocity +10.0
+print('final velocity:', velocity)
+
+
+
OUTPUT
+
+
0 : 10.0
+moving too slow
+1 : 20.0
+moving too slow
+2 : 30.0
+moving too fast
+3 : 25.0
+moving too fast
+4 : 20.0
+moving too slow
+final velocity: 30.0
+
+
Create a table showing variables’ values to trace a program’s
+execution.
+
+
+
+
+
+i
+
+
+0
+
+
+.
+
+
+1
+
+
+.
+
+
+2
+
+
+.
+
+
+3
+
+
+.
+
+
+4
+
+
+.
+
+
+
+
+velocity
+
+
+10.0
+
+
+20.0
+
+
+.
+
+
+30.0
+
+
+.
+
+
+25.0
+
+
+.
+
+
+20.0
+
+
+.
+
+
+30.0
+
+
+
+
+
The program must have a print statement
+outside the body of the loop to show the final value of
+velocity, since its value is updated by the last iteration
+of the loop.
+
+
+
+
+
+
+
Compound Relations Using and,
+or, and Parentheses
+
+
Often, you want some combination of things to be true. You can
+combine relations within a conditional using and and
+or. Continuing the example above, suppose you have
+
+
PYTHON
+
+
mass = [ 3.54, 2.07, 9.22, 1.86, 1.71]
+velocity = [10.00, 20.00, 30.00, 25.00, 20.00]
+
+i =0
+for i inrange(5):
+if mass[i] >5and velocity[i] >20:
+print("Fast heavy object. Duck!")
+elif mass[i] >2and mass[i] <=5and velocity[i] <=20:
+print("Normal traffic")
+elif mass[i] <=2and velocity[i] <=20:
+print("Slow light object. Ignore it")
+else:
+print("Whoa! Something is up with the data. Check it")
+
+
Just like with arithmetic, you can and should use parentheses
+whenever there is possible ambiguity. A good general rule is to
+always use parentheses when mixing and and
+or in the same condition. That is, instead of:
+
+
PYTHON
+
+
if mass[i] <=2or mass[i] >=5and velocity[i] >20:
+
+
write one of these:
+
+
PYTHON
+
+
if (mass[i] <=2or mass[i] >=5) and velocity[i] >20:
+if mass[i] <=2or (mass[i] >=5and velocity[i] >20):
+
+
so it is perfectly clear to a reader (and to Python) what you really
+mean.
Fill in the blanks so that this program creates a new list containing
+zeroes where the original list’s values were negative and ones where the
+original list’s values were positive.
+
+
PYTHON
+
+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = ____
+for value in original:
+if ____:
+ result.append(0)
+else:
+ ____
+print(result)
+
+
+
OUTPUT
+
+
[0, 1, 1, 1, 0, 1]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = []
+for value in original:
+if value <0.0:
+ result.append(0)
+else:
+ result.append(1)
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Processing Small Files
+
+
Modify this program so that it only processes files with fewer than
+50 records.
+
+
PYTHON
+
+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+ contents = pd.read_csv(filename)
+ ____:
+print(filename, len(contents))
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+ contents = pd.read_csv(filename)
+iflen(contents) <50:
+print(filename, len(contents))
+
+
+
+
+
+
+
+
+
+
+
Initializing
+
+
Modify this program so that it finds the largest and smallest values
+in the list no matter what the range of values originally is.
+
+
PYTHON
+
+
values = [...some test data...]
+smallest, largest =None, None
+for v in values:
+if ____:
+ smallest, largest = v, v
+ ____:
+ smallest =min(____, v)
+ largest =max(____, v)
+print(smallest, largest)
+
+
What are the advantages and disadvantages of using this method to
+find the range of the data?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest =None, None
+for v in values:
+if smallest isNoneand largest isNone:
+ smallest, largest = v, v
+else:
+ smallest =min(smallest, v)
+ largest =max(largest, v)
+print(smallest, largest)
+
+
If you wrote == None instead of is None,
+that works too, but Python programmers always write is None
+because of the special way None works in the language.
+
It can be argued that an advantage of using this method would be to
+make the code more readable. However, a disadvantage is that this code
+is not efficient because within each iteration of the for
+loop statement, there are two more loops that run over two numbers each
+(the min and max functions). It would be more
+efficient to iterate over each number just once:
+
+
PYTHON
+
+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest =None, None
+for v in values:
+if smallest isNoneor v < smallest:
+ smallest = v
+if largest isNoneor v > largest:
+ largest = v
+print(smallest, largest)
+
+
Now we have one loop, but four comparison tests. There are two ways
+we could improve it further: either use fewer comparisons in each
+iteration, or use two loops that each contain only one comparison test.
+The simplest solution is often the best:
Use glob.glob
+to find sets of files whose names match a pattern.
+
+
+
+
In Unix, the term “globbing” means “matching a set of files with a
+pattern”.
+
The most common patterns are:
+
+
+* meaning “match zero or more characters”
+
+? meaning “match exactly one character”
+
+
+
Python’s standard library contains the glob
+module to provide pattern matching functionality
+
The glob
+module contains a function also called glob to match file
+patterns
+
E.g., glob.glob('*.txt') matches all files in the
+current directory whose names end with .txt.
+
Result is a (possibly empty) list of character strings.
+
+
+
PYTHON
+
+
import glob
+print('all csv files in data directory:', glob.glob('data/*.csv'))
+
+
+
OUTPUT
+
+
all csv files in data directory: ['data/gapminder_all.csv', 'data/gapminder_gdp_africa.csv', \
+'data/gapminder_gdp_americas.csv', 'data/gapminder_gdp_asia.csv', 'data/gapminder_gdp_europe.csv', \
+'data/gapminder_gdp_oceania.csv']
+
+
+
PYTHON
+
+
print('all PDB files:', glob.glob('*.pdb'))
+
+
+
OUTPUT
+
+
all PDB files: []
+
+
Use glob and for to process batches of
+files.
+
+
+
+
Helps a lot if the files are named and stored systematically and
+consistently so that simple patterns will find the right data.
+
+
+
PYTHON
+
+
for filename in glob.glob('data/gapminder_*.csv'):
+ data = pd.read_csv(filename)
+print(filename, data['gdpPercap_1952'].min())
You might have chosen to initialize the fewest variable
+with a number greater than the numbers you’re dealing with, but that
+could lead to trouble if you reuse the code with bigger numbers. Python
+lets you use positive infinity, which will work no matter how big your
+numbers are. What other special strings does the float
+function recognize?
+
+
+
+
+
+
+
+
+
+
Comparing Data
+
+
Write a program that reads in the regional data sets and plots the
+average GDP per capita for each region over time in a single chart.
+Pandas will raise an error if it encounters non-numeric columns in a
+dataframe computation so you may need to either filter out those columns
+or tell pandas to ignore them.
+
+
+
+
+
+
+
+
+
This solution builds a useful legend by using the string
+split method to extract the region from
+the path ‘data/gapminder_gdp_a_specific_region.csv’.
+
+
PYTHON
+
+
import glob
+import pandas as pd
+import matplotlib.pyplot as plt
+fig, ax = plt.subplots(1,1)
+for filename in glob.glob('data/gapminder_gdp*.csv'):
+ dataframe = pd.read_csv(filename)
+# extract <region> from the filename, expected to be in the format 'data/gapminder_gdp_<region>.csv'.
+# we will split the string using the split method and `_` as our separator,
+# retrieve the last string in the list that split returns (`<region>.csv`),
+# and then remove the `.csv` extension from that string.
+# NOTE: the pathlib module covered in the next callout also offers
+# convenient abstractions for working with filesystem paths and could solve this as well:
+# from pathlib import Path
+# region = Path(filename).stem.split('_')[-1]
+ region = filename.split('_')[-1][:-4]
+# pandas raises errors when it encounters non-numeric columns in a dataframe computation
+# but we can tell pandas to ignore them with the `numeric_only` parameter
+ dataframe.mean(numeric_only=True).plot(ax=ax, label=region)
+# NOTE: another way of doing this selects just the columns with gdp in their name using the filter method
+# dataframe.filter(like="gdp").mean().plot(ax=ax, label=region)
+
+plt.legend()
+plt.show()
+
+
+
+
+
+
+
+
+
+
+
Dealing with File Paths
+
+
The pathlib
+module provides useful abstractions for file and path manipulation
+like returning the name of a file without the file extension. This is
+very useful when looping over files and directories. In the example
+below, we create a Path object and inspect its
+attributes.
A common refrain in software engineering is “Don’t Repeat Yourself”.
+How do the techniques we’ve learned in the last lessons help us avoid
+repeating ourselves? Note that in practice there is some nuance to
+this and should be balanced with doing the simplest thing that could
+possibly work.
+
+
What are the pros / cons of making a variable global or local to a
+function?
+
When would you consider turning a block of code into a function
+definition?
Explain and identify the difference between function definition and
+function call.
+
Write a function that takes a small, fixed number of arguments and
+produces a single result.
+
+
+
+
+
+
+
Break programs down into functions to make them easier to
+understand.
+
+
+
+
Human beings can only keep a few items in working memory at a
+time.
+
Understand larger/more complicated ideas by understanding and
+combining pieces.
+
+
Components in a machine.
+
Lemmas when proving theorems.
+
+
+
Functions serve the same purpose in programs.
+
+
+Encapsulate complexity so that we can treat it as a single
+“thing”.
+
+
+
Also enables re-use.
+
+
Write one time, use many times.
+
+
+
Define a function using def with a name, parameters,
+and a block of code.
+
+
+
+
Begin the definition of a new function with def.
+
Followed by the name of the function.
+
+
Must obey the same rules as variable names.
+
+
+
Then parameters in parentheses.
+
+
Empty parentheses if the function doesn’t take any inputs.
+
We will discuss this in detail in a moment.
+
+
+
Then a colon.
+
Then an indented block of code.
+
+
+
PYTHON
+
+
def print_greeting():
+print('Hello!')
+print('The weather is nice today.')
+print('Right?')
+
+
Defining a function does not run it.
+
+
+
+
Defining a function does not run it.
+
+
Like assigning a value to a variable.
+
+
+
Must call the function to execute the code it contains.
+
+
+
PYTHON
+
+
print_greeting()
+
+
+
OUTPUT
+
+
Hello!
+
+
Arguments in a function call are matched to its defined
+parameters.
+
+
+
+
Functions are most useful when they can operate on different
+data.
+
Specify parameters when defining a function.
+
+
These become variables when the function is executed.
+
Are assigned the arguments in the call (i.e., the values passed to
+the function).
+
If you don’t name the arguments when using them in the call, the
+arguments will be matched to parameters in the order the parameters are
+defined in the function.
Or, we can name the arguments when we call the function, which allows
+us to specify them in any order and adds clarity to the call site;
+otherwise as one is reading the code they might forget if the second
+argument is the month or the day for example.
+
+
PYTHON
+
+
print_date(month=3, day=19, year=1871)
+
+
+
OUTPUT
+
+
1871/3/19
+
+
+
Via Twitter:
+() contains the ingredients for the function while the body
+contains the recipe.
+
Functions may return a result to their caller using
+return.
+
+
+
+
Use return ... to give a value back to the caller.
+
May occur anywhere in the function.
+
But functions are easier to understand if return
+occurs:
+
A function that doesn’t explicitly return a value
+automatically returns None.
+
+
+
PYTHON
+
+
result = print_date(1871, 3, 19)
+print('result of call is:', result)
+
+
+
OUTPUT
+
+
1871/3/19
+result of call is: None
+
+
+
+
+
+
+
Identifying Syntax Errors
+
+
+
Read the code below and try to identify what the errors are
+without running it.
+
Run the code and read the error message. Is it a
+SyntaxError or an IndentationError?
+
Fix the error.
+
Repeat steps 2 and 3 until you have fixed all the errors.
+
+
+
PYTHON
+
+
def another_function
+print("Syntax errors are annoying.")
+print("But at least python tells us about them!")
+print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def another_function():
+print("Syntax errors are annoying.")
+print("But at least Python tells us about them!")
+print("So they are usually not too hard to fix.")
A function call always needs parenthesis, otherwise you get memory
+address of the function object. So, if we wanted to call the function
+named report, and give it the value 22.5 to report on, we could have our
+function call as follows
After fixing the problem above, explain why running this example
+code:
+
+
+
PYTHON
+
+
result = print_time(11, 37, 59)
+print('result of call is:', result)
+
+
gives this output:
+
+
OUTPUT
+
+
11:37:59
+result of call is: None
+
+
+
Why is the result of the call None?
+
+
+
+
+
+
+
+
+
+
+
The problem with the example is that the function
+print_time() is defined after the call to the
+function is made. Python doesn’t know how to resolve the name
+print_time since it hasn’t been defined yet and will raise
+a NameError e.g.,
+NameError: name 'print_time' is not defined
+
The first line of output 11:37:59 is printed by the
+first line of code, result = print_time(11, 37, 59) that
+binds the value returned by invoking print_time to the
+variable result. The second line is from the second print
+call to print the contents of the result variable.
+
print_time() does not explicitly return
+a value, so it automatically returns None.
+
+
+
+
+
+
+
+
+
+
+
Encapsulation
+
+
Fill in the blanks to create a function that takes a single filename
+as an argument, loads the data in the file named by the argument, and
+returns the minimum value in that data.
+
+
PYTHON
+
+
import pandas as pd
+
+def min_in_data(____):
+ data = ____
+return ____
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
import pandas as pd
+
+def min_in_data(filename):
+ data = pd.read_csv(filename)
+return data.min()
+
+
+
+
+
+
+
+
+
+
+
Find the First
+
+
Fill in the blanks to create a function that takes a list of numbers
+as an argument and returns the first negative value in the list. What
+does your function do if the list is empty? What if the list has no
+negative numbers?
+
+
PYTHON
+
+
def first_negative(values):
+for v in ____:
+if ____:
+return ____
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def first_negative(values):
+for v in values:
+if v <0:
+return v
+
+
If an empty list or a list with all positive values is passed to this
+function, it returns None:
+
+
PYTHON
+
+
my_list = []
+print(first_negative(my_list))
+
+
+
OUTPUT
+
+
None
+
+
+
+
+
+
+
+
+
+
+
Calling by Name
+
+
Earlier we saw this function:
+
+
PYTHON
+
+
def print_date(year, month, day):
+ joined =str(year) +'/'+str(month) +'/'+str(day)
+print(joined)
+
+
We saw that we can call the function using named arguments,
+like this:
+
+
PYTHON
+
+
print_date(day=1, month=2, year=2003)
+
+
+
What does print_date(day=1, month=2, year=2003)
+print?
+
When have you seen a function call like this before?
+
When and why is it useful to call functions this way?
+
+
+
+
+
+
+
+
+
+
+
2003/2/1
+
We saw examples of using named arguments when working with
+the pandas library. For example, when reading in a dataset using
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country'),
+the last argument index_col is a named argument.
+
Using named arguments can make code more readable since one can see
+from the function call what name the different arguments have inside the
+function. It can also reduce the chances of passing arguments in the
+wrong order, since by using named arguments the order doesn’t
+matter.
+
+
+
+
+
+
+
+
+
+
+
Encapsulation of an If/Print Block
+
+
The code below will run on a label-printer for chicken eggs. A
+digital scale will report a chicken egg mass (in grams) to the computer
+and then the computer will print a label.
+
+
PYTHON
+
+
import random
+for i inrange(10):
+
+# simulating the mass of a chicken egg
+# the (random) mass will be 70 +/- 20 grams
+ mass =70+20.0* (2.0* random.random() -1.0)
+
+print(mass)
+
+# egg sizing machinery prints a label
+if mass >=85:
+print("jumbo")
+elif mass >=70:
+print("large")
+elif mass <70and mass >=55:
+print("medium")
+else:
+print("small")
+
+
The if-block that classifies the eggs might be useful in other
+situations, so to avoid repeating it, we could fold it into a function,
+get_egg_label(). Revising the program to use the function
+would give us this:
+
+
PYTHON
+
+
# revised version
+import random
+for i inrange(10):
+
+# simulating the mass of a chicken egg
+# the (random) mass will be 70 +/- 20 grams
+ mass =70+20.0* (2.0* random.random() -1.0)
+
+print(mass, get_egg_label(mass))
+
+
+
Create a function definition for get_egg_label() that
+will work with the revised program above. Note that the
+get_egg_label() function’s return value will be important.
+Sample output from the above program would be
+71.23 large.
+
A dirty egg might have a mass of more than 90 grams, and a spoiled
+or broken egg will probably have a mass that’s less than 50 grams.
+Modify your get_egg_label() function to account for these
+error conditions. Sample output could be
+25 too light, probably spoiled.
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def get_egg_label(mass):
+# egg sizing machinery prints a label
+ egg_label ="Unlabelled"
+if mass >=90:
+ egg_label ="warning: egg might be dirty"
+elif mass >=85:
+ egg_label ="jumbo"
+elif mass >=70:
+ egg_label ="large"
+elif mass <70and mass >=55:
+ egg_label ="medium"
+elif mass <50:
+ egg_label ="too light, probably spoiled"
+else:
+ egg_label ="small"
+return egg_label
How would you generalize this function if you did not know
+beforehand which specific years occurred as columns in the data? For
+instance, what if we also had data from years ending in 1 and 9 for each
+decade? (Hint: use the columns to filter out the ones that correspond to
+the decade, instead of enumerating them in the code.)
+
+
+
+
+
+
+
+
+
+
+
The average GDP for Japan across the years reported for the 1980s is
+computed with:
To obtain the average for the relevant years, we need to loop over
+them:
+
+
+
PYTHON
+
+
def avg_gdp_in_decade(country, continent, year):
+ data_countries = pd.read_csv('data/gapminder_gdp_'+ continent +'.csv', index_col=0)
+ c = data_countries.loc[country]
+ gdp_decade ='gdpPercap_'+str(year //10)
+ total =0.0
+ num_years =0
+for yr_header in c.index: # c's index contains reported years
+if yr_header.startswith(gdp_decade):
+ total = total + c.loc[yr_header]
+ num_years = num_years +1
+return total/num_years
+
+
The function can now be called by:
+
+
PYTHON
+
+
avg_gdp_in_decade('Japan','asia',1983)
+
+
+
OUTPUT
+
+
20880.023800000003
+
+
+
+
+
+
+
+
+
+
+
Simulating a dynamical system
+
+
In mathematics, a dynamical
+system is a system in which a function describes the time dependence
+of a point in a geometrical space. A canonical example of a dynamical
+system is the logistic map, a
+growth model that computes a new population density (between 0 and 1)
+based on the current density. In the model, time takes discrete values
+0, 1, 2, …
+
+
Define a function called logistic_map that takes two
+inputs: x, representing the current population (at time
+t), and a parameter r = 1. This function
+should return a value representing the state of the system (population)
+at time t + 1, using the mapping function:
+
+
f(t+1) = r * f(t) * [1 - f(t)]
+
+
Using a for or while loop, iterate the
+logistic_map function defined in part 1, starting from an
+initial population of 0.5, for a period of time
+t_final = 10. Store the intermediate results in a list so
+that after the loop terminates you have accumulated a sequence of values
+representing the state of the logistic map at times
+t = [0,1,...,t_final] (11 values in total). Print this list
+to see the evolution of the population.
+
Encapsulate the logic of your loop into a function called
+iterate that takes the initial population as its first
+input, the parameter t_final as its second input and the
+parameter r as its third input. The function should return
+the list of values representing the state of the logistic map at times
+t = [0,1,...,t_final]. Run this function for periods
+t_final = 100 and 1000 and print some of the
+values. Is the population trending toward a steady state?
Functions will often contain conditionals. Here is a short example
+that will indicate which quartile the argument is in based on hand-coded
+values for the quartile cut points.
+
+
PYTHON
+
+
def calculate_life_quartile(exp):
+if exp <58.41:
+# This observation is in the first quartile
+return1
+elif exp >=58.41and exp <67.05:
+# This observation is in the second quartile
+return2
+elif exp >=67.05and exp <71.70:
+# This observation is in the third quartile
+return3
+elif exp >=71.70:
+# This observation is in the fourth quartile
+return4
+else:
+# This observation has bad data
+returnNone
+
+calculate_life_quartile(62.5)
+
+
+
OUTPUT
+
+
2
+
+
That function would typically be used within a for loop,
+but Pandas has a different, more efficient way of doing the same thing,
+and that is by applying a function to a dataframe or a portion
+of a dataframe. Here is an example, using the definition above.
+
+
PYTHON
+
+
data = pd.read_csv('data/gapminder_all.csv')
+data['life_qrtl'] = data['lifeExp_1952'].apply(calculate_life_quartile)
+
+
There is a lot in that second line, so let’s take it piece by piece.
+On the right side of the = we start with
+data['lifeExp'], which is the column in the dataframe
+called data labeled lifExp. We use the
+apply() to do what it says, apply the
+calculate_life_quartile to the value of this column for
+every row in the dataframe.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Break programs down into functions to make them easier to
+understand.
+
Define a function using def with a name, parameters,
+and a block of code.
+
Defining a function does not run it.
+
Arguments in a function call are matched to its defined
+parameters.
+
Functions may return a result to their caller using
+return.
Read a traceback and determine the file, function, and line number
+on which the error occurred, the type of error, and the error
+message.
+
+
+
+
+
+
+
The scope of a variable is the part of a program that can ‘see’ that
+variable.
+
+
+
+
There are only so many sensible names for variables.
+
People using functions shouldn’t have to worry about what variable
+names the author of the function used.
+
People writing functions shouldn’t have to worry about what variable
+names the function’s caller uses.
+
The part of a program in which a variable is visible is called its
+scope.
+
+
+
PYTHON
+
+
pressure =103.9
+
+def adjust(t):
+ temperature = t *1.43/ pressure
+return temperature
+
+
+
+pressure is a global variable.
+
+
Defined outside any particular function.
+
Visible everywhere.
+
+
+
+t and temperature are local
+variables in adjust.
+
+
Defined in the function.
+
Not visible in the main program.
+
Remember: a function parameter is a variable that is automatically
+assigned a value when the function is called.
+
+
+
+
+
PYTHON
+
+
print('adjusted:', adjust(0.9))
+print('temperature after call:', temperature)
+
+
+
OUTPUT
+
+
adjusted:0.01238691049085659
+
+
+
ERROR
+
+
Traceback (most recent call last):
+ File "/Users/swcarpentry/foo.py", line 8, in <module>
+ print('temperature after call:', temperature)
+NameError: name 'temperature' is not defined
+
+
+
+
+
+
+
Local and Global Variable Use
+
+
Trace the values of all variables in this program as it is executed.
+(Use ‘—’ as the value of variables before and after they exist.)
Read the traceback below, and identify the following:
+
+
How many levels does the traceback have?
+
What is the file name where the error occurred?
+
What is the function name where the error occurred?
+
On which line number in this function did the error occur?
+
What is the type of error?
+
What is the error message?
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+KeyError Traceback (most recent call last)
+<ipython-input-2-e4c4cbafeeb5> in <module>()
+ 1 import errors_02
+----> 2 errors_02.print_friday_message()
+
+/Users/ghopper/thesis/code/errors_02.py in print_friday_message()
+ 13
+ 14 def print_friday_message():
+---> 15 print_message("Friday")
+
+/Users/ghopper/thesis/code/errors_02.py in print_message(day)
+ 9 "sunday": "Aw, the weekend is almost over."
+ 10 }
+---> 11 print(messages[day])
+ 12
+ 13
+
+KeyError: 'Friday'
+
+
+
+
+
+
+
+
+
+
+
Three levels.
+
errors_02.py
+
print_message
+
Line 11
+
+KeyError. These errors occur when we are trying to look
+up a key that does not exist (usually in a data structure such as a
+dictionary). We can find more information about the
+KeyError and other built-in exceptions in the Python
+docs.
+
KeyError: 'Friday'
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
The scope of a variable is the part of a program that can ‘see’ that
+variable.
Provide sound justifications for basic rules of coding style.
+
Refactor one-page programs to make them more readable and justify
+the changes.
+
Use Python community coding standards (PEP-8).
+
+
+
+
+
+
+
Coding style
+
+
+
A consistent coding style helps others (including our future selves)
+read and understand code more easily. Code is read much more often than
+it is written, and as the Zen of Python
+states, “Readability counts”. Python proposed a standard style through
+one of its first Python Enhancement Proposals (PEP), PEP8.
+
Some points worth highlighting:
+
+
document your code and ensure that assumptions, internal algorithms,
+expected inputs, expected outputs, etc., are clear
+
use clear, semantically meaningful variable names
+
use white-space, not tabs, to indent lines (tabs can cause
+problems across different text editors, operating systems, and version
+control systems)
+
Follow standard Python style in your code.
+
+
+
+
+PEP8: a style
+guide for Python that discusses topics such as how to name variables,
+how to indent your code, how to structure your import
+statements, etc. Adhering to PEP8 makes it easier for other Python
+developers to read and understand your code, and to understand what
+their contributions should look like.
+
To check your code for compliance with PEP8, you can use the pycodestyle application
+and tools like the black code
+formatter can automatically format your code to conform to PEP8 and
+pycodestyle (a Jupyter notebook formatter also exists nb_black).
+
Some groups and organizations follow different style guidelines
+besides PEP8. For example, the Google style
+guide on Python makes slightly different recommendations. Google
+wrote an application that can help you format your code in either their
+style or PEP8 called yapf.
+
With respect to coding style, the key is consistency.
+Choose a style for your project be it PEP8, the Google style, or
+something else and do your best to ensure that you and anyone else you
+are collaborating with sticks to it. Consistency within a project is
+often more impactful than the particular style used. A consistent style
+will make your software easier to read and understand for others and for
+your future self.
+
Use assertions to check for internal errors.
+
+
+
Assertions are a simple but powerful method for making sure that the
+context in which your code is executing is as you expect.
+
+
PYTHON
+
+
def calc_bulk_density(mass, volume):
+'''Return dry bulk density = powder mass / powder volume.'''
+assert volume >0
+return mass / volume
+
+
If the assertion is False, the Python interpreter raises
+an AssertionError runtime exception. The source code for
+the expression that failed will be displayed as part of the error
+message. To ignore assertions in your code run the interpreter with the
+‘-O’ (optimize) switch. Assertions should contain only simple checks and
+never change the state of the program. For example, an assertion should
+never contain an assignment.
+
Use docstrings to provide builtin help.
+
+
+
If the first thing in a function is a character string that is not
+assigned directly to a variable, Python attaches it to the function,
+accessible via the builtin help function. This string that provides
+documentation is also known as a docstring.
+
+
PYTHON
+
+
def average(values):
+"Return average of values, or None if no values are supplied."
+
+iflen(values) ==0:
+returnNone
+returnsum(values) /len(values)
+
+help(average)
+
+
+
OUTPUT
+
+
Help on function average in module __main__:
+
+average(values)
+ Return average of values, or None if no values are supplied.
+
+
+
+
+
+
+
Multiline Strings
+
+
Often use multiline strings for documentation. These start
+and end with three quote characters (either single or double) and end
+with three matching characters.
+
+
PYTHON
+
+
"""This string spans
+multiple lines.
+
+Blank lines are allowed."""
+
+
+
+
+
+
+
+
+
+
What Will Be Shown?
+
+
Highlight the lines in the code below that will be available as
+online help. Are there lines that should be made available, but won’t
+be? Will any lines produce a syntax error or a runtime error?
+
+
PYTHON
+
+
"Find maximum edit distance between multiple sequences."
+# This finds the maximum distance between all sequences.
+
+def overall_max(sequences):
+'''Determine overall maximum edit distance.'''
+
+ highest =0
+for left in sequences:
+for right in sequences:
+'''Avoid checking sequence against itself.'''
+if left != right:
+ this = edit_distance(left, right)
+ highest =max(highest, this)
+
+# Report.
+return highest
+
+
+
+
+
+
+
+
+
+
Document This
+
+
Use comments to describe and help others understand potentially
+unintuitive sections or individual lines of code. They are especially
+useful to whoever may need to understand and edit your code in the
+future, including yourself.
+
Use docstrings to document the acceptable inputs and expected outputs
+of a method or class, its purpose, assumptions and intended behavior.
+Docstrings are displayed when a user invokes the builtin
+help method on your method or class.
+
Turn the comment in the following function into a docstring and check
+that help displays it properly.
+
+
PYTHON
+
+
def middle(a, b, c):
+# Return the middle value of three.
+# Assumes the values can actually be compared.
+ values = [a, b, c]
+ values.sort()
+return values[1]
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def middle(a, b, c):
+'''Return the middle value of three.
+ Assumes the values can actually be compared.'''
+ values = [a, b, c]
+ values.sort()
+return values[1]
+
+
+
+
+
+
+
+
+
+
+
Clean Up This Code
+
+
+
Read this short program and try to predict what it does.
+
Run it: how accurate was your prediction?
+
Refactor the program to make it more readable. Remember to run it
+after each change to ensure its behavior hasn’t changed.
+
Compare your rewrite with your neighbor’s. What did you do the same?
+What did you do differently, and why?
+
+
+
PYTHON
+
+
n =10
+s ='et cetera'
+print(s)
+i =0
+while i < n:
+# print('at', j)
+ new =''
+for j inrange(len(s)):
+ left = j-1
+ right = (j+1)%len(s)
+if s[left]==s[right]: new = new +'-'
+else: new = new +'*'
+ s=''.join(new)
+print(s)
+ i +=1
+
+
+
+
+
+
+
+
+
+
Here’s one solution.
+
+
PYTHON
+
+
def string_machine(input_string, iterations):
+"""
+ Takes input_string and generates a new string with -'s and *'s
+ corresponding to characters that have identical adjacent characters
+ or not, respectively. Iterates through this procedure with the resultant
+ strings for the supplied number of iterations.
+ """
+print(input_string)
+ input_string_length =len(input_string)
+ old = input_string
+for i inrange(iterations):
+ new =''
+# iterate through characters in previous string
+for j inrange(input_string_length):
+ left = j-1
+ right = (j+1) % input_string_length # ensure right index wraps around
+if old[left] == old[right]:
+ new = new +'-'
+else:
+ new = new +'*'
+print(new)
+# store new string as old
+ old = new
+
+string_machine('et cetera', 10)
Name and locate scientific Python community sites for software,
+workshops, and help.
+
+
+
+
+
+
+
Leslie Lamport once said, “Writing is nature’s way of showing you how
+sloppy your thinking is.” The same is true of programming: many things
+that seem obvious when we’re thinking about them turn out to be anything
+but when we have to explain them precisely.
+
Python supports a large and diverse community across academia and
+industry.
+
We are filling in the exercises below in order to make the lesson plan
+more concrete. Contributions (both in the form of pull requests with
+filled-in exercises, and comments on specific exercises, ordering, and
+timings) are greatly appreciated.
+
+
+
+
Process Used
+
+
Michael Pollan’s advice if he taught R or Python programming:
This lesson was developed using a slimmed-down variant of the
+“Understanding by Design” process. The main sections are:
+
Assumptions about audience, time, etc. (The current draft also
+includes some conclusions and decisions in this section - that should be
+refactored.)
+
Desired results: overall goals, summative assessments at half-day
+granularity, what learners will be able to do, what learners will
+know.
+
Learning plan: each episode has a heading that summarizes what
+will be covered, then estimates time that will be spent on teaching and
+on exercises, while the exercises are given as bullet points.
+
Stage 1: Assumptions
+
Audience
+
Graduate students in numerate disciplines from cosmology to
+archaeology
+
Who have manipulated data in spreadsheets and with interactive tools
+like SAS
+
But have not programmed beyond CPD
+(copy-paste-despair)
+
+
Constraints
+
One full day 09:00-16:30
+
06:15 class time
+
0:45 lunch
+
0:30 total for two coffee breaks
+
+
Learners use native installs on their own machines
+
May use VMs or cloud resources at instructor’s discretion
+
But must keep native local install as an option
+
+
No dependence on other Carpentry modules
+
In particular, does not require knowledge of shell or version
+control
+
+
Use the Jupyter Notebook
+
Authentic tool used by many instructors
+
There isn’t really an alternative
+
And means that even people who have seen a bit of Python before will
+probably learn something
+
+
+
Motivating Example
+
Creating 2D plots suitable for inclusion in papers
+
Appeals to almost everyone
+
Makes lesson usable by both Carpentries
+
And means that even people who have seen a bit of Python before will
+probably learn something
+
+
+
Data
+
Use the gapminder data throughout
+
But break into multiple files by continent
+
To make display of output from examples tidier (e.g., use
+Australia/New Zealand, which is only two lines)
+
And allow examples showing use of multiple data sets
+
+
+
Focus on Pandas instead of NumPy
+
Makes lesson usable by both Data Carpentry and Software
+Carpentry
+
Genuine novices are likely to want data analysis
+
And people with some prior experience:
+
will accept data analysis as an authentic task,
+
and are unlikely to have encountered Pandas, so they’ll still get
+something useful out of the lesson
+
+
+
Challenges will mostly not be “write this code from
+scratch”
+
Want lots of short exercises that can reliably be finished in
+allotted time
+
So use MCQs, fill-in-the-blanks, Parsons Problems, “tweak this
+code”, etc.
+
+
Stage 2: Desired Results
+
+
Questions
+
How do I…
+
…read tabular data?
+
…plot a single vector of values?
+
…create a time series plot?
+
…create one plot for each of several data sets?
+
…get extra data from a single data set for plotting?
+
…write programs I can read and re-use in future?
+
+
+
Skills
+
I can…
+
…write short scripts using loops and conditionals.
+
…write functions with a fixed number of parameters that return a
+single result.
+
…import libraries using aliases and refer to those libraries’
+contents.
+
…do simple data extraction and formatting using Pandas.
+
+
+
Concepts
+
I know…
+
…that a program is a piece of lab equipment that implements an
+analysis
+
Needs to be validated/calibrated before/during use
+
Makes analysis reproducible, reviewable, shareable
+
+
…that programs are written for people, not for computers
+
Meaningful variable names
+
Modularity for readability as well as re-use
+
No duplication
+
Document purpose and use
+
+
…that there is no magic: the programs they use are no different in
+principle from those they build
+
…how to assign values to variables
+
…what integers, floats, strings, NumPy arrays, and Pandas dataframes
+are
+
…how to trace the execution of a for loop
+
…how to trace the execution of if/else
+statements
+
…how to create and index lists
+
…how to create and index NumPy arrays
+
…how to create and index Pandas dataframes
+
…how to create time series plots
+
…the difference between defining and calling a function
+
…where to find documentation on standard libraries
+
…how to find out what else scientific Python offers
+
+
Stage 3: Learning Plan
+
+
Summative Assessment
+
Midpoint: create time-series plot for each file in a directory.
+
Final: extract data from Pandas dataframe and create comparative
+multi-line time series plot.
Select entire rows or entire columns from a dataframe.
+
Select a subset of both rows and columns from a dataframe in a
+single operation.
+
Select a subset of a dataframe by a single Boolean criterion.
+
+
Challenges: 15 min
+
Write an expression to find the Per Capita GDP of Serbia in
+2007.
+
What rule governs what is (or isn’t) included in numerical and named
+slices in Pandas?
+
What does each line in the following short program do?
+
What do idxmin and idxmax do?
+
Write expressions to get the GDP per capita for all countries in
+1982, for all countries after 1985, etc.
+
Given the way its borders have changed since 1900, what would you do
+if asked to create a table of GDP per capita for Poland for the
+Twentieth Century?
+
+
diff --git a/instructor/exercises.html b/instructor/exercises.html
new file mode 100644
index 000000000..fbb03387b
--- /dev/null
+++ b/instructor/exercises.html
@@ -0,0 +1,531 @@
+
+Plotting and Programming in Python: Further Exercises
+ Skip to main content
+
Image 1 of 1: ‘A line of Python code, print(atom_name[0]), demonstrates that using the zero index will output just the initial letter, in this case ‘h’ for helium.’
Image 1 of 1: ‘A line chart showing time (hr) relative to position (km), using the values provided in the code block above. By default, the plotted line is blue against a white background, and the axes have been scaled automatically to fit the range of the input data.’
+
+
Figure 2
+
Image 1 of 1: ‘GDP plot for Australia’
+
+
Figure 3
+
Image 1 of 1: ‘GDP plot for Australia and New Zealand’
+
+
Figure 4
+
Image 1 of 1: ‘GDP barplot for Australia’
+
+
Figure 5
+
Image 1 of 1: ‘GDP formatted plot for Australia’
+
+
Figure 6
+
Image 1 of 1: ‘GDP formatted plot for Australia and New Zealand’
+
+
Figure 7
+
Image 1 of 1: ‘GDP correlation using plt.scatter’
+
+
Figure 8
+
Image 1 of 1: ‘GDP correlation using data.T.plot.scatter’
+
+
+
+
diff --git a/instructor/index.html b/instructor/index.html
new file mode 100644
index 000000000..c3cd355d5
--- /dev/null
+++ b/instructor/index.html
@@ -0,0 +1,733 @@
+
+Plotting and Programming in Python: Summary and Schedule
+ Skip to main content
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Plotting and Programming in Python
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Summary and Schedule
+
+
+
This lesson is an introduction to programming in Python 3 for people
+with little or no previous programming experience. It uses plotting as
+its motivating example and is designed to be used in both Data Carpentry and Software Carpentry
+workshops. This lesson references JupyterLab but
+can be taught using alternative Python 3 interpreters as well (e.g.,
+repl.it, Anaconda).
+
+
+
+
+
+
Prerequisites
+
+
Learners need to understand what files and directories are, what
+a working directory is, and how to start a Python interpreter.
+
Learners must install Python 3 before the class starts.
+ The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
+
+
Getting the Data
+
The data we will be using is taken from the gapminder
+dataset. To obtain it, download and unzip the file python-novice-gapminder-data.zip.
+In order to follow the presented material, you should launch the
+JupyterLab server in the root directory (see Starting
+JupyterLab).
+
+
diff --git a/instructor/instructor-notes.html b/instructor/instructor-notes.html
new file mode 100644
index 000000000..ebcb90e3e
--- /dev/null
+++ b/instructor/instructor-notes.html
@@ -0,0 +1,666 @@
+
+
+
+
+
+Plotting and Programming in Python: Instructor Notes
+
+
+
+
+
+
+
+
+
+
+
+
+ Skip to main content
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Plotting and Programming in Python
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Instructor Notes
+
+
General Notes
+
+
+
+
It’s all right not to get through the whole lesson.
+
+This lesson is designed for people who have never programmed before, but
+any given class may include people with a wide range of prior
+experience. We have therefore included enough material to fill a full
+day if need be, but expect that many offerings will only get as far as
+the introduction to Pandas.
+
+
Don’t tell people to Google things.
+
+One of the goals of this lesson is to help novices build a workable
+mental model of how programming works. Until they have that model, they
+will not know what to search for or how to recognize a helpful answer.
+Telling them to Google can also give the impression that we think their
+problem is trivial. (That said, if learners have done enough programming
+before to be past these issues, having them search for solutions online
+can help them solidify their understanding.) It’s also worth quoting Trevor
+King’s comment about online search: “If you find anything, other
+folks were confused enough to bother with a blog or Stack Overflow post,
+so it’s probably not trivial.”
+
Learners often struggle here, many may not work with financial data
+and concepts so they find the example concepts difficult to get their
+head around. The biggest problem though is the line generating the
+wealth_score, this step needs to be talked through throughly: * It uses
+implicit conversion between boolean and float values which has not been
+covered in the course so far. * The axis=1 argument needs to be
+explained clearly.
Use the pandas library to do statistics on tabular data. Load with
+import pandas as pd.
+
To read in a csv: pd.read_csv(), including the path
+name in the parenthesis.
+
To specify a column’s values should be used as row headings:
+pd.read_csv('path', index_col='column name'), where path
+and column name should be replaced with the relevant values.
+
+
+
To get more information about a DataFrame, use
+DataFrame.info, replacing DataFrame with the
+variable name of your DataFrame.
+
Use DataFrame.columns to view the column names.
+
Use DataFrame.T to transpose a DataFrame.
+
Use DataFrame.describe to get summary statistics about
+your data.
To select by entry position: DataFrame.iloc[..., ...]
+
This is inclusive of everything except the final index.
+
+
To select by entry label: DataFrame.loc[..., ...]
+
Can select multiple rows or columns by listing labels.
+
This is inclusive to both ends.
+
+
Use : to select all rows or columns.
+
+
Can also select data based on values using True and
+False. This is a Boolean mask.
+
mask = subset > 10000
+
We can then use this to select values.
+
+
To use a select-apply-combine operation we use
+data.apply(lambda x: x > x.mean()) where
+mean() can be any operation the user would like to be
+applied to x.
Use the pandas library to do statistics on tabular data. Load with
+import pandas as pd.
+
To read in a csv: pd.read_csv(), including the path
+name in the parenthesis.
+
To specify a column’s values should be used as row headings:
+pd.read_csv('path', index_col='column name'), where path
+and column name should be replaced with the relevant values.
+
+
+
To get more information about a DataFrame, use
+DataFrame.info, replacing DataFrame with the
+variable name of your DataFrame.
+
Use DataFrame.columns to view the column names.
+
Use DataFrame.T to transpose a DataFrame.
+
Use DataFrame.describe to get summary statistics about
+your data.
To select by entry position: DataFrame.iloc[..., ...]
+
This is inclusive of everything except the final index.
+
+
To select by entry label: DataFrame.loc[..., ...]
+
Can select multiple rows or columns by listing labels.
+
This is inclusive to both ends.
+
+
Use : to select all rows or columns.
+
+
Can also select data based on values using True and
+False. This is a Boolean mask.
+
mask = subset > 10000
+
We can then use this to select values.
+
+
To use a select-apply-combine operation we use
+data.apply(lambda x: x > x.mean()) where
+mean() can be any operation the user would like to be
+applied to x.