diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/01-run-quit.html b/01-run-quit.html new file mode 100644 index 000000000..a8fbf2a15 --- /dev/null +++ b/01-run-quit.html @@ -0,0 +1,1250 @@ + +Plotting and Programming in Python: Running and Quitting +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Running and Quitting

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I run Python programs?
  • +
+
+
+
+
+
+

Objectives

+
  • Launch the JupyterLab server.
  • +
  • Create a new Python script.
  • +
  • Create a Jupyter notebook.
  • +
  • Shutdown the JupyterLab server.
  • +
  • Understand the difference between a Python script and a Jupyter +notebook.
  • +
  • Create Markdown cells in a notebook.
  • +
  • Create and run Python cells in a notebook.
  • +
+
+
+
+
+

To run Python, we are going to use Jupyter Notebooks via JupyterLab for +the remainder of this workshop. Jupyter notebooks are common in data +science and visualization and serve as a convenient common-denominator +experience for running Python code interactively where we can easily +view and share the results of our Python code.

+

There are other ways of editing, managing, and running code. Software +developers often use an integrated development environment (IDE) like PyCharm or Visual Studio Code, or text +editors like Vim or Emacs, to create and edit their Python programs. +After editing and saving your Python programs you can execute those +programs within the IDE itself or directly on the command line. In +contrast, Jupyter notebooks let us execute and view the results of our +Python code immediately within the notebook.

+

JupyterLab has several other handy features:

+
  • You can easily type, edit, and copy and paste blocks of code.
  • +
  • Tab complete allows you to easily access the names of things you are +using and learn more about them.
  • +
  • It allows you to annotate your code with links, different sized +text, bullets, etc. to make it more accessible to you and your +collaborators.
  • +
  • It allows you to display figures next to the code that produces them +to tell a complete story of the analysis.
  • +

Each notebook contains one or more cells that contain code, text, or +images.

+

Getting Started with JupyterLab

+

JupyterLab is an application server with a web user interface from Project Jupyter that enables one to work +with documents and activities such as Jupyter notebooks, text editors, +terminals, and even custom components in a flexible, integrated, and +extensible manner. JupyterLab requires a reasonably up-to-date browser +(ideally a current version of Chrome, Safari, or Firefox); Internet +Explorer versions 9 and below are not supported.

+

JupyterLab is included as part of the Anaconda Python distribution. +If you have not already installed the Anaconda Python distribution, see +the setup instructions for installation +instructions.

+

In this lesson we will run JupyterLab locally on our own machines so +it will not require an internet connection besides the initial +connection to download and install Anaconda and JupyterLab

+
  • Start the JupyterLab server on your machine
  • +
  • Use a web browser to open a special localhost URL that connects to +your JupyterLab server
  • +
  • The JupyterLab server does the work and the web browser renders the +result
  • +
  • Type code into the browser and see the results after your JupyterLab +server has finished executing your code
  • +
+
+ +
+
+

JupyterLab? What about Jupyter notebooks?

+
+

JupyterLab is the next +stage in the evolution of the Jupyter Notebook. If you have prior +experience working with Jupyter notebooks, then you will have a good +idea of what to expect from JupyterLab.

+

Experienced users of Jupyter notebooks interested in a more detailed +discussion of the similarities and differences between the JupyterLab +and Jupyter notebook user interfaces can find more information in the JupyterLab +user interface documentation.

+
+
+
+

Starting JupyterLab

+

You can start the JupyterLab server through the command line or +through an application called Anaconda Navigator. Anaconda +Navigator is included as part of the Anaconda Python distribution.

+
+

macOS - Command Line

+

To start the JupyterLab server you will need to access the command +line through the Terminal. There are two ways to open Terminal on +Mac.

+
  1. In your Applications folder, open Utilities and double-click on +Terminal
  2. +
  3. Press Command + spacebar to launch Spotlight. +Type Terminal and then double-click the search result or +hit Enter +
  4. +

After you have launched Terminal, type the command to launch the +JupyterLab server.

+
+

BASH +

+
$ jupyter lab
+
+
+
+

Windows Users - Command Line

+

To start the JupyterLab server you will need to access the Anaconda +Prompt.

+

Press Windows Logo Key and search for +Anaconda Prompt, click the result or press enter.

+

After you have launched the Anaconda Prompt, type the command:

+
+

BASH +

+
$ jupyter lab
+
+
+
+

Anaconda Navigator

+

To start a JupyterLab server from Anaconda Navigator you must first +start +Anaconda Navigator (click for detailed instructions on macOS, Windows, +and Linux). You can search for Anaconda Navigator via Spotlight on +macOS (Command + spacebar), the Windows search +function (Windows Logo Key) or opening a terminal shell and +executing the anaconda-navigator executable from the +command line.

+

After you have launched Anaconda Navigator, click the +Launch button under JupyterLab. You may need to scroll down +to find it.

+

Here is a screenshot of an Anaconda Navigator page similar to the one +that should open on either macOS or Windows.

+

+Anaconda Navigator landing page

+

And here is a screenshot of a JupyterLab landing page that should be +similar to the one that opens in your default web browser after starting +the JupyterLab server on either macOS or Windows.

+

+JupyterLab landing page

+
+

The JupyterLab Interface

+

JupyterLab has many features found in traditional integrated +development environments (IDEs) but is focused on providing flexible +building blocks for interactive, exploratory computing.

+

The JupyterLab +Interface consists of the Menu Bar, a collapsable Left Side Bar, and +the Main Work Area which contains tabs of documents and activities.

+
+ +

The Menu Bar at the top of JupyterLab has the top-level menus that +expose various actions available in JupyterLab along with their keyboard +shortcuts (where applicable). The following menus are included by +default.

+
  • +File: Actions related to files and directories such +as New, Open, Close, Save, etc. The +File menu also includes the Shut Down action used to +shutdown the JupyterLab server.
  • +
  • +Edit: Actions related to editing documents and +other activities such as Undo, Cut, Copy, +Paste, etc.
  • +
  • +View: Actions that alter the appearance of +JupyterLab.
  • +
  • +Run: Actions for running code in different +activities such as notebooks and code consoles (discussed below).
  • +
  • +Kernel: Actions for managing kernels. Kernels in +Jupyter will be explained in more detail below.
  • +
  • +Tabs: A list of the open documents and activities +in the main work area.
  • +
  • +Settings: Common JupyterLab settings can be +configured using this menu. There is also an Advanced Settings +Editor option in the dropdown menu that provides more fine-grained +control of JupyterLab settings and configuration options.
  • +
  • +Help: A list of JupyterLab and kernel help +links.
  • +
+
+ +
+
+

Kernels

+
+

The JupyterLab docs +define kernels as “separate processes started by the server that runs +your code in different programming languages and environments.” When we +open a Jupyter Notebook, that starts a kernel - a process - that is +going to run the code. In this lesson, we’ll be using the Jupyter +ipython kernel which lets us run Python 3 code interactively.

+

Using other Jupyter kernels +for other programming languages would let us write and execute code +in other programming languages in the same JupyterLab interface, like R, +Java, Julia, Ruby, JavaScript, Fortran, etc.

+
+
+
+

A screenshot of the default Menu Bar is provided below.

+

+JupyterLab Menu Bar

+
+
+ +

The left sidebar contains a number of commonly used tabs, such as a +file browser (showing the contents of the directory where the JupyterLab +server was launched), a list of running kernels and terminals, the +command palette, and a list of open tabs in the main work area. A +screenshot of the default Left Side Bar is provided below.

+

+JupyterLab Left Side Bar

+

The left sidebar can be collapsed or expanded by selecting “Show Left +Sidebar” in the View menu or by clicking on the active sidebar tab.

+
+
+

Main Work Area

+

The main work area in JupyterLab enables you to arrange documents +(notebooks, text files, etc.) and other activities (terminals, code +consoles, etc.) into panels of tabs that can be resized or subdivided. A +screenshot of the default Main Work Area is provided below.

+

If you do not see the Launcher tab, click the blue plus sign under +the “File” and “Edit” menus and it will appear.

+

+JupyterLab Main Work Area

+

Drag a tab to the center of a tab panel to move the tab to the panel. +Subdivide a tab panel by dragging a tab to the left, right, top, or +bottom of the panel. The work area has a single current activity. The +tab for the current activity is marked with a colored top border (blue +by default).

+
+

Creating a Python script

+
  • To start writing a new Python program click the Text File icon under +the Other header in the Launcher tab of the Main Work Area. +
    • You can also create a new plain text file by selecting the New +-> Text File from the File menu in the Menu Bar.
    • +
  • +
  • To convert this plain text file to a Python program, select the +Save File As action from the File menu in the Menu Bar +and give your new text file a name that ends with the .py +extension. +
    • The .py extension lets everyone (including the +operating system) know that this text file is a Python program.
    • +
    • This is convention, not a requirement.
    • +
  • +

Creating a Jupyter Notebook

+

To open a new notebook click the Python 3 icon under the +Notebook header in the Launcher tab in the main work area. You +can also create a new notebook by selecting New -> Notebook +from the File menu in the Menu Bar.

+

Additional notes on Jupyter notebooks.

+
  • Notebook files have the extension .ipynb to distinguish +them from plain-text Python programs.
  • +
  • Notebooks can be exported as Python scripts that can be run from the +command line.
  • +

Below is a screenshot of a Jupyter notebook running inside +JupyterLab. If you are interested in more details, then see the official +notebook documentation.

+

+Example Jupyter Notebook

+
+
+ +
+
+

How It’s Stored

+
+
  • The notebook file is stored in a format called JSON.
  • +
  • Just like a webpage, what’s saved looks different from what you see +in your browser.
  • +
  • But this format allows Jupyter to mix source code, text, and images, +all in one file.
  • +
+
+
+
+
+ +
+
+

Arranging Documents into Panels of Tabs

+
+

In the JupyterLab Main Work Area you can arrange documents into +panels of tabs. Here is an example from the official +documentation.

+

+Multi-panel JupyterLab

+

First, create a text file, Python console, and terminal window and +arrange them into three panels in the main work area. Next, create a +notebook, terminal window, and text file and arrange them into three +panels in the main work area. Finally, create your own combination of +panels and tabs. What combination of panels and tabs do you think will +be most useful for your workflow?

+
+
+
+
+
+ +
+
+

After creating the necessary tabs, you can drag one of the tabs to +the center of a panel to move the tab to the panel; next you can +subdivide a tab panel by dragging a tab to the left, right, top, or +bottom of the panel.

+
+
+
+
+
+
+ +
+
+

Code vs. Text

+
+

Jupyter mixes code and text in different types of blocks, called +cells. We often use the term “code” to mean “the source code of software +written in a language such as Python”. A “code cell” in a Notebook is a +cell that contains software; a “text cell” is one that contains ordinary +prose written for human beings.

+
+
+
+

The Notebook has Command and Edit modes.

+
  • If you press Esc and Return alternately, the +outer border of your code cell will change from gray to blue.
  • +
  • These are the Command (gray) and +Edit (blue) modes of your notebook.
  • +
  • Command mode allows you to edit notebook-level features, and Edit +mode changes the content of cells.
  • +
  • When in Command mode (esc/gray), +
    • The b key will make a new cell below the currently +selected cell.
    • +
    • The a key will make one above.
    • +
    • The x key will delete the current cell.
    • +
    • The z key will undo your last cell operation (which could +be a deletion, creation, etc).
    • +
  • +
  • All actions can be done using the menus, but there are lots of +keyboard shortcuts to speed things up.
  • +
+
+ +
+
+

Command Vs. Edit

+
+

In the Jupyter notebook page are you currently in Command or Edit +mode?
+Switch between the modes. Use the shortcuts to generate a new cell. Use +the shortcuts to delete a cell. Use the shortcuts to undo the last cell +operation you performed.

+
+
+
+
+
+ +
+
+

Command mode has a grey border and Edit mode has a blue border. Use +Esc and Return to switch between modes. You need +to be in Command mode (Press Esc if your cell is blue). Type +b or a. You need to be in Command mode (Press +Esc if your cell is blue). Type x. You need to be +in Command mode (Press Esc if your cell is blue). Type +z.

+
+
+
+
+
+

Use the keyboard and mouse to select and edit cells.

+
  • Pressing the Return key turns the border blue and engages +Edit mode, which allows you to type within the cell.
  • +
  • Because we want to be able to write many lines of code in a single +cell, pressing the Return key when in Edit mode (blue) moves +the cursor to the next line in the cell just like in a text editor.
  • +
  • We need some other way to tell the Notebook we want to run what’s in +the cell.
  • +
  • Pressing Shift+Return together will execute +the contents of the cell.
  • +
  • Notice that the Return and Shift keys on the +right of the keyboard are right next to each other.
  • +
+
+

The Notebook will turn Markdown into pretty-printed +documentation.

+
  • Notebooks can also render Markdown. +
    • A simple plain-text format for writing lists, links, and other +things that might go into a web page.
    • +
    • Equivalently, a subset of HTML that looks like what you’d send in an +old-fashioned email.
    • +
  • +
  • Turn the current cell into a Markdown cell by entering the Command +mode (Esc/gray) and press the M key.
  • +
  • +In [ ]: will disappear to show it is no longer a code +cell and you will be able to write in Markdown.
  • +
  • Turn the current cell into a Code cell by entering the Command mode +(Esc/gray) and press the y key.
  • +
+
+

Markdown does most of what HTML does.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Showing some markdown syntax and its rendered output.
Markdown codeRendered output
*   Use asterisks
+*   to create
+*   bullet lists.
+

+

+
  • Use asterisks
  • +
  • to create
  • +
  • bullet lists.
  • +
1.   Use numbers
+1.   to create
+1.   bullet lists.
+

+

+
  1. Use numbers
  2. +
  3. to create
  4. +
  5. numbered lists.
  6. +
*  You can use indents
+  *  To create sublists
+  *  of the same type
+*  Or sublists
+  1. Of different
+  1. types
+

+

+
  • You can use indents +
    • To create sublists
    • +
    • of the same type
    • +
  • +
  • Or sublists +
    1. Of different
    2. +
    3. types
    4. +
  • +
# A Level-1 Heading
+

+

+

A Level-1 Heading

+
## A Level-2 Heading (etc.)
+

+

+

A Level-2 Heading (etc.)

+
Line breaks
+don't matter.
+
+But blank lines
+create new paragraphs.
+

+

+

Line breaks don’t matter.

+

But blank lines create new paragraphs.

+
[Links](http://software-carpentry.org)
+are created with `[...](...)`.
+Or use [named links][data-carp].
+
+[data-carp]: http://datacarpentry.org
+

+

+

Links are created with +[...](...). Or use named links.

+
+
+ +
+
+

Creating Lists in Markdown

+
+

Create a nested list in a Markdown cell in a notebook that looks like +this:

+
  1. Get funding.
  2. +
  3. Do work.
  4. +
  • Design experiment.
  • +
  • Collect data.
  • +
  • Analyze.
  • +
  1. Write up.
  2. +
  3. Publish.
  4. +
+
+
+
+
+ +
+
+

This challenge integrates both the numbered list and bullet list. +Note that the bullet list is indented 2 spaces so that it is inline with +the items of the numbered list.

+
1.  Get funding.
+2.  Do work.
+    *   Design experiment.
+    *   Collect data.
+    *   Analyze.
+3.  Write up.
+4.  Publish.
+
+
+
+
+
+
+ +
+
+

More Math

+
+

What is displayed when a Python cell in a notebook that contains +several calculations is executed? For example, what happens when this +cell is executed?

+
+

PYTHON +

+
7 * 3
+2 + 1
+
+
+
+
+
+
+ +
+
+

Python returns the output of the last calculation.

+
+

PYTHON +

+
3
+
+
+
+
+
+
+
+ +
+
+

Change an Existing Cell from Code to Markdown

+
+

What happens if you write some Python in a code cell and then you +switch it to a Markdown cell? For example, put the following in a code +cell:

+
+

PYTHON +

+
x = 6 * 7 + 12
+print(x)
+
+

And then run it with Shift+Return to be sure +that it works as a code cell. Now go back to the cell and use +Esc then m to switch the cell to Markdown and +“run” it with Shift+Return. What happened and how +might this be useful?

+
+
+
+
+
+ +
+
+

The Python code gets treated like Markdown text. The lines appear as +if they are part of one contiguous paragraph. This could be useful to +temporarily turn on and off cells in notebooks that get used for +multiple purposes.

+
+

PYTHON +

+
x = 6 * 7 + 12 print(x)
+
+
+
+
+
+
+
+ +
+
+

Equations

+
+

Standard Markdown (such as we’re using for these notes) won’t render +equations, but the Notebook will. Create a new Markdown cell and enter +the following:

+
$\sum_{i=1}^{N} 2^{-i} \approx 1$
+

(It’s probably easier to copy and paste.) What does it display? What +do you think the underscore, _, circumflex, ^, +and dollar sign, $, do?

+
+
+
+
+
+ +
+
+

The notebook shows the equation as it would be rendered from LaTeX +equation syntax. The dollar sign, $, is used to tell +Markdown that the text in between is a LaTeX equation. If you’re not +familiar with LaTeX, underscore, _, is used for subscripts +and circumflex, ^, is used for superscripts. A pair of +curly braces, { and }, is used to group text +together so that the statement i=1 becomes the subscript +and N becomes the superscript. Similarly, -i +is in curly braces to make the whole statement the superscript for +2. \sum and \approx are LaTeX +commands for “sum over” and “approximate” symbols.

+
+
+
+
+
+

Closing JupyterLab

+
  • From the Menu Bar select the “File” menu and then choose “Shut Down” +at the bottom of the dropdown menu. You will be prompted to confirm that +you wish to shutdown the JupyterLab server (don’t forget to save your +work!). Click “Shut Down” to shutdown the JupyterLab server.
  • +
  • To restart the JupyterLab server you will need to re-run the +following command from a shell.
  • +
$ jupyter lab
+
+
+ +
+
+

Closing JupyterLab

+
+

Practice closing and restarting the JupyterLab server.

+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Python scripts are plain text files.
  • +
  • Use the Jupyter Notebook for editing and running Python.
  • +
  • The Notebook has Command and Edit modes.
  • +
  • Use the keyboard and mouse to select and edit cells.
  • +
  • The Notebook will turn Markdown into pretty-printed +documentation.
  • +
  • Markdown does most of what HTML does.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/02-variables.html b/02-variables.html new file mode 100644 index 000000000..53b78721a --- /dev/null +++ b/02-variables.html @@ -0,0 +1,1080 @@ + +Plotting and Programming in Python: Variables and Assignment +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Variables and Assignment

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I store data in programs?
  • +
+
+
+
+
+
+

Objectives

+
  • Write programs that assign scalar values to variables and perform +calculations with those values.
  • +
  • Correctly trace value changes in programs that use scalar +assignment.
  • +
+
+
+
+
+

Use variables to store values.

+
  • Variables are names for values.

  • +
  • +

    Variable names

    +
    • can only contain letters, digits, and underscore +_ (typically used to separate words in long variable +names)
    • +
    • cannot start with a digit
    • +
    • are case sensitive (age, Age and AGE are three +different variables)
    • +
  • +
  • The name should also be meaningful so you or another programmer +know what it is

  • +
  • Variable names that start with underscores like +__alistairs_real_age have a special meaning so we won’t do +that until we understand the convention.

  • +
  • In Python the = symbol assigns the value on the +right to the name on the left.

  • +
  • The variable is created when a value is assigned to it.

  • +
  • +

    Here, Python assigns an age to a variable age and a +name in quotes to a variable first_name.

    +
    +

    PYTHON +

    +
    age = 42
    +first_name = 'Ahmed'
    +
    +
  • +

Use print to display values.

+
  • Python has a built-in function called print that prints +things as text.
  • +
  • Call the function (i.e., tell Python to run it) by using its +name.
  • +
  • Provide values to the function (i.e., the things to print) in +parentheses.
  • +
  • To add a string to the printout, wrap the string in single or double +quotes.
  • +
  • The values passed to the function are called +arguments +
  • +
+

PYTHON +

+
print(first_name, 'is', age, 'years old')
+
+
+

OUTPUT +

+
Ahmed is 42 years old
+
+
  • +print automatically puts a single space between items +to separate them.
  • +
  • And wraps around to a new line at the end.
  • +

Variables must be created before they are used.

+
  • If a variable doesn’t exist yet, or if the name has been +mis-spelled, Python reports an error. (Unlike some languages, which +“guess” a default value.)
  • +
+

PYTHON +

+
print(last_name)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-1-c1fbb4e96102> in <module>()
+----> 1 print(last_name)
+
+NameError: name 'last_name' is not defined
+
+
  • The last line of an error message is usually the most +informative.
  • +
  • We will look at error messages in detail later.
  • +
+
+ +
+
+

Variables Persist Between Cells

+
+

Be aware that it is the order of execution of cells that is +important in a Jupyter notebook, not the order in which they appear. +Python will remember all the code that was run previously, +including any variables you have defined, irrespective of the order in +the notebook. Therefore if you define variables lower down the notebook +and then (re)run cells further up, those defined further down will still +be present. As an example, create two cells with the following content, +in this order:

+
+

PYTHON +

+
print(myval)
+
+
+

PYTHON +

+
myval = 1
+
+

If you execute this in order, the first cell will give an error. +However, if you run the first cell after the second cell it +will print out 1. To prevent confusion, it can be helpful +to use the Kernel -> Restart & Run All +option which clears the interpreter and runs everything from a clean +slate going top to bottom.

+
+
+
+

Variables can be used in calculations.

+
  • We can use variables in calculations just as if they were values. +
    • Remember, we assigned the value 42 to age +a few lines ago.
    • +
  • +
+

PYTHON +

+
age = age + 3
+print('Age in three years:', age)
+
+
+

OUTPUT +

+
Age in three years: 45
+
+

Use an index to get a single character from a string.

+
  • The characters (individual letters, numbers, and so on) in a string +are ordered. For example, the string 'AB' is not the same +as 'BA'. Because of this ordering, we can treat the string +as a list of characters.
  • +
  • Each position in the string (first, second, etc.) is given a number. +This number is called an index or sometimes a +subscript.
  • +
  • Indices are numbered from 0.
  • +
  • Use the position’s index in square brackets to get the character at +that position.
  • +
A line of Python code, print(atom_name[0]), demonstrates that using the zero index will output just the initial letter, in this case ‘h’ for helium.
A line of Python code, print(atom_name[0]), +demonstrates that using the zero index will output just the initial +letter, in this case ‘h’ for helium.
+
+

PYTHON +

+
atom_name = 'helium'
+print(atom_name[0])
+
+
+

OUTPUT +

+
h
+
+

Use a slice to get a substring.

+
  • A part of a string is called a substring. A +substring can be as short as a single character.
  • +
  • An item in a list is called an element. Whenever we treat a string +as if it were a list, the string’s elements are its individual +characters.
  • +
  • A slice is a part of a string (or, more generally, a part of any +list-like thing).
  • +
  • We take a slice with the notation [start:stop], where +start is the integer index of the first element we want and +stop is the integer index of the element just +after the last element we want.
  • +
  • The difference between stop and start is +the slice’s length.
  • +
  • Taking a slice does not change the contents of the original string. +Instead, taking a slice returns a copy of part of the original +string.
  • +
+

PYTHON +

+
atom_name = 'sodium'
+print(atom_name[0:3])
+
+
+

OUTPUT +

+
sod
+
+

Use the built-in function len to find the length of a +string.

+
+

PYTHON +

+
print(len('helium'))
+
+
+

OUTPUT +

+
6
+
+
  • Nested functions are evaluated from the inside out, like in +mathematics.
  • +

Python is case-sensitive.

+
  • Python thinks that upper- and lower-case letters are different, so +Name and name are different variables.
  • +
  • There are conventions for using upper-case letters at the start of +variable names so we will use lower-case letters for now.
  • +

Use meaningful variable names.

+
  • Python doesn’t care what you call variables as long as they obey the +rules (alphanumeric characters and the underscore).
  • +
+

PYTHON +

+
flabadab = 42
+ewr_422_yY = 'Ahmed'
+print(ewr_422_yY, 'is', flabadab, 'years old')
+
+
  • Use meaningful variable names to help other people understand what +the program does.
  • +
  • The most important “other person” is your future self.
  • +
+
+ +
+
+

Swapping Values

+
+

Fill the table showing the values of the variables in this program +after each statement is executed.

+
+

PYTHON +

+
# Command  # Value of x   # Value of y   # Value of swap #
+x = 1.0    #              #              #               #
+y = 3.0    #              #              #               #
+swap = x   #              #              #               #
+x = y      #              #              #               #
+y = swap   #              #              #               #
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
# Command  # Value of x   # Value of y   # Value of swap #
+x = 1.0    # 1.0          # not defined  # not defined   #
+y = 3.0    # 1.0          # 3.0          # not defined   #
+swap = x   # 1.0          # 3.0          # 1.0           #
+x = y      # 3.0          # 3.0          # 1.0           #
+y = swap   # 3.0          # 1.0          # 1.0           #
+
+

These three lines exchange the values in x and +y using the swap variable for temporary +storage. This is a fairly common programming idiom.

+
+
+
+
+
+
+ +
+
+

Predicting Values

+
+

What is the final value of position in the program +below? (Try to predict the value without running the program, then check +your prediction.)

+
+

PYTHON +

+
initial = 'left'
+position = initial
+initial = 'right'
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(position)
+
+
+

OUTPUT +

+
left
+
+

The initial variable is assigned the value +'left'. In the second line, the position +variable also receives the string value 'left'. In third +line, the initial variable is given the value +'right', but the position variable retains its +string value of 'left'.

+
+
+
+
+
+
+ +
+
+

Challenge

+
+

If you assign a = 123, what happens if you try to get +the second digit of a via a[1]?

+
+
+
+
+
+ +
+
+

Numbers are not strings or sequences and Python will raise an error +if you try to perform an index operation on a number. In the next lesson on types and type +conversion we will learn more about types and how to convert between +different types. If you want the Nth digit of a number you can convert +it into a string using the str built-in function and then +perform an index operation on that string.

+
+

PYTHON +

+
a = 123
+print(a[1])
+
+
+

ERROR +

+
TypeError: 'int' object is not subscriptable
+
+
+

PYTHON +

+
a = str(123)
+print(a[1])
+
+
+

OUTPUT +

+
2
+
+
+
+
+
+
+
+ +
+
+

Choosing a Name

+
+

Which is a better variable name, m, min, or +minutes? Why? Hint: think about which code you would rather +inherit from someone who is leaving the lab:

+
  1. ts = m * 60 + s
  2. +
  3. tot_sec = min * 60 + sec
  4. +
  5. total_seconds = minutes * 60 + seconds
  6. +
+
+
+
+
+ +
+
+

minutes is better because min might mean +something like “minimum” (and actually is an existing built-in function +in Python that we will cover later).

+
+
+
+
+
+
+ +
+
+

Slicing practice

+
+

What does the following program print?

+
+

PYTHON +

+
atom_name = 'carbon'
+print('atom_name[1:3] is:', atom_name[1:3])
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
atom_name[1:3] is: ar
+
+
+
+
+
+
+
+ +
+
+

Slicing concepts

+
+

Given the following string:

+
+

PYTHON +

+
species_name = "Acacia buxifolia"
+
+

What would these expressions return?

+
  1. species_name[2:8]
  2. +
  3. +species_name[11:] (without a value after the +colon)
  4. +
  5. +species_name[:4] (without a value before the +colon)
  6. +
  7. +species_name[:] (just a colon)
  8. +
  9. species_name[11:-3]
  10. +
  11. species_name[-5:-3]
  12. +
  13. What happens when you choose a stop value which is out +of range? (i.e., try species_name[0:20] or +species_name[:103])
  14. +
+
+
+
+
+ +
+
+
  1. +species_name[2:8] returns the substring +'acia b' +
  2. +
  3. +species_name[11:] returns the substring +'folia', from position 11 until the end
  4. +
  5. +species_name[:4] returns the substring +'Acac', from the start up to but not including position +4
  6. +
  7. +species_name[:] returns the entire string +'Acacia buxifolia' +
  8. +
  9. +species_name[11:-3] returns the substring +'fo', from the 11th position to the third last +position
  10. +
  11. +species_name[-5:-3] also returns the substring +'fo', from the fifth last position to the third last
  12. +
  13. If a part of the slice is out of range, the operation does not fail. +species_name[0:20] gives the same result as +species_name[0:], and species_name[:103] gives +the same result as species_name[:] +
  14. +
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use variables to store values.
  • +
  • Use print to display values.
  • +
  • Variables persist between cells.
  • +
  • Variables must be created before they are used.
  • +
  • Variables can be used in calculations.
  • +
  • Use an index to get a single character from a string.
  • +
  • Use a slice to get a substring.
  • +
  • Use the built-in function len to find the length of a +string.
  • +
  • Python is case-sensitive.
  • +
  • Use meaningful variable names.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/03-types-conversion.html b/03-types-conversion.html new file mode 100644 index 000000000..b90832c12 --- /dev/null +++ b/03-types-conversion.html @@ -0,0 +1,1158 @@ + +Plotting and Programming in Python: Data Types and Type Conversion +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Data Types and Type Conversion

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • What kinds of data do programs store?
  • +
  • How can I convert one type to another?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain key differences between integers and floating point +numbers.
  • +
  • Explain key differences between numbers and character strings.
  • +
  • Use built-in functions to convert between integers, floating point +numbers, and strings.
  • +
+
+
+
+
+

Every value has a type.

+
  • Every value in a program has a specific type.
  • +
  • Integer (int): represents positive or negative whole +numbers like 3 or -512.
  • +
  • Floating point number (float): represents real numbers +like 3.14159 or -2.5.
  • +
  • Character string (usually called “string”, str): text. +
    • Written in either single quotes or double quotes (as long as they +match).
    • +
    • The quote marks aren’t printed when the string is displayed.
    • +
  • +

Use the built-in function type to find the type of a +value.

+
  • Use the built-in function type to find out what type a +value has.
  • +
  • Works on variables as well. +
    • But remember: the value has the type — the +variable is just a label.
    • +
  • +
+

PYTHON +

+
print(type(52))
+
+
+

OUTPUT +

+
<class 'int'>
+
+
+

PYTHON +

+
fitness = 'average'
+print(type(fitness))
+
+
+

OUTPUT +

+
<class 'str'>
+
+

Types control what operations (or methods) can be performed on a +given value.

+
  • A value’s type determines what the program can do to it.
  • +
+

PYTHON +

+
print(5 - 3)
+
+
+

OUTPUT +

+
2
+
+
+

PYTHON +

+
print('hello' - 'h')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-2-67f5626a1e07> in <module>()
+----> 1 print('hello' - 'h')
+
+TypeError: unsupported operand type(s) for -: 'str' and 'str'
+
+

You can use the “+” and “*” operators on strings.

+
  • “Adding” character strings concatenates them.
  • +
+

PYTHON +

+
full_name = 'Ahmed' + ' ' + 'Walsh'
+print(full_name)
+
+
+

OUTPUT +

+
Ahmed Walsh
+
+
  • Multiplying a character string by an integer N creates a +new string that consists of that character string repeated N +times. +
    • Since multiplication is repeated addition.
    • +
  • +
+

PYTHON +

+
separator = '=' * 10
+print(separator)
+
+
+

OUTPUT +

+
==========
+
+

Strings have a length (but numbers don’t).

+
  • The built-in function len counts the number of +characters in a string.
  • +
+

PYTHON +

+
print(len(full_name))
+
+
+

OUTPUT +

+
11
+
+
  • But numbers don’t have a length (not even zero).
  • +
+

PYTHON +

+
print(len(52))
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-3-f769e8e8097d> in <module>()
+----> 1 print(len(52))
+
+TypeError: object of type 'int' has no len()
+
+

Must convert numbers to strings or vice versa when operating on +them.

+
  • Cannot add numbers and strings.
  • +
+

PYTHON +

+
print(1 + '2')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-4-fe4f54a023c6> in <module>()
+----> 1 print(1 + '2')
+
+TypeError: unsupported operand type(s) for +: 'int' and 'str'
+
+
  • Not allowed because it’s ambiguous: should 1 + '2' be +3 or '12'?
  • +
  • Some types can be converted to other types by using the type name as +a function.
  • +
+

PYTHON +

+
print(1 + int('2'))
+print(str(1) + '2')
+
+
+

OUTPUT +

+
3
+12
+
+

Can mix integers and floats freely in operations.

+
  • Integers and floating-point numbers can be mixed in arithmetic. +
    • Python 3 automatically converts integers to floats as needed.
    • +
  • +
+

PYTHON +

+
print('half is', 1 / 2.0)
+print('three squared is', 3.0 ** 2)
+
+
+

OUTPUT +

+
half is 0.5
+three squared is 9.0
+
+

Variables only change value when something is assigned to them.

+
  • If we make one cell in a spreadsheet depend on another, and update +the latter, the former updates automatically.
  • +
  • This does not happen in programming languages.
  • +
+

PYTHON +

+
variable_one = 1
+variable_two = 5 * variable_one
+variable_one = 2
+print('first is', variable_one, 'and second is', variable_two)
+
+
+

OUTPUT +

+
first is 2 and second is 5
+
+
  • The computer reads the value of variable_one when doing +the multiplication, creates a new value, and assigns it to +variable_two.
  • +
  • Afterwards, the value of variable_two is set to the new +value and not dependent on variable_one so its +value does not automatically change when variable_one +changes.
  • +
+
+ +
+
+

Fractions

+
+

What type of value is 3.4? How can you find out?

+
+
+
+
+
+ +
+
+

It is a floating-point number (often abbreviated “float”). It is +possible to find out by using the built-in function +type().

+
+

PYTHON +

+
print(type(3.4))
+
+
+

OUTPUT +

+
<class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Automatic Type Conversion

+
+

What type of value is 3.25 + 4?

+
+
+
+
+
+ +
+
+

It is a float: integers are automatically converted to floats as +necessary.

+
+

PYTHON +

+
result = 3.25 + 4
+print(result, 'is', type(result))
+
+
+

OUTPUT +

+
7.25 is <class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Choose a Type

+
+

What type of value (integer, floating point number, or character +string) would you use to represent each of the following? Try to come up +with more than one good answer for each problem. For example, in # 1, +when would counting days with a floating point variable make more sense +than using an integer?

+
  1. Number of days since the start of the year.
  2. +
  3. Time elapsed from the start of the year until now in days.
  4. +
  5. Serial number of a piece of lab equipment.
  6. +
  7. A lab specimen’s age
  8. +
  9. Current population of a city.
  10. +
  11. Average population of a city over time.
  12. +
+
+
+
+
+ +
+
+

The answers to the questions are:

+
  1. Integer, since the number of days would lie between 1 and 365.
  2. +
  3. Floating point, since fractional days are required
  4. +
  5. Character string if serial number contains letters and numbers, +otherwise integer if the serial number consists only of numerals
  6. +
  7. This will vary! How do you define a specimen’s age? whole days since +collection (integer)? date and time (string)?
  8. +
  9. Choose floating point to represent population as large aggregates +(eg millions), or integer to represent population in units of +individuals.
  10. +
  11. Floating point number, since an average is likely to have a +fractional part.
  12. +
+
+
+
+
+
+ +
+
+

Division Types

+
+

In Python 3, the // operator performs integer +(whole-number) floor division, the / operator performs +floating-point division, and the % (or modulo) +operator calculates and returns the remainder from integer division:

+
+

PYTHON +

+
print('5 // 3:', 5 // 3)
+print('5 / 3:', 5 / 3)
+print('5 % 3:', 5 % 3)
+
+
+

OUTPUT +

+
5 // 3: 1
+5 / 3: 1.6666666666666667
+5 % 3: 2
+
+

If num_subjects is the number of subjects taking part in +a study, and num_per_survey is the number that can take +part in a single survey, write an expression that calculates the number +of surveys needed to reach everyone once.

+
+
+
+
+
+ +
+
+

We want the minimum number of surveys that reaches everyone once, +which is the rounded up value of +num_subjects/ num_per_survey. This is equivalent to +performing a floor division with // and adding 1. Before +the division we need to subtract 1 from the number of subjects to deal +with the case where num_subjects is evenly divisible by +num_per_survey.

+
+

PYTHON +

+
num_subjects = 600
+num_per_survey = 42
+num_surveys = (num_subjects - 1) // num_per_survey + 1
+
+print(num_subjects, 'subjects,', num_per_survey, 'per survey:', num_surveys)
+
+
+

OUTPUT +

+
600 subjects, 42 per survey: 15
+
+
+
+
+
+
+
+ +
+
+

Strings to Numbers

+
+

Where reasonable, float() will convert a string to a +floating point number, and int() will convert a floating +point number to an integer:

+
+

PYTHON +

+
print("string to float:", float("3.4"))
+print("float to int:", int(3.4))
+
+
+

OUTPUT +

+
string to float: 3.4
+float to int: 3
+
+

If the conversion doesn’t make sense, however, an error message will +occur.

+
+

PYTHON +

+
print("string to float:", float("Hello world!"))
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-5-df3b790bf0a2> in <module>
+----> 1 print("string to float:", float("Hello world!"))
+
+ValueError: could not convert string to float: 'Hello world!'
+
+

Given this information, what do you expect the following program to +do?

+

What does it actually do?

+

Why do you think it does that?

+
+

PYTHON +

+
print("fractional string to int:", int("3.4"))
+
+
+
+
+
+
+ +
+
+

What do you expect this program to do? It would not be so +unreasonable to expect the Python 3 int command to convert +the string “3.4” to 3.4 and an additional type conversion to 3. After +all, Python 3 performs a lot of other magic - isn’t that part of its +charm?

+
+

PYTHON +

+
int("3.4")
+
+
+

OUTPUT +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-2-ec6729dfccdc> in <module>
+----> 1 int("3.4")
+ValueError: invalid literal for int() with base 10: '3.4'
+
+

However, Python 3 throws an error. Why? To be consistent, possibly. +If you ask Python to perform two consecutive typecasts, you must convert +it explicitly in code.

+
+

PYTHON +

+
int(float("3.4"))
+
+
+

OUTPUT +

+
3
+
+
+
+
+
+
+
+ +
+
+

Arithmetic with Different Types

+
+

Which of the following will return the floating point number +2.0? Note: there may be more than one right answer.

+
+

PYTHON +

+
first = 1.0
+second = "1"
+third = "1.1"
+
+
  1. first + float(second)
  2. +
  3. float(second) + float(third)
  4. +
  5. first + int(third)
  6. +
  7. first + int(float(third))
  8. +
  9. int(first) + int(float(third))
  10. +
  11. 2.0 * second
  12. +
+
+
+
+
+ +
+
+

Answer: 1 and 4

+
+
+
+
+
+
+ +
+
+

Complex Numbers

+
+

Python provides complex numbers, which are written as +1.0+2.0j. If val is a complex number, its real +and imaginary parts can be accessed using dot notation as +val.real and val.imag.

+
+

PYTHON +

+
a_complex_number = 6 + 2j
+print(a_complex_number.real)
+print(a_complex_number.imag)
+
+
+

OUTPUT +

+
6.0
+2.0
+
+
  1. Why do you think Python uses j instead of +i for the imaginary part?
  2. +
  3. What do you expect 1 + 2j + 3 to produce?
  4. +
  5. What do you expect 4j to be? What about +4 j or 4 + j?
  6. +
+
+
+
+
+ +
+
+
  1. Standard mathematics treatments typically use i to +denote an imaginary number. However, from media reports it was an early +convention established from electrical engineering that now presents a +technically expensive area to change. Stack +Overflow provides additional explanation and discussion. +
  2. +
  3. (4+2j)
  4. +
  5. +4j and Syntax Error: invalid syntax. In +the latter cases, j is considered a variable and the +statement depends on if j is defined and if so, its +assigned value.
  6. +
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Every value has a type.
  • +
  • Use the built-in function type to find the type of a +value.
  • +
  • Types control what operations can be done on values.
  • +
  • Strings can be added and multiplied.
  • +
  • Strings have a length (but numbers don’t).
  • +
  • Must convert numbers to strings or vice versa when operating on +them.
  • +
  • Can mix integers and floats freely in operations.
  • +
  • Variables only change value when something is assigned to them.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/04-built-in.html b/04-built-in.html new file mode 100644 index 000000000..48fcc3e09 --- /dev/null +++ b/04-built-in.html @@ -0,0 +1,1062 @@ + +Plotting and Programming in Python: Built-in Functions and Help +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Built-in Functions and Help

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I use built-in functions?
  • +
  • How can I find out what they do?
  • +
  • What kind of errors can occur in programs?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain the purpose of functions.
  • +
  • Correctly call built-in Python functions.
  • +
  • Correctly nest calls to built-in functions.
  • +
  • Use help to display documentation for built-in functions.
  • +
  • Correctly describe situations in which SyntaxError and NameError +occur.
  • +
+
+
+
+
+

Use comments to add documentation to programs.

+
+

PYTHON +

+
# This sentence isn't executed by Python.
+adjustment = 0.5   # Neither is this - anything after '#' is ignored.
+
+

A function may take zero or more arguments.

+
  • We have seen some functions already — now let’s take a closer +look.
  • +
  • An argument is a value passed into a function.
  • +
  • +len takes exactly one.
  • +
  • +int, str, and float create a +new value from an existing one.
  • +
  • +print takes zero or more.
  • +
  • +print with no arguments prints a blank line. +
    • Must always use parentheses, even if they’re empty, so that Python +knows a function is being called.
    • +
  • +
+

PYTHON +

+
print('before')
+print()
+print('after')
+
+
+

OUTPUT +

+
before
+
+after
+
+

Every function returns something.

+
  • Every function call produces some result.
  • +
  • If the function doesn’t have a useful result to return, it usually +returns the special value None. None is a +Python object that stands in anytime there is no value.
  • +
+

PYTHON +

+
result = print('example')
+print('result of print is', result)
+
+
+

OUTPUT +

+
example
+result of print is None
+
+

Commonly-used built-in functions include max, +min, and round.

+
  • Use max to find the largest value of one or more +values.
  • +
  • Use min to find the smallest.
  • +
  • Both work on character strings as well as numbers. +
    • “Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.
    • +
  • +
+

PYTHON +

+
print(max(1, 2, 3))
+print(min('a', 'A', '0'))
+
+
+

OUTPUT +

+
3
+0
+
+

Functions may only work for certain (combinations of) +arguments.

+
  • +max and min must be given at least one +argument. +
    • “Largest of the empty set” is a meaningless question.
    • +
  • +
  • And they must be given things that can meaningfully be +compared.
  • +
+

PYTHON +

+
print(max(1, 'a'))
+
+
+

ERROR +

+
TypeError                                 Traceback (most recent call last)
+<ipython-input-52-3f049acf3762> in <module>
+----> 1 print(max(1, 'a'))
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+

Functions may have default values for some arguments.

+
  • +round will round off a floating-point number.
  • +
  • By default, rounds to zero decimal places.
  • +
+

PYTHON +

+
round(3.712)
+
+
+

OUTPUT +

+
4
+
+
  • We can specify the number of decimal places we want.
  • +
+

PYTHON +

+
round(3.712, 1)
+
+
+

OUTPUT +

+
3.7
+
+

Functions attached to objects are called methods

+
  • Functions take another form that will be common in the pandas +episodes.
  • +
  • Methods have parentheses like functions, but come after the +variable.
  • +
  • Some methods are used for internal Python operations, and are marked +with double underlines.
  • +
+

PYTHON +

+
my_string = 'Hello world!'  # creation of a string object 
+
+print(len(my_string))       # the len function takes a string as an argument and returns the length of the string
+
+print(my_string.swapcase()) # calling the swapcase method on the my_string object
+
+print(my_string.__len__())  # calling the internal __len__ method on the my_string object, used by len(my_string)
+
+
+

OUTPUT +

+
12
+hELLO WORLD!
+12
+
+
  • You might even see them chained together. They operate left to +right.
  • +
+

PYTHON +

+
print(my_string.isupper())          # Not all the letters are uppercase
+print(my_string.upper())            # This capitalizes all the letters
+
+print(my_string.upper().isupper())  # Now all the letters are uppercase
+
+
+

OUTPUT +

+
False
+HELLO WORLD
+True
+
+

Use the built-in function help to get help for a +function.

+
  • Every built-in function has online documentation.
  • +
+

PYTHON +

+
help(round)
+
+
+

OUTPUT +

+
Help on built-in function round in module builtins:
+
+round(number, ndigits=None)
+    Round a number to a given precision in decimal digits.
+
+    The return value is an integer if ndigits is omitted or None.  Otherwise
+    the return value has the same type as the number.  ndigits may be negative.
+
+

The Jupyter Notebook has two ways to get help.

+
  • Option 1: Place the cursor near where the function is invoked in a +cell (i.e., the function name or its parameters), +
    • Hold down Shift, and press Tab.
    • +
    • Do this several times to expand the information returned.
    • +
  • +
  • Option 2: Type the function name in a cell with a question mark +after it. Then run the cell.
  • +

Python reports a syntax error when it can’t understand the source of +a program.

+
  • Won’t even try to run the program if it can’t be parsed.
  • +
+

PYTHON +

+
# Forgot to close the quote marks around the string.
+name = 'Feng
+
+
+

ERROR +

+
  File "<ipython-input-56-f42768451d55>", line 2
+    name = 'Feng
+                ^
+SyntaxError: EOL while scanning string literal
+
+
+

PYTHON +

+
# An extra '=' in the assignment.
+age = = 52
+
+
+

ERROR +

+
  File "<ipython-input-57-ccc3df3cf902>", line 2
+    age = = 52
+          ^
+SyntaxError: invalid syntax
+
+
  • Look more closely at the error message:
  • +
+

PYTHON +

+
print("hello world"
+
+
+

ERROR +

+
  File "<ipython-input-6-d1cc229bf815>", line 1
+    print ("hello world"
+                        ^
+SyntaxError: unexpected EOF while parsing
+
+
  • The message indicates a problem on first line of the input (“line +1”). +
    • In this case the “ipython-input” section of the file name tells us +that we are working with input into IPython, the Python interpreter used +by the Jupyter Notebook.
    • +
  • +
  • The -6- part of the filename indicates that the error +occurred in cell 6 of our Notebook.
  • +
  • Next is the problematic line of code, indicating the problem with a +^ pointer.
  • +

Python reports a runtime error when something goes wrong while a +program is executing.

+
+

PYTHON +

+
age = 53
+remaining = 100 - aege # mis-spelled 'age'
+
+
+

ERROR +

+
NameError                                 Traceback (most recent call last)
+<ipython-input-59-1214fb6c55fc> in <module>
+      1 age = 53
+----> 2 remaining = 100 - aege # mis-spelled 'age'
+
+NameError: name 'aege' is not defined
+
+
  • Fix syntax errors by reading the source and runtime errors by +tracing execution.
  • +
+
+ +
+
+

What Happens When

+
+
  1. Explain in simple terms the order of operations in the following +program: when does the addition happen, when does the subtraction +happen, when is each function called, etc.
  2. +
  3. What is the final value of radiance?
  4. +
+

PYTHON +

+
radiance = 1.0
+radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))
+
+
+
+
+
+
+ +
+
+
  1. Order of operations:
  2. +
  3. 1.1 * radiance = 1.1
  4. +
  5. 1.1 - 0.5 = 0.6
  6. +
  7. min(radiance, 0.6) = 0.6
  8. +
  9. 2.0 + 0.6 = 2.6
  10. +
  11. max(2.1, 2.6) = 2.6
  12. +
  13. At the end, radiance = 2.6 +
  14. +
+
+
+
+
+
+ +
+
+

Spot the Difference

+
+
  1. Predict what each of the print statements in the +program below will print.
  2. +
  3. Does max(len(rich), poor) run or produce an error +message? If it runs, does its result make any sense?
  4. +
+

PYTHON +

+
easy_string = "abc"
+print(max(easy_string))
+rich = "gold"
+poor = "tin"
+print(max(rich, poor))
+print(max(len(rich), len(poor)))
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(max(easy_string))
+
+
+

OUTPUT +

+
c
+
+
+

PYTHON +

+
print(max(rich, poor))
+
+
+

OUTPUT +

+
tin
+
+
+

PYTHON +

+
print(max(len(rich), len(poor)))
+
+
+

OUTPUT +

+
4
+
+

max(len(rich), poor) throws a TypeError. This turns into +max(4, 'tin') and as we discussed earlier a string and +integer cannot meaningfully be compared.

+
+

ERROR +

+
TypeError                                 Traceback (most recent call last)
+<ipython-input-65-bc82ad05177a> in <module>
+----> 1 max(len(rich), poor)
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
+
+
+
+
+
+ +
+
+

Why Not?

+
+

Why is it that max and min do not return +None when they are called with no arguments?

+
+
+
+
+
+ +
+
+

max and min return TypeErrors in this case +because the correct number of parameters was not supplied. If it just +returned None, the error would be much harder to trace as +it would likely be stored into a variable and used later in the program, +only to likely throw a runtime error.

+
+
+
+
+
+
+ +
+
+

Last Character of a String

+
+

If Python starts counting from zero, and len returns the +number of characters in a string, what index expression will get the +last character in the string name? (Note: we will see a +simpler way to do this in a later episode.)

+
+
+
+
+
+ +
+
+

name[len(name) - 1]

+
+
+
+
+
+
+ +
+
+

Explore the Python docs!

+
+

The official Python +documentation is arguably the most complete source of information +about the language. It is available in different languages and contains +a lot of useful resources. The Built-in +Functions page contains a catalogue of all of these functions, +including the ones that we’ve covered in this lesson. Some of these are +more advanced and unnecessary at the moment, but others are very simple +and useful.

+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use comments to add documentation to programs.
  • +
  • A function may take zero or more arguments.
  • +
  • Commonly-used built-in functions include max, +min, and round.
  • +
  • Functions may only work for certain (combinations of) +arguments.
  • +
  • Functions may have default values for some arguments.
  • +
  • Use the built-in function help to get help for a +function.
  • +
  • The Jupyter Notebook has two ways to get help.
  • +
  • Every function returns something.
  • +
  • Python reports a syntax error when it can’t understand the source of +a program.
  • +
  • Python reports a runtime error when something goes wrong while a +program is executing.
  • +
  • Fix syntax errors by reading the source code, and runtime errors by +tracing the program’s execution.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/05-coffee.html b/05-coffee.html new file mode 100644 index 000000000..8cce9520e --- /dev/null +++ b/05-coffee.html @@ -0,0 +1,544 @@ + +Plotting and Programming in Python: Morning Coffee +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Morning Coffee

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + +

Reflection exercise

+

Over coffee, reflect on and discuss the following:

+
  • What are the different kinds of errors Python will report?
  • +
  • Did the code always produce the results you expected? If not, +why?
  • +
  • Is there something we can do to prevent errors when we write +code?
  • +
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/06-libraries.html b/06-libraries.html new file mode 100644 index 000000000..60d284141 --- /dev/null +++ b/06-libraries.html @@ -0,0 +1,1110 @@ + +Plotting and Programming in Python: Libraries +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Libraries

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I use software that other people have written?
  • +
  • How can I find out what that software does?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain what software libraries are and why programmers create and +use them.
  • +
  • Write programs that import and use modules from Python’s standard +library.
  • +
  • Find and read documentation for the standard library interactively +(in the interpreter) and online.
  • +
+
+
+
+
+

Most of the power of a programming language is in its +libraries.

+
  • A library is a collection of files (called +modules) that contains functions for use by other programs. +
    • May also contain data values (e.g., numerical constants) and other +things.
    • +
    • Library’s contents are supposed to be related, but there’s no way to +enforce that.
    • +
  • +
  • The Python standard +library is an extensive suite of modules that comes with Python +itself.
  • +
  • Many additional libraries are available from PyPI (the Python Package +Index).
  • +
  • We will see later how to write new libraries.
  • +
+
+ +
+
+

Libraries and modules

+
+

A library is a collection of modules, but the terms are often used +interchangeably, especially since many libraries only consist of a +single module, so don’t worry if you mix them.

+
+
+
+

A program must import a library module before using it.

+
  • Use import to load a library module into a program’s +memory.
  • +
  • Then refer to things from the module as +module_name.thing_name. +
    • Python uses . to mean “part of”.
    • +
  • +
  • Using math, one of the modules in the standard +library:
  • +
+

PYTHON +

+
import math
+
+print('pi is', math.pi)
+print('cos(pi) is', math.cos(math.pi))
+
+
+

OUTPUT +

+
pi is 3.141592653589793
+cos(pi) is -1.0
+
+
  • Have to refer to each item with the module’s name. +
    • +math.cos(pi) won’t work: the reference to +pi doesn’t somehow “inherit” the function’s reference to +math.
    • +
  • +

Use help to learn about the contents of a library +module.

+
  • Works just like help for a function.
  • +
+

PYTHON +

+
help(math)
+
+
+

OUTPUT +

+
Help on module math:
+
+NAME
+    math
+
+MODULE REFERENCE
+    http://docs.python.org/3/library/math
+
+    The following documentation is automatically generated from the Python
+    source files.  It may be incomplete, incorrect or include features that
+    are considered implementation detail and may vary between Python
+    implementations.  When in doubt, consult the module reference at the
+    location listed above.
+
+DESCRIPTION
+    This module is always available.  It provides access to the
+    mathematical functions defined by the C standard.
+
+FUNCTIONS
+    acos(x, /)
+        Return the arc cosine (measured in radians) of x.
+⋮ ⋮ ⋮
+
+

Import specific items from a library module to shorten +programs.

+
  • Use from ... import ... to load only specific items +from a library module.
  • +
  • Then refer to them directly without library name as prefix.
  • +
+

PYTHON +

+
from math import cos, pi
+
+print('cos(pi) is', cos(pi))
+
+
+

OUTPUT +

+
cos(pi) is -1.0
+
+

Create an alias for a library module when importing it to shorten +programs.

+
  • Use import ... as ... to give a library a short +alias while importing it.
  • +
  • Then refer to items in the library using that shortened name.
  • +
+

PYTHON +

+
import math as m
+
+print('cos(pi) is', m.cos(m.pi))
+
+
+

OUTPUT +

+
cos(pi) is -1.0
+
+
  • Commonly used for libraries that are frequently used or have long +names. +
    • E.g., the matplotlib plotting library is often aliased +as mpl.
    • +
  • +
  • But can make programs harder to understand, since readers must learn +your program’s aliases.
  • +
+
+ +
+
+

Exploring the Math Module

+
+
  1. What function from the math module can you use to +calculate a square root without using sqrt?
  2. +
  3. Since the library contains this function, why does sqrt +exist?
  4. +
+
+
+
+
+ +
+
+
  1. Using help(math) we see that we’ve got +pow(x,y) in addition to sqrt(x), so we could +use pow(x, 0.5) to find a square root.

  2. +
  3. The sqrt(x) function is arguably more readable than +pow(x, 0.5) when implementing equations. Readability is a +cornerstone of good programming, so it makes sense to provide a special +function for this specific common case.

  4. +

Also, the design of Python’s math library has its origin +in the C standard, which includes both sqrt(x) and +pow(x,y), so a little bit of the history of programming is +showing in Python’s function names.

+
+
+
+
+
+
+ +
+
+

Locating the Right Module

+
+

You want to select a random character from a string:

+
+

PYTHON +

+
bases = 'ACTTGCTTGAC'
+
+
  1. Which standard +library module could help you?
  2. +
  3. Which function would you select from that module? Are there +alternatives?
  4. +
  5. Try to write a program that uses the function.
  6. +
+
+
+
+
+ +
+
+

The random +module seems like it could help.

+

The string has 11 characters, each having a positional index from 0 +to 10. You could use the random.randrange +or random.randint +functions to get a random integer between 0 and 10, and then select the +bases character at that index:

+
+

PYTHON +

+
from random import randrange
+
+random_index = randrange(len(bases))
+print(bases[random_index])
+
+

or more compactly:

+
+

PYTHON +

+
from random import randrange
+
+print(bases[randrange(len(bases))])
+
+

Perhaps you found the random.sample +function? It allows for slightly less typing but might be a bit harder +to understand just by reading:

+
+

PYTHON +

+
from random import sample
+
+print(sample(bases, 1)[0])
+
+

Note that this function returns a list of values. We will learn about +lists in episode 11.

+

The simplest and shortest solution is the random.choice +function that does exactly what we want:

+
+

PYTHON +

+
from random import choice
+
+print(choice(bases))
+
+
+
+
+
+
+
+ +
+
+

Jigsaw Puzzle (Parson’s Problem) Programming Example

+
+

Rearrange the following statements so that a random DNA base is +printed and its index in the string. Not all statements may be needed. +Feel free to use/add intermediate variables.

+
+

PYTHON +

+
bases="ACTTGCTTGAC"
+import math
+import random
+___ = random.randrange(n_bases)
+___ = len(bases)
+print("random base ", bases[___], "base index", ___)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import math 
+import random
+bases = "ACTTGCTTGAC" 
+n_bases = len(bases)
+idx = random.randrange(n_bases)
+print("random base", bases[idx], "base index", idx)
+
+
+
+
+
+
+
+ +
+
+

When Is Help Available?

+
+

When a colleague of yours types help(math), Python +reports an error:

+
+

ERROR +

+
NameError: name 'math' is not defined
+
+

What has your colleague forgotten to do?

+
+
+
+
+
+ +
+
+

Importing the math module (import math)

+
+
+
+
+
+
+ +
+
+

Importing With Aliases

+
+
  1. Fill in the blanks so that the program below prints +90.0.
  2. +
  3. Rewrite the program so that it uses import +without as.
  4. +
  5. Which form do you find easier to read?
  6. +
+

PYTHON +

+
import math as m
+angle = ____.degrees(____.pi / 2)
+print(____)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import math as m
+angle = m.degrees(m.pi / 2)
+print(angle)
+
+

can be written as

+
+

PYTHON +

+
import math
+angle = math.degrees(math.pi / 2)
+print(angle)
+
+

Since you just wrote the code and are familiar with it, you might +actually find the first version easier to read. But when trying to read +a huge piece of code written by someone else, or when getting back to +your own huge piece of code after several months, non-abbreviated names +are often easier, except where there are clear abbreviation +conventions.

+
+
+
+
+
+
+ +
+
+

There Are Many Ways To Import Libraries!

+
+

Match the following print statements with the appropriate library +calls.

+

Print commands:

+
  1. print("sin(pi/2) =", sin(pi/2))
  2. +
  3. print("sin(pi/2) =", m.sin(m.pi/2))
  4. +
  5. print("sin(pi/2) =", math.sin(math.pi/2))
  6. +

Library calls:

+
  1. from math import sin, pi
  2. +
  3. import math
  4. +
  5. import math as m
  6. +
  7. from math import *
  8. +
+
+
+
+
+ +
+
+
  1. Library calls 1 and 4. In order to directly refer to +sin and pi without the library name as prefix, +you need to use the from ... import ... statement. Whereas +library call 1 specifically imports the two functions sin +and pi, library call 4 imports all functions in the +math module.
  2. +
  3. Library call 3. Here sin and pi are +referred to with a shortened library name m instead of +math. Library call 3 does exactly that using the +import ... as ... syntax - it creates an alias for +math in the form of the shortened name m.
  4. +
  5. Library call 2. Here sin and pi are +referred to with the regular library name math, so the +regular import ... call suffices.
  6. +

Note: although library call 4 works, importing all +names from a module using a wildcard import is not recommended as it makes it +unclear which names from the module are used in the code. In general it +is best to make your imports as specific as possible and to only import +what your code uses. In library call 1, the import +statement explicitly tells us that the sin function is +imported from the math module, but library call 4 does not +convey this information.

+
+
+
+
+
+
+ +
+
+

Importing Specific Items

+
+
  1. Fill in the blanks so that the program below prints +90.0.
  2. +
  3. Do you find this version easier to read than preceding ones?
  4. +
  5. Why wouldn’t programmers always use this form of +import?
  6. +
+

PYTHON +

+
____ math import ____, ____
+angle = degrees(pi / 2)
+print(angle)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
from math import degrees, pi
+angle = degrees(pi / 2)
+print(angle)
+
+

Most likely you find this version easier to read since it’s less +dense. The main reason not to use this form of import is to avoid name +clashes. For instance, you wouldn’t import degrees this way +if you also wanted to use the name degrees for a variable +or function of your own. Or if you were to also import a function named +degrees from another library.

+
+
+
+
+
+
+ +
+
+

Reading Error Messages

+
+
  1. Read the code below and try to identify what the errors are without +running it.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
+

PYTHON +

+
from math import log
+log(0)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-1-d72e1d780bab> in <module>
+      1 from math import log
+----> 2 log(0)
+
+ValueError: math domain error
+
+
  1. The logarithm of x is only defined for +x > 0, so 0 is outside the domain of the function.
  2. +
  3. You get an error of type ValueError, indicating that +the function received an inappropriate argument value. The additional +message “math domain error” makes it clearer what the problem is.
  4. +
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Most of the power of a programming language is in its +libraries.
  • +
  • A program must import a library module in order to use it.
  • +
  • Use help to learn about the contents of a library +module.
  • +
  • Import specific items from a library to shorten programs.
  • +
  • Create an alias for a library when importing it to shorten +programs.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/07-reading-tabular.html b/07-reading-tabular.html new file mode 100644 index 000000000..59a2f4048 --- /dev/null +++ b/07-reading-tabular.html @@ -0,0 +1,1081 @@ + +Plotting and Programming in Python: Reading Tabular Data into DataFrames +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Reading Tabular Data into DataFrames

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I read tabular data?
  • +
+
+
+
+
+
+

Objectives

+
  • Import the Pandas library.
  • +
  • Use Pandas to load a simple CSV data set.
  • +
  • Get some basic information about a Pandas DataFrame.
  • +
+
+
+
+
+

Use the Pandas library to do statistics on tabular data.

+
  • +Pandas is a widely-used +Python library for statistics, particularly on tabular data.
  • +
  • Borrows many features from R’s dataframes. +
    • A 2-dimensional table whose columns have names and potentially have +different data types.
    • +
  • +
  • Load Pandas with import pandas as pd. The alias +pd is commonly used to refer to the Pandas library in +code.
  • +
  • Read a Comma Separated Values (CSV) data file with +pd.read_csv. +
    • Argument is the name of the file to be read.
    • +
    • Returns a dataframe that you can assign to a variable
    • +
  • +
+

PYTHON +

+
import pandas as pd
+
+data_oceania = pd.read_csv('data/gapminder_gdp_oceania.csv')
+print(data_oceania)
+
+
+

OUTPUT +

+
       country  gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  \
+0    Australia     10039.59564     10949.64959     12217.22686
+1  New Zealand     10556.57566     12247.39532     13175.67800
+
+   gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  \
+0     14526.12465     16788.62948     18334.19751     19477.00928
+1     14463.91893     16046.03728     16233.71770     17632.41040
+
+   gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  \
+0     21888.88903     23424.76683     26997.93657     30687.75473
+1     19007.19129     18363.32494     21050.41377     23189.80135
+
+   gdpPercap_2007
+0     34435.36744
+1     25185.00911
+
+
  • The columns in a dataframe are the observed variables, and the rows +are the observations.
  • +
  • Pandas uses backslash \ to show wrapped lines when +output is too wide to fit the screen.
  • +
  • Using descriptive dataframe names helps us distinguish between +multiple dataframes so we won’t accidentally overwrite a dataframe or +read from the wrong one.
  • +
+
+ +
+
+

File Not Found

+
+

Our lessons store their data files in a data +sub-directory, which is why the path to the file is +data/gapminder_gdp_oceania.csv. If you forget to include +data/, or if you include it but your copy of the file is +somewhere else, you will get a runtime +error that ends with a line like this:

+
+

ERROR +

+
FileNotFoundError: [Errno 2] No such file or directory: 'data/gapminder_gdp_oceania.csv'
+
+
+
+
+

Use index_col to specify that a column’s values should +be used as row headings.

+
  • Row headings are numbers (0 and 1 in this case).
  • +
  • Really want to index by country.
  • +
  • Pass the name of the column to read_csv as its +index_col parameter to do this.
  • +
  • Naming the dataframe data_oceania_country tells us +which region the data includes (oceania) and how it is +indexed (country).
  • +
+

PYTHON +

+
data_oceania_country = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+print(data_oceania_country)
+
+
+

OUTPUT +

+
             gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+country
+Australia       10039.59564     10949.64959     12217.22686     14526.12465
+New Zealand     10556.57566     12247.39532     13175.67800     14463.91893
+
+             gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+country
+Australia       16788.62948     18334.19751     19477.00928     21888.88903
+New Zealand     16046.03728     16233.71770     17632.41040     19007.19129
+
+             gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+country
+Australia       23424.76683     26997.93657     30687.75473     34435.36744
+New Zealand     18363.32494     21050.41377     23189.80135     25185.00911
+
+

Use the DataFrame.info() method to find out more about +a dataframe.

+
+

PYTHON +

+
data_oceania_country.info()
+
+
+

OUTPUT +

+
<class 'pandas.core.frame.DataFrame'>
+Index: 2 entries, Australia to New Zealand
+Data columns (total 12 columns):
+gdpPercap_1952    2 non-null float64
+gdpPercap_1957    2 non-null float64
+gdpPercap_1962    2 non-null float64
+gdpPercap_1967    2 non-null float64
+gdpPercap_1972    2 non-null float64
+gdpPercap_1977    2 non-null float64
+gdpPercap_1982    2 non-null float64
+gdpPercap_1987    2 non-null float64
+gdpPercap_1992    2 non-null float64
+gdpPercap_1997    2 non-null float64
+gdpPercap_2002    2 non-null float64
+gdpPercap_2007    2 non-null float64
+dtypes: float64(12)
+memory usage: 208.0+ bytes
+
+
  • This is a DataFrame +
  • +
  • Two rows named 'Australia' and +'New Zealand' +
  • +
  • Twelve columns, each of which has two actual 64-bit floating point +values. +
    • We will talk later about null values, which are used to represent +missing observations.
    • +
  • +
  • Uses 208 bytes of memory.
  • +

The DataFrame.columns variable stores information about +the dataframe’s columns.

+
  • Note that this is data, not a method. (It doesn’t have +parentheses.) +
    • Like math.pi.
    • +
    • So do not use () to try to call it.
    • +
  • +
  • Called a member variable, or just member.
  • +
+

PYTHON +

+
print(data_oceania_country.columns)
+
+
+

OUTPUT +

+
Index(['gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967',
+       'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987',
+       'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007'],
+      dtype='object')
+
+

Use DataFrame.T to transpose a dataframe.

+
  • Sometimes want to treat columns as rows and vice versa.
  • +
  • Transpose (written .T) doesn’t copy the data, just +changes the program’s view of it.
  • +
  • Like columns, it is a member variable.
  • +
+

PYTHON +

+
print(data_oceania_country.T)
+
+
+

OUTPUT +

+
country           Australia  New Zealand
+gdpPercap_1952  10039.59564  10556.57566
+gdpPercap_1957  10949.64959  12247.39532
+gdpPercap_1962  12217.22686  13175.67800
+gdpPercap_1967  14526.12465  14463.91893
+gdpPercap_1972  16788.62948  16046.03728
+gdpPercap_1977  18334.19751  16233.71770
+gdpPercap_1982  19477.00928  17632.41040
+gdpPercap_1987  21888.88903  19007.19129
+gdpPercap_1992  23424.76683  18363.32494
+gdpPercap_1997  26997.93657  21050.41377
+gdpPercap_2002  30687.75473  23189.80135
+gdpPercap_2007  34435.36744  25185.00911
+
+

Use DataFrame.describe() to get summary statistics +about data.

+

DataFrame.describe() gets the summary statistics of only +the columns that have numerical data. All other columns are ignored, +unless you use the argument include='all'.

+
+

PYTHON +

+
print(data_oceania_country.describe())
+
+
+

OUTPUT +

+
       gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+count        2.000000        2.000000        2.000000        2.000000
+mean     10298.085650    11598.522455    12696.452430    14495.021790
+std        365.560078      917.644806      677.727301       43.986086
+min      10039.595640    10949.649590    12217.226860    14463.918930
+25%      10168.840645    11274.086022    12456.839645    14479.470360
+50%      10298.085650    11598.522455    12696.452430    14495.021790
+75%      10427.330655    11922.958888    12936.065215    14510.573220
+max      10556.575660    12247.395320    13175.678000    14526.124650
+
+       gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+count         2.00000        2.000000        2.000000        2.000000
+mean      16417.33338    17283.957605    18554.709840    20448.040160
+std         525.09198     1485.263517     1304.328377     2037.668013
+min       16046.03728    16233.717700    17632.410400    19007.191290
+25%       16231.68533    16758.837652    18093.560120    19727.615725
+50%       16417.33338    17283.957605    18554.709840    20448.040160
+75%       16602.98143    17809.077557    19015.859560    21168.464595
+max       16788.62948    18334.197510    19477.009280    21888.889030
+
+       gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+count        2.000000        2.000000        2.000000        2.000000
+mean     20894.045885    24024.175170    26938.778040    29810.188275
+std       3578.979883     4205.533703     5301.853680     6540.991104
+min      18363.324940    21050.413770    23189.801350    25185.009110
+25%      19628.685413    22537.294470    25064.289695    27497.598692
+50%      20894.045885    24024.175170    26938.778040    29810.188275
+75%      22159.406358    25511.055870    28813.266385    32122.777857
+max      23424.766830    26997.936570    30687.754730    34435.367440
+
+
  • Not particularly useful with just two records, but very helpful when +there are thousands.
  • +
+
+ +
+
+

Reading Other Data

+
+

Read the data in gapminder_gdp_americas.csv (which +should be in the same directory as +gapminder_gdp_oceania.csv) into a variable called +data_americas and display its summary statistics.

+
+
+
+
+
+ +
+
+

To read in a CSV, we use pd.read_csv and pass the +filename 'data/gapminder_gdp_americas.csv' to it. We also +once again pass the column name 'country' to the parameter +index_col in order to index by country. The summary +statistics can be displayed with the DataFrame.describe() +method.

+
+

PYTHON +

+
data_americas = pd.read_csv('data/gapminder_gdp_americas.csv', index_col='country')
+data_americas.describe()
+
+
+
+
+
+
+
+ +
+
+

Inspecting Data

+
+

After reading the data for the Americas, use +help(data_americas.head) and +help(data_americas.tail) to find out what +DataFrame.head and DataFrame.tail do.

+
  1. What method call will display the first three rows of this +data?
  2. +
  3. What method call will display the last three columns of this data? +(Hint: you may need to change your view of the data.)
  4. +
+
+
+
+
+ +
+
+
  1. We can check out the first five rows of data_americas +by executing data_americas.head() which lets us view the +beginning of the DataFrame. We can specify the number of rows we wish to +see by specifying the parameter n in our call to +data_americas.head(). To view the first three rows, +execute:
  2. +
+

PYTHON +

+
data_americas.head(n=3)
+
+
+

OUTPUT +

+
          continent  gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  \
+country
+Argentina  Americas     5911.315053     6856.856212     7133.166023
+Bolivia    Americas     2677.326347     2127.686326     2180.972546
+Brazil     Americas     2108.944355     2487.365989     3336.585802
+
+          gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  \
+country
+Argentina     8052.953021     9443.038526    10079.026740     8997.897412
+Bolivia       2586.886053     2980.331339     3548.097832     3156.510452
+Brazil        3429.864357     4985.711467     6660.118654     7030.835878
+
+           gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  \
+country
+Argentina     9139.671389     9308.418710    10967.281950     8797.640716
+Bolivia       2753.691490     2961.699694     3326.143191     3413.262690
+Brazil        7807.095818     6950.283021     7957.980824     8131.212843
+
+           gdpPercap_2007
+country
+Argentina    12779.379640
+Bolivia       3822.137084
+Brazil        9065.800825
+
+
  1. To check out the last three rows of data_americas, we +would use the command, americas.tail(n=3), analogous to +head() used above. However, here we want to look at the +last three columns so we need to change our view and then use +tail(). To do so, we create a new DataFrame in which rows +and columns are switched:
  2. +
+

PYTHON +

+
americas_flipped = data_americas.T
+
+

We can then view the last three columns of americas by +viewing the last three rows of americas_flipped:

+
+

PYTHON +

+
americas_flipped.tail(n=3)
+
+
+

OUTPUT +

+
country        Argentina  Bolivia   Brazil   Canada    Chile Colombia  \
+gdpPercap_1997   10967.3  3326.14  7957.98  28954.9  10118.1  6117.36
+gdpPercap_2002   8797.64  3413.26  8131.21    33329  10778.8  5755.26
+gdpPercap_2007   12779.4  3822.14   9065.8  36319.2  13171.6  7006.58
+
+country        Costa Rica     Cuba Dominican Republic  Ecuador    ...     \
+gdpPercap_1997    6677.05  5431.99             3614.1  7429.46    ...
+gdpPercap_2002    7723.45  6340.65            4563.81  5773.04    ...
+gdpPercap_2007    9645.06   8948.1            6025.37  6873.26    ...
+
+country          Mexico Nicaragua   Panama Paraguay     Peru Puerto Rico  \
+gdpPercap_1997   9767.3   2253.02  7113.69   4247.4  5838.35     16999.4
+gdpPercap_2002  10742.4   2474.55  7356.03  3783.67  5909.02     18855.6
+gdpPercap_2007  11977.6   2749.32  9809.19  4172.84  7408.91     19328.7
+
+country        Trinidad and Tobago United States  Uruguay Venezuela
+gdpPercap_1997             8792.57       35767.4  9230.24   10165.5
+gdpPercap_2002             11460.6       39097.1     7727   8605.05
+gdpPercap_2007             18008.5       42951.7  10611.5   11415.8
+
+

This shows the data that we want, but we may prefer to display three +columns instead of three rows, so we can flip it back:

+
+

PYTHON +

+
americas_flipped.tail(n=3).T    
+
+

Note: we could have done the above in a single line +of code by ‘chaining’ the commands:

+
+

PYTHON +

+
data_americas.T.tail(n=3).T
+
+
+
+
+
+
+
+ +
+
+

Reading Files in Other Directories

+
+

The data for your current project is stored in a file called +microbes.csv, which is located in a folder called +field_data. You are doing analysis in a notebook called +analysis.ipynb in a sibling folder called +thesis:

+
+

OUTPUT +

+
your_home_directory
++-- field_data/
+|   +-- microbes.csv
++-- thesis/
+    +-- analysis.ipynb
+
+

What value(s) should you pass to read_csv to read +microbes.csv in analysis.ipynb?

+
+
+
+
+
+ +
+
+

We need to specify the path to the file of interest in the call to +pd.read_csv. We first need to ‘jump’ out of the folder +thesis using ‘../’ and then into the folder +field_data using ‘field_data/’. Then we can specify the +filename `microbes.csv. The result is as follows:

+
+

PYTHON +

+
data_microbes = pd.read_csv('../field_data/microbes.csv')
+
+
+
+
+
+
+
+ +
+
+

Writing Data

+
+

As well as the read_csv function for reading data from a +file, Pandas provides a to_csv function to write dataframes +to files. Applying what you’ve learned about reading from files, write +one of your dataframes to a file called processed.csv. You +can use help to get information on how to use +to_csv.

+
+
+
+
+
+ +
+
+

In order to write the DataFrame data_americas to a file +called processed.csv, execute the following command:

+
+

PYTHON +

+
data_americas.to_csv('processed.csv')
+
+

For help on read_csv or to_csv, you could +execute, for example:

+
+

PYTHON +

+
help(data_americas.to_csv)
+help(pd.read_csv)
+
+

Note that help(to_csv) or help(pd.to_csv) +throws an error! This is due to the fact that to_csv is not +a global Pandas function, but a member function of DataFrames. This +means you can only call it on an instance of a DataFrame e.g., +data_americas.to_csv or +data_oceania.to_csv

+
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use the Pandas library to get basic statistics out of tabular +data.
  • +
  • Use index_col to specify that a column’s values should +be used as row headings.
  • +
  • Use DataFrame.info to find out more about a +dataframe.
  • +
  • The DataFrame.columns variable stores information about +the dataframe’s columns.
  • +
  • Use DataFrame.T to transpose a dataframe.
  • +
  • Use DataFrame.describe to get summary statistics about +data.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/08-data-frames.html b/08-data-frames.html new file mode 100644 index 000000000..8920bff80 --- /dev/null +++ b/08-data-frames.html @@ -0,0 +1,1421 @@ + +Plotting and Programming in Python: Pandas DataFrames +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Pandas DataFrames

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I do statistical analysis of tabular data?
  • +
+
+
+
+
+
+

Objectives

+
  • Select individual values from a Pandas dataframe.
  • +
  • Select entire rows or entire columns from a dataframe.
  • +
  • Select a subset of both rows and columns from a dataframe in a +single operation.
  • +
  • Select a subset of a dataframe by a single Boolean criterion.
  • +
+
+
+
+
+

Note about Pandas DataFrames/Series

+

A DataFrame +is a collection of Series; +The DataFrame is the way Pandas represents a table, and Series is the +data-structure Pandas use to represent a column.

+

Pandas is built on top of the Numpy library, which in practice means +that most of the methods defined for Numpy Arrays apply to Pandas +Series/DataFrames.

+

What makes Pandas so attractive is the powerful interface to access +individual records of the table, proper handling of missing values, and +relational-databases operations between DataFrames.

+

Selecting values

+

To access a value at the position [i,j] of a DataFrame, +we have two options, depending on what is the meaning of i +in use. Remember that a DataFrame provides an index as a way to +identify the rows of the table; a row, then, has a position +inside the table as well as a label, which uniquely identifies +its entry in the DataFrame.

+

Use DataFrame.iloc[..., ...] to select values by their +(entry) position

+
  • Can specify location by numerical index analogously to 2D version of +character selection in strings.
  • +
+

PYTHON +

+
import pandas as pd
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.iloc[0, 0])
+
+
+

OUTPUT +

+
1601.056136
+
+

Use DataFrame.loc[..., ...] to select values by their +(entry) label.

+
  • Can specify location by row and/or column name.
  • +
+

PYTHON +

+
print(data.loc["Albania", "gdpPercap_1952"])
+
+
+

OUTPUT +

+
1601.056136
+
+

Use : on its own to mean all columns or all rows.

+
  • Just like Python’s usual slicing notation.
  • +
+

PYTHON +

+
print(data.loc["Albania", :])
+
+
+

OUTPUT +

+
gdpPercap_1952    1601.056136
+gdpPercap_1957    1942.284244
+gdpPercap_1962    2312.888958
+gdpPercap_1967    2760.196931
+gdpPercap_1972    3313.422188
+gdpPercap_1977    3533.003910
+gdpPercap_1982    3630.880722
+gdpPercap_1987    3738.932735
+gdpPercap_1992    2497.437901
+gdpPercap_1997    3193.054604
+gdpPercap_2002    4604.211737
+gdpPercap_2007    5937.029526
+Name: Albania, dtype: float64
+
+
  • Would get the same result printing data.loc["Albania"] +(without a second index).
  • +
+

PYTHON +

+
print(data.loc[:, "gdpPercap_1952"])
+
+
+

OUTPUT +

+
country
+Albania                    1601.056136
+Austria                    6137.076492
+Belgium                    8343.105127
+⋮ ⋮ ⋮
+Switzerland               14734.232750
+Turkey                     1969.100980
+United Kingdom             9979.508487
+Name: gdpPercap_1952, dtype: float64
+
+
  • Would get the same result printing +data["gdpPercap_1952"] +
  • +
  • Also get the same result printing data.gdpPercap_1952 +(not recommended, because easily confused with . notation +for methods)
  • +

Select multiple columns or rows using DataFrame.loc and +a named slice.

+
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'])
+
+
+

OUTPUT +

+
             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy           8243.582340    10022.401310    12269.273780
+Montenegro      4649.593785     5907.850937     7778.414017
+Netherlands    12790.849560    15363.251360    18794.745670
+Norway         13450.401510    16361.876470    18965.055510
+Poland          5338.752143     6557.152776     8006.506993
+
+

In the above code, we discover that slicing using +loc is inclusive at both ends, which differs from +slicing using iloc, where slicing +indicates everything up to but not including the final index.

+

Result of slicing can be used in further operations.

+
  • Usually don’t just print a slice.
  • +
  • All the statistical operators that work on entire dataframes work +the same way on slices.
  • +
  • E.g., calculate max of a slice.
  • +
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].max())
+
+
+

OUTPUT +

+
gdpPercap_1962    13450.40151
+gdpPercap_1967    16361.87647
+gdpPercap_1972    18965.05551
+dtype: float64
+
+
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].min())
+
+
+

OUTPUT +

+
gdpPercap_1962    4649.593785
+gdpPercap_1967    5907.850937
+gdpPercap_1972    7778.414017
+dtype: float64
+
+

Use comparisons to select data based on value.

+
  • Comparison is applied element by element.
  • +
  • Returns a similarly-shaped dataframe of True and +False.
  • +
+

PYTHON +

+
# Use a subset of data to keep output readable.
+subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
+print('Subset of data:\n', subset)
+
+# Which values were greater than 10000 ?
+print('\nWhere are values large?\n', subset > 10000)
+
+
+

OUTPUT +

+
Subset of data:
+             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy           8243.582340    10022.401310    12269.273780
+Montenegro      4649.593785     5907.850937     7778.414017
+Netherlands    12790.849560    15363.251360    18794.745670
+Norway         13450.401510    16361.876470    18965.055510
+Poland          5338.752143     6557.152776     8006.506993
+
+Where are values large?
+            gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
+country
+Italy                False           True           True
+Montenegro           False          False          False
+Netherlands           True           True           True
+Norway                True           True           True
+Poland               False          False          False
+
+

Select values or NaN using a Boolean mask.

+
  • A frame full of Booleans is sometimes called a mask because +of how it can be used.
  • +
+

PYTHON +

+
mask = subset > 10000
+print(subset[mask])
+
+
+

OUTPUT +

+
             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy                   NaN     10022.40131     12269.27378
+Montenegro              NaN             NaN             NaN
+Netherlands     12790.84956     15363.25136     18794.74567
+Norway          13450.40151     16361.87647     18965.05551
+Poland                  NaN             NaN             NaN
+
+
  • Get the value where the mask is true, and NaN (Not a Number) where +it is false.
  • +
  • Useful because NaNs are ignored by operations like max, min, +average, etc.
  • +
+

PYTHON +

+
print(subset[subset > 10000].describe())
+
+
+

OUTPUT +

+
       gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+count        2.000000        3.000000        3.000000
+mean     13120.625535    13915.843047    16676.358320
+std        466.373656     3408.589070     3817.597015
+min      12790.849560    10022.401310    12269.273780
+25%      12955.737547    12692.826335    15532.009725
+50%      13120.625535    15363.251360    18794.745670
+75%      13285.513523    15862.563915    18879.900590
+max      13450.401510    16361.876470    18965.055510
+
+

Group By: split-apply-combine

+

Pandas vectorizing methods and grouping operations are features that +provide users much flexibility to analyse their data.

+

For instance, let’s say we want to have a clearer view on how the +European countries split themselves according to their GDP.

+
  1. We may have a glance by splitting the countries in two groups during +the years surveyed, those who presented a GDP higher than the +European average and those with a lower GDP.
  2. +
  3. We then estimate a wealthy score based on the historical +(from 1962 to 2007) values, where we account how many times a country +has participated in the groups of lower or higher +GDP
  4. +
+

PYTHON +

+
mask_higher = data > data.mean()
+wealth_score = mask_higher.aggregate('sum', axis=1) / len(data.columns)
+print(wealth_score)
+
+
+

OUTPUT +

+
country
+Albania                   0.000000
+Austria                   1.000000
+Belgium                   1.000000
+Bosnia and Herzegovina    0.000000
+Bulgaria                  0.000000
+Croatia                   0.000000
+Czech Republic            0.500000
+Denmark                   1.000000
+Finland                   1.000000
+France                    1.000000
+Germany                   1.000000
+Greece                    0.333333
+Hungary                   0.000000
+Iceland                   1.000000
+Ireland                   0.333333
+Italy                     0.500000
+Montenegro                0.000000
+Netherlands               1.000000
+Norway                    1.000000
+Poland                    0.000000
+Portugal                  0.000000
+Romania                   0.000000
+Serbia                    0.000000
+Slovak Republic           0.000000
+Slovenia                  0.333333
+Spain                     0.333333
+Sweden                    1.000000
+Switzerland               1.000000
+Turkey                    0.000000
+United Kingdom            1.000000
+dtype: float64
+
+

Finally, for each group in the wealth_score table, we +sum their (financial) contribution across the years surveyed using +chained methods:

+
+

PYTHON +

+
print(data.groupby(wealth_score).sum())
+
+
+

OUTPUT +

+
          gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+0.000000    36916.854200    46110.918793    56850.065437    71324.848786
+0.333333    16790.046878    20942.456800    25744.935321    33567.667670
+0.500000    11807.544405    14505.000150    18380.449470    21421.846200
+1.000000   104317.277560   127332.008735   149989.154201   178000.350040
+
+          gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+0.000000    88569.346898   104459.358438   113553.768507   119649.599409
+0.333333    45277.839976    53860.456750    59679.634020    64436.912960
+0.500000    25377.727380    29056.145370    31914.712050    35517.678220
+1.000000   215162.343140   241143.412730   263388.781960   296825.131210
+
+          gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+0.000000    92380.047256   103772.937598   118590.929863   149577.357928
+0.333333    67918.093220    80876.051580   102086.795210   122803.729520
+0.500000    36310.666080    40723.538700    45564.308390    51403.028210
+1.000000   315238.235970   346930.926170   385109.939210   427850.333420
+
+
+
+ +
+
+

Selection of Individual Values

+
+

Assume Pandas has been imported into your notebook and the Gapminder +GDP data for Europe has been loaded:

+
+

PYTHON +

+
import pandas as pd
+
+data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+
+

Write an expression to find the Per Capita GDP of Serbia in 2007.

+
+
+
+
+
+ +
+
+

The selection can be done by using the labels for both the row +(“Serbia”) and the column (“gdpPercap_2007”):

+
+

PYTHON +

+
print(data_europe.loc['Serbia', 'gdpPercap_2007'])
+
+

The output is

+
+

OUTPUT +

+
9786.534714
+
+
+
+
+
+
+
+ +
+
+

Extent of Slicing

+
+
  1. Do the two statements below produce the same output?
  2. +
  3. Based on this, what rule governs what is included (or not) in +numerical slices and named slices in Pandas?
  4. +
+

PYTHON +

+
print(data_europe.iloc[0:2, 0:2])
+print(data_europe.loc['Albania':'Belgium', 'gdpPercap_1952':'gdpPercap_1962'])
+
+
+
+
+
+
+ +
+
+

No, they do not produce the same output! The output of the first +statement is:

+
+

OUTPUT +

+
        gdpPercap_1952  gdpPercap_1957
+country
+Albania     1601.056136     1942.284244
+Austria     6137.076492     8842.598030
+
+

The second statement gives:

+
+

OUTPUT +

+
        gdpPercap_1952  gdpPercap_1957  gdpPercap_1962
+country
+Albania     1601.056136     1942.284244     2312.888958
+Austria     6137.076492     8842.598030    10750.721110
+Belgium     8343.105127     9714.960623    10991.206760
+
+

Clearly, the second statement produces an additional column and an +additional row compared to the first statement.
+What conclusion can we draw? We see that a numerical slice, 0:2, +omits the final index (i.e. index 2) in the range provided, +while a named slice, ‘gdpPercap_1952’:‘gdpPercap_1962’, +includes the final element.

+
+
+
+
+
+
+ +
+
+

Reconstructing Data

+
+

Explain what each line in the following short program does: what is +in first, second, etc.?

+
+

PYTHON +

+
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+second = first[first['continent'] == 'Americas']
+third = second.drop('Puerto Rico')
+fourth = third.drop('continent', axis = 1)
+fourth.to_csv('result.csv')
+
+
+
+
+
+
+ +
+
+

Let’s go through this piece of code line by line.

+
+

PYTHON +

+
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+
+

This line loads the dataset containing the GDP data from all +countries into a dataframe called first. The +index_col='country' parameter selects which column to use +as the row labels in the dataframe.

+
+

PYTHON +

+
second = first[first['continent'] == 'Americas']
+
+

This line makes a selection: only those rows of first +for which the ‘continent’ column matches ‘Americas’ are extracted. +Notice how the Boolean expression inside the brackets, +first['continent'] == 'Americas', is used to select only +those rows where the expression is true. Try printing this expression! +Can you print also its individual True/False elements? (hint: first +assign the expression to a variable)

+
+

PYTHON +

+
third = second.drop('Puerto Rico')
+
+

As the syntax suggests, this line drops the row from +second where the label is ‘Puerto Rico’. The resulting +dataframe third has one row less than the original +dataframe second.

+
+

PYTHON +

+
fourth = third.drop('continent', axis = 1)
+
+

Again we apply the drop function, but in this case we are dropping +not a row but a whole column. To accomplish this, we need to specify +also the axis parameter (we want to drop the second column +which has index 1).

+
+

PYTHON +

+
fourth.to_csv('result.csv')
+
+

The final step is to write the data that we have been working on to a +csv file. Pandas makes this easy with the to_csv() +function. The only required argument to the function is the filename. +Note that the file will be written in the directory from which you +started the Jupyter or Python session.

+
+
+
+
+
+
+ +
+
+

Selecting Indices

+
+

Explain in simple terms what idxmin and +idxmax do in the short program below. When would you use +these methods?

+
+

PYTHON +

+
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.idxmin())
+print(data.idxmax())
+
+
+
+
+
+
+ +
+
+

For each column in data, idxmin will return +the index value corresponding to each column’s minimum; +idxmax will do accordingly the same for each column’s +maximum value.

+

You can use these functions whenever you want to get the row index of +the minimum/maximum value and not the actual minimum/maximum value.

+
+
+
+
+
+
+ +
+
+

Practice with Selection

+
+

Assume Pandas has been imported and the Gapminder GDP data for Europe +has been loaded. Write an expression to select each of the +following:

+
  1. GDP per capita for all countries in 1982.
  2. +
  3. GDP per capita for Denmark for all years.
  4. +
  5. GDP per capita for all countries for years after 1985.
  6. +
  7. GDP per capita for each country in 2007 as a multiple of GDP per +capita for that country in 1952.
  8. +
+
+
+
+
+ +
+
+

1:

+
+

PYTHON +

+
data['gdpPercap_1982']
+
+

2:

+
+

PYTHON +

+
data.loc['Denmark',:]
+
+

3:

+
+

PYTHON +

+
data.loc[:,'gdpPercap_1985':]
+
+

Pandas is smart enough to recognize the number at the end of the +column label and does not give you an error, although no column named +gdpPercap_1985 actually exists. This is useful if new +columns are added to the CSV file later.

+

4:

+
+

PYTHON +

+
data['gdpPercap_2007']/data['gdpPercap_1952']
+
+
+
+
+
+
+
+ +
+
+

Many Ways of Access

+
+

There are at least two ways of accessing a value or slice of a +DataFrame: by name or index. However, there are many others. For +example, a single column or row can be accessed either as a +DataFrame or a Series object.

+

Suggest different ways of doing the following operations on a +DataFrame:

+
  1. Access a single column
  2. +
  3. Access a single row
  4. +
  5. Access an individual DataFrame element
  6. +
  7. Access several columns
  8. +
  9. Access several rows
  10. +
  11. Access a subset of specific rows and columns
  12. +
  13. Access a subset of row and column ranges
  14. +
+
+
+
+
+ +
+
+

1. Access a single column:

+
+

PYTHON +

+
# by name
+data["col_name"]   # as a Series
+data[["col_name"]] # as a DataFrame
+
+# by name using .loc
+data.T.loc["col_name"]  # as a Series
+data.T.loc[["col_name"]].T  # as a DataFrame
+
+# Dot notation (Series)
+data.col_name
+
+# by index (iloc)
+data.iloc[:, col_index]   # as a Series
+data.iloc[:, [col_index]] # as a DataFrame
+
+# using a mask
+data.T[data.T.index == "col_name"].T
+
+

2. Access a single row:

+
+

PYTHON +

+
# by name using .loc
+data.loc["row_name"] # as a Series
+data.loc[["row_name"]] # as a DataFrame
+
+# by name
+data.T["row_name"] # as a Series
+data.T[["row_name"]].T # as a DataFrame
+
+# by index
+data.iloc[row_index]   # as a Series
+data.iloc[[row_index]]   # as a DataFrame
+
+# using mask
+data[data.index == "row_name"]
+
+

3. Access an individual DataFrame element:

+
+

PYTHON +

+
# by column/row names
+data["column_name"]["row_name"]         # as a Series
+
+data[["col_name"]].loc["row_name"]  # as a Series
+data[["col_name"]].loc[["row_name"]]  # as a DataFrame
+
+data.loc["row_name"]["col_name"]  # as a value
+data.loc[["row_name"]]["col_name"]  # as a Series
+data.loc[["row_name"]][["col_name"]]  # as a DataFrame
+
+data.loc["row_name", "col_name"]  # as a value
+data.loc[["row_name"], "col_name"]  # as a Series. Preserves index. Column name is moved to `.name`.
+data.loc["row_name", ["col_name"]]  # as a Series. Index is moved to `.name.` Sets index to column name.
+data.loc[["row_name"], ["col_name"]]  # as a DataFrame (preserves original index and column name)
+
+# by column/row names: Dot notation
+data.col_name.row_name
+
+# by column/row indices
+data.iloc[row_index, col_index] # as a value
+data.iloc[[row_index], col_index] # as a Series. Preserves index. Column name is moved to `.name`
+data.iloc[row_index, [col_index]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.iloc[[row_index], [col_index]] # as a DataFrame (preserves original index and column name)
+
+# column name + row index
+data["col_name"][row_index]
+data.col_name[row_index]
+data["col_name"].iloc[row_index]
+
+# column index + row name
+data.iloc[:, [col_index]].loc["row_name"]  # as a Series
+data.iloc[:, [col_index]].loc[["row_name"]]  # as a DataFrame
+
+# using masks
+data[data.index == "row_name"].T[data.T.index == "col_name"].T
+
+

4. Access several columns:

+
+

PYTHON +

+
# by name
+data[["col1", "col2", "col3"]]
+data.loc[:, ["col1", "col2", "col3"]]
+
+# by index
+data.iloc[:, [col1_index, col2_index, col3_index]]
+
+

5. Access several rows

+
+

PYTHON +

+
# by name
+data.loc[["row1", "row2", "row3"]]
+
+# by index
+data.iloc[[row1_index, row2_index, row3_index]]
+
+

6. Access a subset of specific rows and columns

+
+

PYTHON +

+
# by names
+data.loc[["row1", "row2", "row3"], ["col1", "col2", "col3"]]
+
+# by indices
+data.iloc[[row1_index, row2_index, row3_index], [col1_index, col2_index, col3_index]]
+
+# column names + row indices
+data[["col1", "col2", "col3"]].iloc[[row1_index, row2_index, row3_index]]
+
+# column indices + row names
+data.iloc[:, [col1_index, col2_index, col3_index]].loc[["row1", "row2", "row3"]]
+
+

7. Access a subset of row and column ranges

+
+

PYTHON +

+
# by name
+data.loc["row1":"row2", "col1":"col2"]
+
+# by index
+data.iloc[row1_index:row2_index, col1_index:col2_index]
+
+# column names + row indices
+data.loc[:, "col1_name":"col2_name"].iloc[row1_index:row2_index]
+
+# column indices + row names
+data.iloc[:, col1_index:col2_index].loc["row1":"row2"]
+
+
+
+
+
+
+
+ +
+
+

Exploring available methods using the +dir() function

+
+

Python includes a dir() function that can be used to +display all of the available methods (functions) that are built into a +data object. In Episode 4, we used some methods with a string. But we +can see many more are available by using dir():

+
+

PYTHON +

+
my_string = 'Hello world!'   # creation of a string object 
+dir(my_string)
+
+

This command returns:

+
+

PYTHON +

+
['__add__',
+...
+'__subclasshook__',
+'capitalize',
+'casefold',
+'center',
+...
+'upper',
+'zfill']
+
+

You can use help() or Shift+Tab to +get more information about what these methods do.

+

Assume Pandas has been imported and the Gapminder GDP data for Europe +has been loaded as data. Then, use dir() to +find the function that prints out the median per-capita GDP across all +European countries for each year that information is available.

+
+
+
+
+
+ +
+
+

Among many choices, dir() lists the +median() function as a possibility. Thus,

+
+

PYTHON +

+
data.median()
+
+
+
+
+
+
+
+ +
+
+

Interpretation

+
+

Poland’s borders have been stable since 1945, but changed several +times in the years before then. How would you handle this if you were +creating a table of GDP per capita for Poland for the entire twentieth +century?

+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use DataFrame.iloc[..., ...] to select values by +integer location.
  • +
  • Use : on its own to mean all columns or all rows.
  • +
  • Select multiple columns or rows using DataFrame.loc and +a named slice.
  • +
  • Result of slicing can be used in further operations.
  • +
  • Use comparisons to select data based on value.
  • +
  • Select values or NaN using a Boolean mask.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/09-plotting.html b/09-plotting.html new file mode 100644 index 000000000..994d1728b --- /dev/null +++ b/09-plotting.html @@ -0,0 +1,985 @@ + +Plotting and Programming in Python: Plotting +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Plotting

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I plot my data?
  • +
  • How can I save my plot for publishing?
  • +
+
+
+
+
+
+

Objectives

+
  • Create a time series plot showing a single data set.
  • +
  • Create a scatter plot showing relationship between two data +sets.
  • +
+
+
+
+
+

+matplotlib is the +most widely used scientific plotting library in Python.

+
  • Commonly use a sub-library called matplotlib.pyplot.
  • +
  • The Jupyter Notebook will render plots inline by default.
  • +
+

PYTHON +

+
import matplotlib.pyplot as plt
+
+
  • Simple plots are then (fairly) simple to create.
  • +
+

PYTHON +

+
time = [0, 1, 2, 3]
+position = [0, 100, 200, 300]
+
+plt.plot(time, position)
+plt.xlabel('Time (hr)')
+plt.ylabel('Position (km)')
+
+
A line chart showing time (hr) relative to position (km), using the values provided in the code block above. By default, the plotted line is blue against a white background, and the axes have been scaled automatically to fit the range of the input data.
+
+ +
+
+

Display All Open Figures

+
+

In our Jupyter Notebook example, running the cell should generate the +figure directly below the code. The figure is also included in the +Notebook document for future viewing. However, other Python environments +like an interactive Python session started from a terminal or a Python +script executed via the command line require an additional command to +display the figure.

+

Instruct matplotlib to show a figure:

+
+

PYTHON +

+
plt.show()
+
+

This command can also be used within a Notebook - for instance, to +display multiple figures if several are created by a single cell.

+
+
+
+

Plot data directly from a Pandas dataframe.

+
  • We can also plot Pandas +dataframes.
  • +
  • Before plotting, we convert the column headings from a +string to integer data type, since they +represent numerical values, using str.replace() +to remove the gpdPercap_ prefix and then astype(int) +to convert the series of string values +(['1952', '1957', ..., '2007']) to a series of integers: +[1925, 1957, ..., 2007].
  • +
+

PYTHON +

+
import pandas as pd
+
+data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+
+# Extract year from last 4 characters of each column name
+# The current column names are structured as 'gdpPercap_(year)', 
+# so we want to keep the (year) part only for clarity when plotting GDP vs. years
+# To do this we use replace(), which removes from the string the characters stated in the argument
+# This method works on strings, so we use replace() from Pandas Series.str vectorized string functions
+
+years = data.columns.str.replace('gdpPercap_', '')
+
+# Convert year values to integers, saving results back to dataframe
+
+data.columns = years.astype(int)
+
+data.loc['Australia'].plot()
+
+
GDP plot for Australia

Select and transform data, then plot it.

+
  • By default, DataFrame.plot +plots with the rows as the X axis.
  • +
  • We can transpose the data in order to plot multiple series.
  • +
+

PYTHON +

+
data.T.plot()
+plt.ylabel('GDP per capita')
+
+
GDP plot for Australia and New Zealand

Many styles of plot are available.

+
  • For example, do a bar plot using a fancier style.
  • +
+

PYTHON +

+
plt.style.use('ggplot')
+data.T.plot(kind='bar')
+plt.ylabel('GDP per capita')
+
+
GDP barplot for Australia

Data can also be plotted by calling the matplotlib +plot function directly.

+
  • The command is plt.plot(x, y) +
  • +
  • The color and format of markers can also be specified as an +additional optional argument e.g., b- is a blue line, +g-- is a green dashed line.
  • +

Get Australia data from dataframe

+
+

PYTHON +

+
years = data.columns
+gdp_australia = data.loc['Australia']
+
+plt.plot(years, gdp_australia, 'g--')
+
+
GDP formatted plot for Australia

Can plot many sets of data together.

+
+

PYTHON +

+
# Select two countries' worth of data.
+gdp_australia = data.loc['Australia']
+gdp_nz = data.loc['New Zealand']
+
+# Plot with differently-colored markers.
+plt.plot(years, gdp_australia, 'b-', label='Australia')
+plt.plot(years, gdp_nz, 'g-', label='New Zealand')
+
+# Create legend.
+plt.legend(loc='upper left')
+plt.xlabel('Year')
+plt.ylabel('GDP per capita ($)')
+
+
+
+ +
+
+

Adding a Legend

+
+

Often when plotting multiple datasets on the same figure it is +desirable to have a legend describing the data.

+

This can be done in matplotlib in two stages:

+
  • Provide a label for each dataset in the figure:
  • +
+

PYTHON +

+
plt.plot(years, gdp_australia, label='Australia')
+plt.plot(years, gdp_nz, label='New Zealand')
+
+
  • Instruct matplotlib to create the legend.
  • +
+

PYTHON +

+
plt.legend()
+
+

By default matplotlib will attempt to place the legend in a suitable +position. If you would rather specify a position this can be done with +the loc= argument, e.g to place the legend in the upper +left corner of the plot, specify loc='upper left'

+
+
+
+
GDP formatted plot for Australia and New Zealand
  • Plot a scatter plot correlating the GDP of Australia and New +Zealand
  • +
  • Use either plt.scatter or +DataFrame.plot.scatter +
  • +
+

PYTHON +

+
plt.scatter(gdp_australia, gdp_nz)
+
+
GDP correlation using plt.scatter
+

PYTHON +

+
data.T.plot.scatter(x = 'Australia', y = 'New Zealand')
+
+
GDP correlation using data.T.plot.scatter
+
+ +
+
+

Minima and Maxima

+
+

Fill in the blanks below to plot the minimum GDP per capita over time +for all the countries in Europe. Modify it again to plot the maximum GDP +per capita over time for Europe.

+
+

PYTHON +

+
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+data_europe.____.plot(label='min')
+data_europe.____
+plt.legend(loc='best')
+plt.xticks(rotation=90)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+data_europe.min().plot(label='min')
+data_europe.max().plot(label='max')
+plt.legend(loc='best')
+plt.xticks(rotation=90)
+
+
Minima Maxima Solution
+
+
+
+
+
+ +
+
+

Correlations

+
+

Modify the example in the notes to create a scatter plot showing the +relationship between the minimum and maximum GDP per capita among the +countries in Asia for each year in the data set. What relationship do +you see (if any)?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col='country')
+data_asia.describe().T.plot(kind='scatter', x='min', y='max')
+
+
Correlations Solution 1

No particular correlations can be seen between the minimum and +maximum GDP values year on year. It seems the fortunes of asian +countries do not rise and fall together.

+
+
+
+
+
+
+ +
+
+

Correlations (continued) +

+
+

You might note that the variability in the maximum is much higher +than that of the minimum. Take a look at the maximum and the max +indexes:

+
+

PYTHON +

+
data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col='country')
+data_asia.max().plot()
+print(data_asia.idxmax())
+print(data_asia.idxmin())
+
+
+
+
+
+
+ +
+
+
Correlations Solution 2

Seems the variability in this value is due to a sharp drop after +1972. Some geopolitics at play perhaps? Given the dominance of oil +producing countries, maybe the Brent crude index would make an +interesting comparison? Whilst Myanmar consistently has the lowest GDP, +the highest GDP nation has varied more notably.

+
+
+
+
+
+
+ +
+
+

More Correlations

+
+

This short program creates a plot showing the correlation between GDP +and life expectancy for 2007, normalizing marker size by population:

+
+

PYTHON +

+
data_all = pd.read_csv('data/gapminder_all.csv', index_col='country')
+data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007',
+              s=data_all['pop_2007']/1e6)
+
+

Using online help and other resources, explain what each argument to +plot does.

+
+
+
+
+
+ +
+
+
More Correlations Solution

A good place to look is the documentation for the plot function - +help(data_all.plot).

+

kind - As seen already this determines the kind of plot to be +drawn.

+

x and y - A column name or index that determines what data will be +placed on the x and y axes of the plot

+

s - Details for this can be found in the documentation of +plt.scatter. A single number or one value for each data point. +Determines the size of the plotted points.

+
+
+
+
+
+
+ +
+
+

Saving your plot to a file

+
+

If you are satisfied with the plot you see you may want to save it to +a file, perhaps to include it in a publication. There is a function in +the matplotlib.pyplot module that accomplishes this: savefig. +Calling this function, e.g. with

+
+

PYTHON +

+
plt.savefig('my_figure.png')
+
+

will save the current figure to the file my_figure.png. +The file format will automatically be deduced from the file name +extension (other formats are pdf, ps, eps and svg).

+

Note that functions in plt refer to a global figure +variable and after a figure has been displayed to the screen (e.g. with +plt.show) matplotlib will make this variable refer to a new +empty figure. Therefore, make sure you call plt.savefig +before the plot is displayed to the screen, otherwise you may find a +file with an empty plot.

+

When using dataframes, data is often generated and plotted to screen +in one line. In addition to using plt.savefig, we can save +a reference to the current figure in a local variable (with +plt.gcf) and call the savefig class method +from that variable to save the figure to file.

+
+

PYTHON +

+
data.plot(kind='bar')
+fig = plt.gcf() # get current figure
+fig.savefig('my_figure.png')
+
+
+
+
+
+
+ +
+
+

Making your plots accessible

+
+

Whenever you are generating plots to go into a paper or a +presentation, there are a few things you can do to make sure that +everyone can understand your plots.

+
  • Always make sure your text is large enough to read. Use the +fontsize parameter in xlabel, +ylabel, title, and legend, and tick_params +with labelsize to increase the text size of the numbers +on your axes.
  • +
  • Similarly, you should make your graph elements easy to see. Use +s to increase the size of your scatterplot markers and +linewidth to increase the sizes of your plot lines.
  • +
  • Using color (and nothing else) to distinguish between different plot +elements will make your plots unreadable to anyone who is colorblind, or +who happens to have a black-and-white office printer. For lines, the +linestyle parameter lets you use different types of lines. +For scatterplots, marker lets you change the shape of your +points. If you’re unsure about your colors, you can use Coblis +or Color Oracle to simulate what +your plots would look like to those with colorblindness.
  • +
+
+
+
+
+ +
+
+

Key Points

+
+
  • +matplotlib is the +most widely used scientific plotting library in Python.
  • +
  • Plot data directly from a Pandas dataframe.
  • +
  • Select and transform data, then plot it.
  • +
  • Many styles of plot are available: see the Python Graph +Gallery for more options.
  • +
  • Can plot many sets of data together.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/10-lunch.html b/10-lunch.html new file mode 100644 index 000000000..7349440a6 --- /dev/null +++ b/10-lunch.html @@ -0,0 +1,538 @@ + +Plotting and Programming in Python: Lunch +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Lunch

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +

Over lunch, reflect on and discuss the following:

+
  • What sort of packages might you use in Python and why would you use +them?
  • +
  • How would data need to be formatted to be used in Pandas data +frames? Would the data you have meet these requirements?
  • +
  • What limitations or problems might you run into when thinking about +how to apply what we’ve learned to your own projects or data?
  • +
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/11-lists.html b/11-lists.html new file mode 100644 index 000000000..7b141b8c3 --- /dev/null +++ b/11-lists.html @@ -0,0 +1,1152 @@ + +Plotting and Programming in Python: Lists +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Lists

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I store multiple values?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain why programs need collections of values.
  • +
  • Write programs that create flat lists, index them, slice them, and +modify them through assignment and method calls.
  • +
+
+
+
+
+

A list stores many values in a single structure.

+
  • Doing calculations with a hundred variables called +pressure_001, pressure_002, etc., would be at +least as slow as doing them by hand.
  • +
  • Use a list to store many values together. +
    • Contained within square brackets [...].
    • +
    • Values separated by commas ,.
    • +
  • +
  • Use len to find out how many values are in a list.
  • +
+

PYTHON +

+
pressures = [0.273, 0.275, 0.277, 0.275, 0.276]
+print('pressures:', pressures)
+print('length:', len(pressures))
+
+
+

OUTPUT +

+
pressures: [0.273, 0.275, 0.277, 0.275, 0.276]
+length: 5
+
+

Use an item’s index to fetch it from a list.

+
  • Just like strings.
  • +
+

PYTHON +

+
print('zeroth item of pressures:', pressures[0])
+print('fourth item of pressures:', pressures[4])
+
+
+

OUTPUT +

+
zeroth item of pressures: 0.273
+fourth item of pressures: 0.276
+
+

Lists’ values can be replaced by assigning to them.

+
  • Use an index expression on the left of assignment to replace a +value.
  • +
+

PYTHON +

+
pressures[0] = 0.265
+print('pressures is now:', pressures)
+
+
+

OUTPUT +

+
pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]
+
+

Appending items to a list lengthens it.

+
  • Use list_name.append to add items to the end of a +list.
  • +
+

PYTHON +

+
primes = [2, 3, 5]
+print('primes is initially:', primes)
+primes.append(7)
+print('primes has become:', primes)
+
+
+

OUTPUT +

+
primes is initially: [2, 3, 5]
+primes has become: [2, 3, 5, 7]
+
+
  • +append is a method of lists. +
    • Like a function, but tied to a particular object.
    • +
  • +
  • Use object_name.method_name to call methods. +
    • Deliberately resembles the way we refer to things in a library.
    • +
  • +
  • We will meet other methods of lists as we go along. +
    • Use help(list) for a preview.
    • +
  • +
  • +extend is similar to append, but it allows +you to combine two lists. For example:
  • +
+

PYTHON +

+
teen_primes = [11, 13, 17, 19]
+middle_aged_primes = [37, 41, 43, 47]
+print('primes is currently:', primes)
+primes.extend(teen_primes)
+print('primes has now become:', primes)
+primes.append(middle_aged_primes)
+print('primes has finally become:', primes)
+
+
+

OUTPUT +

+
primes is currently: [2, 3, 5, 7]
+primes has now become: [2, 3, 5, 7, 11, 13, 17, 19]
+primes has finally become: [2, 3, 5, 7, 11, 13, 17, 19, [37, 41, 43, 47]]
+
+

Note that while extend maintains the “flat” structure of +the list, appending a list to a list means the last element in +primes will itself be a list, not an integer. Lists can +contain values of any type; therefore, lists of lists are possible.

+

Use del to remove items from a list entirely.

+
  • We use del list_name[index] to remove an element from a +list (in the example, 9 is not a prime number) and thus shorten it.
  • +
  • +del is not a function or a method, but a statement in +the language.
  • +
+

PYTHON +

+
primes = [2, 3, 5, 7, 9]
+print('primes before removing last item:', primes)
+del primes[4]
+print('primes after removing last item:', primes)
+
+
+

OUTPUT +

+
primes before removing last item: [2, 3, 5, 7, 9]
+primes after removing last item: [2, 3, 5, 7]
+
+

The empty list contains no values.

+
  • Use [] on its own to represent a list that doesn’t +contain any values. +
    • “The zero of lists.”
    • +
  • +
  • Helpful as a starting point for collecting values (which we will see +in the next episode).
  • +

Lists may contain values of different types.

+
  • A single list may contain numbers, strings, and anything else.
  • +
+

PYTHON +

+
goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']
+
+

Character strings can be indexed like lists.

+
  • Get single characters from a character string using indexes in +square brackets.
  • +
+

PYTHON +

+
element = 'carbon'
+print('zeroth character:', element[0])
+print('third character:', element[3])
+
+
+

OUTPUT +

+
zeroth character: c
+third character: b
+
+

Character strings are immutable.

+
  • Cannot change the characters in a string after it has been created. +
    • +Immutable: can’t be changed after creation.
    • +
    • In contrast, lists are mutable: they can be modified in +place.
    • +
  • +
  • Python considers the string to be a single value with parts, not a +collection of values.
  • +
+

PYTHON +

+
element[0] = 'C'
+
+
+

ERROR +

+
TypeError: 'str' object does not support item assignment
+
+
  • Lists and character strings are both collections.
  • +

Indexing beyond the end of the collection is an error.

+
  • Python reports an IndexError if we attempt to access a +value that doesn’t exist. +
    • This is a kind of runtime error.
    • +
    • Cannot be detected as the code is parsed because the index might be +calculated based on data.
    • +
  • +
+

PYTHON +

+
print('99th element of element is:', element[99])
+
+
+

OUTPUT +

+
IndexError: string index out of range
+
+
+
+ +
+
+

Fill in the Blanks

+
+

Fill in the blanks so that the program below produces the output +shown.

+
+

PYTHON +

+
values = ____
+values.____(1)
+values.____(3)
+values.____(5)
+print('first time:', values)
+values = values[____]
+print('second time:', values)
+
+
+

OUTPUT +

+
first time: [1, 3, 5]
+second time: [3, 5]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
values = []
+values.append(1)
+values.append(3)
+values.append(5)
+print('first time:', values)
+values = values[1:]
+print('second time:', values)
+
+
+
+
+
+
+
+ +
+
+

How Large is a Slice?

+
+

If start and stop are both non-negative +integers, how long is the list values[start:stop]?

+
+
+
+
+
+ +
+
+

The list values[start:stop] has up to +stop - start elements. For example, +values[1:4] has the 3 elements values[1], +values[2], and values[3]. Why ‘up to’? As we +saw in episode 2, if stop +is greater than the total length of the list values, we +will still get a list back but it will be shorter than expected.

+
+
+
+
+
+
+ +
+
+

From Strings to Lists and Back

+
+

Given this:

+
+

PYTHON +

+
print('string to list:', list('tin'))
+print('list to string:', ''.join(['g', 'o', 'l', 'd']))
+
+
+

OUTPUT +

+
string to list: ['t', 'i', 'n']
+list to string: gold
+
+
  1. What does list('some string') do?
  2. +
  3. What does '-'.join(['x', 'y', 'z']) generate?
  4. +
+
+
+
+
+ +
+
+
  1. +list('some string') +converts a string into a list containing all of its characters.
  2. +
  3. +join +returns a string that is the concatenation of each string +element in the list and adds the separator between each element in the +list. This results in x-y-z. The separator between the +elements is the string that provides this method.
  4. +
+
+
+
+
+
+ +
+
+

Working With the End

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'helium'
+print(element[-1])
+
+
  1. How does Python interpret a negative index?
  2. +
  3. If a list or string has N elements, what is the most negative index +that can safely be used with it, and what location does that index +represent?
  4. +
  5. If values is a list, what does +del values[-1] do?
  6. +
  7. How can you display all elements but the last one without changing +values? (Hint: you will need to combine slicing and +negative indexing.)
  8. +
+
+
+
+
+ +
+
+

The program prints m.

+
  1. Python interprets a negative index as starting from the end (as +opposed to starting from the beginning). The last element is +-1.
  2. +
  3. The last index that can safely be used with a list of N elements is +element -N, which represents the first element.
  4. +
  5. +del values[-1] removes the last element from the +list.
  6. +
  7. values[:-1]
  8. +
+
+
+
+
+
+ +
+
+

Stepping Through a List

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'fluorine'
+print(element[::2])
+print(element[::-1])
+
+
  1. If we write a slice as low:high:stride, what does +stride do?
  2. +
  3. What expression would select all of the even-numbered items from a +collection?
  4. +
+
+
+
+
+ +
+
+

The program prints

+
+

PYTHON +

+
furn
+eniroulf
+
+
  1. +stride is the step size of the slice.
  2. +
  3. The slice 1::2 selects all even-numbered items from a +collection: it starts with element 1 (which is the second +element, since indexing starts at 0), goes on until the end +(since no end is given), and uses a step size of +2 (i.e., selects every second element).
  4. +
+
+
+
+
+
+ +
+
+

Slice Bounds

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'lithium'
+print(element[0:20])
+print(element[-1:3])
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
lithium
+
+

The first statement prints the whole string, since the slice goes +beyond the total length of the string. The second statement returns an +empty string, because the slice goes “out of bounds” of the string.

+
+
+
+
+
+
+ +
+
+

Sort and Sorted

+
+

What do these two programs print? In simple terms, explain the +difference between sorted(letters) and +letters.sort().

+
+

PYTHON +

+
# Program A
+letters = list('gold')
+result = sorted(letters)
+print('letters is', letters, 'and result is', result)
+
+
+

PYTHON +

+
# Program B
+letters = list('gold')
+result = letters.sort()
+print('letters is', letters, 'and result is', result)
+
+
+
+
+
+
+ +
+
+

Program A prints

+
+

OUTPUT +

+
letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']
+
+

Program B prints

+
+

OUTPUT +

+
letters is ['d', 'g', 'l', 'o'] and result is None
+
+

sorted(letters) returns a sorted copy of the list +letters (the original list letters remains +unchanged), while letters.sort() sorts the list +letters in-place and does not return anything.

+
+
+
+
+
+
+ +
+
+

Copying (or Not)

+
+

What do these two programs print? In simple terms, explain the +difference between new = old and +new = old[:].

+
+

PYTHON +

+
# Program A
+old = list('gold')
+new = old      # simple assignment
+new[0] = 'D'
+print('new is', new, 'and old is', old)
+
+
+

PYTHON +

+
# Program B
+old = list('gold')
+new = old[:]   # assigning a slice
+new[0] = 'D'
+print('new is', new, 'and old is', old)
+
+
+
+
+
+
+ +
+
+

Program A prints

+
+

OUTPUT +

+
new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']
+
+

Program B prints

+
+

OUTPUT +

+
new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']
+
+

new = old makes new a reference to the list +old; new and old point towards +the same object.

+

new = old[:] however creates a new list object +new containing all elements from the list old; +new and old are different objects.

+
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • A list stores many values in a single structure.
  • +
  • Use an item’s index to fetch it from a list.
  • +
  • Lists’ values can be replaced by assigning to them.
  • +
  • Appending items to a list lengthens it.
  • +
  • Use del to remove items from a list entirely.
  • +
  • The empty list contains no values.
  • +
  • Lists may contain values of different types.
  • +
  • Character strings can be indexed like lists.
  • +
  • Character strings are immutable.
  • +
  • Indexing beyond the end of the collection is an error.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/12-for-loops.html b/12-for-loops.html new file mode 100644 index 000000000..090ca4358 --- /dev/null +++ b/12-for-loops.html @@ -0,0 +1,1178 @@ + +Plotting and Programming in Python: For Loops +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

For Loops

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I make a program do many things?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain what for loops are normally used for.
  • +
  • Trace the execution of a simple (unnested) loop and correctly state +the values of variables in each iteration.
  • +
  • Write for loops that use the Accumulator pattern to aggregate +values.
  • +
+
+
+
+
+

A for loop executes commands once for each value in a +collection.

+
  • Doing calculations on the values in a list one by one is as painful +as working with pressure_001, pressure_002, +etc.
  • +
  • A for loop tells Python to execute some statements once for +each value in a list, a character string, or some other collection.
  • +
  • “for each thing in this group, do these operations”
  • +
+

PYTHON +

+
for number in [2, 3, 5]:
+    print(number)
+
+
  • This for loop is equivalent to:
  • +
+

PYTHON +

+
print(2)
+print(3)
+print(5)
+
+
  • And the for loop’s output is:
  • +
+

OUTPUT +

+
2
+3
+5
+
+

A for loop is made up of a collection, a loop variable, +and a body.

+
+

PYTHON +

+
for number in [2, 3, 5]:
+    print(number)
+
+
  • The collection, [2, 3, 5], is what the loop is being +run on.
  • +
  • The body, print(number), specifies what to do for each +value in the collection.
  • +
  • The loop variable, number, is what changes for each +iteration of the loop. +
    • The “current thing”.
    • +
  • +

The first line of the for loop must end with a colon, +and the body must be indented.

+
  • The colon at the end of the first line signals the start of a +block of statements.
  • +
  • Python uses indentation rather than {} or +begin/end to show nesting. +
    • Any consistent indentation is legal, but almost everyone uses four +spaces.
    • +
  • +
+

PYTHON +

+
for number in [2, 3, 5]:
+print(number)
+
+
+

ERROR +

+
IndentationError: expected an indented block
+
+
  • Indentation is always meaningful in Python.
  • +
+

PYTHON +

+
firstName = "Jon"
+  lastName = "Smith"
+
+
+

ERROR +

+
  File "<ipython-input-7-f65f2962bf9c>", line 2
+    lastName = "Smith"
+    ^
+IndentationError: unexpected indent
+
+
  • This error can be fixed by removing the extra spaces at the +beginning of the second line.
  • +

Loop variables can be called anything.

+
  • As with all variables, loop variables are: +
    • Created on demand.
    • +
    • Meaningless: their names can be anything at all.
    • +
  • +
+

PYTHON +

+
for kitten in [2, 3, 5]:
+    print(kitten)
+
+

The body of a loop can contain many statements.

+
  • But no loop should be more than a few lines long.
  • +
  • Hard for human beings to keep larger chunks of code in mind.
  • +
+

PYTHON +

+
primes = [2, 3, 5]
+for p in primes:
+    squared = p ** 2
+    cubed = p ** 3
+    print(p, squared, cubed)
+
+
+

OUTPUT +

+
2 4 8
+3 9 27
+5 25 125
+
+

Use range to iterate over a sequence of numbers.

+
  • The built-in function range +produces a sequence of numbers. +
    • +Not a list: the numbers are produced on demand to make +looping over large ranges more efficient.
    • +
  • +
  • +range(N) is the numbers 0..N-1 +
    • Exactly the legal indices of a list or character string of length +N
    • +
  • +
+

PYTHON +

+
print('a range is not a list: range(0, 3)')
+for number in range(0, 3):
+    print(number)
+
+
+

OUTPUT +

+
a range is not a list: range(0, 3)
+0
+1
+2
+
+

The Accumulator pattern turns many values into one.

+
  • A common pattern in programs is to: +
    1. Initialize an accumulator variable to zero, the empty +string, or the empty list.
    2. +
    3. Update the variable with values from a collection.
    4. +
  • +
+

PYTHON +

+
# Sum the first 10 integers.
+total = 0
+for number in range(10):
+   total = total + (number + 1)
+print(total)
+
+
+

OUTPUT +

+
55
+
+
  • Read total = total + (number + 1) as: +
    • Add 1 to the current value of the loop variable +number.
    • +
    • Add that to the current value of the accumulator variable +total.
    • +
    • Assign that to total, replacing the current value.
    • +
  • +
  • We have to add number + 1 because range +produces 0..9, not 1..10.
  • +
+
+ +
+
+

Classifying Errors

+
+

Is an indentation error a syntax error or a runtime error?

+
+
+
+
+
+ +
+
+

An IndentationError is a syntax error. Programs with syntax errors +cannot be started. A program with a runtime error will start but an +error will be thrown under certain conditions.

+
+
+
+
+
+
+ +
+
+

Tracing Execution

+
+

Create a table showing the numbers of the lines that are executed +when this program runs, and the values of the variables after each line +is executed.

+
+

PYTHON +

+
total = 0
+for char in "tin":
+    total = total + 1
+
+
+
+
+
+
+ +
+
+ + + + + + + + + + + + + + + + +
Line noVariables
1total = 0
2total = 0 char = ‘t’
3total = 1 char = ‘t’
2total = 1 char = ‘i’
3total = 2 char = ‘i’
2total = 2 char = ‘n’
3total = 3 char = ‘n’
+
+
+
+
+
+ +
+
+

Reversing a String

+
+

Fill in the blanks in the program below so that it prints “nit” (the +reverse of the original character string “tin”).

+
+

PYTHON +

+
original = "tin"
+result = ____
+for char in original:
+    result = ____
+print(result)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
original = "tin"
+result = ""
+for char in original:
+    result = char + result
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating

+
+

Fill in the blanks in each of the programs below to produce the +indicated result.

+
+

PYTHON +

+
# Total length of the strings in the list: ["red", "green", "blue"] => 12
+total = 0
+for word in ["red", "green", "blue"]:
+    ____ = ____ + len(word)
+print(total)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
total = 0
+for word in ["red", "green", "blue"]:
+    total = total + len(word)
+print(total)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+
+

PYTHON +

+
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
+lengths = ____
+for word in ["red", "green", "blue"]:
+    lengths.____(____)
+print(lengths)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
lengths = []
+for word in ["red", "green", "blue"]:
+    lengths.append(len(word))
+print(lengths)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+
+

PYTHON +

+
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
+words = ["red", "green", "blue"]
+result = ____
+for ____ in ____:
+    ____
+print(result)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
words = ["red", "green", "blue"]
+result = ""
+for word in words:
+    result = result + word
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+

Create an acronym: Starting from the list +["red", "green", "blue"], create the acronym +"RGB" using a for loop.

+

Hint: You may need to use a string method to +properly format the acronym.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
acronym = ""
+for word in ["red", "green", "blue"]:
+    acronym = acronym + word[0].upper()
+print(acronym)
+
+
+
+
+
+
+
+ +
+
+

Cumulative Sum

+
+

Reorder and properly indent the lines of code below so that they +print a list with the cumulative sum of data. The result should be +[1, 3, 5, 10].

+
+

PYTHON +

+
cumulative.append(total)
+for number in data:
+cumulative = []
+total = total + number
+total = 0
+print(cumulative)
+data = [1,2,2,5]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
total = 0
+data = [1,2,2,5]
+cumulative = []
+for number in data:
+    total = total + number
+    cumulative.append(total)
+print(cumulative)
+
+
+
+
+
+
+
+ +
+
+

Identifying Variable Name Errors

+
+
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code and read the error message. What type of +NameError do you think this is? Is it a string with no +quotes, a misspelled variable, or a variable that should have been +defined but was not?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+

PYTHON +

+
for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (Number % 3) == 0:
+        message = message + a
+    else:
+        message = message + "b"
+print(message)
+
+
+
+
+
+
+ +
+
+
  • Python variable names are case sensitive: number and +Number refer to different variables.
  • +
  • The variable message needs to be initialized as an +empty string.
  • +
  • We want to add the string "a" to message, +not the undefined variable a.
  • +
+

PYTHON +

+
message = ""
+for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (number % 3) == 0:
+        message = message + "a"
+    else:
+        message = message + "b"
+print(message)
+
+
+
+
+
+
+
+ +
+
+

Identifying Item Errors

+
+
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
  5. Fix the error.
  6. +
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+ +
+
+

This list has 4 elements and the index to access the last element in +the list is 3.

+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[3])
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • A for loop executes commands once for each value in a +collection.
  • +
  • A for loop is made up of a collection, a loop variable, +and a body.
  • +
  • The first line of the for loop must end with a colon, +and the body must be indented.
  • +
  • Indentation is always meaningful in Python.
  • +
  • Loop variables can be called anything (but it is strongly advised to +have a meaningful name to the looping variable).
  • +
  • The body of a loop can contain many statements.
  • +
  • Use range to iterate over a sequence of numbers.
  • +
  • The Accumulator pattern turns many values into one.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/13-conditionals.html b/13-conditionals.html new file mode 100644 index 000000000..27f62ab52 --- /dev/null +++ b/13-conditionals.html @@ -0,0 +1,1093 @@ + +Plotting and Programming in Python: Conditionals +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Conditionals

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can programs do different things for different data?
  • +
+
+
+
+
+
+

Objectives

+
  • Correctly write programs that use if and else statements and simple +Boolean expressions (without logical operators).
  • +
  • Trace the execution of unnested conditionals and conditionals inside +loops.
  • +
+
+
+
+
+

Use if statements to control whether or not a block of +code is executed.

+
  • An if statement (more properly called a +conditional statement) controls whether some block of code is +executed or not.
  • +
  • Structure is similar to a for statement: +
    • First line opens with if and ends with a colon
    • +
    • Body containing one or more statements is indented (usually by 4 +spaces)
    • +
  • +
+

PYTHON +

+
mass = 3.54
+if mass > 3.0:
+    print(mass, 'is large')
+
+mass = 2.07
+if mass > 3.0:
+    print (mass, 'is large')
+
+
+

OUTPUT +

+
3.54 is large
+
+

Conditionals are often used inside loops.

+
  • Not much point using a conditional when we know the value (as +above).
  • +
  • But useful when we have a collection to process.
  • +
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 3.0:
+        print(m, 'is large')
+
+
+

OUTPUT +

+
3.54 is large
+9.22 is large
+
+

Use else to execute a block of code when an +if condition is not true.

+
  • +else can be used following an if.
  • +
  • Allows us to specify an alternative to execute when the +if branch isn’t taken.
  • +
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 3.0:
+        print(m, 'is large')
+    else:
+        print(m, 'is small')
+
+
+

OUTPUT +

+
3.54 is large
+2.07 is small
+9.22 is large
+1.86 is small
+1.71 is small
+
+

Use elif to specify additional tests.

+
  • May want to provide several alternative choices, each with its own +test.
  • +
  • Use elif (short for “else if”) and a condition to +specify these.
  • +
  • Always associated with an if.
  • +
  • Must come before the else (which is the “catch +all”).
  • +
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 9.0:
+        print(m, 'is HUGE')
+    elif m > 3.0:
+        print(m, 'is large')
+    else:
+        print(m, 'is small')
+
+
+

OUTPUT +

+
3.54 is large
+2.07 is small
+9.22 is HUGE
+1.86 is small
+1.71 is small
+
+

Conditions are tested once, in order.

+
  • Python steps through the branches of the conditional in order, +testing each in turn.
  • +
  • So ordering matters.
  • +
+

PYTHON +

+
grade = 85
+if grade >= 90:
+    print('grade is A')
+elif grade >= 80:
+    print('grade is B')
+elif grade >= 70:
+    print('grade is C')
+
+
+

OUTPUT +

+
grade is B
+
+
  • Does not automatically go back and re-evaluate if values +change.
  • +
+

PYTHON +

+
velocity = 10.0
+if velocity > 20.0:
+    print('moving too fast')
+else:
+    print('adjusting velocity')
+    velocity = 50.0
+
+
+

OUTPUT +

+
adjusting velocity
+
+
  • Often use conditionals in a loop to “evolve” the values of +variables.
  • +
+

PYTHON +

+
velocity = 10.0
+for i in range(5): # execute the loop 5 times
+    print(i, ':', velocity)
+    if velocity > 20.0:
+        print('moving too fast')
+        velocity = velocity - 5.0
+    else:
+        print('moving too slow')
+        velocity = velocity + 10.0
+print('final velocity:', velocity)
+
+
+

OUTPUT +

+
0 : 10.0
+moving too slow
+1 : 20.0
+moving too slow
+2 : 30.0
+moving too fast
+3 : 25.0
+moving too fast
+4 : 20.0
+moving too slow
+final velocity: 30.0
+
+

Create a table showing variables’ values to trace a program’s +execution.

+
+ + + + + + + + + + + + + + + + + + + + + +
+i + +0 + +. + +1 + +. + +2 + +. + +3 + +. + +4 + +. +
+velocity + +10.0 + +20.0 + +. + +30.0 + +. + +25.0 + +. + +20.0 + +. + +30.0 +
  • The program must have a print statement +outside the body of the loop to show the final value of +velocity, since its value is updated by the last iteration +of the loop.
  • +
+
+ +
+
+

Compound Relations Using and, +or, and Parentheses

+
+

Often, you want some combination of things to be true. You can +combine relations within a conditional using and and +or. Continuing the example above, suppose you have

+
+

PYTHON +

+
mass     = [ 3.54,  2.07,  9.22,  1.86,  1.71]
+velocity = [10.00, 20.00, 30.00, 25.00, 20.00]
+
+i = 0
+for i in range(5):
+    if mass[i] > 5 and velocity[i] > 20:
+        print("Fast heavy object.  Duck!")
+    elif mass[i] > 2 and mass[i] <= 5 and velocity[i] <= 20:
+        print("Normal traffic")
+    elif mass[i] <= 2 and velocity[i] <= 20:
+        print("Slow light object.  Ignore it")
+    else:
+        print("Whoa!  Something is up with the data.  Check it")
+
+

Just like with arithmetic, you can and should use parentheses +whenever there is possible ambiguity. A good general rule is to +always use parentheses when mixing and and +or in the same condition. That is, instead of:

+
+

PYTHON +

+
if mass[i] <= 2 or mass[i] >= 5 and velocity[i] > 20:
+
+

write one of these:

+
+

PYTHON +

+
if (mass[i] <= 2 or mass[i] >= 5) and velocity[i] > 20:
+if mass[i] <= 2 or (mass[i] >= 5 and velocity[i] > 20):
+
+

so it is perfectly clear to a reader (and to Python) what you really +mean.

+
+
+
+
+
+ +
+
+

Tracing Execution

+
+

What does this program print?

+
+

PYTHON +

+
pressure = 71.9
+if pressure > 50.0:
+    pressure = 25.0
+elif pressure <= 50.0:
+    pressure = 0.0
+print(pressure)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
25.0
+
+
+
+
+
+
+
+ +
+
+

Trimming Values

+
+

Fill in the blanks so that this program creates a new list containing +zeroes where the original list’s values were negative and ones where the +original list’s values were positive.

+
+

PYTHON +

+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = ____
+for value in original:
+    if ____:
+        result.append(0)
+    else:
+        ____
+print(result)
+
+
+

OUTPUT +

+
[0, 1, 1, 1, 0, 1]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = []
+for value in original:
+    if value < 0.0:
+        result.append(0)
+    else:
+        result.append(1)
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Processing Small Files

+
+

Modify this program so that it only processes files with fewer than +50 records.

+
+

PYTHON +

+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+    contents = pd.read_csv(filename)
+    ____:
+        print(filename, len(contents))
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+    contents = pd.read_csv(filename)
+    if len(contents) < 50:
+        print(filename, len(contents))
+
+
+
+
+
+
+
+ +
+
+

Initializing

+
+

Modify this program so that it finds the largest and smallest values +in the list no matter what the range of values originally is.

+
+

PYTHON +

+
values = [...some test data...]
+smallest, largest = None, None
+for v in values:
+    if ____:
+        smallest, largest = v, v
+    ____:
+        smallest = min(____, v)
+        largest = max(____, v)
+print(smallest, largest)
+
+

What are the advantages and disadvantages of using this method to +find the range of the data?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest = None, None
+for v in values:
+    if smallest is None and largest is None:
+        smallest, largest = v, v
+    else:
+        smallest = min(smallest, v)
+        largest = max(largest, v)
+print(smallest, largest)
+
+

If you wrote == None instead of is None, +that works too, but Python programmers always write is None +because of the special way None works in the language.

+

It can be argued that an advantage of using this method would be to +make the code more readable. However, a disadvantage is that this code +is not efficient because within each iteration of the for +loop statement, there are two more loops that run over two numbers each +(the min and max functions). It would be more +efficient to iterate over each number just once:

+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest = None, None
+for v in values:
+    if smallest is None or v < smallest:
+        smallest = v
+    if largest is None or v > largest:
+        largest = v
+print(smallest, largest)
+
+

Now we have one loop, but four comparison tests. There are two ways +we could improve it further: either use fewer comparisons in each +iteration, or use two loops that each contain only one comparison test. +The simplest solution is often the best:

+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest = min(values)
+largest = max(values)
+print(smallest, largest)
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use if statements to control whether or not a block of +code is executed.
  • +
  • Conditionals are often used inside loops.
  • +
  • Use else to execute a block of code when an +if condition is not true.
  • +
  • Use elif to specify additional tests.
  • +
  • Conditions are tested once, in order.
  • +
  • Create a table showing variables’ values to trace a program’s +execution.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/14-looping-data-sets.html b/14-looping-data-sets.html new file mode 100644 index 000000000..53cb68901 --- /dev/null +++ b/14-looping-data-sets.html @@ -0,0 +1,857 @@ + +Plotting and Programming in Python: Looping Over Data Sets +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Looping Over Data Sets

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I process many data sets with a single command?
  • +
+
+
+
+
+
+

Objectives

+
  • Be able to read and write globbing expressions that match sets of +files.
  • +
  • Use glob to create lists of files.
  • +
  • Write for loops to perform operations on files given their names in +a list.
  • +
+
+
+
+
+

Use a for loop to process files given a list of their +names.

+
  • A filename is a character string.
  • +
  • And lists can contain character strings.
  • +
+

PYTHON +

+
import pandas as pd
+for filename in ['data/gapminder_gdp_africa.csv', 'data/gapminder_gdp_asia.csv']:
+    data = pd.read_csv(filename, index_col='country')
+    print(filename, data.min())
+
+
+

OUTPUT +

+
data/gapminder_gdp_africa.csv gdpPercap_1952    298.846212
+gdpPercap_1957    335.997115
+gdpPercap_1962    355.203227
+gdpPercap_1967    412.977514
+⋮ ⋮ ⋮
+gdpPercap_1997    312.188423
+gdpPercap_2002    241.165877
+gdpPercap_2007    277.551859
+dtype: float64
+data/gapminder_gdp_asia.csv gdpPercap_1952    331
+gdpPercap_1957    350
+gdpPercap_1962    388
+gdpPercap_1967    349
+⋮ ⋮ ⋮
+gdpPercap_1997    415
+gdpPercap_2002    611
+gdpPercap_2007    944
+dtype: float64
+
+

Use glob.glob +to find sets of files whose names match a pattern.

+
  • In Unix, the term “globbing” means “matching a set of files with a +pattern”.
  • +
  • The most common patterns are: +
    • +* meaning “match zero or more characters”
    • +
    • +? meaning “match exactly one character”
    • +
  • +
  • Python’s standard library contains the glob +module to provide pattern matching functionality
  • +
  • The glob +module contains a function also called glob to match file +patterns
  • +
  • E.g., glob.glob('*.txt') matches all files in the +current directory whose names end with .txt.
  • +
  • Result is a (possibly empty) list of character strings.
  • +
+

PYTHON +

+
import glob
+print('all csv files in data directory:', glob.glob('data/*.csv'))
+
+
+

OUTPUT +

+
all csv files in data directory: ['data/gapminder_all.csv', 'data/gapminder_gdp_africa.csv', \
+'data/gapminder_gdp_americas.csv', 'data/gapminder_gdp_asia.csv', 'data/gapminder_gdp_europe.csv', \
+'data/gapminder_gdp_oceania.csv']
+
+
+

PYTHON +

+
print('all PDB files:', glob.glob('*.pdb'))
+
+
+

OUTPUT +

+
all PDB files: []
+
+

Use glob and for to process batches of +files.

+
  • Helps a lot if the files are named and stored systematically and +consistently so that simple patterns will find the right data.
  • +
+

PYTHON +

+
for filename in glob.glob('data/gapminder_*.csv'):
+    data = pd.read_csv(filename)
+    print(filename, data['gdpPercap_1952'].min())
+
+
+

OUTPUT +

+
data/gapminder_all.csv 298.8462121
+data/gapminder_gdp_africa.csv 298.8462121
+data/gapminder_gdp_americas.csv 1397.717137
+data/gapminder_gdp_asia.csv 331.0
+data/gapminder_gdp_europe.csv 973.5331948
+data/gapminder_gdp_oceania.csv 10039.59564
+
+
  • This includes all data, as well as per-region data.
  • +
  • Use a more specific pattern in the exercises to exclude the whole +data set.
  • +
  • But note that the minimum of the entire data set is also the minimum +of one of the data sets, which is a nice check on correctness.
  • +
+
+ +
+
+

Determining Matches

+
+

Which of these files is not matched by the expression +glob.glob('data/*as*.csv')?

+
  1. data/gapminder_gdp_africa.csv
  2. +
  3. data/gapminder_gdp_americas.csv
  4. +
  5. data/gapminder_gdp_asia.csv
  6. +
+
+
+
+
+ +
+
+

1 is not matched by the glob.

+
+
+
+
+
+
+ +
+
+

Minimum File Size

+
+

Modify this program so that it prints the number of records in the +file that has the fewest records.

+
+

PYTHON +

+
import glob
+import pandas as pd
+fewest = ____
+for filename in glob.glob('data/*.csv'):
+    dataframe = pd.____(filename)
+    fewest = min(____, dataframe.shape[0])
+print('smallest file has', fewest, 'records')
+
+

Note that the DataFrame.shape() +method returns a tuple with the number of rows and columns of the +data frame.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import glob
+import pandas as pd
+fewest = float('Inf')
+for filename in glob.glob('data/*.csv'):
+    dataframe = pd.read_csv(filename)
+    fewest = min(fewest, dataframe.shape[0])
+print('smallest file has', fewest, 'records')
+
+

You might have chosen to initialize the fewest variable +with a number greater than the numbers you’re dealing with, but that +could lead to trouble if you reuse the code with bigger numbers. Python +lets you use positive infinity, which will work no matter how big your +numbers are. What other special strings does the float +function recognize?

+
+
+
+
+
+
+ +
+
+

Comparing Data

+
+

Write a program that reads in the regional data sets and plots the +average GDP per capita for each region over time in a single chart. +Pandas will raise an error if it encounters non-numeric columns in a +dataframe computation so you may need to either filter out those columns +or tell pandas to ignore them.

+
+
+
+
+
+ +
+
+

This solution builds a useful legend by using the string +split method to extract the region from +the path ‘data/gapminder_gdp_a_specific_region.csv’.

+
+

PYTHON +

+
import glob
+import pandas as pd
+import matplotlib.pyplot as plt
+fig, ax = plt.subplots(1,1)
+for filename in glob.glob('data/gapminder_gdp*.csv'):
+    dataframe = pd.read_csv(filename)
+    # extract <region> from the filename, expected to be in the format 'data/gapminder_gdp_<region>.csv'.
+    # we will split the string using the split method and `_` as our separator,
+    # retrieve the last string in the list that split returns (`<region>.csv`), 
+    # and then remove the `.csv` extension from that string.
+    # NOTE: the pathlib module covered in the next callout also offers
+    # convenient abstractions for working with filesystem paths and could solve this as well:
+    # from pathlib import Path
+    # region = Path(filename).stem.split('_')[-1]
+    region = filename.split('_')[-1][:-4] 
+    # pandas raises errors when it encounters non-numeric columns in a dataframe computation
+    # but we can tell pandas to ignore them with the `numeric_only` parameter
+    dataframe.mean(numeric_only=True).plot(ax=ax, label=region)
+    # NOTE: another way of doing this selects just the columns with gdp in their name using the filter method
+    # dataframe.filter(like="gdp").mean().plot(ax=ax, label=region)
+
+plt.legend()
+plt.show()
+
+
+
+
+
+
+
+ +
+
+

Dealing with File Paths

+
+

The pathlib +module provides useful abstractions for file and path manipulation +like returning the name of a file without the file extension. This is +very useful when looping over files and directories. In the example +below, we create a Path object and inspect its +attributes.

+
+

PYTHON +

+
from pathlib import Path
+
+p = Path("data/gapminder_gdp_africa.csv")
+print(p.parent)
+print(p.stem)
+print(p.suffix)
+
+
+

OUTPUT +

+
data
+gapminder_gdp_africa
+.csv
+
+

Hint: Check all available attributes and methods on +the Path object with the dir() function.

+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use a for loop to process files given a list of their +names.
  • +
  • Use glob.glob to find sets of files whose names match a +pattern.
  • +
  • Use glob and for to process batches of +files.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/15-coffee.html b/15-coffee.html new file mode 100644 index 000000000..858501d91 --- /dev/null +++ b/15-coffee.html @@ -0,0 +1,549 @@ + +Plotting and Programming in Python: Afternoon Coffee +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Afternoon Coffee

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + +

Reflection exercise

+

Over break, reflect on and discuss the following:

+
  • A common refrain in software engineering is “Don’t Repeat Yourself”. +How do the techniques we’ve learned in the last lessons help us avoid +repeating ourselves? Note that in practice there is some nuance to +this and should be balanced with doing the simplest thing that could +possibly work. +
  • +
  • What are the pros / cons of making a variable global or local to a +function?
  • +
  • When would you consider turning a block of code into a function +definition?
  • +
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/16-writing-functions.html b/16-writing-functions.html new file mode 100644 index 000000000..8a6cb3aa8 --- /dev/null +++ b/16-writing-functions.html @@ -0,0 +1,1380 @@ + +Plotting and Programming in Python: Writing Functions +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Writing Functions

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I create my own functions?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain and identify the difference between function definition and +function call.
  • +
  • Write a function that takes a small, fixed number of arguments and +produces a single result.
  • +
+
+
+
+
+

Break programs down into functions to make them easier to +understand.

+
  • Human beings can only keep a few items in working memory at a +time.
  • +
  • Understand larger/more complicated ideas by understanding and +combining pieces. +
    • Components in a machine.
    • +
    • Lemmas when proving theorems.
    • +
  • +
  • Functions serve the same purpose in programs. +
    • +Encapsulate complexity so that we can treat it as a single +“thing”.
    • +
  • +
  • Also enables re-use. +
    • Write one time, use many times.
    • +
  • +

Define a function using def with a name, parameters, +and a block of code.

+
  • Begin the definition of a new function with def.
  • +
  • Followed by the name of the function. +
    • Must obey the same rules as variable names.
    • +
  • +
  • Then parameters in parentheses. +
    • Empty parentheses if the function doesn’t take any inputs.
    • +
    • We will discuss this in detail in a moment.
    • +
  • +
  • Then a colon.
  • +
  • Then an indented block of code.
  • +
+

PYTHON +

+
def print_greeting():
+    print('Hello!')
+    print('The weather is nice today.')
+    print('Right?')
+
+

Defining a function does not run it.

+
  • Defining a function does not run it. +
    • Like assigning a value to a variable.
    • +
  • +
  • Must call the function to execute the code it contains.
  • +
+

PYTHON +

+
print_greeting()
+
+
+

OUTPUT +

+
Hello!
+
+

Arguments in a function call are matched to its defined +parameters.

+
  • Functions are most useful when they can operate on different +data.
  • +
  • Specify parameters when defining a function. +
    • These become variables when the function is executed.
    • +
    • Are assigned the arguments in the call (i.e., the values passed to +the function).
    • +
    • If you don’t name the arguments when using them in the call, the +arguments will be matched to parameters in the order the parameters are +defined in the function.
    • +
  • +
+

PYTHON +

+
def print_date(year, month, day):
+    joined = str(year) + '/' + str(month) + '/' + str(day)
+    print(joined)
+
+print_date(1871, 3, 19)
+
+
+

OUTPUT +

+
1871/3/19
+
+

Or, we can name the arguments when we call the function, which allows +us to specify them in any order and adds clarity to the call site; +otherwise as one is reading the code they might forget if the second +argument is the month or the day for example.

+
+

PYTHON +

+
print_date(month=3, day=19, year=1871)
+
+
+

OUTPUT +

+
1871/3/19
+
+
  • Via Twitter: +() contains the ingredients for the function while the body +contains the recipe.
  • +

Functions may return a result to their caller using +return.

+
  • Use return ... to give a value back to the caller.
  • +
  • May occur anywhere in the function.
  • +
  • But functions are easier to understand if return +occurs: +
    • At the start to handle special cases.
    • +
    • At the very end, with a final result.
    • +
  • +
+

PYTHON +

+
def average(values):
+    if len(values) == 0:
+        return None
+    return sum(values) / len(values)
+
+
+

PYTHON +

+
a = average([1, 3, 4])
+print('average of actual values:', a)
+
+
+

OUTPUT +

+
average of actual values: 2.6666666666666665
+
+
+

PYTHON +

+
print('average of empty list:', average([]))
+
+
+

OUTPUT +

+
average of empty list: None
+
+
+

PYTHON +

+
result = print_date(1871, 3, 19)
+print('result of call is:', result)
+
+
+

OUTPUT +

+
1871/3/19
+result of call is: None
+
+
+
+ +
+
+

Identifying Syntax Errors

+
+
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code and read the error message. Is it a +SyntaxError or an IndentationError?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3 until you have fixed all the errors.
  8. +
+

PYTHON +

+
def another_function
+  print("Syntax errors are annoying.")
+   print("But at least python tells us about them!")
+  print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def another_function():
+  print("Syntax errors are annoying.")
+  print("But at least Python tells us about them!")
+  print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+
+ +
+
+

Definition and Use

+
+

What does the following program print?

+
+

PYTHON +

+
def report(pressure):
+    print('pressure is', pressure)
+
+print('calling', report, 22.5)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
calling <function report at 0x7fd128ff1bf8> 22.5
+
+

A function call always needs parenthesis, otherwise you get memory +address of the function object. So, if we wanted to call the function +named report, and give it the value 22.5 to report on, we could have our +function call as follows

+
+

PYTHON +

+
print("calling")
+report(22.5)
+
+
+

OUTPUT +

+
calling
+pressure is 22.5
+
+
+
+
+
+
+
+ +
+
+

Order of Operations

+
+
  1. What’s wrong in this example?
  2. +
+

PYTHON +

+
result = print_time(11, 37, 59)
+
+def print_time(hour, minute, second):
+   time_string = str(hour) + ':' + str(minute) + ':' + str(second)
+   print(time_string)
+
+
  1. After fixing the problem above, explain why running this example +code:
  2. +
+

PYTHON +

+
result = print_time(11, 37, 59)
+print('result of call is:', result)
+
+

gives this output:

+
+

OUTPUT +

+
11:37:59
+result of call is: None
+
+
  1. Why is the result of the call None?
  2. +
+
+
+
+
+ +
+
+
  1. The problem with the example is that the function +print_time() is defined after the call to the +function is made. Python doesn’t know how to resolve the name +print_time since it hasn’t been defined yet and will raise +a NameError e.g., +NameError: name 'print_time' is not defined

  2. +
  3. The first line of output 11:37:59 is printed by the +first line of code, result = print_time(11, 37, 59) that +binds the value returned by invoking print_time to the +variable result. The second line is from the second print +call to print the contents of the result variable.

  4. +
  5. print_time() does not explicitly return +a value, so it automatically returns None.

  6. +
+
+
+
+
+
+ +
+
+

Encapsulation

+
+

Fill in the blanks to create a function that takes a single filename +as an argument, loads the data in the file named by the argument, and +returns the minimum value in that data.

+
+

PYTHON +

+
import pandas as pd
+
+def min_in_data(____):
+    data = ____
+    return ____
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import pandas as pd
+
+def min_in_data(filename):
+    data = pd.read_csv(filename)
+    return data.min()
+
+
+
+
+
+
+
+ +
+
+

Find the First

+
+

Fill in the blanks to create a function that takes a list of numbers +as an argument and returns the first negative value in the list. What +does your function do if the list is empty? What if the list has no +negative numbers?

+
+

PYTHON +

+
def first_negative(values):
+    for v in ____:
+        if ____:
+            return ____
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def first_negative(values):
+    for v in values:
+        if v < 0:
+            return v
+
+

If an empty list or a list with all positive values is passed to this +function, it returns None:

+
+

PYTHON +

+
my_list = []
+print(first_negative(my_list))
+
+
+

OUTPUT +

+
None
+
+
+
+
+
+
+
+ +
+
+

Calling by Name

+
+

Earlier we saw this function:

+
+

PYTHON +

+
def print_date(year, month, day):
+    joined = str(year) + '/' + str(month) + '/' + str(day)
+    print(joined)
+
+

We saw that we can call the function using named arguments, +like this:

+
+

PYTHON +

+
print_date(day=1, month=2, year=2003)
+
+
  1. What does print_date(day=1, month=2, year=2003) +print?
  2. +
  3. When have you seen a function call like this before?
  4. +
  5. When and why is it useful to call functions this way?
  6. +
+
+
+
+
+ +
+
+
  1. 2003/2/1
  2. +
  3. We saw examples of using named arguments when working with +the pandas library. For example, when reading in a dataset using +data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country'), +the last argument index_col is a named argument.
  4. +
  5. Using named arguments can make code more readable since one can see +from the function call what name the different arguments have inside the +function. It can also reduce the chances of passing arguments in the +wrong order, since by using named arguments the order doesn’t +matter.
  6. +
+
+
+
+
+
+ +
+
+

Encapsulation of an If/Print Block

+
+

The code below will run on a label-printer for chicken eggs. A +digital scale will report a chicken egg mass (in grams) to the computer +and then the computer will print a label.

+
+

PYTHON +

+
import random
+for i in range(10):
+
+    # simulating the mass of a chicken egg
+    # the (random) mass will be 70 +/- 20 grams
+    mass = 70 + 20.0 * (2.0 * random.random() - 1.0)
+
+    print(mass)
+
+    # egg sizing machinery prints a label
+    if mass >= 85:
+        print("jumbo")
+    elif mass >= 70:
+        print("large")
+    elif mass < 70 and mass >= 55:
+        print("medium")
+    else:
+        print("small")
+
+

The if-block that classifies the eggs might be useful in other +situations, so to avoid repeating it, we could fold it into a function, +get_egg_label(). Revising the program to use the function +would give us this:

+
+

PYTHON +

+
# revised version
+import random
+for i in range(10):
+
+    # simulating the mass of a chicken egg
+    # the (random) mass will be 70 +/- 20 grams
+    mass = 70 + 20.0 * (2.0 * random.random() - 1.0)
+
+    print(mass, get_egg_label(mass))
+
+
  1. Create a function definition for get_egg_label() that +will work with the revised program above. Note that the +get_egg_label() function’s return value will be important. +Sample output from the above program would be +71.23 large.
  2. +
  3. A dirty egg might have a mass of more than 90 grams, and a spoiled +or broken egg will probably have a mass that’s less than 50 grams. +Modify your get_egg_label() function to account for these +error conditions. Sample output could be +25 too light, probably spoiled.
  4. +
+
+
+
+
+ +
+
+
+

PYTHON +

+
def get_egg_label(mass):
+    # egg sizing machinery prints a label
+    egg_label = "Unlabelled"
+    if mass >= 90:
+        egg_label = "warning: egg might be dirty"
+    elif mass >= 85:
+        egg_label = "jumbo"
+    elif mass >= 70:
+        egg_label = "large"
+    elif mass < 70 and mass >= 55:
+        egg_label = "medium"
+    elif mass < 50:
+        egg_label = "too light, probably spoiled"
+    else:
+        egg_label = "small"
+    return egg_label
+
+
+
+
+
+
+
+ +
+
+

Encapsulating Data Analysis

+
+

Assume that the following code has been executed:

+
+

PYTHON +

+
import pandas as pd
+
+data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col=0)
+japan = data_asia.loc['Japan']
+
+
  1. Complete the statements below to obtain the average GDP for Japan +across the years reported for the 1980s.
  2. +
+

PYTHON +

+
year = 1983
+gdp_decade = 'gdpPercap_' + str(year // ____)
+avg = (japan.loc[gdp_decade + ___] + japan.loc[gdp_decade + ___]) / 2
+
+
  1. Abstract the code above into a single function.
  2. +
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_'+___+'.csv',delimiter=',',index_col=0)
+    ____
+    ____
+    ____
+    return avg
+
+
  1. How would you generalize this function if you did not know +beforehand which specific years occurred as columns in the data? For +instance, what if we also had data from years ending in 1 and 9 for each +decade? (Hint: use the columns to filter out the ones that correspond to +the decade, instead of enumerating them in the code.)
  2. +
+
+
+
+
+ +
+
+
  1. The average GDP for Japan across the years reported for the 1980s is +computed with:
  2. +
+

PYTHON +

+
year = 1983
+gdp_decade = 'gdpPercap_' + str(year // 10)
+avg = (japan.loc[gdp_decade + '2'] + japan.loc[gdp_decade + '7']) / 2
+
+
  1. That code as a function is:
  2. +
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0)
+    c = data_countries.loc[country]
+    gdp_decade = 'gdpPercap_' + str(year // 10)
+    avg = (c.loc[gdp_decade + '2'] + c.loc[gdp_decade + '7'])/2
+    return avg
+
+
  1. To obtain the average for the relevant years, we need to loop over +them:
  2. +
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0)
+    c = data_countries.loc[country]
+    gdp_decade = 'gdpPercap_' + str(year // 10)
+    total = 0.0
+    num_years = 0
+    for yr_header in c.index: # c's index contains reported years
+        if yr_header.startswith(gdp_decade):
+            total = total + c.loc[yr_header]
+            num_years = num_years + 1
+    return total/num_years
+
+

The function can now be called by:

+
+

PYTHON +

+
avg_gdp_in_decade('Japan','asia',1983)
+
+
+

OUTPUT +

+
20880.023800000003
+
+
+
+
+
+
+
+ +
+
+

Simulating a dynamical system

+
+

In mathematics, a dynamical +system is a system in which a function describes the time dependence +of a point in a geometrical space. A canonical example of a dynamical +system is the logistic map, a +growth model that computes a new population density (between 0 and 1) +based on the current density. In the model, time takes discrete values +0, 1, 2, …

+
  1. Define a function called logistic_map that takes two +inputs: x, representing the current population (at time +t), and a parameter r = 1. This function +should return a value representing the state of the system (population) +at time t + 1, using the mapping function:
  2. +

f(t+1) = r * f(t) * [1 - f(t)]

+
  1. Using a for or while loop, iterate the +logistic_map function defined in part 1, starting from an +initial population of 0.5, for a period of time +t_final = 10. Store the intermediate results in a list so +that after the loop terminates you have accumulated a sequence of values +representing the state of the logistic map at times +t = [0,1,...,t_final] (11 values in total). Print this list +to see the evolution of the population.

  2. +
  3. Encapsulate the logic of your loop into a function called +iterate that takes the initial population as its first +input, the parameter t_final as its second input and the +parameter r as its third input. The function should return +the list of values representing the state of the logistic map at times +t = [0,1,...,t_final]. Run this function for periods +t_final = 100 and 1000 and print some of the +values. Is the population trending toward a steady state?

  4. +
+
+
+
+
+ +
+
+
  1. +

    PYTHON +

    +
    def logistic_map(x, r):
    +    return r * x * (1 - x)
    +
  2. +
  3. +

    PYTHON +

    +
    initial_population = 0.5
    +t_final = 10
    +r = 1.0
    +population = [initial_population]
    +
    +for t in range(t_final):
    +    population.append( logistic_map(population[t], r) )
    +
  4. +
  5. +
    +

    PYTHON +

    +
    def iterate(initial_population, t_final, r):
    +    population = [initial_population]
    +    for t in range(t_final):
    +        population.append( logistic_map(population[t], r) )
    +    return population
    +
    +for period in (10, 100, 1000):
    +    population = iterate(0.5, period, 1)
    +    print(population[-1])
    +
    +
    +

    OUTPUT +

    +
    0.06945089389714401
    +0.009395779870614648
    +0.0009913908614406382
    +
    +The population seems to be approaching zero.
  6. +
+
+
+
+
+
+ +
+
+

Using Functions With Conditionals in Pandas

+
+

Functions will often contain conditionals. Here is a short example +that will indicate which quartile the argument is in based on hand-coded +values for the quartile cut points.

+
+

PYTHON +

+
def calculate_life_quartile(exp):
+    if exp < 58.41:
+        # This observation is in the first quartile
+        return 1
+    elif exp >= 58.41 and exp < 67.05:
+        # This observation is in the second quartile
+       return 2
+    elif exp >= 67.05 and exp < 71.70:
+        # This observation is in the third quartile
+       return 3
+    elif exp >= 71.70:
+        # This observation is in the fourth quartile
+       return 4
+    else:
+        # This observation has bad data
+       return None
+
+calculate_life_quartile(62.5)
+
+
+

OUTPUT +

+
2
+
+

That function would typically be used within a for loop, +but Pandas has a different, more efficient way of doing the same thing, +and that is by applying a function to a dataframe or a portion +of a dataframe. Here is an example, using the definition above.

+
+

PYTHON +

+
data = pd.read_csv('data/gapminder_all.csv')
+data['life_qrtl'] = data['lifeExp_1952'].apply(calculate_life_quartile)
+
+

There is a lot in that second line, so let’s take it piece by piece. +On the right side of the = we start with +data['lifeExp'], which is the column in the dataframe +called data labeled lifExp. We use the +apply() to do what it says, apply the +calculate_life_quartile to the value of this column for +every row in the dataframe.

+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Break programs down into functions to make them easier to +understand.
  • +
  • Define a function using def with a name, parameters, +and a block of code.
  • +
  • Defining a function does not run it.
  • +
  • Arguments in a function call are matched to its defined +parameters.
  • +
  • Functions may return a result to their caller using +return.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/17-scope.html b/17-scope.html new file mode 100644 index 000000000..3a0049970 --- /dev/null +++ b/17-scope.html @@ -0,0 +1,712 @@ + +Plotting and Programming in Python: Variable Scope +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Variable Scope

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How do function calls actually work?
  • +
  • How can I determine where errors occurred?
  • +
+
+
+
+
+
+

Objectives

+
  • Identify local and global variables.
  • +
  • Identify parameters as local variables.
  • +
  • Read a traceback and determine the file, function, and line number +on which the error occurred, the type of error, and the error +message.
  • +
+
+
+
+
+

The scope of a variable is the part of a program that can ‘see’ that +variable.

+
  • There are only so many sensible names for variables.
  • +
  • People using functions shouldn’t have to worry about what variable +names the author of the function used.
  • +
  • People writing functions shouldn’t have to worry about what variable +names the function’s caller uses.
  • +
  • The part of a program in which a variable is visible is called its +scope.
  • +
+

PYTHON +

+
pressure = 103.9
+
+def adjust(t):
+    temperature = t * 1.43 / pressure
+    return temperature
+
+
  • +pressure is a global variable. +
    • Defined outside any particular function.
    • +
    • Visible everywhere.
    • +
  • +
  • +t and temperature are local +variables in adjust. +
    • Defined in the function.
    • +
    • Not visible in the main program.
    • +
    • Remember: a function parameter is a variable that is automatically +assigned a value when the function is called.
    • +
  • +
+

PYTHON +

+
print('adjusted:', adjust(0.9))
+print('temperature after call:', temperature)
+
+
+

OUTPUT +

+
adjusted: 0.01238691049085659
+
+
+

ERROR +

+
Traceback (most recent call last):
+  File "/Users/swcarpentry/foo.py", line 8, in <module>
+    print('temperature after call:', temperature)
+NameError: name 'temperature' is not defined
+
+
+
+ +
+
+

Local and Global Variable Use

+
+

Trace the values of all variables in this program as it is executed. +(Use ‘—’ as the value of variables before and after they exist.)

+
+

PYTHON +

+
limit = 100
+
+def clip(value):
+    return min(max(0.0, value), limit)
+
+value = -22.5
+print(clip(value))
+
+
+
+
+
+
+ +
+
+

Reading Error Messages

+
+

Read the traceback below, and identify the following:

+
  1. How many levels does the traceback have?
  2. +
  3. What is the file name where the error occurred?
  4. +
  5. What is the function name where the error occurred?
  6. +
  7. On which line number in this function did the error occur?
  8. +
  9. What is the type of error?
  10. +
  11. What is the error message?
  12. +
+

ERROR +

+
---------------------------------------------------------------------------
+KeyError                                  Traceback (most recent call last)
+<ipython-input-2-e4c4cbafeeb5> in <module>()
+      1 import errors_02
+----> 2 errors_02.print_friday_message()
+
+/Users/ghopper/thesis/code/errors_02.py in print_friday_message()
+     13
+     14 def print_friday_message():
+---> 15     print_message("Friday")
+
+/Users/ghopper/thesis/code/errors_02.py in print_message(day)
+      9         "sunday": "Aw, the weekend is almost over."
+     10     }
+---> 11     print(messages[day])
+     12
+     13
+
+KeyError: 'Friday'
+
+
+
+
+
+
+ +
+
+
  1. Three levels.
  2. +
  3. errors_02.py
  4. +
  5. print_message
  6. +
  7. Line 11
  8. +
  9. +KeyError. These errors occur when we are trying to look +up a key that does not exist (usually in a data structure such as a +dictionary). We can find more information about the +KeyError and other built-in exceptions in the Python +docs.
  10. +
  11. KeyError: 'Friday'
  12. +
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • The scope of a variable is the part of a program that can ‘see’ that +variable.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/18-style.html b/18-style.html new file mode 100644 index 000000000..b8a6a15fd --- /dev/null +++ b/18-style.html @@ -0,0 +1,855 @@ + +Plotting and Programming in Python: Programming Style +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Programming Style

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I make my programs more readable?
  • +
  • How do most programmers format their code?
  • +
  • How can programs check their own operation?
  • +
+
+
+
+
+
+

Objectives

+
  • Provide sound justifications for basic rules of coding style.
  • +
  • Refactor one-page programs to make them more readable and justify +the changes.
  • +
  • Use Python community coding standards (PEP-8).
  • +
+
+
+
+
+

Coding style

+

A consistent coding style helps others (including our future selves) +read and understand code more easily. Code is read much more often than +it is written, and as the Zen of Python +states, “Readability counts”. Python proposed a standard style through +one of its first Python Enhancement Proposals (PEP), PEP8.

+

Some points worth highlighting:

+
  • document your code and ensure that assumptions, internal algorithms, +expected inputs, expected outputs, etc., are clear
  • +
  • use clear, semantically meaningful variable names
  • +
  • use white-space, not tabs, to indent lines (tabs can cause +problems across different text editors, operating systems, and version +control systems)
  • +

Follow standard Python style in your code.

+
  • +PEP8: a style +guide for Python that discusses topics such as how to name variables, +how to indent your code, how to structure your import +statements, etc. Adhering to PEP8 makes it easier for other Python +developers to read and understand your code, and to understand what +their contributions should look like.
  • +
  • To check your code for compliance with PEP8, you can use the pycodestyle application +and tools like the black code +formatter can automatically format your code to conform to PEP8 and +pycodestyle (a Jupyter notebook formatter also exists nb_black).
  • +
  • Some groups and organizations follow different style guidelines +besides PEP8. For example, the Google style +guide on Python makes slightly different recommendations. Google +wrote an application that can help you format your code in either their +style or PEP8 called yapf.
  • +
  • With respect to coding style, the key is consistency. +Choose a style for your project be it PEP8, the Google style, or +something else and do your best to ensure that you and anyone else you +are collaborating with sticks to it. Consistency within a project is +often more impactful than the particular style used. A consistent style +will make your software easier to read and understand for others and for +your future self.
  • +

Use assertions to check for internal errors.

+

Assertions are a simple but powerful method for making sure that the +context in which your code is executing is as you expect.

+
+

PYTHON +

+
def calc_bulk_density(mass, volume):
+    '''Return dry bulk density = powder mass / powder volume.'''
+    assert volume > 0
+    return mass / volume
+
+

If the assertion is False, the Python interpreter raises +an AssertionError runtime exception. The source code for +the expression that failed will be displayed as part of the error +message. To ignore assertions in your code run the interpreter with the +‘-O’ (optimize) switch. Assertions should contain only simple checks and +never change the state of the program. For example, an assertion should +never contain an assignment.

+

Use docstrings to provide builtin help.

+

If the first thing in a function is a character string that is not +assigned directly to a variable, Python attaches it to the function, +accessible via the builtin help function. This string that provides +documentation is also known as a docstring.

+
+

PYTHON +

+
def average(values):
+    "Return average of values, or None if no values are supplied."
+
+    if len(values) == 0:
+        return None
+    return sum(values) / len(values)
+
+help(average)
+
+
+

OUTPUT +

+
Help on function average in module __main__:
+
+average(values)
+    Return average of values, or None if no values are supplied.
+
+
+
+ +
+
+

Multiline Strings

+
+

Often use multiline strings for documentation. These start +and end with three quote characters (either single or double) and end +with three matching characters.

+
+

PYTHON +

+
"""This string spans
+multiple lines.
+
+Blank lines are allowed."""
+
+
+
+
+
+
+ +
+
+

What Will Be Shown?

+
+

Highlight the lines in the code below that will be available as +online help. Are there lines that should be made available, but won’t +be? Will any lines produce a syntax error or a runtime error?

+
+

PYTHON +

+
"Find maximum edit distance between multiple sequences."
+# This finds the maximum distance between all sequences.
+
+def overall_max(sequences):
+    '''Determine overall maximum edit distance.'''
+
+    highest = 0
+    for left in sequences:
+        for right in sequences:
+            '''Avoid checking sequence against itself.'''
+            if left != right:
+                this = edit_distance(left, right)
+                highest = max(highest, this)
+
+    # Report.
+    return highest
+
+
+
+
+
+
+ +
+
+

Document This

+
+

Use comments to describe and help others understand potentially +unintuitive sections or individual lines of code. They are especially +useful to whoever may need to understand and edit your code in the +future, including yourself.

+

Use docstrings to document the acceptable inputs and expected outputs +of a method or class, its purpose, assumptions and intended behavior. +Docstrings are displayed when a user invokes the builtin +help method on your method or class.

+

Turn the comment in the following function into a docstring and check +that help displays it properly.

+
+

PYTHON +

+
def middle(a, b, c):
+    # Return the middle value of three.
+    # Assumes the values can actually be compared.
+    values = [a, b, c]
+    values.sort()
+    return values[1]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def middle(a, b, c):
+    '''Return the middle value of three.
+    Assumes the values can actually be compared.'''
+    values = [a, b, c]
+    values.sort()
+    return values[1]
+
+
+
+
+
+
+
+ +
+
+

Clean Up This Code

+
+
  1. Read this short program and try to predict what it does.
  2. +
  3. Run it: how accurate was your prediction?
  4. +
  5. Refactor the program to make it more readable. Remember to run it +after each change to ensure its behavior hasn’t changed.
  6. +
  7. Compare your rewrite with your neighbor’s. What did you do the same? +What did you do differently, and why?
  8. +
+

PYTHON +

+
n = 10
+s = 'et cetera'
+print(s)
+i = 0
+while i < n:
+    # print('at', j)
+    new = ''
+    for j in range(len(s)):
+        left = j-1
+        right = (j+1)%len(s)
+        if s[left]==s[right]: new = new + '-'
+        else: new = new + '*'
+    s=''.join(new)
+    print(s)
+    i += 1
+
+
+
+
+
+
+ +
+
+

Here’s one solution.

+
+

PYTHON +

+
def string_machine(input_string, iterations):
+    """
+    Takes input_string and generates a new string with -'s and *'s
+    corresponding to characters that have identical adjacent characters
+    or not, respectively.  Iterates through this procedure with the resultant
+    strings for the supplied number of iterations.
+    """
+    print(input_string)
+    input_string_length = len(input_string)
+    old = input_string
+    for i in range(iterations):
+        new = ''
+        # iterate through characters in previous string
+        for j in range(input_string_length):
+            left = j-1
+            right = (j+1) % input_string_length  # ensure right index wraps around
+            if old[left] == old[right]:
+                new = new + '-'
+            else:
+                new = new + '*'
+        print(new)
+        # store new string as old
+        old = new     
+
+string_machine('et cetera', 10)
+
+
+

OUTPUT +

+
et cetera
+*****-***
+----*-*--
+---*---*-
+--*-*-*-*
+**-------
+***-----*
+--**---**
+*****-***
+----*-*--
+---*---*-
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Follow standard Python style in your code.
  • +
  • Use docstrings to provide builtin help.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/19-wrap.html b/19-wrap.html new file mode 100644 index 000000000..783f900a0 --- /dev/null +++ b/19-wrap.html @@ -0,0 +1,596 @@ + +Plotting and Programming in Python: Wrap-Up +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Wrap-Up

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • What have we learned?
  • +
  • What else is out there and where do I find it?
  • +
+
+
+
+
+
+

Objectives

+
  • Name and locate scientific Python community sites for software, +workshops, and help.
  • +
+
+
+
+
+

Leslie Lamport once said, “Writing is nature’s way of showing you how +sloppy your thinking is.” The same is true of programming: many things +that seem obvious when we’re thinking about them turn out to be anything +but when we have to explain them precisely.

+

Python supports a large and diverse community across academia and +industry.

+
+
+ +
+
+

Key Points

+
+
  • Python supports a large and diverse community across academia and +industry.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/20-feedback.html b/20-feedback.html new file mode 100644 index 000000000..7658ac3b0 --- /dev/null +++ b/20-feedback.html @@ -0,0 +1,567 @@ + +Plotting and Programming in Python: Feedback +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Feedback

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How did the class go?
  • +
+
+
+
+
+
+

Objectives

+
  • Gather feedback on the class
  • +
+
+
+
+
+

Gather feedback from participants.

+
+
+ +
+
+

Key Points

+
+
  • We are constantly seeking to improve this course.
  • +
+
+
+ + + +
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/404.html b/404.html new file mode 100644 index 000000000..e668c9f57 --- /dev/null +++ b/404.html @@ -0,0 +1,546 @@ + +Plotting and Programming in Python: Page not found +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Page not found

+ +

Our apologies!

+

We cannot seem to find the page you are looking for. Here are some +tips that may help:

+
  1. try going back to the previous +page or
  2. +
  3. navigate to any other page using the navigation bar on the +left.
  4. +
  5. if the URL ends with /index.html, try removing +that.
  6. +
  7. head over to the home page of this +lesson +
  8. +

If you came here from a link in this lesson, please contact the +lesson maintainers using the links at the foot of this page.

+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/CODE_OF_CONDUCT.html b/CODE_OF_CONDUCT.html new file mode 100644 index 000000000..397c32ae4 --- /dev/null +++ b/CODE_OF_CONDUCT.html @@ -0,0 +1,536 @@ + +Plotting and Programming in Python: Contributor Code of Conduct +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Contributor Code of Conduct

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +

As contributors and maintainers of this project, we pledge to follow +the The +Carpentries Code of Conduct.

+

Instances of abusive, harassing, or otherwise unacceptable behavior +may be reported by following our reporting +guidelines.

+ + + +
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/LICENSE.html b/LICENSE.html new file mode 100644 index 000000000..5108006e9 --- /dev/null +++ b/LICENSE.html @@ -0,0 +1,584 @@ + +Plotting and Programming in Python: Licenses +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Licenses

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +

Instructional Material

+

All Carpentries (Software Carpentry, Data Carpentry, and Library +Carpentry) instructional material is made available under the Creative Commons +Attribution license. The following is a human-readable summary of +(and not a substitute for) the full legal +text of the CC BY 4.0 license.

+

You are free:

+
  • to Share—copy and redistribute the material in any +medium or format
  • +
  • to Adapt—remix, transform, and build upon the +material
  • +

for any purpose, even commercially.

+

The licensor cannot revoke these freedoms as long as you follow the +license terms.

+

Under the following terms:

+
  • Attribution—You must give appropriate credit +(mentioning that your work is derived from work that is Copyright (c) +The Carpentries and, where practical, linking to https://carpentries.org/), provide a link to the +license, and indicate if changes were made. You may do so in any +reasonable manner, but not in any way that suggests the licensor +endorses you or your use.

  • +
  • No additional restrictions—You may not apply +legal terms or technological measures that legally restrict others from +doing anything the license permits. With the understanding +that:

  • +

Notices:

+
  • You do not have to comply with the license for elements of the +material in the public domain or where your use is permitted by an +applicable exception or limitation.
  • +
  • No warranties are given. The license may not give you all of the +permissions necessary for your intended use. For example, other rights +such as publicity, privacy, or moral rights may limit how you use the +material.
  • +

Software

+

Except where otherwise noted, the example programs and other software +provided by The Carpentries are made available under the OSI-approved MIT +license.

+

Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +“Software”), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions:

+

The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software.

+

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

+

Trademark

+

“The Carpentries”, “Software Carpentry”, “Data Carpentry”, and +“Library Carpentry” and their respective logos are registered trademarks +of Community Initiatives.

+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/aio.html b/aio.html new file mode 100644 index 000000000..863419037 --- /dev/null +++ b/aio.html @@ -0,0 +1,9917 @@ + + + + + +Plotting and Programming in Python: All in One View + + + + + + + + + + + + +
+ Plotting and Programming in Python +
+ +
+
+ + + + + + +
+
+ + +

Content from Running and Quitting

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I run Python programs?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Launch the JupyterLab server.
  • +
  • Create a new Python script.
  • +
  • Create a Jupyter notebook.
  • +
  • Shutdown the JupyterLab server.
  • +
  • Understand the difference between a Python script and a Jupyter +notebook.
  • +
  • Create Markdown cells in a notebook.
  • +
  • Create and run Python cells in a notebook.
  • +
+
+
+
+
+
+

To run Python, we are going to use Jupyter Notebooks via JupyterLab for +the remainder of this workshop. Jupyter notebooks are common in data +science and visualization and serve as a convenient common-denominator +experience for running Python code interactively where we can easily +view and share the results of our Python code.

+

There are other ways of editing, managing, and running code. Software +developers often use an integrated development environment (IDE) like PyCharm or Visual Studio Code, or text +editors like Vim or Emacs, to create and edit their Python programs. +After editing and saving your Python programs you can execute those +programs within the IDE itself or directly on the command line. In +contrast, Jupyter notebooks let us execute and view the results of our +Python code immediately within the notebook.

+

JupyterLab has several other handy features:

+
    +
  • You can easily type, edit, and copy and paste blocks of code.
  • +
  • Tab complete allows you to easily access the names of things you are +using and learn more about them.
  • +
  • It allows you to annotate your code with links, different sized +text, bullets, etc. to make it more accessible to you and your +collaborators.
  • +
  • It allows you to display figures next to the code that produces them +to tell a complete story of the analysis.
  • +
+

Each notebook contains one or more cells that contain code, text, or +images.

+

Getting Started with JupyterLab +

+
+

JupyterLab is an application server with a web user interface from Project Jupyter that enables one to work +with documents and activities such as Jupyter notebooks, text editors, +terminals, and even custom components in a flexible, integrated, and +extensible manner. JupyterLab requires a reasonably up-to-date browser +(ideally a current version of Chrome, Safari, or Firefox); Internet +Explorer versions 9 and below are not supported.

+

JupyterLab is included as part of the Anaconda Python distribution. +If you have not already installed the Anaconda Python distribution, see +the setup instructions for installation +instructions.

+

In this lesson we will run JupyterLab locally on our own machines so +it will not require an internet connection besides the initial +connection to download and install Anaconda and JupyterLab

+
    +
  • Start the JupyterLab server on your machine
  • +
  • Use a web browser to open a special localhost URL that connects to +your JupyterLab server
  • +
  • The JupyterLab server does the work and the web browser renders the +result
  • +
  • Type code into the browser and see the results after your JupyterLab +server has finished executing your code
  • +
+
+
+ +
+
+

JupyterLab? What about Jupyter notebooks?

+
+

JupyterLab is the next +stage in the evolution of the Jupyter Notebook. If you have prior +experience working with Jupyter notebooks, then you will have a good +idea of what to expect from JupyterLab.

+

Experienced users of Jupyter notebooks interested in a more detailed +discussion of the similarities and differences between the JupyterLab +and Jupyter notebook user interfaces can find more information in the JupyterLab +user interface documentation.

+
+
+
+

Starting JupyterLab +

+
+

You can start the JupyterLab server through the command line or +through an application called Anaconda Navigator. Anaconda +Navigator is included as part of the Anaconda Python distribution.

+
+

macOS - Command Line +

+

To start the JupyterLab server you will need to access the command +line through the Terminal. There are two ways to open Terminal on +Mac.

+
    +
  1. In your Applications folder, open Utilities and double-click on +Terminal
  2. +
  3. Press Command + spacebar to launch Spotlight. +Type Terminal and then double-click the search result or +hit Enter +
  4. +
+

After you have launched Terminal, type the command to launch the +JupyterLab server.

+
+

BASH +

+
$ jupyter lab
+
+
+
+

Windows Users - Command Line +

+

To start the JupyterLab server you will need to access the Anaconda +Prompt.

+

Press Windows Logo Key and search for +Anaconda Prompt, click the result or press enter.

+

After you have launched the Anaconda Prompt, type the command:

+
+

BASH +

+
$ jupyter lab
+
+
+
+

Anaconda Navigator +

+

To start a JupyterLab server from Anaconda Navigator you must first +start +Anaconda Navigator (click for detailed instructions on macOS, Windows, +and Linux). You can search for Anaconda Navigator via Spotlight on +macOS (Command + spacebar), the Windows search +function (Windows Logo Key) or opening a terminal shell and +executing the anaconda-navigator executable from the +command line.

+

After you have launched Anaconda Navigator, click the +Launch button under JupyterLab. You may need to scroll down +to find it.

+

Here is a screenshot of an Anaconda Navigator page similar to the one +that should open on either macOS or Windows.

+

+Anaconda Navigator landing page

+

And here is a screenshot of a JupyterLab landing page that should be +similar to the one that opens in your default web browser after starting +the JupyterLab server on either macOS or Windows.

+

+JupyterLab landing page

+
+

The JupyterLab Interface +

+
+

JupyterLab has many features found in traditional integrated +development environments (IDEs) but is focused on providing flexible +building blocks for interactive, exploratory computing.

+

The JupyterLab +Interface consists of the Menu Bar, a collapsable Left Side Bar, and +the Main Work Area which contains tabs of documents and activities.

+
+ +

The Menu Bar at the top of JupyterLab has the top-level menus that +expose various actions available in JupyterLab along with their keyboard +shortcuts (where applicable). The following menus are included by +default.

+
    +
  • +File: Actions related to files and directories such +as New, Open, Close, Save, etc. The +File menu also includes the Shut Down action used to +shutdown the JupyterLab server.
  • +
  • +Edit: Actions related to editing documents and +other activities such as Undo, Cut, Copy, +Paste, etc.
  • +
  • +View: Actions that alter the appearance of +JupyterLab.
  • +
  • +Run: Actions for running code in different +activities such as notebooks and code consoles (discussed below).
  • +
  • +Kernel: Actions for managing kernels. Kernels in +Jupyter will be explained in more detail below.
  • +
  • +Tabs: A list of the open documents and activities +in the main work area.
  • +
  • +Settings: Common JupyterLab settings can be +configured using this menu. There is also an Advanced Settings +Editor option in the dropdown menu that provides more fine-grained +control of JupyterLab settings and configuration options.
  • +
  • +Help: A list of JupyterLab and kernel help +links.
  • +
+
+
+ +
+
+

Kernels

+
+

The JupyterLab docs +define kernels as “separate processes started by the server that runs +your code in different programming languages and environments.” When we +open a Jupyter Notebook, that starts a kernel - a process - that is +going to run the code. In this lesson, we’ll be using the Jupyter +ipython kernel which lets us run Python 3 code interactively.

+

Using other Jupyter kernels +for other programming languages would let us write and execute code +in other programming languages in the same JupyterLab interface, like R, +Java, Julia, Ruby, JavaScript, Fortran, etc.

+
+
+
+

A screenshot of the default Menu Bar is provided below.

+

+JupyterLab Menu Bar

+
+
+ +

The left sidebar contains a number of commonly used tabs, such as a +file browser (showing the contents of the directory where the JupyterLab +server was launched), a list of running kernels and terminals, the +command palette, and a list of open tabs in the main work area. A +screenshot of the default Left Side Bar is provided below.

+

+JupyterLab Left Side Bar

+

The left sidebar can be collapsed or expanded by selecting “Show Left +Sidebar” in the View menu or by clicking on the active sidebar tab.

+
+
+

Main Work Area +

+

The main work area in JupyterLab enables you to arrange documents +(notebooks, text files, etc.) and other activities (terminals, code +consoles, etc.) into panels of tabs that can be resized or subdivided. A +screenshot of the default Main Work Area is provided below.

+

If you do not see the Launcher tab, click the blue plus sign under +the “File” and “Edit” menus and it will appear.

+

+JupyterLab Main Work Area

+

Drag a tab to the center of a tab panel to move the tab to the panel. +Subdivide a tab panel by dragging a tab to the left, right, top, or +bottom of the panel. The work area has a single current activity. The +tab for the current activity is marked with a colored top border (blue +by default).

+
+

Creating a Python script +

+
+
    +
  • To start writing a new Python program click the Text File icon under +the Other header in the Launcher tab of the Main Work Area. +
      +
    • You can also create a new plain text file by selecting the New +-> Text File from the File menu in the Menu Bar.
    • +
    +
  • +
  • To convert this plain text file to a Python program, select the +Save File As action from the File menu in the Menu Bar +and give your new text file a name that ends with the .py +extension. +
      +
    • The .py extension lets everyone (including the +operating system) know that this text file is a Python program.
    • +
    • This is convention, not a requirement.
    • +
    +
  • +

Creating a Jupyter Notebook +

+
+

To open a new notebook click the Python 3 icon under the +Notebook header in the Launcher tab in the main work area. You +can also create a new notebook by selecting New -> Notebook +from the File menu in the Menu Bar.

+

Additional notes on Jupyter notebooks.

+
    +
  • Notebook files have the extension .ipynb to distinguish +them from plain-text Python programs.
  • +
  • Notebooks can be exported as Python scripts that can be run from the +command line.
  • +
+

Below is a screenshot of a Jupyter notebook running inside +JupyterLab. If you are interested in more details, then see the official +notebook documentation.

+

+Example Jupyter Notebook

+
+
+ +
+
+

How It’s Stored

+
+
    +
  • The notebook file is stored in a format called JSON.
  • +
  • Just like a webpage, what’s saved looks different from what you see +in your browser.
  • +
  • But this format allows Jupyter to mix source code, text, and images, +all in one file.
  • +
+
+
+
+
+
+ +
+
+

Arranging Documents into Panels of Tabs

+
+

In the JupyterLab Main Work Area you can arrange documents into +panels of tabs. Here is an example from the official +documentation.

+

+Multi-panel JupyterLab

+

First, create a text file, Python console, and terminal window and +arrange them into three panels in the main work area. Next, create a +notebook, terminal window, and text file and arrange them into three +panels in the main work area. Finally, create your own combination of +panels and tabs. What combination of panels and tabs do you think will +be most useful for your workflow?

+
+
+
+
+
+ +
+
+

After creating the necessary tabs, you can drag one of the tabs to +the center of a panel to move the tab to the panel; next you can +subdivide a tab panel by dragging a tab to the left, right, top, or +bottom of the panel.

+
+
+
+
+
+
+ +
+
+

Code vs. Text

+
+

Jupyter mixes code and text in different types of blocks, called +cells. We often use the term “code” to mean “the source code of software +written in a language such as Python”. A “code cell” in a Notebook is a +cell that contains software; a “text cell” is one that contains ordinary +prose written for human beings.

+
+
+
+

The Notebook has Command and Edit modes. +

+
+
    +
  • If you press Esc and Return alternately, the +outer border of your code cell will change from gray to blue.
  • +
  • These are the Command (gray) and +Edit (blue) modes of your notebook.
  • +
  • Command mode allows you to edit notebook-level features, and Edit +mode changes the content of cells.
  • +
  • When in Command mode (esc/gray), +
      +
    • The b key will make a new cell below the currently +selected cell.
    • +
    • The a key will make one above.
    • +
    • The x key will delete the current cell.
    • +
    • The z key will undo your last cell operation (which could +be a deletion, creation, etc).
    • +
    +
  • +
  • All actions can be done using the menus, but there are lots of +keyboard shortcuts to speed things up.
  • +
+
+
+ +
+
+

Command Vs. Edit

+
+

In the Jupyter notebook page are you currently in Command or Edit +mode?
+Switch between the modes. Use the shortcuts to generate a new cell. Use +the shortcuts to delete a cell. Use the shortcuts to undo the last cell +operation you performed.

+
+
+
+
+
+ +
+
+

Command mode has a grey border and Edit mode has a blue border. Use +Esc and Return to switch between modes. You need +to be in Command mode (Press Esc if your cell is blue). Type +b or a. You need to be in Command mode (Press +Esc if your cell is blue). Type x. You need to be +in Command mode (Press Esc if your cell is blue). Type +z.

+
+
+
+
+
+

Use the keyboard and mouse to select and edit cells. +

+
    +
  • Pressing the Return key turns the border blue and engages +Edit mode, which allows you to type within the cell.
  • +
  • Because we want to be able to write many lines of code in a single +cell, pressing the Return key when in Edit mode (blue) moves +the cursor to the next line in the cell just like in a text editor.
  • +
  • We need some other way to tell the Notebook we want to run what’s in +the cell.
  • +
  • Pressing Shift+Return together will execute +the contents of the cell.
  • +
  • Notice that the Return and Shift keys on the +right of the keyboard are right next to each other.
  • +
+
+
+

The Notebook will turn Markdown into pretty-printed +documentation. +

+
    +
  • Notebooks can also render Markdown. +
      +
    • A simple plain-text format for writing lists, links, and other +things that might go into a web page.
    • +
    • Equivalently, a subset of HTML that looks like what you’d send in an +old-fashioned email.
    • +
    +
  • +
  • Turn the current cell into a Markdown cell by entering the Command +mode (Esc/gray) and press the M key.
  • +
  • +In [ ]: will disappear to show it is no longer a code +cell and you will be able to write in Markdown.
  • +
  • Turn the current cell into a Code cell by entering the Command mode +(Esc/gray) and press the y key.
  • +
+
+
+

Markdown does most of what HTML does. +

+ + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Showing some markdown syntax and its rendered output.
Markdown codeRendered output
*   Use asterisks
+*   to create
+*   bullet lists.
+

+

+
    +
  • Use asterisks
  • +
  • to create
  • +
  • bullet lists.
  • +
+
1.   Use numbers
+1.   to create
+1.   bullet lists.
+

+

+
    +
  1. Use numbers
  2. +
  3. to create
  4. +
  5. numbered lists.
  6. +
+
*  You can use indents
+  *  To create sublists
+  *  of the same type
+*  Or sublists
+  1. Of different
+  1. types
+

+

+
    +
  • You can use indents +
      +
    • To create sublists
    • +
    • of the same type
    • +
    +
  • +
  • Or sublists +
      +
    1. Of different
    2. +
    3. types
    4. +
    +
  • +
+
# A Level-1 Heading
+

+

+

A Level-1 Heading

+
## A Level-2 Heading (etc.)
+

+

+

A Level-2 Heading (etc.)

+
Line breaks
+don't matter.
+
+But blank lines
+create new paragraphs.
+

+

+

Line breaks don’t matter.

+

But blank lines create new paragraphs.

+
[Links](http://software-carpentry.org)
+are created with `[...](...)`.
+Or use [named links][data-carp].
+
+[data-carp]: http://datacarpentry.org
+

+

+

Links are created with +[...](...). Or use named links.

+
+
+
+ +
+
+

Creating Lists in Markdown

+
+

Create a nested list in a Markdown cell in a notebook that looks like +this:

+
    +
  1. Get funding.
  2. +
  3. Do work.
  4. +
+
    +
  • Design experiment.
  • +
  • Collect data.
  • +
  • Analyze.
  • +
+
    +
  1. Write up.
  2. +
  3. Publish.
  4. +
+
+
+
+
+
+ +
+
+

This challenge integrates both the numbered list and bullet list. +Note that the bullet list is indented 2 spaces so that it is inline with +the items of the numbered list.

+
1.  Get funding.
+2.  Do work.
+    *   Design experiment.
+    *   Collect data.
+    *   Analyze.
+3.  Write up.
+4.  Publish.
+
+
+
+
+
+
+ +
+
+

More Math

+
+

What is displayed when a Python cell in a notebook that contains +several calculations is executed? For example, what happens when this +cell is executed?

+
+

PYTHON +

+
7 * 3
+2 + 1
+
+
+
+
+
+
+ +
+
+

Python returns the output of the last calculation.

+
+

PYTHON +

+
3
+
+
+
+
+
+
+
+ +
+
+

Change an Existing Cell from Code to Markdown

+
+

What happens if you write some Python in a code cell and then you +switch it to a Markdown cell? For example, put the following in a code +cell:

+
+

PYTHON +

+
x = 6 * 7 + 12
+print(x)
+
+

And then run it with Shift+Return to be sure +that it works as a code cell. Now go back to the cell and use +Esc then m to switch the cell to Markdown and +“run” it with Shift+Return. What happened and how +might this be useful?

+
+
+
+
+
+ +
+
+

The Python code gets treated like Markdown text. The lines appear as +if they are part of one contiguous paragraph. This could be useful to +temporarily turn on and off cells in notebooks that get used for +multiple purposes.

+
+

PYTHON +

+
x = 6 * 7 + 12 print(x)
+
+
+
+
+
+
+
+ +
+
+

Equations

+
+

Standard Markdown (such as we’re using for these notes) won’t render +equations, but the Notebook will. Create a new Markdown cell and enter +the following:

+
$\sum_{i=1}^{N} 2^{-i} \approx 1$
+

(It’s probably easier to copy and paste.) What does it display? What +do you think the underscore, _, circumflex, ^, +and dollar sign, $, do?

+
+
+
+
+
+ +
+
+

The notebook shows the equation as it would be rendered from LaTeX +equation syntax. The dollar sign, $, is used to tell +Markdown that the text in between is a LaTeX equation. If you’re not +familiar with LaTeX, underscore, _, is used for subscripts +and circumflex, ^, is used for superscripts. A pair of +curly braces, { and }, is used to group text +together so that the statement i=1 becomes the subscript +and N becomes the superscript. Similarly, -i +is in curly braces to make the whole statement the superscript for +2. \sum and \approx are LaTeX +commands for “sum over” and “approximate” symbols.

+
+
+
+
+
+

Closing JupyterLab +

+
+
    +
  • From the Menu Bar select the “File” menu and then choose “Shut Down” +at the bottom of the dropdown menu. You will be prompted to confirm that +you wish to shutdown the JupyterLab server (don’t forget to save your +work!). Click “Shut Down” to shutdown the JupyterLab server.
  • +
  • To restart the JupyterLab server you will need to re-run the +following command from a shell.
  • +
+
$ jupyter lab
+
+
+ +
+
+

Closing JupyterLab

+
+

Practice closing and restarting the JupyterLab server.

+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Python scripts are plain text files.
  • +
  • Use the Jupyter Notebook for editing and running Python.
  • +
  • The Notebook has Command and Edit modes.
  • +
  • Use the keyboard and mouse to select and edit cells.
  • +
  • The Notebook will turn Markdown into pretty-printed +documentation.
  • +
  • Markdown does most of what HTML does.
  • +
+
+
+
+

Content from Variables and Assignment

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I store data in programs?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Write programs that assign scalar values to variables and perform +calculations with those values.
  • +
  • Correctly trace value changes in programs that use scalar +assignment.
  • +
+
+
+
+
+
+

Use variables to store values. +

+
+
    +
  • Variables are names for values.

  • +
  • +

    Variable names

    +
      +
    • can only contain letters, digits, and underscore +_ (typically used to separate words in long variable +names)
    • +
    • cannot start with a digit
    • +
    • are case sensitive (age, Age and AGE are three +different variables)
    • +
    +
  • +
  • The name should also be meaningful so you or another programmer +know what it is

  • +
  • Variable names that start with underscores like +__alistairs_real_age have a special meaning so we won’t do +that until we understand the convention.

  • +
  • In Python the = symbol assigns the value on the +right to the name on the left.

  • +
  • The variable is created when a value is assigned to it.

  • +
  • +

    Here, Python assigns an age to a variable age and a +name in quotes to a variable first_name.

    +
    +

    PYTHON +

    +
    age = 42
    +first_name = 'Ahmed'
    +
    +
  • +

Use print to display values. +

+
+
    +
  • Python has a built-in function called print that prints +things as text.
  • +
  • Call the function (i.e., tell Python to run it) by using its +name.
  • +
  • Provide values to the function (i.e., the things to print) in +parentheses.
  • +
  • To add a string to the printout, wrap the string in single or double +quotes.
  • +
  • The values passed to the function are called +arguments +
  • +
+
+

PYTHON +

+
print(first_name, 'is', age, 'years old')
+
+
+

OUTPUT +

+
Ahmed is 42 years old
+
+
    +
  • +print automatically puts a single space between items +to separate them.
  • +
  • And wraps around to a new line at the end.
  • +

Variables must be created before they are used. +

+
+
    +
  • If a variable doesn’t exist yet, or if the name has been +mis-spelled, Python reports an error. (Unlike some languages, which +“guess” a default value.)
  • +
+
+

PYTHON +

+
print(last_name)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-1-c1fbb4e96102> in <module>()
+----> 1 print(last_name)
+
+NameError: name 'last_name' is not defined
+
+
    +
  • The last line of an error message is usually the most +informative.
  • +
  • We will look at error messages in detail later.
  • +
+
+
+ +
+
+

Variables Persist Between Cells

+
+

Be aware that it is the order of execution of cells that is +important in a Jupyter notebook, not the order in which they appear. +Python will remember all the code that was run previously, +including any variables you have defined, irrespective of the order in +the notebook. Therefore if you define variables lower down the notebook +and then (re)run cells further up, those defined further down will still +be present. As an example, create two cells with the following content, +in this order:

+
+

PYTHON +

+
print(myval)
+
+
+

PYTHON +

+
myval = 1
+
+

If you execute this in order, the first cell will give an error. +However, if you run the first cell after the second cell it +will print out 1. To prevent confusion, it can be helpful +to use the Kernel -> Restart & Run All +option which clears the interpreter and runs everything from a clean +slate going top to bottom.

+
+
+
+

Variables can be used in calculations. +

+
+
    +
  • We can use variables in calculations just as if they were values. +
      +
    • Remember, we assigned the value 42 to age +a few lines ago.
    • +
    +
  • +
+
+

PYTHON +

+
age = age + 3
+print('Age in three years:', age)
+
+
+

OUTPUT +

+
Age in three years: 45
+
+

Use an index to get a single character from a string. +

+
+
    +
  • The characters (individual letters, numbers, and so on) in a string +are ordered. For example, the string 'AB' is not the same +as 'BA'. Because of this ordering, we can treat the string +as a list of characters.
  • +
  • Each position in the string (first, second, etc.) is given a number. +This number is called an index or sometimes a +subscript.
  • +
  • Indices are numbered from 0.
  • +
  • Use the position’s index in square brackets to get the character at +that position.
  • +
+
A line of Python code, print(atom_name[0]), demonstrates that using the zero index will output just the initial letter, in this case ‘h’ for helium.
A line of Python code, print(atom_name[0]), +demonstrates that using the zero index will output just the initial +letter, in this case ‘h’ for helium.
+
+

PYTHON +

+
atom_name = 'helium'
+print(atom_name[0])
+
+
+

OUTPUT +

+
h
+
+

Use a slice to get a substring. +

+
+
    +
  • A part of a string is called a substring. A +substring can be as short as a single character.
  • +
  • An item in a list is called an element. Whenever we treat a string +as if it were a list, the string’s elements are its individual +characters.
  • +
  • A slice is a part of a string (or, more generally, a part of any +list-like thing).
  • +
  • We take a slice with the notation [start:stop], where +start is the integer index of the first element we want and +stop is the integer index of the element just +after the last element we want.
  • +
  • The difference between stop and start is +the slice’s length.
  • +
  • Taking a slice does not change the contents of the original string. +Instead, taking a slice returns a copy of part of the original +string.
  • +
+
+

PYTHON +

+
atom_name = 'sodium'
+print(atom_name[0:3])
+
+
+

OUTPUT +

+
sod
+
+

Use the built-in function len to find the length of a +string. +

+
+
+

PYTHON +

+
print(len('helium'))
+
+
+

OUTPUT +

+
6
+
+
    +
  • Nested functions are evaluated from the inside out, like in +mathematics.
  • +

Python is case-sensitive. +

+
+
    +
  • Python thinks that upper- and lower-case letters are different, so +Name and name are different variables.
  • +
  • There are conventions for using upper-case letters at the start of +variable names so we will use lower-case letters for now.
  • +

Use meaningful variable names. +

+
+
    +
  • Python doesn’t care what you call variables as long as they obey the +rules (alphanumeric characters and the underscore).
  • +
+
+

PYTHON +

+
flabadab = 42
+ewr_422_yY = 'Ahmed'
+print(ewr_422_yY, 'is', flabadab, 'years old')
+
+
    +
  • Use meaningful variable names to help other people understand what +the program does.
  • +
  • The most important “other person” is your future self.
  • +
+
+
+ +
+
+

Swapping Values

+
+

Fill the table showing the values of the variables in this program +after each statement is executed.

+
+

PYTHON +

+
# Command  # Value of x   # Value of y   # Value of swap #
+x = 1.0    #              #              #               #
+y = 3.0    #              #              #               #
+swap = x   #              #              #               #
+x = y      #              #              #               #
+y = swap   #              #              #               #
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
# Command  # Value of x   # Value of y   # Value of swap #
+x = 1.0    # 1.0          # not defined  # not defined   #
+y = 3.0    # 1.0          # 3.0          # not defined   #
+swap = x   # 1.0          # 3.0          # 1.0           #
+x = y      # 3.0          # 3.0          # 1.0           #
+y = swap   # 3.0          # 1.0          # 1.0           #
+
+

These three lines exchange the values in x and +y using the swap variable for temporary +storage. This is a fairly common programming idiom.

+
+
+
+
+
+
+ +
+
+

Predicting Values

+
+

What is the final value of position in the program +below? (Try to predict the value without running the program, then check +your prediction.)

+
+

PYTHON +

+
initial = 'left'
+position = initial
+initial = 'right'
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(position)
+
+
+

OUTPUT +

+
left
+
+

The initial variable is assigned the value +'left'. In the second line, the position +variable also receives the string value 'left'. In third +line, the initial variable is given the value +'right', but the position variable retains its +string value of 'left'.

+
+
+
+
+
+
+ +
+
+

Challenge

+
+

If you assign a = 123, what happens if you try to get +the second digit of a via a[1]?

+
+
+
+
+
+ +
+
+

Numbers are not strings or sequences and Python will raise an error +if you try to perform an index operation on a number. In the next lesson on types and type +conversion we will learn more about types and how to convert between +different types. If you want the Nth digit of a number you can convert +it into a string using the str built-in function and then +perform an index operation on that string.

+
+

PYTHON +

+
a = 123
+print(a[1])
+
+
+

ERROR +

+
TypeError: 'int' object is not subscriptable
+
+
+

PYTHON +

+
a = str(123)
+print(a[1])
+
+
+

OUTPUT +

+
2
+
+
+
+
+
+
+
+ +
+
+

Choosing a Name

+
+

Which is a better variable name, m, min, or +minutes? Why? Hint: think about which code you would rather +inherit from someone who is leaving the lab:

+
    +
  1. ts = m * 60 + s
  2. +
  3. tot_sec = min * 60 + sec
  4. +
  5. total_seconds = minutes * 60 + seconds
  6. +
+
+
+
+
+
+ +
+
+

minutes is better because min might mean +something like “minimum” (and actually is an existing built-in function +in Python that we will cover later).

+
+
+
+
+
+
+ +
+
+

Slicing practice

+
+

What does the following program print?

+
+

PYTHON +

+
atom_name = 'carbon'
+print('atom_name[1:3] is:', atom_name[1:3])
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
atom_name[1:3] is: ar
+
+
+
+
+
+
+
+ +
+
+

Slicing concepts

+
+

Given the following string:

+
+

PYTHON +

+
species_name = "Acacia buxifolia"
+
+

What would these expressions return?

+
    +
  1. species_name[2:8]
  2. +
  3. +species_name[11:] (without a value after the +colon)
  4. +
  5. +species_name[:4] (without a value before the +colon)
  6. +
  7. +species_name[:] (just a colon)
  8. +
  9. species_name[11:-3]
  10. +
  11. species_name[-5:-3]
  12. +
  13. What happens when you choose a stop value which is out +of range? (i.e., try species_name[0:20] or +species_name[:103])
  14. +
+
+
+
+
+
+ +
+
+
    +
  1. +species_name[2:8] returns the substring +'acia b' +
  2. +
  3. +species_name[11:] returns the substring +'folia', from position 11 until the end
  4. +
  5. +species_name[:4] returns the substring +'Acac', from the start up to but not including position +4
  6. +
  7. +species_name[:] returns the entire string +'Acacia buxifolia' +
  8. +
  9. +species_name[11:-3] returns the substring +'fo', from the 11th position to the third last +position
  10. +
  11. +species_name[-5:-3] also returns the substring +'fo', from the fifth last position to the third last
  12. +
  13. If a part of the slice is out of range, the operation does not fail. +species_name[0:20] gives the same result as +species_name[0:], and species_name[:103] gives +the same result as species_name[:] +
  14. +
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use variables to store values.
  • +
  • Use print to display values.
  • +
  • Variables persist between cells.
  • +
  • Variables must be created before they are used.
  • +
  • Variables can be used in calculations.
  • +
  • Use an index to get a single character from a string.
  • +
  • Use a slice to get a substring.
  • +
  • Use the built-in function len to find the length of a +string.
  • +
  • Python is case-sensitive.
  • +
  • Use meaningful variable names.
  • +
+
+
+
+

Content from Data Types and Type Conversion

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • What kinds of data do programs store?
  • +
  • How can I convert one type to another?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain key differences between integers and floating point +numbers.
  • +
  • Explain key differences between numbers and character strings.
  • +
  • Use built-in functions to convert between integers, floating point +numbers, and strings.
  • +
+
+
+
+
+
+

Every value has a type. +

+
+
    +
  • Every value in a program has a specific type.
  • +
  • Integer (int): represents positive or negative whole +numbers like 3 or -512.
  • +
  • Floating point number (float): represents real numbers +like 3.14159 or -2.5.
  • +
  • Character string (usually called “string”, str): text. +
      +
    • Written in either single quotes or double quotes (as long as they +match).
    • +
    • The quote marks aren’t printed when the string is displayed.
    • +
    +
  • +

Use the built-in function type to find the type of a +value. +

+
+
    +
  • Use the built-in function type to find out what type a +value has.
  • +
  • Works on variables as well. +
      +
    • But remember: the value has the type — the +variable is just a label.
    • +
    +
  • +
+
+

PYTHON +

+
print(type(52))
+
+
+

OUTPUT +

+
<class 'int'>
+
+
+

PYTHON +

+
fitness = 'average'
+print(type(fitness))
+
+
+

OUTPUT +

+
<class 'str'>
+
+

Types control what operations (or methods) can be performed on a +given value. +

+
+
    +
  • A value’s type determines what the program can do to it.
  • +
+
+

PYTHON +

+
print(5 - 3)
+
+
+

OUTPUT +

+
2
+
+
+

PYTHON +

+
print('hello' - 'h')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-2-67f5626a1e07> in <module>()
+----> 1 print('hello' - 'h')
+
+TypeError: unsupported operand type(s) for -: 'str' and 'str'
+
+

You can use the “+” and “*” operators on strings. +

+
+
    +
  • “Adding” character strings concatenates them.
  • +
+
+

PYTHON +

+
full_name = 'Ahmed' + ' ' + 'Walsh'
+print(full_name)
+
+
+

OUTPUT +

+
Ahmed Walsh
+
+
    +
  • Multiplying a character string by an integer N creates a +new string that consists of that character string repeated N +times. +
      +
    • Since multiplication is repeated addition.
    • +
    +
  • +
+
+

PYTHON +

+
separator = '=' * 10
+print(separator)
+
+
+

OUTPUT +

+
==========
+
+

Strings have a length (but numbers don’t). +

+
+
    +
  • The built-in function len counts the number of +characters in a string.
  • +
+
+

PYTHON +

+
print(len(full_name))
+
+
+

OUTPUT +

+
11
+
+
    +
  • But numbers don’t have a length (not even zero).
  • +
+
+

PYTHON +

+
print(len(52))
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-3-f769e8e8097d> in <module>()
+----> 1 print(len(52))
+
+TypeError: object of type 'int' has no len()
+
+

Must convert numbers to strings or vice versa when operating on +them. +

+
+
    +
  • Cannot add numbers and strings.
  • +
+
+

PYTHON +

+
print(1 + '2')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-4-fe4f54a023c6> in <module>()
+----> 1 print(1 + '2')
+
+TypeError: unsupported operand type(s) for +: 'int' and 'str'
+
+
    +
  • Not allowed because it’s ambiguous: should 1 + '2' be +3 or '12'?
  • +
  • Some types can be converted to other types by using the type name as +a function.
  • +
+
+

PYTHON +

+
print(1 + int('2'))
+print(str(1) + '2')
+
+
+

OUTPUT +

+
3
+12
+
+

Can mix integers and floats freely in operations. +

+
+
    +
  • Integers and floating-point numbers can be mixed in arithmetic. +
      +
    • Python 3 automatically converts integers to floats as needed.
    • +
    +
  • +
+
+

PYTHON +

+
print('half is', 1 / 2.0)
+print('three squared is', 3.0 ** 2)
+
+
+

OUTPUT +

+
half is 0.5
+three squared is 9.0
+
+

Variables only change value when something is assigned to them. +

+
+
    +
  • If we make one cell in a spreadsheet depend on another, and update +the latter, the former updates automatically.
  • +
  • This does not happen in programming languages.
  • +
+
+

PYTHON +

+
variable_one = 1
+variable_two = 5 * variable_one
+variable_one = 2
+print('first is', variable_one, 'and second is', variable_two)
+
+
+

OUTPUT +

+
first is 2 and second is 5
+
+
    +
  • The computer reads the value of variable_one when doing +the multiplication, creates a new value, and assigns it to +variable_two.
  • +
  • Afterwards, the value of variable_two is set to the new +value and not dependent on variable_one so its +value does not automatically change when variable_one +changes.
  • +
+
+
+ +
+
+

Fractions

+
+

What type of value is 3.4? How can you find out?

+
+
+
+
+
+ +
+
+

It is a floating-point number (often abbreviated “float”). It is +possible to find out by using the built-in function +type().

+
+

PYTHON +

+
print(type(3.4))
+
+
+

OUTPUT +

+
<class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Automatic Type Conversion

+
+

What type of value is 3.25 + 4?

+
+
+
+
+
+ +
+
+

It is a float: integers are automatically converted to floats as +necessary.

+
+

PYTHON +

+
result = 3.25 + 4
+print(result, 'is', type(result))
+
+
+

OUTPUT +

+
7.25 is <class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Choose a Type

+
+

What type of value (integer, floating point number, or character +string) would you use to represent each of the following? Try to come up +with more than one good answer for each problem. For example, in # 1, +when would counting days with a floating point variable make more sense +than using an integer?

+
    +
  1. Number of days since the start of the year.
  2. +
  3. Time elapsed from the start of the year until now in days.
  4. +
  5. Serial number of a piece of lab equipment.
  6. +
  7. A lab specimen’s age
  8. +
  9. Current population of a city.
  10. +
  11. Average population of a city over time.
  12. +
+
+
+
+
+
+ +
+
+

The answers to the questions are:

+
    +
  1. Integer, since the number of days would lie between 1 and 365.
  2. +
  3. Floating point, since fractional days are required
  4. +
  5. Character string if serial number contains letters and numbers, +otherwise integer if the serial number consists only of numerals
  6. +
  7. This will vary! How do you define a specimen’s age? whole days since +collection (integer)? date and time (string)?
  8. +
  9. Choose floating point to represent population as large aggregates +(eg millions), or integer to represent population in units of +individuals.
  10. +
  11. Floating point number, since an average is likely to have a +fractional part.
  12. +
+
+
+
+
+
+
+ +
+
+

Division Types

+
+

In Python 3, the // operator performs integer +(whole-number) floor division, the / operator performs +floating-point division, and the % (or modulo) +operator calculates and returns the remainder from integer division:

+
+

PYTHON +

+
print('5 // 3:', 5 // 3)
+print('5 / 3:', 5 / 3)
+print('5 % 3:', 5 % 3)
+
+
+

OUTPUT +

+
5 // 3: 1
+5 / 3: 1.6666666666666667
+5 % 3: 2
+
+

If num_subjects is the number of subjects taking part in +a study, and num_per_survey is the number that can take +part in a single survey, write an expression that calculates the number +of surveys needed to reach everyone once.

+
+
+
+
+
+ +
+
+

We want the minimum number of surveys that reaches everyone once, +which is the rounded up value of +num_subjects/ num_per_survey. This is equivalent to +performing a floor division with // and adding 1. Before +the division we need to subtract 1 from the number of subjects to deal +with the case where num_subjects is evenly divisible by +num_per_survey.

+
+

PYTHON +

+
num_subjects = 600
+num_per_survey = 42
+num_surveys = (num_subjects - 1) // num_per_survey + 1
+
+print(num_subjects, 'subjects,', num_per_survey, 'per survey:', num_surveys)
+
+
+

OUTPUT +

+
600 subjects, 42 per survey: 15
+
+
+
+
+
+
+
+ +
+
+

Strings to Numbers

+
+

Where reasonable, float() will convert a string to a +floating point number, and int() will convert a floating +point number to an integer:

+
+

PYTHON +

+
print("string to float:", float("3.4"))
+print("float to int:", int(3.4))
+
+
+

OUTPUT +

+
string to float: 3.4
+float to int: 3
+
+

If the conversion doesn’t make sense, however, an error message will +occur.

+
+

PYTHON +

+
print("string to float:", float("Hello world!"))
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-5-df3b790bf0a2> in <module>
+----> 1 print("string to float:", float("Hello world!"))
+
+ValueError: could not convert string to float: 'Hello world!'
+
+

Given this information, what do you expect the following program to +do?

+

What does it actually do?

+

Why do you think it does that?

+
+

PYTHON +

+
print("fractional string to int:", int("3.4"))
+
+
+
+
+
+
+ +
+
+

What do you expect this program to do? It would not be so +unreasonable to expect the Python 3 int command to convert +the string “3.4” to 3.4 and an additional type conversion to 3. After +all, Python 3 performs a lot of other magic - isn’t that part of its +charm?

+
+

PYTHON +

+
int("3.4")
+
+
+

OUTPUT +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-2-ec6729dfccdc> in <module>
+----> 1 int("3.4")
+ValueError: invalid literal for int() with base 10: '3.4'
+
+

However, Python 3 throws an error. Why? To be consistent, possibly. +If you ask Python to perform two consecutive typecasts, you must convert +it explicitly in code.

+
+

PYTHON +

+
int(float("3.4"))
+
+
+

OUTPUT +

+
3
+
+
+
+
+
+
+
+ +
+
+

Arithmetic with Different Types

+
+

Which of the following will return the floating point number +2.0? Note: there may be more than one right answer.

+
+

PYTHON +

+
first = 1.0
+second = "1"
+third = "1.1"
+
+
    +
  1. first + float(second)
  2. +
  3. float(second) + float(third)
  4. +
  5. first + int(third)
  6. +
  7. first + int(float(third))
  8. +
  9. int(first) + int(float(third))
  10. +
  11. 2.0 * second
  12. +
+
+
+
+
+
+ +
+
+

Answer: 1 and 4

+
+
+
+
+
+
+ +
+
+

Complex Numbers

+
+

Python provides complex numbers, which are written as +1.0+2.0j. If val is a complex number, its real +and imaginary parts can be accessed using dot notation as +val.real and val.imag.

+
+

PYTHON +

+
a_complex_number = 6 + 2j
+print(a_complex_number.real)
+print(a_complex_number.imag)
+
+
+

OUTPUT +

+
6.0
+2.0
+
+
    +
  1. Why do you think Python uses j instead of +i for the imaginary part?
  2. +
  3. What do you expect 1 + 2j + 3 to produce?
  4. +
  5. What do you expect 4j to be? What about +4 j or 4 + j?
  6. +
+
+
+
+
+
+ +
+
+
    +
  1. Standard mathematics treatments typically use i to +denote an imaginary number. However, from media reports it was an early +convention established from electrical engineering that now presents a +technically expensive area to change. Stack +Overflow provides additional explanation and discussion. +
  2. +
  3. (4+2j)
  4. +
  5. +4j and Syntax Error: invalid syntax. In +the latter cases, j is considered a variable and the +statement depends on if j is defined and if so, its +assigned value.
  6. +
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Every value has a type.
  • +
  • Use the built-in function type to find the type of a +value.
  • +
  • Types control what operations can be done on values.
  • +
  • Strings can be added and multiplied.
  • +
  • Strings have a length (but numbers don’t).
  • +
  • Must convert numbers to strings or vice versa when operating on +them.
  • +
  • Can mix integers and floats freely in operations.
  • +
  • Variables only change value when something is assigned to them.
  • +
+
+
+
+

Content from Built-in Functions and Help

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I use built-in functions?
  • +
  • How can I find out what they do?
  • +
  • What kind of errors can occur in programs?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain the purpose of functions.
  • +
  • Correctly call built-in Python functions.
  • +
  • Correctly nest calls to built-in functions.
  • +
  • Use help to display documentation for built-in functions.
  • +
  • Correctly describe situations in which SyntaxError and NameError +occur.
  • +
+
+
+
+
+
+

Use comments to add documentation to programs. +

+
+
+

PYTHON +

+
# This sentence isn't executed by Python.
+adjustment = 0.5   # Neither is this - anything after '#' is ignored.
+
+

A function may take zero or more arguments. +

+
+
    +
  • We have seen some functions already — now let’s take a closer +look.
  • +
  • An argument is a value passed into a function.
  • +
  • +len takes exactly one.
  • +
  • +int, str, and float create a +new value from an existing one.
  • +
  • +print takes zero or more.
  • +
  • +print with no arguments prints a blank line. +
      +
    • Must always use parentheses, even if they’re empty, so that Python +knows a function is being called.
    • +
    +
  • +
+
+

PYTHON +

+
print('before')
+print()
+print('after')
+
+
+

OUTPUT +

+
before
+
+after
+
+

Every function returns something. +

+
+
    +
  • Every function call produces some result.
  • +
  • If the function doesn’t have a useful result to return, it usually +returns the special value None. None is a +Python object that stands in anytime there is no value.
  • +
+
+

PYTHON +

+
result = print('example')
+print('result of print is', result)
+
+
+

OUTPUT +

+
example
+result of print is None
+
+

Commonly-used built-in functions include max, +min, and round. +

+
+
    +
  • Use max to find the largest value of one or more +values.
  • +
  • Use min to find the smallest.
  • +
  • Both work on character strings as well as numbers. +
      +
    • “Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.
    • +
    +
  • +
+
+

PYTHON +

+
print(max(1, 2, 3))
+print(min('a', 'A', '0'))
+
+
+

OUTPUT +

+
3
+0
+
+

Functions may only work for certain (combinations of) +arguments. +

+
+
    +
  • +max and min must be given at least one +argument. +
      +
    • “Largest of the empty set” is a meaningless question.
    • +
    +
  • +
  • And they must be given things that can meaningfully be +compared.
  • +
+
+

PYTHON +

+
print(max(1, 'a'))
+
+
+

ERROR +

+
TypeError                                 Traceback (most recent call last)
+<ipython-input-52-3f049acf3762> in <module>
+----> 1 print(max(1, 'a'))
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+

Functions may have default values for some arguments. +

+
+
    +
  • +round will round off a floating-point number.
  • +
  • By default, rounds to zero decimal places.
  • +
+
+

PYTHON +

+
round(3.712)
+
+
+

OUTPUT +

+
4
+
+
    +
  • We can specify the number of decimal places we want.
  • +
+
+

PYTHON +

+
round(3.712, 1)
+
+
+

OUTPUT +

+
3.7
+
+

Functions attached to objects are called methods +

+
+
    +
  • Functions take another form that will be common in the pandas +episodes.
  • +
  • Methods have parentheses like functions, but come after the +variable.
  • +
  • Some methods are used for internal Python operations, and are marked +with double underlines.
  • +
+
+

PYTHON +

+
my_string = 'Hello world!'  # creation of a string object 
+
+print(len(my_string))       # the len function takes a string as an argument and returns the length of the string
+
+print(my_string.swapcase()) # calling the swapcase method on the my_string object
+
+print(my_string.__len__())  # calling the internal __len__ method on the my_string object, used by len(my_string)
+
+
+

OUTPUT +

+
12
+hELLO WORLD!
+12
+
+
    +
  • You might even see them chained together. They operate left to +right.
  • +
+
+

PYTHON +

+
print(my_string.isupper())          # Not all the letters are uppercase
+print(my_string.upper())            # This capitalizes all the letters
+
+print(my_string.upper().isupper())  # Now all the letters are uppercase
+
+
+

OUTPUT +

+
False
+HELLO WORLD
+True
+
+

Use the built-in function help to get help for a +function. +

+
+
    +
  • Every built-in function has online documentation.
  • +
+
+

PYTHON +

+
help(round)
+
+
+

OUTPUT +

+
Help on built-in function round in module builtins:
+
+round(number, ndigits=None)
+    Round a number to a given precision in decimal digits.
+
+    The return value is an integer if ndigits is omitted or None.  Otherwise
+    the return value has the same type as the number.  ndigits may be negative.
+
+

The Jupyter Notebook has two ways to get help. +

+
+
    +
  • Option 1: Place the cursor near where the function is invoked in a +cell (i.e., the function name or its parameters), +
      +
    • Hold down Shift, and press Tab.
    • +
    • Do this several times to expand the information returned.
    • +
    +
  • +
  • Option 2: Type the function name in a cell with a question mark +after it. Then run the cell.
  • +

Python reports a syntax error when it can’t understand the source of +a program. +

+
+
    +
  • Won’t even try to run the program if it can’t be parsed.
  • +
+
+

PYTHON +

+
# Forgot to close the quote marks around the string.
+name = 'Feng
+
+
+

ERROR +

+
  File "<ipython-input-56-f42768451d55>", line 2
+    name = 'Feng
+                ^
+SyntaxError: EOL while scanning string literal
+
+
+

PYTHON +

+
# An extra '=' in the assignment.
+age = = 52
+
+
+

ERROR +

+
  File "<ipython-input-57-ccc3df3cf902>", line 2
+    age = = 52
+          ^
+SyntaxError: invalid syntax
+
+
    +
  • Look more closely at the error message:
  • +
+
+

PYTHON +

+
print("hello world"
+
+
+

ERROR +

+
  File "<ipython-input-6-d1cc229bf815>", line 1
+    print ("hello world"
+                        ^
+SyntaxError: unexpected EOF while parsing
+
+
    +
  • The message indicates a problem on first line of the input (“line +1”). +
      +
    • In this case the “ipython-input” section of the file name tells us +that we are working with input into IPython, the Python interpreter used +by the Jupyter Notebook.
    • +
    +
  • +
  • The -6- part of the filename indicates that the error +occurred in cell 6 of our Notebook.
  • +
  • Next is the problematic line of code, indicating the problem with a +^ pointer.
  • +

Python reports a runtime error when something goes wrong while a +program is executing. +

+
+
+

PYTHON +

+
age = 53
+remaining = 100 - aege # mis-spelled 'age'
+
+
+

ERROR +

+
NameError                                 Traceback (most recent call last)
+<ipython-input-59-1214fb6c55fc> in <module>
+      1 age = 53
+----> 2 remaining = 100 - aege # mis-spelled 'age'
+
+NameError: name 'aege' is not defined
+
+
    +
  • Fix syntax errors by reading the source and runtime errors by +tracing execution.
  • +
+
+
+ +
+
+

What Happens When

+
+
    +
  1. Explain in simple terms the order of operations in the following +program: when does the addition happen, when does the subtraction +happen, when is each function called, etc.
  2. +
  3. What is the final value of radiance?
  4. +
+
+

PYTHON +

+
radiance = 1.0
+radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))
+
+
+
+
+
+
+ +
+
+
    +
  1. Order of operations:
  2. +
  3. 1.1 * radiance = 1.1
  4. +
  5. 1.1 - 0.5 = 0.6
  6. +
  7. min(radiance, 0.6) = 0.6
  8. +
  9. 2.0 + 0.6 = 2.6
  10. +
  11. max(2.1, 2.6) = 2.6
  12. +
  13. At the end, radiance = 2.6 +
  14. +
+
+
+
+
+
+
+ +
+
+

Spot the Difference

+
+
    +
  1. Predict what each of the print statements in the +program below will print.
  2. +
  3. Does max(len(rich), poor) run or produce an error +message? If it runs, does its result make any sense?
  4. +
+
+

PYTHON +

+
easy_string = "abc"
+print(max(easy_string))
+rich = "gold"
+poor = "tin"
+print(max(rich, poor))
+print(max(len(rich), len(poor)))
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(max(easy_string))
+
+
+

OUTPUT +

+
c
+
+
+

PYTHON +

+
print(max(rich, poor))
+
+
+

OUTPUT +

+
tin
+
+
+

PYTHON +

+
print(max(len(rich), len(poor)))
+
+
+

OUTPUT +

+
4
+
+

max(len(rich), poor) throws a TypeError. This turns into +max(4, 'tin') and as we discussed earlier a string and +integer cannot meaningfully be compared.

+
+

ERROR +

+
TypeError                                 Traceback (most recent call last)
+<ipython-input-65-bc82ad05177a> in <module>
+----> 1 max(len(rich), poor)
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
+
+
+
+
+
+ +
+
+

Why Not?

+
+

Why is it that max and min do not return +None when they are called with no arguments?

+
+
+
+
+
+ +
+
+

max and min return TypeErrors in this case +because the correct number of parameters was not supplied. If it just +returned None, the error would be much harder to trace as +it would likely be stored into a variable and used later in the program, +only to likely throw a runtime error.

+
+
+
+
+
+
+ +
+
+

Last Character of a String

+
+

If Python starts counting from zero, and len returns the +number of characters in a string, what index expression will get the +last character in the string name? (Note: we will see a +simpler way to do this in a later episode.)

+
+
+
+
+
+ +
+
+

name[len(name) - 1]

+
+
+
+
+
+
+ +
+
+

Explore the Python docs!

+
+

The official Python +documentation is arguably the most complete source of information +about the language. It is available in different languages and contains +a lot of useful resources. The Built-in +Functions page contains a catalogue of all of these functions, +including the ones that we’ve covered in this lesson. Some of these are +more advanced and unnecessary at the moment, but others are very simple +and useful.

+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use comments to add documentation to programs.
  • +
  • A function may take zero or more arguments.
  • +
  • Commonly-used built-in functions include max, +min, and round.
  • +
  • Functions may only work for certain (combinations of) +arguments.
  • +
  • Functions may have default values for some arguments.
  • +
  • Use the built-in function help to get help for a +function.
  • +
  • The Jupyter Notebook has two ways to get help.
  • +
  • Every function returns something.
  • +
  • Python reports a syntax error when it can’t understand the source of +a program.
  • +
  • Python reports a runtime error when something goes wrong while a +program is executing.
  • +
  • Fix syntax errors by reading the source code, and runtime errors by +tracing the program’s execution.
  • +
+
+
+
+

Content from Morning Coffee

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+

Reflection exercise +

+
+

Over coffee, reflect on and discuss the following:

+
    +
  • What are the different kinds of errors Python will report?
  • +
  • Did the code always produce the results you expected? If not, +why?
  • +
  • Is there something we can do to prevent errors when we write +code?
  • +

Content from Libraries

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I use software that other people have written?
  • +
  • How can I find out what that software does?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain what software libraries are and why programmers create and +use them.
  • +
  • Write programs that import and use modules from Python’s standard +library.
  • +
  • Find and read documentation for the standard library interactively +(in the interpreter) and online.
  • +
+
+
+
+
+
+

Most of the power of a programming language is in its +libraries. +

+
+
    +
  • A library is a collection of files (called +modules) that contains functions for use by other programs. +
      +
    • May also contain data values (e.g., numerical constants) and other +things.
    • +
    • Library’s contents are supposed to be related, but there’s no way to +enforce that.
    • +
    +
  • +
  • The Python standard +library is an extensive suite of modules that comes with Python +itself.
  • +
  • Many additional libraries are available from PyPI (the Python Package +Index).
  • +
  • We will see later how to write new libraries.
  • +
+
+
+ +
+
+

Libraries and modules

+
+

A library is a collection of modules, but the terms are often used +interchangeably, especially since many libraries only consist of a +single module, so don’t worry if you mix them.

+
+
+
+

A program must import a library module before using it. +

+
+
    +
  • Use import to load a library module into a program’s +memory.
  • +
  • Then refer to things from the module as +module_name.thing_name. +
      +
    • Python uses . to mean “part of”.
    • +
    +
  • +
  • Using math, one of the modules in the standard +library:
  • +
+
+

PYTHON +

+
import math
+
+print('pi is', math.pi)
+print('cos(pi) is', math.cos(math.pi))
+
+
+

OUTPUT +

+
pi is 3.141592653589793
+cos(pi) is -1.0
+
+
    +
  • Have to refer to each item with the module’s name. +
      +
    • +math.cos(pi) won’t work: the reference to +pi doesn’t somehow “inherit” the function’s reference to +math.
    • +
    +
  • +

Use help to learn about the contents of a library +module. +

+
+
    +
  • Works just like help for a function.
  • +
+
+

PYTHON +

+
help(math)
+
+
+

OUTPUT +

+
Help on module math:
+
+NAME
+    math
+
+MODULE REFERENCE
+    http://docs.python.org/3/library/math
+
+    The following documentation is automatically generated from the Python
+    source files.  It may be incomplete, incorrect or include features that
+    are considered implementation detail and may vary between Python
+    implementations.  When in doubt, consult the module reference at the
+    location listed above.
+
+DESCRIPTION
+    This module is always available.  It provides access to the
+    mathematical functions defined by the C standard.
+
+FUNCTIONS
+    acos(x, /)
+        Return the arc cosine (measured in radians) of x.
+⋮ ⋮ ⋮
+
+

Import specific items from a library module to shorten +programs. +

+
+
    +
  • Use from ... import ... to load only specific items +from a library module.
  • +
  • Then refer to them directly without library name as prefix.
  • +
+
+

PYTHON +

+
from math import cos, pi
+
+print('cos(pi) is', cos(pi))
+
+
+

OUTPUT +

+
cos(pi) is -1.0
+
+

Create an alias for a library module when importing it to shorten +programs. +

+
+
    +
  • Use import ... as ... to give a library a short +alias while importing it.
  • +
  • Then refer to items in the library using that shortened name.
  • +
+
+

PYTHON +

+
import math as m
+
+print('cos(pi) is', m.cos(m.pi))
+
+
+

OUTPUT +

+
cos(pi) is -1.0
+
+
    +
  • Commonly used for libraries that are frequently used or have long +names. +
      +
    • E.g., the matplotlib plotting library is often aliased +as mpl.
    • +
    +
  • +
  • But can make programs harder to understand, since readers must learn +your program’s aliases.
  • +
+
+
+ +
+
+

Exploring the Math Module

+
+
    +
  1. What function from the math module can you use to +calculate a square root without using sqrt?
  2. +
  3. Since the library contains this function, why does sqrt +exist?
  4. +
+
+
+
+
+
+ +
+
+
    +
  1. Using help(math) we see that we’ve got +pow(x,y) in addition to sqrt(x), so we could +use pow(x, 0.5) to find a square root.

  2. +
  3. The sqrt(x) function is arguably more readable than +pow(x, 0.5) when implementing equations. Readability is a +cornerstone of good programming, so it makes sense to provide a special +function for this specific common case.

  4. +
+

Also, the design of Python’s math library has its origin +in the C standard, which includes both sqrt(x) and +pow(x,y), so a little bit of the history of programming is +showing in Python’s function names.

+
+
+
+
+
+
+ +
+
+

Locating the Right Module

+
+

You want to select a random character from a string:

+
+

PYTHON +

+
bases = 'ACTTGCTTGAC'
+
+
    +
  1. Which standard +library module could help you?
  2. +
  3. Which function would you select from that module? Are there +alternatives?
  4. +
  5. Try to write a program that uses the function.
  6. +
+
+
+
+
+
+ +
+
+

The random +module seems like it could help.

+

The string has 11 characters, each having a positional index from 0 +to 10. You could use the random.randrange +or random.randint +functions to get a random integer between 0 and 10, and then select the +bases character at that index:

+
+

PYTHON +

+
from random import randrange
+
+random_index = randrange(len(bases))
+print(bases[random_index])
+
+

or more compactly:

+
+

PYTHON +

+
from random import randrange
+
+print(bases[randrange(len(bases))])
+
+

Perhaps you found the random.sample +function? It allows for slightly less typing but might be a bit harder +to understand just by reading:

+
+

PYTHON +

+
from random import sample
+
+print(sample(bases, 1)[0])
+
+

Note that this function returns a list of values. We will learn about +lists in episode 11.

+

The simplest and shortest solution is the random.choice +function that does exactly what we want:

+
+

PYTHON +

+
from random import choice
+
+print(choice(bases))
+
+
+
+
+
+
+
+ +
+
+

Jigsaw Puzzle (Parson’s Problem) Programming Example

+
+

Rearrange the following statements so that a random DNA base is +printed and its index in the string. Not all statements may be needed. +Feel free to use/add intermediate variables.

+
+

PYTHON +

+
bases="ACTTGCTTGAC"
+import math
+import random
+___ = random.randrange(n_bases)
+___ = len(bases)
+print("random base ", bases[___], "base index", ___)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import math 
+import random
+bases = "ACTTGCTTGAC" 
+n_bases = len(bases)
+idx = random.randrange(n_bases)
+print("random base", bases[idx], "base index", idx)
+
+
+
+
+
+
+
+ +
+
+

When Is Help Available?

+
+

When a colleague of yours types help(math), Python +reports an error:

+
+

ERROR +

+
NameError: name 'math' is not defined
+
+

What has your colleague forgotten to do?

+
+
+
+
+
+ +
+
+

Importing the math module (import math)

+
+
+
+
+
+
+ +
+
+

Importing With Aliases

+
+
    +
  1. Fill in the blanks so that the program below prints +90.0.
  2. +
  3. Rewrite the program so that it uses import +without as.
  4. +
  5. Which form do you find easier to read?
  6. +
+
+

PYTHON +

+
import math as m
+angle = ____.degrees(____.pi / 2)
+print(____)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import math as m
+angle = m.degrees(m.pi / 2)
+print(angle)
+
+

can be written as

+
+

PYTHON +

+
import math
+angle = math.degrees(math.pi / 2)
+print(angle)
+
+

Since you just wrote the code and are familiar with it, you might +actually find the first version easier to read. But when trying to read +a huge piece of code written by someone else, or when getting back to +your own huge piece of code after several months, non-abbreviated names +are often easier, except where there are clear abbreviation +conventions.

+
+
+
+
+
+
+ +
+
+

There Are Many Ways To Import Libraries!

+
+

Match the following print statements with the appropriate library +calls.

+

Print commands:

+
    +
  1. print("sin(pi/2) =", sin(pi/2))
  2. +
  3. print("sin(pi/2) =", m.sin(m.pi/2))
  4. +
  5. print("sin(pi/2) =", math.sin(math.pi/2))
  6. +
+

Library calls:

+
    +
  1. from math import sin, pi
  2. +
  3. import math
  4. +
  5. import math as m
  6. +
  7. from math import *
  8. +
+
+
+
+
+
+ +
+
+
    +
  1. Library calls 1 and 4. In order to directly refer to +sin and pi without the library name as prefix, +you need to use the from ... import ... statement. Whereas +library call 1 specifically imports the two functions sin +and pi, library call 4 imports all functions in the +math module.
  2. +
  3. Library call 3. Here sin and pi are +referred to with a shortened library name m instead of +math. Library call 3 does exactly that using the +import ... as ... syntax - it creates an alias for +math in the form of the shortened name m.
  4. +
  5. Library call 2. Here sin and pi are +referred to with the regular library name math, so the +regular import ... call suffices.
  6. +
+

Note: although library call 4 works, importing all +names from a module using a wildcard import is not recommended as it makes it +unclear which names from the module are used in the code. In general it +is best to make your imports as specific as possible and to only import +what your code uses. In library call 1, the import +statement explicitly tells us that the sin function is +imported from the math module, but library call 4 does not +convey this information.

+
+
+
+
+
+
+ +
+
+

Importing Specific Items

+
+
    +
  1. Fill in the blanks so that the program below prints +90.0.
  2. +
  3. Do you find this version easier to read than preceding ones?
  4. +
  5. Why wouldn’t programmers always use this form of +import?
  6. +
+
+

PYTHON +

+
____ math import ____, ____
+angle = degrees(pi / 2)
+print(angle)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
from math import degrees, pi
+angle = degrees(pi / 2)
+print(angle)
+
+

Most likely you find this version easier to read since it’s less +dense. The main reason not to use this form of import is to avoid name +clashes. For instance, you wouldn’t import degrees this way +if you also wanted to use the name degrees for a variable +or function of your own. Or if you were to also import a function named +degrees from another library.

+
+
+
+
+
+
+ +
+
+

Reading Error Messages

+
+
    +
  1. Read the code below and try to identify what the errors are without +running it.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
+
+

PYTHON +

+
from math import log
+log(0)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-1-d72e1d780bab> in <module>
+      1 from math import log
+----> 2 log(0)
+
+ValueError: math domain error
+
+
    +
  1. The logarithm of x is only defined for +x > 0, so 0 is outside the domain of the function.
  2. +
  3. You get an error of type ValueError, indicating that +the function received an inappropriate argument value. The additional +message “math domain error” makes it clearer what the problem is.
  4. +
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Most of the power of a programming language is in its +libraries.
  • +
  • A program must import a library module in order to use it.
  • +
  • Use help to learn about the contents of a library +module.
  • +
  • Import specific items from a library to shorten programs.
  • +
  • Create an alias for a library when importing it to shorten +programs.
  • +
+
+
+
+

Content from Reading Tabular Data into DataFrames

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I read tabular data?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Import the Pandas library.
  • +
  • Use Pandas to load a simple CSV data set.
  • +
  • Get some basic information about a Pandas DataFrame.
  • +
+
+
+
+
+
+

Use the Pandas library to do statistics on tabular data. +

+
+
    +
  • +Pandas is a widely-used +Python library for statistics, particularly on tabular data.
  • +
  • Borrows many features from R’s dataframes. +
      +
    • A 2-dimensional table whose columns have names and potentially have +different data types.
    • +
    +
  • +
  • Load Pandas with import pandas as pd. The alias +pd is commonly used to refer to the Pandas library in +code.
  • +
  • Read a Comma Separated Values (CSV) data file with +pd.read_csv. +
      +
    • Argument is the name of the file to be read.
    • +
    • Returns a dataframe that you can assign to a variable
    • +
    +
  • +
+
+

PYTHON +

+
import pandas as pd
+
+data_oceania = pd.read_csv('data/gapminder_gdp_oceania.csv')
+print(data_oceania)
+
+
+

OUTPUT +

+
       country  gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  \
+0    Australia     10039.59564     10949.64959     12217.22686
+1  New Zealand     10556.57566     12247.39532     13175.67800
+
+   gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  \
+0     14526.12465     16788.62948     18334.19751     19477.00928
+1     14463.91893     16046.03728     16233.71770     17632.41040
+
+   gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  \
+0     21888.88903     23424.76683     26997.93657     30687.75473
+1     19007.19129     18363.32494     21050.41377     23189.80135
+
+   gdpPercap_2007
+0     34435.36744
+1     25185.00911
+
+
    +
  • The columns in a dataframe are the observed variables, and the rows +are the observations.
  • +
  • Pandas uses backslash \ to show wrapped lines when +output is too wide to fit the screen.
  • +
  • Using descriptive dataframe names helps us distinguish between +multiple dataframes so we won’t accidentally overwrite a dataframe or +read from the wrong one.
  • +
+
+
+ +
+
+

File Not Found

+
+

Our lessons store their data files in a data +sub-directory, which is why the path to the file is +data/gapminder_gdp_oceania.csv. If you forget to include +data/, or if you include it but your copy of the file is +somewhere else, you will get a runtime +error that ends with a line like this:

+
+

ERROR +

+
FileNotFoundError: [Errno 2] No such file or directory: 'data/gapminder_gdp_oceania.csv'
+
+
+
+
+

Use index_col to specify that a column’s values should +be used as row headings. +

+
+
    +
  • Row headings are numbers (0 and 1 in this case).
  • +
  • Really want to index by country.
  • +
  • Pass the name of the column to read_csv as its +index_col parameter to do this.
  • +
  • Naming the dataframe data_oceania_country tells us +which region the data includes (oceania) and how it is +indexed (country).
  • +
+
+

PYTHON +

+
data_oceania_country = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+print(data_oceania_country)
+
+
+

OUTPUT +

+
             gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+country
+Australia       10039.59564     10949.64959     12217.22686     14526.12465
+New Zealand     10556.57566     12247.39532     13175.67800     14463.91893
+
+             gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+country
+Australia       16788.62948     18334.19751     19477.00928     21888.88903
+New Zealand     16046.03728     16233.71770     17632.41040     19007.19129
+
+             gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+country
+Australia       23424.76683     26997.93657     30687.75473     34435.36744
+New Zealand     18363.32494     21050.41377     23189.80135     25185.00911
+
+

Use the DataFrame.info() method to find out more about +a dataframe. +

+
+
+

PYTHON +

+
data_oceania_country.info()
+
+
+

OUTPUT +

+
<class 'pandas.core.frame.DataFrame'>
+Index: 2 entries, Australia to New Zealand
+Data columns (total 12 columns):
+gdpPercap_1952    2 non-null float64
+gdpPercap_1957    2 non-null float64
+gdpPercap_1962    2 non-null float64
+gdpPercap_1967    2 non-null float64
+gdpPercap_1972    2 non-null float64
+gdpPercap_1977    2 non-null float64
+gdpPercap_1982    2 non-null float64
+gdpPercap_1987    2 non-null float64
+gdpPercap_1992    2 non-null float64
+gdpPercap_1997    2 non-null float64
+gdpPercap_2002    2 non-null float64
+gdpPercap_2007    2 non-null float64
+dtypes: float64(12)
+memory usage: 208.0+ bytes
+
+
    +
  • This is a DataFrame +
  • +
  • Two rows named 'Australia' and +'New Zealand' +
  • +
  • Twelve columns, each of which has two actual 64-bit floating point +values. +
      +
    • We will talk later about null values, which are used to represent +missing observations.
    • +
    +
  • +
  • Uses 208 bytes of memory.
  • +

The DataFrame.columns variable stores information about +the dataframe’s columns. +

+
+
    +
  • Note that this is data, not a method. (It doesn’t have +parentheses.) +
      +
    • Like math.pi.
    • +
    • So do not use () to try to call it.
    • +
    +
  • +
  • Called a member variable, or just member.
  • +
+
+

PYTHON +

+
print(data_oceania_country.columns)
+
+
+

OUTPUT +

+
Index(['gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967',
+       'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987',
+       'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007'],
+      dtype='object')
+
+

Use DataFrame.T to transpose a dataframe. +

+
+
    +
  • Sometimes want to treat columns as rows and vice versa.
  • +
  • Transpose (written .T) doesn’t copy the data, just +changes the program’s view of it.
  • +
  • Like columns, it is a member variable.
  • +
+
+

PYTHON +

+
print(data_oceania_country.T)
+
+
+

OUTPUT +

+
country           Australia  New Zealand
+gdpPercap_1952  10039.59564  10556.57566
+gdpPercap_1957  10949.64959  12247.39532
+gdpPercap_1962  12217.22686  13175.67800
+gdpPercap_1967  14526.12465  14463.91893
+gdpPercap_1972  16788.62948  16046.03728
+gdpPercap_1977  18334.19751  16233.71770
+gdpPercap_1982  19477.00928  17632.41040
+gdpPercap_1987  21888.88903  19007.19129
+gdpPercap_1992  23424.76683  18363.32494
+gdpPercap_1997  26997.93657  21050.41377
+gdpPercap_2002  30687.75473  23189.80135
+gdpPercap_2007  34435.36744  25185.00911
+
+

Use DataFrame.describe() to get summary statistics +about data. +

+
+

DataFrame.describe() gets the summary statistics of only +the columns that have numerical data. All other columns are ignored, +unless you use the argument include='all'.

+
+

PYTHON +

+
print(data_oceania_country.describe())
+
+
+

OUTPUT +

+
       gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+count        2.000000        2.000000        2.000000        2.000000
+mean     10298.085650    11598.522455    12696.452430    14495.021790
+std        365.560078      917.644806      677.727301       43.986086
+min      10039.595640    10949.649590    12217.226860    14463.918930
+25%      10168.840645    11274.086022    12456.839645    14479.470360
+50%      10298.085650    11598.522455    12696.452430    14495.021790
+75%      10427.330655    11922.958888    12936.065215    14510.573220
+max      10556.575660    12247.395320    13175.678000    14526.124650
+
+       gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+count         2.00000        2.000000        2.000000        2.000000
+mean      16417.33338    17283.957605    18554.709840    20448.040160
+std         525.09198     1485.263517     1304.328377     2037.668013
+min       16046.03728    16233.717700    17632.410400    19007.191290
+25%       16231.68533    16758.837652    18093.560120    19727.615725
+50%       16417.33338    17283.957605    18554.709840    20448.040160
+75%       16602.98143    17809.077557    19015.859560    21168.464595
+max       16788.62948    18334.197510    19477.009280    21888.889030
+
+       gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+count        2.000000        2.000000        2.000000        2.000000
+mean     20894.045885    24024.175170    26938.778040    29810.188275
+std       3578.979883     4205.533703     5301.853680     6540.991104
+min      18363.324940    21050.413770    23189.801350    25185.009110
+25%      19628.685413    22537.294470    25064.289695    27497.598692
+50%      20894.045885    24024.175170    26938.778040    29810.188275
+75%      22159.406358    25511.055870    28813.266385    32122.777857
+max      23424.766830    26997.936570    30687.754730    34435.367440
+
+
    +
  • Not particularly useful with just two records, but very helpful when +there are thousands.
  • +
+
+
+ +
+
+

Reading Other Data

+
+

Read the data in gapminder_gdp_americas.csv (which +should be in the same directory as +gapminder_gdp_oceania.csv) into a variable called +data_americas and display its summary statistics.

+
+
+
+
+
+ +
+
+

To read in a CSV, we use pd.read_csv and pass the +filename 'data/gapminder_gdp_americas.csv' to it. We also +once again pass the column name 'country' to the parameter +index_col in order to index by country. The summary +statistics can be displayed with the DataFrame.describe() +method.

+
+

PYTHON +

+
data_americas = pd.read_csv('data/gapminder_gdp_americas.csv', index_col='country')
+data_americas.describe()
+
+
+
+
+
+
+
+ +
+
+

Inspecting Data

+
+

After reading the data for the Americas, use +help(data_americas.head) and +help(data_americas.tail) to find out what +DataFrame.head and DataFrame.tail do.

+
    +
  1. What method call will display the first three rows of this +data?
  2. +
  3. What method call will display the last three columns of this data? +(Hint: you may need to change your view of the data.)
  4. +
+
+
+
+
+
+ +
+
+
    +
  1. We can check out the first five rows of data_americas +by executing data_americas.head() which lets us view the +beginning of the DataFrame. We can specify the number of rows we wish to +see by specifying the parameter n in our call to +data_americas.head(). To view the first three rows, +execute:
  2. +
+
+

PYTHON +

+
data_americas.head(n=3)
+
+
+

OUTPUT +

+
          continent  gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  \
+country
+Argentina  Americas     5911.315053     6856.856212     7133.166023
+Bolivia    Americas     2677.326347     2127.686326     2180.972546
+Brazil     Americas     2108.944355     2487.365989     3336.585802
+
+          gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  \
+country
+Argentina     8052.953021     9443.038526    10079.026740     8997.897412
+Bolivia       2586.886053     2980.331339     3548.097832     3156.510452
+Brazil        3429.864357     4985.711467     6660.118654     7030.835878
+
+           gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  \
+country
+Argentina     9139.671389     9308.418710    10967.281950     8797.640716
+Bolivia       2753.691490     2961.699694     3326.143191     3413.262690
+Brazil        7807.095818     6950.283021     7957.980824     8131.212843
+
+           gdpPercap_2007
+country
+Argentina    12779.379640
+Bolivia       3822.137084
+Brazil        9065.800825
+
+
    +
  1. To check out the last three rows of data_americas, we +would use the command, americas.tail(n=3), analogous to +head() used above. However, here we want to look at the +last three columns so we need to change our view and then use +tail(). To do so, we create a new DataFrame in which rows +and columns are switched:
  2. +
+
+

PYTHON +

+
americas_flipped = data_americas.T
+
+

We can then view the last three columns of americas by +viewing the last three rows of americas_flipped:

+
+

PYTHON +

+
americas_flipped.tail(n=3)
+
+
+

OUTPUT +

+
country        Argentina  Bolivia   Brazil   Canada    Chile Colombia  \
+gdpPercap_1997   10967.3  3326.14  7957.98  28954.9  10118.1  6117.36
+gdpPercap_2002   8797.64  3413.26  8131.21    33329  10778.8  5755.26
+gdpPercap_2007   12779.4  3822.14   9065.8  36319.2  13171.6  7006.58
+
+country        Costa Rica     Cuba Dominican Republic  Ecuador    ...     \
+gdpPercap_1997    6677.05  5431.99             3614.1  7429.46    ...
+gdpPercap_2002    7723.45  6340.65            4563.81  5773.04    ...
+gdpPercap_2007    9645.06   8948.1            6025.37  6873.26    ...
+
+country          Mexico Nicaragua   Panama Paraguay     Peru Puerto Rico  \
+gdpPercap_1997   9767.3   2253.02  7113.69   4247.4  5838.35     16999.4
+gdpPercap_2002  10742.4   2474.55  7356.03  3783.67  5909.02     18855.6
+gdpPercap_2007  11977.6   2749.32  9809.19  4172.84  7408.91     19328.7
+
+country        Trinidad and Tobago United States  Uruguay Venezuela
+gdpPercap_1997             8792.57       35767.4  9230.24   10165.5
+gdpPercap_2002             11460.6       39097.1     7727   8605.05
+gdpPercap_2007             18008.5       42951.7  10611.5   11415.8
+
+

This shows the data that we want, but we may prefer to display three +columns instead of three rows, so we can flip it back:

+
+

PYTHON +

+
americas_flipped.tail(n=3).T    
+
+

Note: we could have done the above in a single line +of code by ‘chaining’ the commands:

+
+

PYTHON +

+
data_americas.T.tail(n=3).T
+
+
+
+
+
+
+
+ +
+
+

Reading Files in Other Directories

+
+

The data for your current project is stored in a file called +microbes.csv, which is located in a folder called +field_data. You are doing analysis in a notebook called +analysis.ipynb in a sibling folder called +thesis:

+
+

OUTPUT +

+
your_home_directory
++-- field_data/
+|   +-- microbes.csv
++-- thesis/
+    +-- analysis.ipynb
+
+

What value(s) should you pass to read_csv to read +microbes.csv in analysis.ipynb?

+
+
+
+
+
+ +
+
+

We need to specify the path to the file of interest in the call to +pd.read_csv. We first need to ‘jump’ out of the folder +thesis using ‘../’ and then into the folder +field_data using ‘field_data/’. Then we can specify the +filename `microbes.csv. The result is as follows:

+
+

PYTHON +

+
data_microbes = pd.read_csv('../field_data/microbes.csv')
+
+
+
+
+
+
+
+ +
+
+

Writing Data

+
+

As well as the read_csv function for reading data from a +file, Pandas provides a to_csv function to write dataframes +to files. Applying what you’ve learned about reading from files, write +one of your dataframes to a file called processed.csv. You +can use help to get information on how to use +to_csv.

+
+
+
+
+
+ +
+
+

In order to write the DataFrame data_americas to a file +called processed.csv, execute the following command:

+
+

PYTHON +

+
data_americas.to_csv('processed.csv')
+
+

For help on read_csv or to_csv, you could +execute, for example:

+
+

PYTHON +

+
help(data_americas.to_csv)
+help(pd.read_csv)
+
+

Note that help(to_csv) or help(pd.to_csv) +throws an error! This is due to the fact that to_csv is not +a global Pandas function, but a member function of DataFrames. This +means you can only call it on an instance of a DataFrame e.g., +data_americas.to_csv or +data_oceania.to_csv

+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use the Pandas library to get basic statistics out of tabular +data.
  • +
  • Use index_col to specify that a column’s values should +be used as row headings.
  • +
  • Use DataFrame.info to find out more about a +dataframe.
  • +
  • The DataFrame.columns variable stores information about +the dataframe’s columns.
  • +
  • Use DataFrame.T to transpose a dataframe.
  • +
  • Use DataFrame.describe to get summary statistics about +data.
  • +
+
+
+
+

Content from Pandas DataFrames

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I do statistical analysis of tabular data?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Select individual values from a Pandas dataframe.
  • +
  • Select entire rows or entire columns from a dataframe.
  • +
  • Select a subset of both rows and columns from a dataframe in a +single operation.
  • +
  • Select a subset of a dataframe by a single Boolean criterion.
  • +
+
+
+
+
+
+

Note about Pandas DataFrames/Series +

+
+

A DataFrame +is a collection of Series; +The DataFrame is the way Pandas represents a table, and Series is the +data-structure Pandas use to represent a column.

+

Pandas is built on top of the Numpy library, which in practice means +that most of the methods defined for Numpy Arrays apply to Pandas +Series/DataFrames.

+

What makes Pandas so attractive is the powerful interface to access +individual records of the table, proper handling of missing values, and +relational-databases operations between DataFrames.

+

Selecting values +

+
+

To access a value at the position [i,j] of a DataFrame, +we have two options, depending on what is the meaning of i +in use. Remember that a DataFrame provides an index as a way to +identify the rows of the table; a row, then, has a position +inside the table as well as a label, which uniquely identifies +its entry in the DataFrame.

+

Use DataFrame.iloc[..., ...] to select values by their +(entry) position +

+
+
    +
  • Can specify location by numerical index analogously to 2D version of +character selection in strings.
  • +
+
+

PYTHON +

+
import pandas as pd
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.iloc[0, 0])
+
+
+

OUTPUT +

+
1601.056136
+
+

Use DataFrame.loc[..., ...] to select values by their +(entry) label. +

+
+
    +
  • Can specify location by row and/or column name.
  • +
+
+

PYTHON +

+
print(data.loc["Albania", "gdpPercap_1952"])
+
+
+

OUTPUT +

+
1601.056136
+
+

Use : on its own to mean all columns or all rows. +

+
+
    +
  • Just like Python’s usual slicing notation.
  • +
+
+

PYTHON +

+
print(data.loc["Albania", :])
+
+
+

OUTPUT +

+
gdpPercap_1952    1601.056136
+gdpPercap_1957    1942.284244
+gdpPercap_1962    2312.888958
+gdpPercap_1967    2760.196931
+gdpPercap_1972    3313.422188
+gdpPercap_1977    3533.003910
+gdpPercap_1982    3630.880722
+gdpPercap_1987    3738.932735
+gdpPercap_1992    2497.437901
+gdpPercap_1997    3193.054604
+gdpPercap_2002    4604.211737
+gdpPercap_2007    5937.029526
+Name: Albania, dtype: float64
+
+
    +
  • Would get the same result printing data.loc["Albania"] +(without a second index).
  • +
+
+

PYTHON +

+
print(data.loc[:, "gdpPercap_1952"])
+
+
+

OUTPUT +

+
country
+Albania                    1601.056136
+Austria                    6137.076492
+Belgium                    8343.105127
+⋮ ⋮ ⋮
+Switzerland               14734.232750
+Turkey                     1969.100980
+United Kingdom             9979.508487
+Name: gdpPercap_1952, dtype: float64
+
+
    +
  • Would get the same result printing +data["gdpPercap_1952"] +
  • +
  • Also get the same result printing data.gdpPercap_1952 +(not recommended, because easily confused with . notation +for methods)
  • +

Select multiple columns or rows using DataFrame.loc and +a named slice. +

+
+
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'])
+
+
+

OUTPUT +

+
             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy           8243.582340    10022.401310    12269.273780
+Montenegro      4649.593785     5907.850937     7778.414017
+Netherlands    12790.849560    15363.251360    18794.745670
+Norway         13450.401510    16361.876470    18965.055510
+Poland          5338.752143     6557.152776     8006.506993
+
+

In the above code, we discover that slicing using +loc is inclusive at both ends, which differs from +slicing using iloc, where slicing +indicates everything up to but not including the final index.

+

Result of slicing can be used in further operations. +

+
+
    +
  • Usually don’t just print a slice.
  • +
  • All the statistical operators that work on entire dataframes work +the same way on slices.
  • +
  • E.g., calculate max of a slice.
  • +
+
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].max())
+
+
+

OUTPUT +

+
gdpPercap_1962    13450.40151
+gdpPercap_1967    16361.87647
+gdpPercap_1972    18965.05551
+dtype: float64
+
+
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].min())
+
+
+

OUTPUT +

+
gdpPercap_1962    4649.593785
+gdpPercap_1967    5907.850937
+gdpPercap_1972    7778.414017
+dtype: float64
+
+

Use comparisons to select data based on value. +

+
+
    +
  • Comparison is applied element by element.
  • +
  • Returns a similarly-shaped dataframe of True and +False.
  • +
+
+

PYTHON +

+
# Use a subset of data to keep output readable.
+subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
+print('Subset of data:\n', subset)
+
+# Which values were greater than 10000 ?
+print('\nWhere are values large?\n', subset > 10000)
+
+
+

OUTPUT +

+
Subset of data:
+             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy           8243.582340    10022.401310    12269.273780
+Montenegro      4649.593785     5907.850937     7778.414017
+Netherlands    12790.849560    15363.251360    18794.745670
+Norway         13450.401510    16361.876470    18965.055510
+Poland          5338.752143     6557.152776     8006.506993
+
+Where are values large?
+            gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
+country
+Italy                False           True           True
+Montenegro           False          False          False
+Netherlands           True           True           True
+Norway                True           True           True
+Poland               False          False          False
+
+

Select values or NaN using a Boolean mask. +

+
+
    +
  • A frame full of Booleans is sometimes called a mask because +of how it can be used.
  • +
+
+

PYTHON +

+
mask = subset > 10000
+print(subset[mask])
+
+
+

OUTPUT +

+
             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy                   NaN     10022.40131     12269.27378
+Montenegro              NaN             NaN             NaN
+Netherlands     12790.84956     15363.25136     18794.74567
+Norway          13450.40151     16361.87647     18965.05551
+Poland                  NaN             NaN             NaN
+
+
    +
  • Get the value where the mask is true, and NaN (Not a Number) where +it is false.
  • +
  • Useful because NaNs are ignored by operations like max, min, +average, etc.
  • +
+
+

PYTHON +

+
print(subset[subset > 10000].describe())
+
+
+

OUTPUT +

+
       gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+count        2.000000        3.000000        3.000000
+mean     13120.625535    13915.843047    16676.358320
+std        466.373656     3408.589070     3817.597015
+min      12790.849560    10022.401310    12269.273780
+25%      12955.737547    12692.826335    15532.009725
+50%      13120.625535    15363.251360    18794.745670
+75%      13285.513523    15862.563915    18879.900590
+max      13450.401510    16361.876470    18965.055510
+
+

Group By: split-apply-combine +

+
+

Pandas vectorizing methods and grouping operations are features that +provide users much flexibility to analyse their data.

+

For instance, let’s say we want to have a clearer view on how the +European countries split themselves according to their GDP.

+
    +
  1. We may have a glance by splitting the countries in two groups during +the years surveyed, those who presented a GDP higher than the +European average and those with a lower GDP.
  2. +
  3. We then estimate a wealthy score based on the historical +(from 1962 to 2007) values, where we account how many times a country +has participated in the groups of lower or higher +GDP
  4. +
+
+

PYTHON +

+
mask_higher = data > data.mean()
+wealth_score = mask_higher.aggregate('sum', axis=1) / len(data.columns)
+print(wealth_score)
+
+
+

OUTPUT +

+
country
+Albania                   0.000000
+Austria                   1.000000
+Belgium                   1.000000
+Bosnia and Herzegovina    0.000000
+Bulgaria                  0.000000
+Croatia                   0.000000
+Czech Republic            0.500000
+Denmark                   1.000000
+Finland                   1.000000
+France                    1.000000
+Germany                   1.000000
+Greece                    0.333333
+Hungary                   0.000000
+Iceland                   1.000000
+Ireland                   0.333333
+Italy                     0.500000
+Montenegro                0.000000
+Netherlands               1.000000
+Norway                    1.000000
+Poland                    0.000000
+Portugal                  0.000000
+Romania                   0.000000
+Serbia                    0.000000
+Slovak Republic           0.000000
+Slovenia                  0.333333
+Spain                     0.333333
+Sweden                    1.000000
+Switzerland               1.000000
+Turkey                    0.000000
+United Kingdom            1.000000
+dtype: float64
+
+

Finally, for each group in the wealth_score table, we +sum their (financial) contribution across the years surveyed using +chained methods:

+
+

PYTHON +

+
print(data.groupby(wealth_score).sum())
+
+
+

OUTPUT +

+
          gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+0.000000    36916.854200    46110.918793    56850.065437    71324.848786
+0.333333    16790.046878    20942.456800    25744.935321    33567.667670
+0.500000    11807.544405    14505.000150    18380.449470    21421.846200
+1.000000   104317.277560   127332.008735   149989.154201   178000.350040
+
+          gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+0.000000    88569.346898   104459.358438   113553.768507   119649.599409
+0.333333    45277.839976    53860.456750    59679.634020    64436.912960
+0.500000    25377.727380    29056.145370    31914.712050    35517.678220
+1.000000   215162.343140   241143.412730   263388.781960   296825.131210
+
+          gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+0.000000    92380.047256   103772.937598   118590.929863   149577.357928
+0.333333    67918.093220    80876.051580   102086.795210   122803.729520
+0.500000    36310.666080    40723.538700    45564.308390    51403.028210
+1.000000   315238.235970   346930.926170   385109.939210   427850.333420
+
+
+
+ +
+
+

Selection of Individual Values

+
+

Assume Pandas has been imported into your notebook and the Gapminder +GDP data for Europe has been loaded:

+
+

PYTHON +

+
import pandas as pd
+
+data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+
+

Write an expression to find the Per Capita GDP of Serbia in 2007.

+
+
+
+
+
+ +
+
+

The selection can be done by using the labels for both the row +(“Serbia”) and the column (“gdpPercap_2007”):

+
+

PYTHON +

+
print(data_europe.loc['Serbia', 'gdpPercap_2007'])
+
+

The output is

+
+

OUTPUT +

+
9786.534714
+
+
+
+
+
+
+
+ +
+
+

Extent of Slicing

+
+
    +
  1. Do the two statements below produce the same output?
  2. +
  3. Based on this, what rule governs what is included (or not) in +numerical slices and named slices in Pandas?
  4. +
+
+

PYTHON +

+
print(data_europe.iloc[0:2, 0:2])
+print(data_europe.loc['Albania':'Belgium', 'gdpPercap_1952':'gdpPercap_1962'])
+
+
+
+
+
+
+ +
+
+

No, they do not produce the same output! The output of the first +statement is:

+
+

OUTPUT +

+
        gdpPercap_1952  gdpPercap_1957
+country
+Albania     1601.056136     1942.284244
+Austria     6137.076492     8842.598030
+
+

The second statement gives:

+
+

OUTPUT +

+
        gdpPercap_1952  gdpPercap_1957  gdpPercap_1962
+country
+Albania     1601.056136     1942.284244     2312.888958
+Austria     6137.076492     8842.598030    10750.721110
+Belgium     8343.105127     9714.960623    10991.206760
+
+

Clearly, the second statement produces an additional column and an +additional row compared to the first statement.
+What conclusion can we draw? We see that a numerical slice, 0:2, +omits the final index (i.e. index 2) in the range provided, +while a named slice, ‘gdpPercap_1952’:‘gdpPercap_1962’, +includes the final element.

+
+
+
+
+
+
+ +
+
+

Reconstructing Data

+
+

Explain what each line in the following short program does: what is +in first, second, etc.?

+
+

PYTHON +

+
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+second = first[first['continent'] == 'Americas']
+third = second.drop('Puerto Rico')
+fourth = third.drop('continent', axis = 1)
+fourth.to_csv('result.csv')
+
+
+
+
+
+
+ +
+
+

Let’s go through this piece of code line by line.

+
+

PYTHON +

+
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+
+

This line loads the dataset containing the GDP data from all +countries into a dataframe called first. The +index_col='country' parameter selects which column to use +as the row labels in the dataframe.

+
+

PYTHON +

+
second = first[first['continent'] == 'Americas']
+
+

This line makes a selection: only those rows of first +for which the ‘continent’ column matches ‘Americas’ are extracted. +Notice how the Boolean expression inside the brackets, +first['continent'] == 'Americas', is used to select only +those rows where the expression is true. Try printing this expression! +Can you print also its individual True/False elements? (hint: first +assign the expression to a variable)

+
+

PYTHON +

+
third = second.drop('Puerto Rico')
+
+

As the syntax suggests, this line drops the row from +second where the label is ‘Puerto Rico’. The resulting +dataframe third has one row less than the original +dataframe second.

+
+

PYTHON +

+
fourth = third.drop('continent', axis = 1)
+
+

Again we apply the drop function, but in this case we are dropping +not a row but a whole column. To accomplish this, we need to specify +also the axis parameter (we want to drop the second column +which has index 1).

+
+

PYTHON +

+
fourth.to_csv('result.csv')
+
+

The final step is to write the data that we have been working on to a +csv file. Pandas makes this easy with the to_csv() +function. The only required argument to the function is the filename. +Note that the file will be written in the directory from which you +started the Jupyter or Python session.

+
+
+
+
+
+
+ +
+
+

Selecting Indices

+
+

Explain in simple terms what idxmin and +idxmax do in the short program below. When would you use +these methods?

+
+

PYTHON +

+
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.idxmin())
+print(data.idxmax())
+
+
+
+
+
+
+ +
+
+

For each column in data, idxmin will return +the index value corresponding to each column’s minimum; +idxmax will do accordingly the same for each column’s +maximum value.

+

You can use these functions whenever you want to get the row index of +the minimum/maximum value and not the actual minimum/maximum value.

+
+
+
+
+
+
+ +
+
+

Practice with Selection

+
+

Assume Pandas has been imported and the Gapminder GDP data for Europe +has been loaded. Write an expression to select each of the +following:

+
    +
  1. GDP per capita for all countries in 1982.
  2. +
  3. GDP per capita for Denmark for all years.
  4. +
  5. GDP per capita for all countries for years after 1985.
  6. +
  7. GDP per capita for each country in 2007 as a multiple of GDP per +capita for that country in 1952.
  8. +
+
+
+
+
+
+ +
+
+

1:

+
+

PYTHON +

+
data['gdpPercap_1982']
+
+

2:

+
+

PYTHON +

+
data.loc['Denmark',:]
+
+

3:

+
+

PYTHON +

+
data.loc[:,'gdpPercap_1985':]
+
+

Pandas is smart enough to recognize the number at the end of the +column label and does not give you an error, although no column named +gdpPercap_1985 actually exists. This is useful if new +columns are added to the CSV file later.

+

4:

+
+

PYTHON +

+
data['gdpPercap_2007']/data['gdpPercap_1952']
+
+
+
+
+
+
+
+ +
+
+

Many Ways of Access

+
+

There are at least two ways of accessing a value or slice of a +DataFrame: by name or index. However, there are many others. For +example, a single column or row can be accessed either as a +DataFrame or a Series object.

+

Suggest different ways of doing the following operations on a +DataFrame:

+
    +
  1. Access a single column
  2. +
  3. Access a single row
  4. +
  5. Access an individual DataFrame element
  6. +
  7. Access several columns
  8. +
  9. Access several rows
  10. +
  11. Access a subset of specific rows and columns
  12. +
  13. Access a subset of row and column ranges
  14. +
+
+
+
+
+
+ +
+
+

1. Access a single column:

+
+

PYTHON +

+
# by name
+data["col_name"]   # as a Series
+data[["col_name"]] # as a DataFrame
+
+# by name using .loc
+data.T.loc["col_name"]  # as a Series
+data.T.loc[["col_name"]].T  # as a DataFrame
+
+# Dot notation (Series)
+data.col_name
+
+# by index (iloc)
+data.iloc[:, col_index]   # as a Series
+data.iloc[:, [col_index]] # as a DataFrame
+
+# using a mask
+data.T[data.T.index == "col_name"].T
+
+

2. Access a single row:

+
+

PYTHON +

+
# by name using .loc
+data.loc["row_name"] # as a Series
+data.loc[["row_name"]] # as a DataFrame
+
+# by name
+data.T["row_name"] # as a Series
+data.T[["row_name"]].T # as a DataFrame
+
+# by index
+data.iloc[row_index]   # as a Series
+data.iloc[[row_index]]   # as a DataFrame
+
+# using mask
+data[data.index == "row_name"]
+
+

3. Access an individual DataFrame element:

+
+

PYTHON +

+
# by column/row names
+data["column_name"]["row_name"]         # as a Series
+
+data[["col_name"]].loc["row_name"]  # as a Series
+data[["col_name"]].loc[["row_name"]]  # as a DataFrame
+
+data.loc["row_name"]["col_name"]  # as a value
+data.loc[["row_name"]]["col_name"]  # as a Series
+data.loc[["row_name"]][["col_name"]]  # as a DataFrame
+
+data.loc["row_name", "col_name"]  # as a value
+data.loc[["row_name"], "col_name"]  # as a Series. Preserves index. Column name is moved to `.name`.
+data.loc["row_name", ["col_name"]]  # as a Series. Index is moved to `.name.` Sets index to column name.
+data.loc[["row_name"], ["col_name"]]  # as a DataFrame (preserves original index and column name)
+
+# by column/row names: Dot notation
+data.col_name.row_name
+
+# by column/row indices
+data.iloc[row_index, col_index] # as a value
+data.iloc[[row_index], col_index] # as a Series. Preserves index. Column name is moved to `.name`
+data.iloc[row_index, [col_index]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.iloc[[row_index], [col_index]] # as a DataFrame (preserves original index and column name)
+
+# column name + row index
+data["col_name"][row_index]
+data.col_name[row_index]
+data["col_name"].iloc[row_index]
+
+# column index + row name
+data.iloc[:, [col_index]].loc["row_name"]  # as a Series
+data.iloc[:, [col_index]].loc[["row_name"]]  # as a DataFrame
+
+# using masks
+data[data.index == "row_name"].T[data.T.index == "col_name"].T
+
+

4. Access several columns:

+
+

PYTHON +

+
# by name
+data[["col1", "col2", "col3"]]
+data.loc[:, ["col1", "col2", "col3"]]
+
+# by index
+data.iloc[:, [col1_index, col2_index, col3_index]]
+
+

5. Access several rows

+
+

PYTHON +

+
# by name
+data.loc[["row1", "row2", "row3"]]
+
+# by index
+data.iloc[[row1_index, row2_index, row3_index]]
+
+

6. Access a subset of specific rows and columns

+
+

PYTHON +

+
# by names
+data.loc[["row1", "row2", "row3"], ["col1", "col2", "col3"]]
+
+# by indices
+data.iloc[[row1_index, row2_index, row3_index], [col1_index, col2_index, col3_index]]
+
+# column names + row indices
+data[["col1", "col2", "col3"]].iloc[[row1_index, row2_index, row3_index]]
+
+# column indices + row names
+data.iloc[:, [col1_index, col2_index, col3_index]].loc[["row1", "row2", "row3"]]
+
+

7. Access a subset of row and column ranges

+
+

PYTHON +

+
# by name
+data.loc["row1":"row2", "col1":"col2"]
+
+# by index
+data.iloc[row1_index:row2_index, col1_index:col2_index]
+
+# column names + row indices
+data.loc[:, "col1_name":"col2_name"].iloc[row1_index:row2_index]
+
+# column indices + row names
+data.iloc[:, col1_index:col2_index].loc["row1":"row2"]
+
+
+
+
+
+
+
+ +
+
+

Exploring available methods using the +dir() function

+
+

Python includes a dir() function that can be used to +display all of the available methods (functions) that are built into a +data object. In Episode 4, we used some methods with a string. But we +can see many more are available by using dir():

+
+

PYTHON +

+
my_string = 'Hello world!'   # creation of a string object 
+dir(my_string)
+
+

This command returns:

+
+

PYTHON +

+
['__add__',
+...
+'__subclasshook__',
+'capitalize',
+'casefold',
+'center',
+...
+'upper',
+'zfill']
+
+

You can use help() or Shift+Tab to +get more information about what these methods do.

+

Assume Pandas has been imported and the Gapminder GDP data for Europe +has been loaded as data. Then, use dir() to +find the function that prints out the median per-capita GDP across all +European countries for each year that information is available.

+
+
+
+
+
+ +
+
+

Among many choices, dir() lists the +median() function as a possibility. Thus,

+
+

PYTHON +

+
data.median()
+
+
+
+
+
+
+
+ +
+
+

Interpretation

+
+

Poland’s borders have been stable since 1945, but changed several +times in the years before then. How would you handle this if you were +creating a table of GDP per capita for Poland for the entire twentieth +century?

+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use DataFrame.iloc[..., ...] to select values by +integer location.
  • +
  • Use : on its own to mean all columns or all rows.
  • +
  • Select multiple columns or rows using DataFrame.loc and +a named slice.
  • +
  • Result of slicing can be used in further operations.
  • +
  • Use comparisons to select data based on value.
  • +
  • Select values or NaN using a Boolean mask.
  • +
+
+
+
+

Content from Plotting

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I plot my data?
  • +
  • How can I save my plot for publishing?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Create a time series plot showing a single data set.
  • +
  • Create a scatter plot showing relationship between two data +sets.
  • +
+
+
+
+
+
+

+matplotlib is the +most widely used scientific plotting library in Python. +

+
+
    +
  • Commonly use a sub-library called matplotlib.pyplot.
  • +
  • The Jupyter Notebook will render plots inline by default.
  • +
+
+

PYTHON +

+
import matplotlib.pyplot as plt
+
+
    +
  • Simple plots are then (fairly) simple to create.
  • +
+
+

PYTHON +

+
time = [0, 1, 2, 3]
+position = [0, 100, 200, 300]
+
+plt.plot(time, position)
+plt.xlabel('Time (hr)')
+plt.ylabel('Position (km)')
+
+
A line chart showing time (hr) relative to position (km), using the values provided in the code block above. By default, the plotted line is blue against a white background, and the axes have been scaled automatically to fit the range of the input data.
+
+ +
+
+

Display All Open Figures

+
+

In our Jupyter Notebook example, running the cell should generate the +figure directly below the code. The figure is also included in the +Notebook document for future viewing. However, other Python environments +like an interactive Python session started from a terminal or a Python +script executed via the command line require an additional command to +display the figure.

+

Instruct matplotlib to show a figure:

+
+

PYTHON +

+
plt.show()
+
+

This command can also be used within a Notebook - for instance, to +display multiple figures if several are created by a single cell.

+
+
+
+

Plot data directly from a Pandas dataframe. +

+
+
    +
  • We can also plot Pandas +dataframes.
  • +
  • Before plotting, we convert the column headings from a +string to integer data type, since they +represent numerical values, using str.replace() +to remove the gpdPercap_ prefix and then astype(int) +to convert the series of string values +(['1952', '1957', ..., '2007']) to a series of integers: +[1925, 1957, ..., 2007].
  • +
+
+

PYTHON +

+
import pandas as pd
+
+data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+
+# Extract year from last 4 characters of each column name
+# The current column names are structured as 'gdpPercap_(year)', 
+# so we want to keep the (year) part only for clarity when plotting GDP vs. years
+# To do this we use replace(), which removes from the string the characters stated in the argument
+# This method works on strings, so we use replace() from Pandas Series.str vectorized string functions
+
+years = data.columns.str.replace('gdpPercap_', '')
+
+# Convert year values to integers, saving results back to dataframe
+
+data.columns = years.astype(int)
+
+data.loc['Australia'].plot()
+
+
GDP plot for Australia

Select and transform data, then plot it. +

+
+
    +
  • By default, DataFrame.plot +plots with the rows as the X axis.
  • +
  • We can transpose the data in order to plot multiple series.
  • +
+
+

PYTHON +

+
data.T.plot()
+plt.ylabel('GDP per capita')
+
+
GDP plot for Australia and New Zealand

Many styles of plot are available. +

+
+
    +
  • For example, do a bar plot using a fancier style.
  • +
+
+

PYTHON +

+
plt.style.use('ggplot')
+data.T.plot(kind='bar')
+plt.ylabel('GDP per capita')
+
+
GDP barplot for Australia

Data can also be plotted by calling the matplotlib +plot function directly. +

+
+
    +
  • The command is plt.plot(x, y) +
  • +
  • The color and format of markers can also be specified as an +additional optional argument e.g., b- is a blue line, +g-- is a green dashed line.
  • +

Get Australia data from dataframe +

+
+
+

PYTHON +

+
years = data.columns
+gdp_australia = data.loc['Australia']
+
+plt.plot(years, gdp_australia, 'g--')
+
+
GDP formatted plot for Australia

Can plot many sets of data together. +

+
+
+

PYTHON +

+
# Select two countries' worth of data.
+gdp_australia = data.loc['Australia']
+gdp_nz = data.loc['New Zealand']
+
+# Plot with differently-colored markers.
+plt.plot(years, gdp_australia, 'b-', label='Australia')
+plt.plot(years, gdp_nz, 'g-', label='New Zealand')
+
+# Create legend.
+plt.legend(loc='upper left')
+plt.xlabel('Year')
+plt.ylabel('GDP per capita ($)')
+
+
+
+ +
+
+

Adding a Legend

+
+

Often when plotting multiple datasets on the same figure it is +desirable to have a legend describing the data.

+

This can be done in matplotlib in two stages:

+
    +
  • Provide a label for each dataset in the figure:
  • +
+
+

PYTHON +

+
plt.plot(years, gdp_australia, label='Australia')
+plt.plot(years, gdp_nz, label='New Zealand')
+
+
    +
  • Instruct matplotlib to create the legend.
  • +
+
+

PYTHON +

+
plt.legend()
+
+

By default matplotlib will attempt to place the legend in a suitable +position. If you would rather specify a position this can be done with +the loc= argument, e.g to place the legend in the upper +left corner of the plot, specify loc='upper left'

+
+
+
+
GDP formatted plot for Australia and New Zealand
    +
  • Plot a scatter plot correlating the GDP of Australia and New +Zealand
  • +
  • Use either plt.scatter or +DataFrame.plot.scatter +
  • +
+
+

PYTHON +

+
plt.scatter(gdp_australia, gdp_nz)
+
+
GDP correlation using plt.scatter
+

PYTHON +

+
data.T.plot.scatter(x = 'Australia', y = 'New Zealand')
+
+
GDP correlation using data.T.plot.scatter
+
+ +
+
+

Minima and Maxima

+
+

Fill in the blanks below to plot the minimum GDP per capita over time +for all the countries in Europe. Modify it again to plot the maximum GDP +per capita over time for Europe.

+
+

PYTHON +

+
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+data_europe.____.plot(label='min')
+data_europe.____
+plt.legend(loc='best')
+plt.xticks(rotation=90)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+data_europe.min().plot(label='min')
+data_europe.max().plot(label='max')
+plt.legend(loc='best')
+plt.xticks(rotation=90)
+
+
Minima Maxima Solution
+
+
+
+
+
+
+ +
+
+

Correlations

+
+

Modify the example in the notes to create a scatter plot showing the +relationship between the minimum and maximum GDP per capita among the +countries in Asia for each year in the data set. What relationship do +you see (if any)?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col='country')
+data_asia.describe().T.plot(kind='scatter', x='min', y='max')
+
+
Correlations Solution 1

No particular correlations can be seen between the minimum and +maximum GDP values year on year. It seems the fortunes of asian +countries do not rise and fall together.

+
+
+
+
+
+
+ +
+
+

Correlations (continued) +

+
+

You might note that the variability in the maximum is much higher +than that of the minimum. Take a look at the maximum and the max +indexes:

+
+

PYTHON +

+
data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col='country')
+data_asia.max().plot()
+print(data_asia.idxmax())
+print(data_asia.idxmin())
+
+
+
+
+
+
+ +
+
+
Correlations Solution 2

Seems the variability in this value is due to a sharp drop after +1972. Some geopolitics at play perhaps? Given the dominance of oil +producing countries, maybe the Brent crude index would make an +interesting comparison? Whilst Myanmar consistently has the lowest GDP, +the highest GDP nation has varied more notably.

+
+
+
+
+
+
+ +
+
+

More Correlations

+
+

This short program creates a plot showing the correlation between GDP +and life expectancy for 2007, normalizing marker size by population:

+
+

PYTHON +

+
data_all = pd.read_csv('data/gapminder_all.csv', index_col='country')
+data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007',
+              s=data_all['pop_2007']/1e6)
+
+

Using online help and other resources, explain what each argument to +plot does.

+
+
+
+
+
+ +
+
+
More Correlations Solution

A good place to look is the documentation for the plot function - +help(data_all.plot).

+

kind - As seen already this determines the kind of plot to be +drawn.

+

x and y - A column name or index that determines what data will be +placed on the x and y axes of the plot

+

s - Details for this can be found in the documentation of +plt.scatter. A single number or one value for each data point. +Determines the size of the plotted points.

+
+
+
+
+
+
+ +
+
+

Saving your plot to a file

+
+

If you are satisfied with the plot you see you may want to save it to +a file, perhaps to include it in a publication. There is a function in +the matplotlib.pyplot module that accomplishes this: savefig. +Calling this function, e.g. with

+
+

PYTHON +

+
plt.savefig('my_figure.png')
+
+

will save the current figure to the file my_figure.png. +The file format will automatically be deduced from the file name +extension (other formats are pdf, ps, eps and svg).

+

Note that functions in plt refer to a global figure +variable and after a figure has been displayed to the screen (e.g. with +plt.show) matplotlib will make this variable refer to a new +empty figure. Therefore, make sure you call plt.savefig +before the plot is displayed to the screen, otherwise you may find a +file with an empty plot.

+

When using dataframes, data is often generated and plotted to screen +in one line. In addition to using plt.savefig, we can save +a reference to the current figure in a local variable (with +plt.gcf) and call the savefig class method +from that variable to save the figure to file.

+
+

PYTHON +

+
data.plot(kind='bar')
+fig = plt.gcf() # get current figure
+fig.savefig('my_figure.png')
+
+
+
+
+
+
+ +
+
+

Making your plots accessible

+
+

Whenever you are generating plots to go into a paper or a +presentation, there are a few things you can do to make sure that +everyone can understand your plots.

+
    +
  • Always make sure your text is large enough to read. Use the +fontsize parameter in xlabel, +ylabel, title, and legend, and tick_params +with labelsize to increase the text size of the numbers +on your axes.
  • +
  • Similarly, you should make your graph elements easy to see. Use +s to increase the size of your scatterplot markers and +linewidth to increase the sizes of your plot lines.
  • +
  • Using color (and nothing else) to distinguish between different plot +elements will make your plots unreadable to anyone who is colorblind, or +who happens to have a black-and-white office printer. For lines, the +linestyle parameter lets you use different types of lines. +For scatterplots, marker lets you change the shape of your +points. If you’re unsure about your colors, you can use Coblis +or Color Oracle to simulate what +your plots would look like to those with colorblindness.
  • +
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • +matplotlib is the +most widely used scientific plotting library in Python.
  • +
  • Plot data directly from a Pandas dataframe.
  • +
  • Select and transform data, then plot it.
  • +
  • Many styles of plot are available: see the Python Graph +Gallery for more options.
  • +
  • Can plot many sets of data together.
  • +
+
+
+
+

Content from Lunch

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+

Over lunch, reflect on and discuss the following:

+
    +
  • What sort of packages might you use in Python and why would you use +them?
  • +
  • How would data need to be formatted to be used in Pandas data +frames? Would the data you have meet these requirements?
  • +
  • What limitations or problems might you run into when thinking about +how to apply what we’ve learned to your own projects or data?
  • +

Content from Lists

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I store multiple values?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain why programs need collections of values.
  • +
  • Write programs that create flat lists, index them, slice them, and +modify them through assignment and method calls.
  • +
+
+
+
+
+
+

A list stores many values in a single structure. +

+
+
    +
  • Doing calculations with a hundred variables called +pressure_001, pressure_002, etc., would be at +least as slow as doing them by hand.
  • +
  • Use a list to store many values together. +
      +
    • Contained within square brackets [...].
    • +
    • Values separated by commas ,.
    • +
    +
  • +
  • Use len to find out how many values are in a list.
  • +
+
+

PYTHON +

+
pressures = [0.273, 0.275, 0.277, 0.275, 0.276]
+print('pressures:', pressures)
+print('length:', len(pressures))
+
+
+

OUTPUT +

+
pressures: [0.273, 0.275, 0.277, 0.275, 0.276]
+length: 5
+
+

Use an item’s index to fetch it from a list. +

+
+
    +
  • Just like strings.
  • +
+
+

PYTHON +

+
print('zeroth item of pressures:', pressures[0])
+print('fourth item of pressures:', pressures[4])
+
+
+

OUTPUT +

+
zeroth item of pressures: 0.273
+fourth item of pressures: 0.276
+
+

Lists’ values can be replaced by assigning to them. +

+
+
    +
  • Use an index expression on the left of assignment to replace a +value.
  • +
+
+

PYTHON +

+
pressures[0] = 0.265
+print('pressures is now:', pressures)
+
+
+

OUTPUT +

+
pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]
+
+

Appending items to a list lengthens it. +

+
+
    +
  • Use list_name.append to add items to the end of a +list.
  • +
+
+

PYTHON +

+
primes = [2, 3, 5]
+print('primes is initially:', primes)
+primes.append(7)
+print('primes has become:', primes)
+
+
+

OUTPUT +

+
primes is initially: [2, 3, 5]
+primes has become: [2, 3, 5, 7]
+
+
    +
  • +append is a method of lists. +
      +
    • Like a function, but tied to a particular object.
    • +
    +
  • +
  • Use object_name.method_name to call methods. +
      +
    • Deliberately resembles the way we refer to things in a library.
    • +
    +
  • +
  • We will meet other methods of lists as we go along. +
      +
    • Use help(list) for a preview.
    • +
    +
  • +
  • +extend is similar to append, but it allows +you to combine two lists. For example:
  • +
+
+

PYTHON +

+
teen_primes = [11, 13, 17, 19]
+middle_aged_primes = [37, 41, 43, 47]
+print('primes is currently:', primes)
+primes.extend(teen_primes)
+print('primes has now become:', primes)
+primes.append(middle_aged_primes)
+print('primes has finally become:', primes)
+
+
+

OUTPUT +

+
primes is currently: [2, 3, 5, 7]
+primes has now become: [2, 3, 5, 7, 11, 13, 17, 19]
+primes has finally become: [2, 3, 5, 7, 11, 13, 17, 19, [37, 41, 43, 47]]
+
+

Note that while extend maintains the “flat” structure of +the list, appending a list to a list means the last element in +primes will itself be a list, not an integer. Lists can +contain values of any type; therefore, lists of lists are possible.

+

Use del to remove items from a list entirely. +

+
+
    +
  • We use del list_name[index] to remove an element from a +list (in the example, 9 is not a prime number) and thus shorten it.
  • +
  • +del is not a function or a method, but a statement in +the language.
  • +
+
+

PYTHON +

+
primes = [2, 3, 5, 7, 9]
+print('primes before removing last item:', primes)
+del primes[4]
+print('primes after removing last item:', primes)
+
+
+

OUTPUT +

+
primes before removing last item: [2, 3, 5, 7, 9]
+primes after removing last item: [2, 3, 5, 7]
+
+

The empty list contains no values. +

+
+
    +
  • Use [] on its own to represent a list that doesn’t +contain any values. +
      +
    • “The zero of lists.”
    • +
    +
  • +
  • Helpful as a starting point for collecting values (which we will see +in the next episode).
  • +

Lists may contain values of different types. +

+
+
    +
  • A single list may contain numbers, strings, and anything else.
  • +
+
+

PYTHON +

+
goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']
+
+

Character strings can be indexed like lists. +

+
+
    +
  • Get single characters from a character string using indexes in +square brackets.
  • +
+
+

PYTHON +

+
element = 'carbon'
+print('zeroth character:', element[0])
+print('third character:', element[3])
+
+
+

OUTPUT +

+
zeroth character: c
+third character: b
+
+

Character strings are immutable. +

+
+
    +
  • Cannot change the characters in a string after it has been created. +
      +
    • +Immutable: can’t be changed after creation.
    • +
    • In contrast, lists are mutable: they can be modified in +place.
    • +
    +
  • +
  • Python considers the string to be a single value with parts, not a +collection of values.
  • +
+
+

PYTHON +

+
element[0] = 'C'
+
+
+

ERROR +

+
TypeError: 'str' object does not support item assignment
+
+
    +
  • Lists and character strings are both collections.
  • +

Indexing beyond the end of the collection is an error. +

+
+
    +
  • Python reports an IndexError if we attempt to access a +value that doesn’t exist. +
      +
    • This is a kind of runtime error.
    • +
    • Cannot be detected as the code is parsed because the index might be +calculated based on data.
    • +
    +
  • +
+
+

PYTHON +

+
print('99th element of element is:', element[99])
+
+
+

OUTPUT +

+
IndexError: string index out of range
+
+
+
+ +
+
+

Fill in the Blanks

+
+

Fill in the blanks so that the program below produces the output +shown.

+
+

PYTHON +

+
values = ____
+values.____(1)
+values.____(3)
+values.____(5)
+print('first time:', values)
+values = values[____]
+print('second time:', values)
+
+
+

OUTPUT +

+
first time: [1, 3, 5]
+second time: [3, 5]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
values = []
+values.append(1)
+values.append(3)
+values.append(5)
+print('first time:', values)
+values = values[1:]
+print('second time:', values)
+
+
+
+
+
+
+
+ +
+
+

How Large is a Slice?

+
+

If start and stop are both non-negative +integers, how long is the list values[start:stop]?

+
+
+
+
+
+ +
+
+

The list values[start:stop] has up to +stop - start elements. For example, +values[1:4] has the 3 elements values[1], +values[2], and values[3]. Why ‘up to’? As we +saw in episode 2, if stop +is greater than the total length of the list values, we +will still get a list back but it will be shorter than expected.

+
+
+
+
+
+
+ +
+
+

From Strings to Lists and Back

+
+

Given this:

+
+

PYTHON +

+
print('string to list:', list('tin'))
+print('list to string:', ''.join(['g', 'o', 'l', 'd']))
+
+
+

OUTPUT +

+
string to list: ['t', 'i', 'n']
+list to string: gold
+
+
    +
  1. What does list('some string') do?
  2. +
  3. What does '-'.join(['x', 'y', 'z']) generate?
  4. +
+
+
+
+
+
+ +
+
+
    +
  1. +list('some string') +converts a string into a list containing all of its characters.
  2. +
  3. +join +returns a string that is the concatenation of each string +element in the list and adds the separator between each element in the +list. This results in x-y-z. The separator between the +elements is the string that provides this method.
  4. +
+
+
+
+
+
+
+ +
+
+

Working With the End

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'helium'
+print(element[-1])
+
+
    +
  1. How does Python interpret a negative index?
  2. +
  3. If a list or string has N elements, what is the most negative index +that can safely be used with it, and what location does that index +represent?
  4. +
  5. If values is a list, what does +del values[-1] do?
  6. +
  7. How can you display all elements but the last one without changing +values? (Hint: you will need to combine slicing and +negative indexing.)
  8. +
+
+
+
+
+
+ +
+
+

The program prints m.

+
    +
  1. Python interprets a negative index as starting from the end (as +opposed to starting from the beginning). The last element is +-1.
  2. +
  3. The last index that can safely be used with a list of N elements is +element -N, which represents the first element.
  4. +
  5. +del values[-1] removes the last element from the +list.
  6. +
  7. values[:-1]
  8. +
+
+
+
+
+
+
+ +
+
+

Stepping Through a List

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'fluorine'
+print(element[::2])
+print(element[::-1])
+
+
    +
  1. If we write a slice as low:high:stride, what does +stride do?
  2. +
  3. What expression would select all of the even-numbered items from a +collection?
  4. +
+
+
+
+
+
+ +
+
+

The program prints

+
+

PYTHON +

+
furn
+eniroulf
+
+
    +
  1. +stride is the step size of the slice.
  2. +
  3. The slice 1::2 selects all even-numbered items from a +collection: it starts with element 1 (which is the second +element, since indexing starts at 0), goes on until the end +(since no end is given), and uses a step size of +2 (i.e., selects every second element).
  4. +
+
+
+
+
+
+
+ +
+
+

Slice Bounds

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'lithium'
+print(element[0:20])
+print(element[-1:3])
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
lithium
+
+

The first statement prints the whole string, since the slice goes +beyond the total length of the string. The second statement returns an +empty string, because the slice goes “out of bounds” of the string.

+
+
+
+
+
+
+ +
+
+

Sort and Sorted

+
+

What do these two programs print? In simple terms, explain the +difference between sorted(letters) and +letters.sort().

+
+

PYTHON +

+
# Program A
+letters = list('gold')
+result = sorted(letters)
+print('letters is', letters, 'and result is', result)
+
+
+

PYTHON +

+
# Program B
+letters = list('gold')
+result = letters.sort()
+print('letters is', letters, 'and result is', result)
+
+
+
+
+
+
+ +
+
+

Program A prints

+
+

OUTPUT +

+
letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']
+
+

Program B prints

+
+

OUTPUT +

+
letters is ['d', 'g', 'l', 'o'] and result is None
+
+

sorted(letters) returns a sorted copy of the list +letters (the original list letters remains +unchanged), while letters.sort() sorts the list +letters in-place and does not return anything.

+
+
+
+
+
+
+ +
+
+

Copying (or Not)

+
+

What do these two programs print? In simple terms, explain the +difference between new = old and +new = old[:].

+
+

PYTHON +

+
# Program A
+old = list('gold')
+new = old      # simple assignment
+new[0] = 'D'
+print('new is', new, 'and old is', old)
+
+
+

PYTHON +

+
# Program B
+old = list('gold')
+new = old[:]   # assigning a slice
+new[0] = 'D'
+print('new is', new, 'and old is', old)
+
+
+
+
+
+
+ +
+
+

Program A prints

+
+

OUTPUT +

+
new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']
+
+

Program B prints

+
+

OUTPUT +

+
new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']
+
+

new = old makes new a reference to the list +old; new and old point towards +the same object.

+

new = old[:] however creates a new list object +new containing all elements from the list old; +new and old are different objects.

+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • A list stores many values in a single structure.
  • +
  • Use an item’s index to fetch it from a list.
  • +
  • Lists’ values can be replaced by assigning to them.
  • +
  • Appending items to a list lengthens it.
  • +
  • Use del to remove items from a list entirely.
  • +
  • The empty list contains no values.
  • +
  • Lists may contain values of different types.
  • +
  • Character strings can be indexed like lists.
  • +
  • Character strings are immutable.
  • +
  • Indexing beyond the end of the collection is an error.
  • +
+
+
+
+

Content from For Loops

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I make a program do many things?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain what for loops are normally used for.
  • +
  • Trace the execution of a simple (unnested) loop and correctly state +the values of variables in each iteration.
  • +
  • Write for loops that use the Accumulator pattern to aggregate +values.
  • +
+
+
+
+
+
+

A for loop executes commands once for each value in a +collection. +

+
+
    +
  • Doing calculations on the values in a list one by one is as painful +as working with pressure_001, pressure_002, +etc.
  • +
  • A for loop tells Python to execute some statements once for +each value in a list, a character string, or some other collection.
  • +
  • “for each thing in this group, do these operations”
  • +
+
+

PYTHON +

+
for number in [2, 3, 5]:
+    print(number)
+
+
    +
  • This for loop is equivalent to:
  • +
+
+

PYTHON +

+
print(2)
+print(3)
+print(5)
+
+
    +
  • And the for loop’s output is:
  • +
+
+

OUTPUT +

+
2
+3
+5
+
+

A for loop is made up of a collection, a loop variable, +and a body. +

+
+
+

PYTHON +

+
for number in [2, 3, 5]:
+    print(number)
+
+
    +
  • The collection, [2, 3, 5], is what the loop is being +run on.
  • +
  • The body, print(number), specifies what to do for each +value in the collection.
  • +
  • The loop variable, number, is what changes for each +iteration of the loop. +
      +
    • The “current thing”.
    • +
    +
  • +

The first line of the for loop must end with a colon, +and the body must be indented. +

+
+
    +
  • The colon at the end of the first line signals the start of a +block of statements.
  • +
  • Python uses indentation rather than {} or +begin/end to show nesting. +
      +
    • Any consistent indentation is legal, but almost everyone uses four +spaces.
    • +
    +
  • +
+
+

PYTHON +

+
for number in [2, 3, 5]:
+print(number)
+
+
+

ERROR +

+
IndentationError: expected an indented block
+
+
    +
  • Indentation is always meaningful in Python.
  • +
+
+

PYTHON +

+
firstName = "Jon"
+  lastName = "Smith"
+
+
+

ERROR +

+
  File "<ipython-input-7-f65f2962bf9c>", line 2
+    lastName = "Smith"
+    ^
+IndentationError: unexpected indent
+
+
    +
  • This error can be fixed by removing the extra spaces at the +beginning of the second line.
  • +

Loop variables can be called anything. +

+
+
    +
  • As with all variables, loop variables are: +
      +
    • Created on demand.
    • +
    • Meaningless: their names can be anything at all.
    • +
    +
  • +
+
+

PYTHON +

+
for kitten in [2, 3, 5]:
+    print(kitten)
+
+

The body of a loop can contain many statements. +

+
+
    +
  • But no loop should be more than a few lines long.
  • +
  • Hard for human beings to keep larger chunks of code in mind.
  • +
+
+

PYTHON +

+
primes = [2, 3, 5]
+for p in primes:
+    squared = p ** 2
+    cubed = p ** 3
+    print(p, squared, cubed)
+
+
+

OUTPUT +

+
2 4 8
+3 9 27
+5 25 125
+
+

Use range to iterate over a sequence of numbers. +

+
+
    +
  • The built-in function range +produces a sequence of numbers. +
      +
    • +Not a list: the numbers are produced on demand to make +looping over large ranges more efficient.
    • +
    +
  • +
  • +range(N) is the numbers 0..N-1 +
      +
    • Exactly the legal indices of a list or character string of length +N
    • +
    +
  • +
+
+

PYTHON +

+
print('a range is not a list: range(0, 3)')
+for number in range(0, 3):
+    print(number)
+
+
+

OUTPUT +

+
a range is not a list: range(0, 3)
+0
+1
+2
+
+

The Accumulator pattern turns many values into one. +

+
+
    +
  • A common pattern in programs is to: +
      +
    1. Initialize an accumulator variable to zero, the empty +string, or the empty list.
    2. +
    3. Update the variable with values from a collection.
    4. +
    +
  • +
+
+

PYTHON +

+
# Sum the first 10 integers.
+total = 0
+for number in range(10):
+   total = total + (number + 1)
+print(total)
+
+
+

OUTPUT +

+
55
+
+
    +
  • Read total = total + (number + 1) as: +
      +
    • Add 1 to the current value of the loop variable +number.
    • +
    • Add that to the current value of the accumulator variable +total.
    • +
    • Assign that to total, replacing the current value.
    • +
    +
  • +
  • We have to add number + 1 because range +produces 0..9, not 1..10.
  • +
+
+
+ +
+
+

Classifying Errors

+
+

Is an indentation error a syntax error or a runtime error?

+
+
+
+
+
+ +
+
+

An IndentationError is a syntax error. Programs with syntax errors +cannot be started. A program with a runtime error will start but an +error will be thrown under certain conditions.

+
+
+
+
+
+
+ +
+
+

Tracing Execution

+
+

Create a table showing the numbers of the lines that are executed +when this program runs, and the values of the variables after each line +is executed.

+
+

PYTHON +

+
total = 0
+for char in "tin":
+    total = total + 1
+
+
+
+
+
+
+ +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Line noVariables
1total = 0
2total = 0 char = ‘t’
3total = 1 char = ‘t’
2total = 1 char = ‘i’
3total = 2 char = ‘i’
2total = 2 char = ‘n’
3total = 3 char = ‘n’
+
+
+
+
+
+
+ +
+
+

Reversing a String

+
+

Fill in the blanks in the program below so that it prints “nit” (the +reverse of the original character string “tin”).

+
+

PYTHON +

+
original = "tin"
+result = ____
+for char in original:
+    result = ____
+print(result)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
original = "tin"
+result = ""
+for char in original:
+    result = char + result
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating

+
+

Fill in the blanks in each of the programs below to produce the +indicated result.

+
+

PYTHON +

+
# Total length of the strings in the list: ["red", "green", "blue"] => 12
+total = 0
+for word in ["red", "green", "blue"]:
+    ____ = ____ + len(word)
+print(total)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
total = 0
+for word in ["red", "green", "blue"]:
+    total = total + len(word)
+print(total)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+
+

PYTHON +

+
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
+lengths = ____
+for word in ["red", "green", "blue"]:
+    lengths.____(____)
+print(lengths)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
lengths = []
+for word in ["red", "green", "blue"]:
+    lengths.append(len(word))
+print(lengths)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+
+

PYTHON +

+
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
+words = ["red", "green", "blue"]
+result = ____
+for ____ in ____:
+    ____
+print(result)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
words = ["red", "green", "blue"]
+result = ""
+for word in words:
+    result = result + word
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+

Create an acronym: Starting from the list +["red", "green", "blue"], create the acronym +"RGB" using a for loop.

+

Hint: You may need to use a string method to +properly format the acronym.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
acronym = ""
+for word in ["red", "green", "blue"]:
+    acronym = acronym + word[0].upper()
+print(acronym)
+
+
+
+
+
+
+
+ +
+
+

Cumulative Sum

+
+

Reorder and properly indent the lines of code below so that they +print a list with the cumulative sum of data. The result should be +[1, 3, 5, 10].

+
+

PYTHON +

+
cumulative.append(total)
+for number in data:
+cumulative = []
+total = total + number
+total = 0
+print(cumulative)
+data = [1,2,2,5]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
total = 0
+data = [1,2,2,5]
+cumulative = []
+for number in data:
+    total = total + number
+    cumulative.append(total)
+print(cumulative)
+
+
+
+
+
+
+
+ +
+
+

Identifying Variable Name Errors

+
+
    +
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code and read the error message. What type of +NameError do you think this is? Is it a string with no +quotes, a misspelled variable, or a variable that should have been +defined but was not?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+
+

PYTHON +

+
for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (Number % 3) == 0:
+        message = message + a
+    else:
+        message = message + "b"
+print(message)
+
+
+
+
+
+
+ +
+
+
    +
  • Python variable names are case sensitive: number and +Number refer to different variables.
  • +
  • The variable message needs to be initialized as an +empty string.
  • +
  • We want to add the string "a" to message, +not the undefined variable a.
  • +
+
+

PYTHON +

+
message = ""
+for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (number % 3) == 0:
+        message = message + "a"
+    else:
+        message = message + "b"
+print(message)
+
+
+
+
+
+
+
+ +
+
+

Identifying Item Errors

+
+
    +
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
  5. Fix the error.
  6. +
+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+ +
+
+

This list has 4 elements and the index to access the last element in +the list is 3.

+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[3])
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • A for loop executes commands once for each value in a +collection.
  • +
  • A for loop is made up of a collection, a loop variable, +and a body.
  • +
  • The first line of the for loop must end with a colon, +and the body must be indented.
  • +
  • Indentation is always meaningful in Python.
  • +
  • Loop variables can be called anything (but it is strongly advised to +have a meaningful name to the looping variable).
  • +
  • The body of a loop can contain many statements.
  • +
  • Use range to iterate over a sequence of numbers.
  • +
  • The Accumulator pattern turns many values into one.
  • +
+
+
+
+

Content from Conditionals

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can programs do different things for different data?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Correctly write programs that use if and else statements and simple +Boolean expressions (without logical operators).
  • +
  • Trace the execution of unnested conditionals and conditionals inside +loops.
  • +
+
+
+
+
+
+

Use if statements to control whether or not a block of +code is executed. +

+
+
    +
  • An if statement (more properly called a +conditional statement) controls whether some block of code is +executed or not.
  • +
  • Structure is similar to a for statement: +
      +
    • First line opens with if and ends with a colon
    • +
    • Body containing one or more statements is indented (usually by 4 +spaces)
    • +
    +
  • +
+
+

PYTHON +

+
mass = 3.54
+if mass > 3.0:
+    print(mass, 'is large')
+
+mass = 2.07
+if mass > 3.0:
+    print (mass, 'is large')
+
+
+

OUTPUT +

+
3.54 is large
+
+

Conditionals are often used inside loops. +

+
+
    +
  • Not much point using a conditional when we know the value (as +above).
  • +
  • But useful when we have a collection to process.
  • +
+
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 3.0:
+        print(m, 'is large')
+
+
+

OUTPUT +

+
3.54 is large
+9.22 is large
+
+

Use else to execute a block of code when an +if condition is not true. +

+
+
    +
  • +else can be used following an if.
  • +
  • Allows us to specify an alternative to execute when the +if branch isn’t taken.
  • +
+
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 3.0:
+        print(m, 'is large')
+    else:
+        print(m, 'is small')
+
+
+

OUTPUT +

+
3.54 is large
+2.07 is small
+9.22 is large
+1.86 is small
+1.71 is small
+
+

Use elif to specify additional tests. +

+
+
    +
  • May want to provide several alternative choices, each with its own +test.
  • +
  • Use elif (short for “else if”) and a condition to +specify these.
  • +
  • Always associated with an if.
  • +
  • Must come before the else (which is the “catch +all”).
  • +
+
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 9.0:
+        print(m, 'is HUGE')
+    elif m > 3.0:
+        print(m, 'is large')
+    else:
+        print(m, 'is small')
+
+
+

OUTPUT +

+
3.54 is large
+2.07 is small
+9.22 is HUGE
+1.86 is small
+1.71 is small
+
+

Conditions are tested once, in order. +

+
+
    +
  • Python steps through the branches of the conditional in order, +testing each in turn.
  • +
  • So ordering matters.
  • +
+
+

PYTHON +

+
grade = 85
+if grade >= 90:
+    print('grade is A')
+elif grade >= 80:
+    print('grade is B')
+elif grade >= 70:
+    print('grade is C')
+
+
+

OUTPUT +

+
grade is B
+
+
    +
  • Does not automatically go back and re-evaluate if values +change.
  • +
+
+

PYTHON +

+
velocity = 10.0
+if velocity > 20.0:
+    print('moving too fast')
+else:
+    print('adjusting velocity')
+    velocity = 50.0
+
+
+

OUTPUT +

+
adjusting velocity
+
+
    +
  • Often use conditionals in a loop to “evolve” the values of +variables.
  • +
+
+

PYTHON +

+
velocity = 10.0
+for i in range(5): # execute the loop 5 times
+    print(i, ':', velocity)
+    if velocity > 20.0:
+        print('moving too fast')
+        velocity = velocity - 5.0
+    else:
+        print('moving too slow')
+        velocity = velocity + 10.0
+print('final velocity:', velocity)
+
+
+

OUTPUT +

+
0 : 10.0
+moving too slow
+1 : 20.0
+moving too slow
+2 : 30.0
+moving too fast
+3 : 25.0
+moving too fast
+4 : 20.0
+moving too slow
+final velocity: 30.0
+
+

Create a table showing variables’ values to trace a program’s +execution. +

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + +
+i + +0 + +. + +1 + +. + +2 + +. + +3 + +. + +4 + +. +
+velocity + +10.0 + +20.0 + +. + +30.0 + +. + +25.0 + +. + +20.0 + +. + +30.0 +
+
    +
  • The program must have a print statement +outside the body of the loop to show the final value of +velocity, since its value is updated by the last iteration +of the loop.
  • +
+
+
+ +
+
+

Compound Relations Using and, +or, and Parentheses

+
+

Often, you want some combination of things to be true. You can +combine relations within a conditional using and and +or. Continuing the example above, suppose you have

+
+

PYTHON +

+
mass     = [ 3.54,  2.07,  9.22,  1.86,  1.71]
+velocity = [10.00, 20.00, 30.00, 25.00, 20.00]
+
+i = 0
+for i in range(5):
+    if mass[i] > 5 and velocity[i] > 20:
+        print("Fast heavy object.  Duck!")
+    elif mass[i] > 2 and mass[i] <= 5 and velocity[i] <= 20:
+        print("Normal traffic")
+    elif mass[i] <= 2 and velocity[i] <= 20:
+        print("Slow light object.  Ignore it")
+    else:
+        print("Whoa!  Something is up with the data.  Check it")
+
+

Just like with arithmetic, you can and should use parentheses +whenever there is possible ambiguity. A good general rule is to +always use parentheses when mixing and and +or in the same condition. That is, instead of:

+
+

PYTHON +

+
if mass[i] <= 2 or mass[i] >= 5 and velocity[i] > 20:
+
+

write one of these:

+
+

PYTHON +

+
if (mass[i] <= 2 or mass[i] >= 5) and velocity[i] > 20:
+if mass[i] <= 2 or (mass[i] >= 5 and velocity[i] > 20):
+
+

so it is perfectly clear to a reader (and to Python) what you really +mean.

+
+
+
+
+
+ +
+
+

Tracing Execution

+
+

What does this program print?

+
+

PYTHON +

+
pressure = 71.9
+if pressure > 50.0:
+    pressure = 25.0
+elif pressure <= 50.0:
+    pressure = 0.0
+print(pressure)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
25.0
+
+
+
+
+
+
+
+ +
+
+

Trimming Values

+
+

Fill in the blanks so that this program creates a new list containing +zeroes where the original list’s values were negative and ones where the +original list’s values were positive.

+
+

PYTHON +

+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = ____
+for value in original:
+    if ____:
+        result.append(0)
+    else:
+        ____
+print(result)
+
+
+

OUTPUT +

+
[0, 1, 1, 1, 0, 1]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = []
+for value in original:
+    if value < 0.0:
+        result.append(0)
+    else:
+        result.append(1)
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Processing Small Files

+
+

Modify this program so that it only processes files with fewer than +50 records.

+
+

PYTHON +

+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+    contents = pd.read_csv(filename)
+    ____:
+        print(filename, len(contents))
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+    contents = pd.read_csv(filename)
+    if len(contents) < 50:
+        print(filename, len(contents))
+
+
+
+
+
+
+
+ +
+
+

Initializing

+
+

Modify this program so that it finds the largest and smallest values +in the list no matter what the range of values originally is.

+
+

PYTHON +

+
values = [...some test data...]
+smallest, largest = None, None
+for v in values:
+    if ____:
+        smallest, largest = v, v
+    ____:
+        smallest = min(____, v)
+        largest = max(____, v)
+print(smallest, largest)
+
+

What are the advantages and disadvantages of using this method to +find the range of the data?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest = None, None
+for v in values:
+    if smallest is None and largest is None:
+        smallest, largest = v, v
+    else:
+        smallest = min(smallest, v)
+        largest = max(largest, v)
+print(smallest, largest)
+
+

If you wrote == None instead of is None, +that works too, but Python programmers always write is None +because of the special way None works in the language.

+

It can be argued that an advantage of using this method would be to +make the code more readable. However, a disadvantage is that this code +is not efficient because within each iteration of the for +loop statement, there are two more loops that run over two numbers each +(the min and max functions). It would be more +efficient to iterate over each number just once:

+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest = None, None
+for v in values:
+    if smallest is None or v < smallest:
+        smallest = v
+    if largest is None or v > largest:
+        largest = v
+print(smallest, largest)
+
+

Now we have one loop, but four comparison tests. There are two ways +we could improve it further: either use fewer comparisons in each +iteration, or use two loops that each contain only one comparison test. +The simplest solution is often the best:

+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest = min(values)
+largest = max(values)
+print(smallest, largest)
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use if statements to control whether or not a block of +code is executed.
  • +
  • Conditionals are often used inside loops.
  • +
  • Use else to execute a block of code when an +if condition is not true.
  • +
  • Use elif to specify additional tests.
  • +
  • Conditions are tested once, in order.
  • +
  • Create a table showing variables’ values to trace a program’s +execution.
  • +
+
+
+
+

Content from Looping Over Data Sets

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I process many data sets with a single command?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Be able to read and write globbing expressions that match sets of +files.
  • +
  • Use glob to create lists of files.
  • +
  • Write for loops to perform operations on files given their names in +a list.
  • +
+
+
+
+
+
+

Use a for loop to process files given a list of their +names. +

+
+
    +
  • A filename is a character string.
  • +
  • And lists can contain character strings.
  • +
+
+

PYTHON +

+
import pandas as pd
+for filename in ['data/gapminder_gdp_africa.csv', 'data/gapminder_gdp_asia.csv']:
+    data = pd.read_csv(filename, index_col='country')
+    print(filename, data.min())
+
+
+

OUTPUT +

+
data/gapminder_gdp_africa.csv gdpPercap_1952    298.846212
+gdpPercap_1957    335.997115
+gdpPercap_1962    355.203227
+gdpPercap_1967    412.977514
+⋮ ⋮ ⋮
+gdpPercap_1997    312.188423
+gdpPercap_2002    241.165877
+gdpPercap_2007    277.551859
+dtype: float64
+data/gapminder_gdp_asia.csv gdpPercap_1952    331
+gdpPercap_1957    350
+gdpPercap_1962    388
+gdpPercap_1967    349
+⋮ ⋮ ⋮
+gdpPercap_1997    415
+gdpPercap_2002    611
+gdpPercap_2007    944
+dtype: float64
+
+

Use glob.glob +to find sets of files whose names match a pattern. +

+
+
    +
  • In Unix, the term “globbing” means “matching a set of files with a +pattern”.
  • +
  • The most common patterns are: +
      +
    • +* meaning “match zero or more characters”
    • +
    • +? meaning “match exactly one character”
    • +
    +
  • +
  • Python’s standard library contains the glob +module to provide pattern matching functionality
  • +
  • The glob +module contains a function also called glob to match file +patterns
  • +
  • E.g., glob.glob('*.txt') matches all files in the +current directory whose names end with .txt.
  • +
  • Result is a (possibly empty) list of character strings.
  • +
+
+

PYTHON +

+
import glob
+print('all csv files in data directory:', glob.glob('data/*.csv'))
+
+
+

OUTPUT +

+
all csv files in data directory: ['data/gapminder_all.csv', 'data/gapminder_gdp_africa.csv', \
+'data/gapminder_gdp_americas.csv', 'data/gapminder_gdp_asia.csv', 'data/gapminder_gdp_europe.csv', \
+'data/gapminder_gdp_oceania.csv']
+
+
+

PYTHON +

+
print('all PDB files:', glob.glob('*.pdb'))
+
+
+

OUTPUT +

+
all PDB files: []
+
+

Use glob and for to process batches of +files. +

+
+
    +
  • Helps a lot if the files are named and stored systematically and +consistently so that simple patterns will find the right data.
  • +
+
+

PYTHON +

+
for filename in glob.glob('data/gapminder_*.csv'):
+    data = pd.read_csv(filename)
+    print(filename, data['gdpPercap_1952'].min())
+
+
+

OUTPUT +

+
data/gapminder_all.csv 298.8462121
+data/gapminder_gdp_africa.csv 298.8462121
+data/gapminder_gdp_americas.csv 1397.717137
+data/gapminder_gdp_asia.csv 331.0
+data/gapminder_gdp_europe.csv 973.5331948
+data/gapminder_gdp_oceania.csv 10039.59564
+
+
    +
  • This includes all data, as well as per-region data.
  • +
  • Use a more specific pattern in the exercises to exclude the whole +data set.
  • +
  • But note that the minimum of the entire data set is also the minimum +of one of the data sets, which is a nice check on correctness.
  • +
+
+
+ +
+
+

Determining Matches

+
+

Which of these files is not matched by the expression +glob.glob('data/*as*.csv')?

+
    +
  1. data/gapminder_gdp_africa.csv
  2. +
  3. data/gapminder_gdp_americas.csv
  4. +
  5. data/gapminder_gdp_asia.csv
  6. +
+
+
+
+
+
+ +
+
+

1 is not matched by the glob.

+
+
+
+
+
+
+ +
+
+

Minimum File Size

+
+

Modify this program so that it prints the number of records in the +file that has the fewest records.

+
+

PYTHON +

+
import glob
+import pandas as pd
+fewest = ____
+for filename in glob.glob('data/*.csv'):
+    dataframe = pd.____(filename)
+    fewest = min(____, dataframe.shape[0])
+print('smallest file has', fewest, 'records')
+
+

Note that the DataFrame.shape() +method returns a tuple with the number of rows and columns of the +data frame.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import glob
+import pandas as pd
+fewest = float('Inf')
+for filename in glob.glob('data/*.csv'):
+    dataframe = pd.read_csv(filename)
+    fewest = min(fewest, dataframe.shape[0])
+print('smallest file has', fewest, 'records')
+
+

You might have chosen to initialize the fewest variable +with a number greater than the numbers you’re dealing with, but that +could lead to trouble if you reuse the code with bigger numbers. Python +lets you use positive infinity, which will work no matter how big your +numbers are. What other special strings does the float +function recognize?

+
+
+
+
+
+
+ +
+
+

Comparing Data

+
+

Write a program that reads in the regional data sets and plots the +average GDP per capita for each region over time in a single chart. +Pandas will raise an error if it encounters non-numeric columns in a +dataframe computation so you may need to either filter out those columns +or tell pandas to ignore them.

+
+
+
+
+
+ +
+
+

This solution builds a useful legend by using the string +split method to extract the region from +the path ‘data/gapminder_gdp_a_specific_region.csv’.

+
+

PYTHON +

+
import glob
+import pandas as pd
+import matplotlib.pyplot as plt
+fig, ax = plt.subplots(1,1)
+for filename in glob.glob('data/gapminder_gdp*.csv'):
+    dataframe = pd.read_csv(filename)
+    # extract <region> from the filename, expected to be in the format 'data/gapminder_gdp_<region>.csv'.
+    # we will split the string using the split method and `_` as our separator,
+    # retrieve the last string in the list that split returns (`<region>.csv`), 
+    # and then remove the `.csv` extension from that string.
+    # NOTE: the pathlib module covered in the next callout also offers
+    # convenient abstractions for working with filesystem paths and could solve this as well:
+    # from pathlib import Path
+    # region = Path(filename).stem.split('_')[-1]
+    region = filename.split('_')[-1][:-4] 
+    # pandas raises errors when it encounters non-numeric columns in a dataframe computation
+    # but we can tell pandas to ignore them with the `numeric_only` parameter
+    dataframe.mean(numeric_only=True).plot(ax=ax, label=region)
+    # NOTE: another way of doing this selects just the columns with gdp in their name using the filter method
+    # dataframe.filter(like="gdp").mean().plot(ax=ax, label=region)
+
+plt.legend()
+plt.show()
+
+
+
+
+
+
+
+ +
+
+

Dealing with File Paths

+
+

The pathlib +module provides useful abstractions for file and path manipulation +like returning the name of a file without the file extension. This is +very useful when looping over files and directories. In the example +below, we create a Path object and inspect its +attributes.

+
+

PYTHON +

+
from pathlib import Path
+
+p = Path("data/gapminder_gdp_africa.csv")
+print(p.parent)
+print(p.stem)
+print(p.suffix)
+
+
+

OUTPUT +

+
data
+gapminder_gdp_africa
+.csv
+
+

Hint: Check all available attributes and methods on +the Path object with the dir() function.

+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use a for loop to process files given a list of their +names.
  • +
  • Use glob.glob to find sets of files whose names match a +pattern.
  • +
  • Use glob and for to process batches of +files.
  • +
+
+
+
+

Content from Afternoon Coffee

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+

Reflection exercise +

+
+

Over break, reflect on and discuss the following:

+
    +
  • A common refrain in software engineering is “Don’t Repeat Yourself”. +How do the techniques we’ve learned in the last lessons help us avoid +repeating ourselves? Note that in practice there is some nuance to +this and should be balanced with doing the simplest thing that could +possibly work. +
  • +
  • What are the pros / cons of making a variable global or local to a +function?
  • +
  • When would you consider turning a block of code into a function +definition?
  • +

Content from Writing Functions

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I create my own functions?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain and identify the difference between function definition and +function call.
  • +
  • Write a function that takes a small, fixed number of arguments and +produces a single result.
  • +
+
+
+
+
+
+

Break programs down into functions to make them easier to +understand. +

+
+
    +
  • Human beings can only keep a few items in working memory at a +time.
  • +
  • Understand larger/more complicated ideas by understanding and +combining pieces. +
      +
    • Components in a machine.
    • +
    • Lemmas when proving theorems.
    • +
    +
  • +
  • Functions serve the same purpose in programs. +
      +
    • +Encapsulate complexity so that we can treat it as a single +“thing”.
    • +
    +
  • +
  • Also enables re-use. +
      +
    • Write one time, use many times.
    • +
    +
  • +

Define a function using def with a name, parameters, +and a block of code. +

+
+
    +
  • Begin the definition of a new function with def.
  • +
  • Followed by the name of the function. +
      +
    • Must obey the same rules as variable names.
    • +
    +
  • +
  • Then parameters in parentheses. +
      +
    • Empty parentheses if the function doesn’t take any inputs.
    • +
    • We will discuss this in detail in a moment.
    • +
    +
  • +
  • Then a colon.
  • +
  • Then an indented block of code.
  • +
+
+

PYTHON +

+
def print_greeting():
+    print('Hello!')
+    print('The weather is nice today.')
+    print('Right?')
+
+

Defining a function does not run it. +

+
+
    +
  • Defining a function does not run it. +
      +
    • Like assigning a value to a variable.
    • +
    +
  • +
  • Must call the function to execute the code it contains.
  • +
+
+

PYTHON +

+
print_greeting()
+
+
+

OUTPUT +

+
Hello!
+
+

Arguments in a function call are matched to its defined +parameters. +

+
+
    +
  • Functions are most useful when they can operate on different +data.
  • +
  • Specify parameters when defining a function. +
      +
    • These become variables when the function is executed.
    • +
    • Are assigned the arguments in the call (i.e., the values passed to +the function).
    • +
    • If you don’t name the arguments when using them in the call, the +arguments will be matched to parameters in the order the parameters are +defined in the function.
    • +
    +
  • +
+
+

PYTHON +

+
def print_date(year, month, day):
+    joined = str(year) + '/' + str(month) + '/' + str(day)
+    print(joined)
+
+print_date(1871, 3, 19)
+
+
+

OUTPUT +

+
1871/3/19
+
+

Or, we can name the arguments when we call the function, which allows +us to specify them in any order and adds clarity to the call site; +otherwise as one is reading the code they might forget if the second +argument is the month or the day for example.

+
+

PYTHON +

+
print_date(month=3, day=19, year=1871)
+
+
+

OUTPUT +

+
1871/3/19
+
+
    +
  • Via Twitter: +() contains the ingredients for the function while the body +contains the recipe.
  • +

Functions may return a result to their caller using +return. +

+
+
    +
  • Use return ... to give a value back to the caller.
  • +
  • May occur anywhere in the function.
  • +
  • But functions are easier to understand if return +occurs: +
      +
    • At the start to handle special cases.
    • +
    • At the very end, with a final result.
    • +
    +
  • +
+
+

PYTHON +

+
def average(values):
+    if len(values) == 0:
+        return None
+    return sum(values) / len(values)
+
+
+

PYTHON +

+
a = average([1, 3, 4])
+print('average of actual values:', a)
+
+
+

OUTPUT +

+
average of actual values: 2.6666666666666665
+
+
+

PYTHON +

+
print('average of empty list:', average([]))
+
+
+

OUTPUT +

+
average of empty list: None
+
+ +
+

PYTHON +

+
result = print_date(1871, 3, 19)
+print('result of call is:', result)
+
+
+

OUTPUT +

+
1871/3/19
+result of call is: None
+
+
+
+ +
+
+

Identifying Syntax Errors

+
+
    +
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code and read the error message. Is it a +SyntaxError or an IndentationError?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3 until you have fixed all the errors.
  8. +
+
+

PYTHON +

+
def another_function
+  print("Syntax errors are annoying.")
+   print("But at least python tells us about them!")
+  print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def another_function():
+  print("Syntax errors are annoying.")
+  print("But at least Python tells us about them!")
+  print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+
+ +
+
+

Definition and Use

+
+

What does the following program print?

+
+

PYTHON +

+
def report(pressure):
+    print('pressure is', pressure)
+
+print('calling', report, 22.5)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
calling <function report at 0x7fd128ff1bf8> 22.5
+
+

A function call always needs parenthesis, otherwise you get memory +address of the function object. So, if we wanted to call the function +named report, and give it the value 22.5 to report on, we could have our +function call as follows

+
+

PYTHON +

+
print("calling")
+report(22.5)
+
+
+

OUTPUT +

+
calling
+pressure is 22.5
+
+
+
+
+
+
+
+ +
+
+

Order of Operations

+
+
    +
  1. What’s wrong in this example?
  2. +
+
+

PYTHON +

+
result = print_time(11, 37, 59)
+
+def print_time(hour, minute, second):
+   time_string = str(hour) + ':' + str(minute) + ':' + str(second)
+   print(time_string)
+
+
    +
  1. After fixing the problem above, explain why running this example +code:
  2. +
+
+

PYTHON +

+
result = print_time(11, 37, 59)
+print('result of call is:', result)
+
+

gives this output:

+
+

OUTPUT +

+
11:37:59
+result of call is: None
+
+
    +
  1. Why is the result of the call None?
  2. +
+
+
+
+
+
+ +
+
+
    +
  1. The problem with the example is that the function +print_time() is defined after the call to the +function is made. Python doesn’t know how to resolve the name +print_time since it hasn’t been defined yet and will raise +a NameError e.g., +NameError: name 'print_time' is not defined

  2. +
  3. The first line of output 11:37:59 is printed by the +first line of code, result = print_time(11, 37, 59) that +binds the value returned by invoking print_time to the +variable result. The second line is from the second print +call to print the contents of the result variable.

  4. +
  5. print_time() does not explicitly return +a value, so it automatically returns None.

  6. +
+
+
+
+
+
+
+ +
+
+

Encapsulation

+
+

Fill in the blanks to create a function that takes a single filename +as an argument, loads the data in the file named by the argument, and +returns the minimum value in that data.

+
+

PYTHON +

+
import pandas as pd
+
+def min_in_data(____):
+    data = ____
+    return ____
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import pandas as pd
+
+def min_in_data(filename):
+    data = pd.read_csv(filename)
+    return data.min()
+
+
+
+
+
+
+
+ +
+
+

Find the First

+
+

Fill in the blanks to create a function that takes a list of numbers +as an argument and returns the first negative value in the list. What +does your function do if the list is empty? What if the list has no +negative numbers?

+
+

PYTHON +

+
def first_negative(values):
+    for v in ____:
+        if ____:
+            return ____
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def first_negative(values):
+    for v in values:
+        if v < 0:
+            return v
+
+

If an empty list or a list with all positive values is passed to this +function, it returns None:

+
+

PYTHON +

+
my_list = []
+print(first_negative(my_list))
+
+
+

OUTPUT +

+
None
+
+
+
+
+
+
+
+ +
+
+

Calling by Name

+
+

Earlier we saw this function:

+
+

PYTHON +

+
def print_date(year, month, day):
+    joined = str(year) + '/' + str(month) + '/' + str(day)
+    print(joined)
+
+

We saw that we can call the function using named arguments, +like this:

+
+

PYTHON +

+
print_date(day=1, month=2, year=2003)
+
+
    +
  1. What does print_date(day=1, month=2, year=2003) +print?
  2. +
  3. When have you seen a function call like this before?
  4. +
  5. When and why is it useful to call functions this way?
  6. +
+
+
+
+
+
+ +
+
+
    +
  1. 2003/2/1
  2. +
  3. We saw examples of using named arguments when working with +the pandas library. For example, when reading in a dataset using +data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country'), +the last argument index_col is a named argument.
  4. +
  5. Using named arguments can make code more readable since one can see +from the function call what name the different arguments have inside the +function. It can also reduce the chances of passing arguments in the +wrong order, since by using named arguments the order doesn’t +matter.
  6. +
+
+
+
+
+
+
+ +
+
+

Encapsulation of an If/Print Block

+
+

The code below will run on a label-printer for chicken eggs. A +digital scale will report a chicken egg mass (in grams) to the computer +and then the computer will print a label.

+
+

PYTHON +

+
import random
+for i in range(10):
+
+    # simulating the mass of a chicken egg
+    # the (random) mass will be 70 +/- 20 grams
+    mass = 70 + 20.0 * (2.0 * random.random() - 1.0)
+
+    print(mass)
+
+    # egg sizing machinery prints a label
+    if mass >= 85:
+        print("jumbo")
+    elif mass >= 70:
+        print("large")
+    elif mass < 70 and mass >= 55:
+        print("medium")
+    else:
+        print("small")
+
+

The if-block that classifies the eggs might be useful in other +situations, so to avoid repeating it, we could fold it into a function, +get_egg_label(). Revising the program to use the function +would give us this:

+
+

PYTHON +

+
# revised version
+import random
+for i in range(10):
+
+    # simulating the mass of a chicken egg
+    # the (random) mass will be 70 +/- 20 grams
+    mass = 70 + 20.0 * (2.0 * random.random() - 1.0)
+
+    print(mass, get_egg_label(mass))
+
+
    +
  1. Create a function definition for get_egg_label() that +will work with the revised program above. Note that the +get_egg_label() function’s return value will be important. +Sample output from the above program would be +71.23 large.
  2. +
  3. A dirty egg might have a mass of more than 90 grams, and a spoiled +or broken egg will probably have a mass that’s less than 50 grams. +Modify your get_egg_label() function to account for these +error conditions. Sample output could be +25 too light, probably spoiled.
  4. +
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def get_egg_label(mass):
+    # egg sizing machinery prints a label
+    egg_label = "Unlabelled"
+    if mass >= 90:
+        egg_label = "warning: egg might be dirty"
+    elif mass >= 85:
+        egg_label = "jumbo"
+    elif mass >= 70:
+        egg_label = "large"
+    elif mass < 70 and mass >= 55:
+        egg_label = "medium"
+    elif mass < 50:
+        egg_label = "too light, probably spoiled"
+    else:
+        egg_label = "small"
+    return egg_label
+
+
+
+
+
+
+
+ +
+
+

Encapsulating Data Analysis

+
+

Assume that the following code has been executed:

+
+

PYTHON +

+
import pandas as pd
+
+data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col=0)
+japan = data_asia.loc['Japan']
+
+
    +
  1. Complete the statements below to obtain the average GDP for Japan +across the years reported for the 1980s.
  2. +
+
+

PYTHON +

+
year = 1983
+gdp_decade = 'gdpPercap_' + str(year // ____)
+avg = (japan.loc[gdp_decade + ___] + japan.loc[gdp_decade + ___]) / 2
+
+
    +
  1. Abstract the code above into a single function.
  2. +
+
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_'+___+'.csv',delimiter=',',index_col=0)
+    ____
+    ____
+    ____
+    return avg
+
+
    +
  1. How would you generalize this function if you did not know +beforehand which specific years occurred as columns in the data? For +instance, what if we also had data from years ending in 1 and 9 for each +decade? (Hint: use the columns to filter out the ones that correspond to +the decade, instead of enumerating them in the code.)
  2. +
+
+
+
+
+
+ +
+
+
    +
  1. The average GDP for Japan across the years reported for the 1980s is +computed with:
  2. +
+
+

PYTHON +

+
year = 1983
+gdp_decade = 'gdpPercap_' + str(year // 10)
+avg = (japan.loc[gdp_decade + '2'] + japan.loc[gdp_decade + '7']) / 2
+
+
    +
  1. That code as a function is:
  2. +
+
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0)
+    c = data_countries.loc[country]
+    gdp_decade = 'gdpPercap_' + str(year // 10)
+    avg = (c.loc[gdp_decade + '2'] + c.loc[gdp_decade + '7'])/2
+    return avg
+
+
    +
  1. To obtain the average for the relevant years, we need to loop over +them:
  2. +
+
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0)
+    c = data_countries.loc[country]
+    gdp_decade = 'gdpPercap_' + str(year // 10)
+    total = 0.0
+    num_years = 0
+    for yr_header in c.index: # c's index contains reported years
+        if yr_header.startswith(gdp_decade):
+            total = total + c.loc[yr_header]
+            num_years = num_years + 1
+    return total/num_years
+
+

The function can now be called by:

+
+

PYTHON +

+
avg_gdp_in_decade('Japan','asia',1983)
+
+
+

OUTPUT +

+
20880.023800000003
+
+
+
+
+
+
+
+ +
+
+

Simulating a dynamical system

+
+

In mathematics, a dynamical +system is a system in which a function describes the time dependence +of a point in a geometrical space. A canonical example of a dynamical +system is the logistic map, a +growth model that computes a new population density (between 0 and 1) +based on the current density. In the model, time takes discrete values +0, 1, 2, …

+
    +
  1. Define a function called logistic_map that takes two +inputs: x, representing the current population (at time +t), and a parameter r = 1. This function +should return a value representing the state of the system (population) +at time t + 1, using the mapping function:
  2. +
+

f(t+1) = r * f(t) * [1 - f(t)]

+
    +
  1. Using a for or while loop, iterate the +logistic_map function defined in part 1, starting from an +initial population of 0.5, for a period of time +t_final = 10. Store the intermediate results in a list so +that after the loop terminates you have accumulated a sequence of values +representing the state of the logistic map at times +t = [0,1,...,t_final] (11 values in total). Print this list +to see the evolution of the population.

  2. +
  3. Encapsulate the logic of your loop into a function called +iterate that takes the initial population as its first +input, the parameter t_final as its second input and the +parameter r as its third input. The function should return +the list of values representing the state of the logistic map at times +t = [0,1,...,t_final]. Run this function for periods +t_final = 100 and 1000 and print some of the +values. Is the population trending toward a steady state?

  4. +
+
+
+
+
+
+ +
+
+
    +
  1. +

    PYTHON +

    +
    def logistic_map(x, r):
    +    return r * x * (1 - x)
    +
  2. +
  3. +

    PYTHON +

    +
    initial_population = 0.5
    +t_final = 10
    +r = 1.0
    +population = [initial_population]
    +
    +for t in range(t_final):
    +    population.append( logistic_map(population[t], r) )
    +
  4. +
  5. +
    +

    PYTHON +

    +
    def iterate(initial_population, t_final, r):
    +    population = [initial_population]
    +    for t in range(t_final):
    +        population.append( logistic_map(population[t], r) )
    +    return population
    +
    +for period in (10, 100, 1000):
    +    population = iterate(0.5, period, 1)
    +    print(population[-1])
    +
    +
    +

    OUTPUT +

    +
    0.06945089389714401
    +0.009395779870614648
    +0.0009913908614406382
    +
    +The population seems to be approaching zero.
  6. +
+
+
+
+
+
+
+ +
+
+

Using Functions With Conditionals in Pandas

+
+

Functions will often contain conditionals. Here is a short example +that will indicate which quartile the argument is in based on hand-coded +values for the quartile cut points.

+
+

PYTHON +

+
def calculate_life_quartile(exp):
+    if exp < 58.41:
+        # This observation is in the first quartile
+        return 1
+    elif exp >= 58.41 and exp < 67.05:
+        # This observation is in the second quartile
+       return 2
+    elif exp >= 67.05 and exp < 71.70:
+        # This observation is in the third quartile
+       return 3
+    elif exp >= 71.70:
+        # This observation is in the fourth quartile
+       return 4
+    else:
+        # This observation has bad data
+       return None
+
+calculate_life_quartile(62.5)
+
+
+

OUTPUT +

+
2
+
+

That function would typically be used within a for loop, +but Pandas has a different, more efficient way of doing the same thing, +and that is by applying a function to a dataframe or a portion +of a dataframe. Here is an example, using the definition above.

+
+

PYTHON +

+
data = pd.read_csv('data/gapminder_all.csv')
+data['life_qrtl'] = data['lifeExp_1952'].apply(calculate_life_quartile)
+
+

There is a lot in that second line, so let’s take it piece by piece. +On the right side of the = we start with +data['lifeExp'], which is the column in the dataframe +called data labeled lifExp. We use the +apply() to do what it says, apply the +calculate_life_quartile to the value of this column for +every row in the dataframe.

+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Break programs down into functions to make them easier to +understand.
  • +
  • Define a function using def with a name, parameters, +and a block of code.
  • +
  • Defining a function does not run it.
  • +
  • Arguments in a function call are matched to its defined +parameters.
  • +
  • Functions may return a result to their caller using +return.
  • +
+
+
+
+

Content from Variable Scope

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How do function calls actually work?
  • +
  • How can I determine where errors occurred?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Identify local and global variables.
  • +
  • Identify parameters as local variables.
  • +
  • Read a traceback and determine the file, function, and line number +on which the error occurred, the type of error, and the error +message.
  • +
+
+
+
+
+
+

The scope of a variable is the part of a program that can ‘see’ that +variable. +

+
+
    +
  • There are only so many sensible names for variables.
  • +
  • People using functions shouldn’t have to worry about what variable +names the author of the function used.
  • +
  • People writing functions shouldn’t have to worry about what variable +names the function’s caller uses.
  • +
  • The part of a program in which a variable is visible is called its +scope.
  • +
+
+

PYTHON +

+
pressure = 103.9
+
+def adjust(t):
+    temperature = t * 1.43 / pressure
+    return temperature
+
+
    +
  • +pressure is a global variable. +
      +
    • Defined outside any particular function.
    • +
    • Visible everywhere.
    • +
    +
  • +
  • +t and temperature are local +variables in adjust. +
      +
    • Defined in the function.
    • +
    • Not visible in the main program.
    • +
    • Remember: a function parameter is a variable that is automatically +assigned a value when the function is called.
    • +
    +
  • +
+
+

PYTHON +

+
print('adjusted:', adjust(0.9))
+print('temperature after call:', temperature)
+
+
+

OUTPUT +

+
adjusted: 0.01238691049085659
+
+
+

ERROR +

+
Traceback (most recent call last):
+  File "/Users/swcarpentry/foo.py", line 8, in <module>
+    print('temperature after call:', temperature)
+NameError: name 'temperature' is not defined
+
+
+
+ +
+
+

Local and Global Variable Use

+
+

Trace the values of all variables in this program as it is executed. +(Use ‘—’ as the value of variables before and after they exist.)

+
+

PYTHON +

+
limit = 100
+
+def clip(value):
+    return min(max(0.0, value), limit)
+
+value = -22.5
+print(clip(value))
+
+
+
+
+
+
+ +
+
+

Reading Error Messages

+
+

Read the traceback below, and identify the following:

+
    +
  1. How many levels does the traceback have?
  2. +
  3. What is the file name where the error occurred?
  4. +
  5. What is the function name where the error occurred?
  6. +
  7. On which line number in this function did the error occur?
  8. +
  9. What is the type of error?
  10. +
  11. What is the error message?
  12. +
+
+

ERROR +

+
---------------------------------------------------------------------------
+KeyError                                  Traceback (most recent call last)
+<ipython-input-2-e4c4cbafeeb5> in <module>()
+      1 import errors_02
+----> 2 errors_02.print_friday_message()
+
+/Users/ghopper/thesis/code/errors_02.py in print_friday_message()
+     13
+     14 def print_friday_message():
+---> 15     print_message("Friday")
+
+/Users/ghopper/thesis/code/errors_02.py in print_message(day)
+      9         "sunday": "Aw, the weekend is almost over."
+     10     }
+---> 11     print(messages[day])
+     12
+     13
+
+KeyError: 'Friday'
+
+
+
+
+
+
+ +
+
+
    +
  1. Three levels.
  2. +
  3. errors_02.py
  4. +
  5. print_message
  6. +
  7. Line 11
  8. +
  9. +KeyError. These errors occur when we are trying to look +up a key that does not exist (usually in a data structure such as a +dictionary). We can find more information about the +KeyError and other built-in exceptions in the Python +docs.
  10. +
  11. KeyError: 'Friday'
  12. +
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • The scope of a variable is the part of a program that can ‘see’ that +variable.
  • +
+
+
+
+

Content from Programming Style

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I make my programs more readable?
  • +
  • How do most programmers format their code?
  • +
  • How can programs check their own operation?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Provide sound justifications for basic rules of coding style.
  • +
  • Refactor one-page programs to make them more readable and justify +the changes.
  • +
  • Use Python community coding standards (PEP-8).
  • +
+
+
+
+
+
+

Coding style +

+
+

A consistent coding style helps others (including our future selves) +read and understand code more easily. Code is read much more often than +it is written, and as the Zen of Python +states, “Readability counts”. Python proposed a standard style through +one of its first Python Enhancement Proposals (PEP), PEP8.

+

Some points worth highlighting:

+
    +
  • document your code and ensure that assumptions, internal algorithms, +expected inputs, expected outputs, etc., are clear
  • +
  • use clear, semantically meaningful variable names
  • +
  • use white-space, not tabs, to indent lines (tabs can cause +problems across different text editors, operating systems, and version +control systems)
  • +

Follow standard Python style in your code. +

+
+
    +
  • +PEP8: a style +guide for Python that discusses topics such as how to name variables, +how to indent your code, how to structure your import +statements, etc. Adhering to PEP8 makes it easier for other Python +developers to read and understand your code, and to understand what +their contributions should look like.
  • +
  • To check your code for compliance with PEP8, you can use the pycodestyle application +and tools like the black code +formatter can automatically format your code to conform to PEP8 and +pycodestyle (a Jupyter notebook formatter also exists nb_black).
  • +
  • Some groups and organizations follow different style guidelines +besides PEP8. For example, the Google style +guide on Python makes slightly different recommendations. Google +wrote an application that can help you format your code in either their +style or PEP8 called yapf.
  • +
  • With respect to coding style, the key is consistency. +Choose a style for your project be it PEP8, the Google style, or +something else and do your best to ensure that you and anyone else you +are collaborating with sticks to it. Consistency within a project is +often more impactful than the particular style used. A consistent style +will make your software easier to read and understand for others and for +your future self.
  • +

Use assertions to check for internal errors. +

+
+

Assertions are a simple but powerful method for making sure that the +context in which your code is executing is as you expect.

+
+

PYTHON +

+
def calc_bulk_density(mass, volume):
+    '''Return dry bulk density = powder mass / powder volume.'''
+    assert volume > 0
+    return mass / volume
+
+

If the assertion is False, the Python interpreter raises +an AssertionError runtime exception. The source code for +the expression that failed will be displayed as part of the error +message. To ignore assertions in your code run the interpreter with the +‘-O’ (optimize) switch. Assertions should contain only simple checks and +never change the state of the program. For example, an assertion should +never contain an assignment.

+

Use docstrings to provide builtin help. +

+
+

If the first thing in a function is a character string that is not +assigned directly to a variable, Python attaches it to the function, +accessible via the builtin help function. This string that provides +documentation is also known as a docstring.

+
+

PYTHON +

+
def average(values):
+    "Return average of values, or None if no values are supplied."
+
+    if len(values) == 0:
+        return None
+    return sum(values) / len(values)
+
+help(average)
+
+
+

OUTPUT +

+
Help on function average in module __main__:
+
+average(values)
+    Return average of values, or None if no values are supplied.
+
+
+
+ +
+
+

Multiline Strings

+
+

Often use multiline strings for documentation. These start +and end with three quote characters (either single or double) and end +with three matching characters.

+
+

PYTHON +

+
"""This string spans
+multiple lines.
+
+Blank lines are allowed."""
+
+
+
+
+
+
+ +
+
+

What Will Be Shown?

+
+

Highlight the lines in the code below that will be available as +online help. Are there lines that should be made available, but won’t +be? Will any lines produce a syntax error or a runtime error?

+
+

PYTHON +

+
"Find maximum edit distance between multiple sequences."
+# This finds the maximum distance between all sequences.
+
+def overall_max(sequences):
+    '''Determine overall maximum edit distance.'''
+
+    highest = 0
+    for left in sequences:
+        for right in sequences:
+            '''Avoid checking sequence against itself.'''
+            if left != right:
+                this = edit_distance(left, right)
+                highest = max(highest, this)
+
+    # Report.
+    return highest
+
+
+
+
+
+
+ +
+
+

Document This

+
+

Use comments to describe and help others understand potentially +unintuitive sections or individual lines of code. They are especially +useful to whoever may need to understand and edit your code in the +future, including yourself.

+

Use docstrings to document the acceptable inputs and expected outputs +of a method or class, its purpose, assumptions and intended behavior. +Docstrings are displayed when a user invokes the builtin +help method on your method or class.

+

Turn the comment in the following function into a docstring and check +that help displays it properly.

+
+

PYTHON +

+
def middle(a, b, c):
+    # Return the middle value of three.
+    # Assumes the values can actually be compared.
+    values = [a, b, c]
+    values.sort()
+    return values[1]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def middle(a, b, c):
+    '''Return the middle value of three.
+    Assumes the values can actually be compared.'''
+    values = [a, b, c]
+    values.sort()
+    return values[1]
+
+
+
+
+
+
+
+ +
+
+

Clean Up This Code

+
+
    +
  1. Read this short program and try to predict what it does.
  2. +
  3. Run it: how accurate was your prediction?
  4. +
  5. Refactor the program to make it more readable. Remember to run it +after each change to ensure its behavior hasn’t changed.
  6. +
  7. Compare your rewrite with your neighbor’s. What did you do the same? +What did you do differently, and why?
  8. +
+
+

PYTHON +

+
n = 10
+s = 'et cetera'
+print(s)
+i = 0
+while i < n:
+    # print('at', j)
+    new = ''
+    for j in range(len(s)):
+        left = j-1
+        right = (j+1)%len(s)
+        if s[left]==s[right]: new = new + '-'
+        else: new = new + '*'
+    s=''.join(new)
+    print(s)
+    i += 1
+
+
+
+
+
+
+ +
+
+

Here’s one solution.

+
+

PYTHON +

+
def string_machine(input_string, iterations):
+    """
+    Takes input_string and generates a new string with -'s and *'s
+    corresponding to characters that have identical adjacent characters
+    or not, respectively.  Iterates through this procedure with the resultant
+    strings for the supplied number of iterations.
+    """
+    print(input_string)
+    input_string_length = len(input_string)
+    old = input_string
+    for i in range(iterations):
+        new = ''
+        # iterate through characters in previous string
+        for j in range(input_string_length):
+            left = j-1
+            right = (j+1) % input_string_length  # ensure right index wraps around
+            if old[left] == old[right]:
+                new = new + '-'
+            else:
+                new = new + '*'
+        print(new)
+        # store new string as old
+        old = new     
+
+string_machine('et cetera', 10)
+
+
+

OUTPUT +

+
et cetera
+*****-***
+----*-*--
+---*---*-
+--*-*-*-*
+**-------
+***-----*
+--**---**
+*****-***
+----*-*--
+---*---*-
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Follow standard Python style in your code.
  • +
  • Use docstrings to provide builtin help.
  • +
+
+
+
+

Content from Wrap-Up

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • What have we learned?
  • +
  • What else is out there and where do I find it?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Name and locate scientific Python community sites for software, +workshops, and help.
  • +
+
+
+
+
+
+

Leslie Lamport once said, “Writing is nature’s way of showing you how +sloppy your thinking is.” The same is true of programming: many things +that seem obvious when we’re thinking about them turn out to be anything +but when we have to explain them precisely.

+

Python supports a large and diverse community across academia and +industry. +

+
+ +
+
+ +
+
+

Key Points

+
+
    +
  • Python supports a large and diverse community across academia and +industry.
  • +
+
+
+
+

Content from Feedback

+
+

Last updated on 2024-10-18 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How did the class go?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Gather feedback on the class
  • +
+
+
+
+
+
+

Gather feedback from participants.

+
+
+ +
+
+

Key Points

+
+
    +
  • We are constantly seeking to improve this course.
  • +
+
+
+
+
+
+
+ + +
+ + +
+
+ +
Back To Top +
+
+ + + + diff --git a/android-chrome-192x192.png b/android-chrome-192x192.png new file mode 100644 index 000000000..ed3c210ab Binary files /dev/null and b/android-chrome-192x192.png differ diff --git a/android-chrome-512x512.png b/android-chrome-512x512.png new file mode 100644 index 000000000..c88d96c1c Binary files /dev/null and b/android-chrome-512x512.png differ diff --git a/apple-touch-icon.png b/apple-touch-icon.png new file mode 100644 index 000000000..8044feefd Binary files /dev/null and b/apple-touch-icon.png differ diff --git a/assets/fonts/Mulish-Bold.ttf b/assets/fonts/Mulish-Bold.ttf new file mode 100644 index 000000000..1f522d476 Binary files /dev/null and b/assets/fonts/Mulish-Bold.ttf differ diff --git a/assets/fonts/Mulish-Bold.woff b/assets/fonts/Mulish-Bold.woff new file mode 100644 index 000000000..711448ea9 Binary files /dev/null and b/assets/fonts/Mulish-Bold.woff differ diff --git a/assets/fonts/Mulish-ExtraBold.ttf b/assets/fonts/Mulish-ExtraBold.ttf new file mode 100644 index 000000000..62850fff3 Binary files /dev/null and b/assets/fonts/Mulish-ExtraBold.ttf differ diff --git a/assets/fonts/mulish-v5-latin-regular.eot b/assets/fonts/mulish-v5-latin-regular.eot new file mode 100644 index 000000000..423bcb17a Binary files /dev/null and b/assets/fonts/mulish-v5-latin-regular.eot differ diff --git a/assets/fonts/mulish-v5-latin-regular.svg b/assets/fonts/mulish-v5-latin-regular.svg new file mode 100644 index 000000000..70341f98b --- /dev/null +++ b/assets/fonts/mulish-v5-latin-regular.svg @@ -0,0 +1,305 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/fonts/mulish-v5-latin-regular.ttf b/assets/fonts/mulish-v5-latin-regular.ttf new file mode 100644 index 000000000..541bb406e Binary files /dev/null and b/assets/fonts/mulish-v5-latin-regular.ttf differ diff --git a/assets/fonts/mulish-v5-latin-regular.woff b/assets/fonts/mulish-v5-latin-regular.woff new file mode 100644 index 000000000..700ec13f5 Binary files /dev/null and b/assets/fonts/mulish-v5-latin-regular.woff differ diff --git a/assets/fonts/mulish-v5-latin-regular.woff2 b/assets/fonts/mulish-v5-latin-regular.woff2 new file mode 100644 index 000000000..b244298bf Binary files /dev/null and b/assets/fonts/mulish-v5-latin-regular.woff2 differ diff --git a/assets/fonts/mulish-variablefont_wght.woff b/assets/fonts/mulish-variablefont_wght.woff new file mode 100644 index 000000000..fc425383a Binary files /dev/null and b/assets/fonts/mulish-variablefont_wght.woff differ diff --git a/assets/fonts/mulish-variablefont_wght.woff2 b/assets/fonts/mulish-variablefont_wght.woff2 new file mode 100644 index 000000000..8a233c6f9 Binary files /dev/null and b/assets/fonts/mulish-variablefont_wght.woff2 differ diff --git a/assets/images/carpentries-logo-sm.svg b/assets/images/carpentries-logo-sm.svg new file mode 100644 index 000000000..da70d40ee --- /dev/null +++ b/assets/images/carpentries-logo-sm.svg @@ -0,0 +1,7 @@ + + + + + + + \ No newline at end of file diff --git a/assets/images/carpentries-logo.svg b/assets/images/carpentries-logo.svg new file mode 100644 index 000000000..6cbe66500 --- /dev/null +++ b/assets/images/carpentries-logo.svg @@ -0,0 +1,19 @@ + + + + + + + + + + + + + + + + + + + diff --git a/assets/images/data-logo-sm.svg b/assets/images/data-logo-sm.svg new file mode 100644 index 000000000..6d4019ed5 --- /dev/null +++ b/assets/images/data-logo-sm.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/assets/images/data-logo.svg b/assets/images/data-logo.svg new file mode 100644 index 000000000..c5949528e --- /dev/null +++ b/assets/images/data-logo.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/assets/images/dropdown-arrow.svg b/assets/images/dropdown-arrow.svg new file mode 100644 index 000000000..a12b04b34 --- /dev/null +++ b/assets/images/dropdown-arrow.svg @@ -0,0 +1,12 @@ + + + + +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Lesson Design

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + + +
+
+ +
+
+

Help Wanted

+
+

We are filling in the exercises below in order to make the lesson plan +more concrete. Contributions (both in the form of pull requests with +filled-in exercises, and comments on specific exercises, ordering, and +timings) are greatly appreciated.

+
+
+
+

Process Used

+
+

Michael Pollan’s advice if he taught R or Python programming:

+
  1. Write code.
  2. +
  3. Not too much.
  4. +
  5. Mostly plots.
  6. +

Michael +Koontz {: .quotation}

+
+

This lesson was developed using a slimmed-down variant of the +“Understanding by Design” process. The main sections are:

+
  1. Assumptions about audience, time, etc. (The current draft also +includes some conclusions and decisions in this section - that should be +refactored.)

  2. +
  3. Desired results: overall goals, summative assessments at half-day +granularity, what learners will be able to do, what learners will +know.

  4. +
  5. Learning plan: each episode has a heading that summarizes what +will be covered, then estimates time that will be spent on teaching and +on exercises, while the exercises are given as bullet points.

  6. +

Stage 1: Assumptions

+
  • Audience +
    • Graduate students in numerate disciplines from cosmology to +archaeology
    • +
    • Who have manipulated data in spreadsheets and with interactive tools +like SAS
    • +
    • But have not programmed beyond CPD +(copy-paste-despair)
    • +
  • +
  • Constraints +
    • One full day 09:00-16:30 +
      • 06:15 class time
      • +
      • 0:45 lunch
      • +
      • 0:30 total for two coffee breaks
      • +
    • +
    • Learners use native installs on their own machines +
      • May use VMs or cloud resources at instructor’s discretion
      • +
      • But must keep native local install as an option
      • +
    • +
    • No dependence on other Carpentry modules +
      • In particular, does not require knowledge of shell or version +control
      • +
    • +
    • Use the Jupyter Notebook +
      • Authentic tool used by many instructors
      • +
      • There isn’t really an alternative
      • +
      • And means that even people who have seen a bit of Python before will +probably learn something
      • +
    • +
  • +
  • Motivating Example +
    • Creating 2D plots suitable for inclusion in papers
    • +
    • Appeals to almost everyone
    • +
    • Makes lesson usable by both Carpentries +
      • And means that even people who have seen a bit of Python before will +probably learn something
      • +
    • +
  • +
  • Data +
    • Use the gapminder data throughout
    • +
    • But break into multiple files by continent +
      • To make display of output from examples tidier (e.g., use +Australia/New Zealand, which is only two lines)
      • +
      • And allow examples showing use of multiple data sets
      • +
    • +
  • +
  • Focus on Pandas instead of NumPy +
    • Makes lesson usable by both Data Carpentry and Software +Carpentry
    • +
    • Genuine novices are likely to want data analysis
    • +
    • And people with some prior experience: +
      • will accept data analysis as an authentic task,
      • +
      • and are unlikely to have encountered Pandas, so they’ll still get +something useful out of the lesson
      • +
    • +
  • +
  • Challenges will mostly not be “write this code from +scratch” +
    • Want lots of short exercises that can reliably be finished in +allotted time
    • +
    • So use MCQs, fill-in-the-blanks, Parsons Problems, “tweak this +code”, etc.
    • +
  • +

Stage 2: Desired Results

+
+

Questions

+

How do I…

+
  • …read tabular data?
  • +
  • …plot a single vector of values?
  • +
  • …create a time series plot?
  • +
  • …create one plot for each of several data sets?
  • +
  • …get extra data from a single data set for plotting?
  • +
  • …write programs I can read and re-use in future?
  • +
+
+

Skills

+

I can…

+
  • …write short scripts using loops and conditionals.
  • +
  • …write functions with a fixed number of parameters that return a +single result.
  • +
  • …import libraries using aliases and refer to those libraries’ +contents.
  • +
  • …do simple data extraction and formatting using Pandas.
  • +
+
+

Concepts

+

I know…

+
  • …that a program is a piece of lab equipment that implements an +analysis +
    • Needs to be validated/calibrated before/during use
    • +
    • Makes analysis reproducible, reviewable, shareable
    • +
  • +
  • …that programs are written for people, not for computers +
    • Meaningful variable names
    • +
    • Modularity for readability as well as re-use
    • +
    • No duplication
    • +
    • Document purpose and use
    • +
  • +
  • …that there is no magic: the programs they use are no different in +principle from those they build
  • +
  • …how to assign values to variables
  • +
  • …what integers, floats, strings, NumPy arrays, and Pandas dataframes +are
  • +
  • …how to trace the execution of a for loop
  • +
  • …how to trace the execution of if/else +statements
  • +
  • …how to create and index lists
  • +
  • …how to create and index NumPy arrays
  • +
  • …how to create and index Pandas dataframes
  • +
  • …how to create time series plots
  • +
  • …the difference between defining and calling a function
  • +
  • …where to find documentation on standard libraries
  • +
  • …how to find out what else scientific Python offers
  • +
+

Stage 3: Learning Plan

+
+

Summative Assessment

+
  • Midpoint: create time-series plot for each file in a directory.
  • +
  • Final: extract data from Pandas dataframe and create comparative +multi-line time series plot.
  • +
+
+

+Running and Quitting Interactively +(9:00)

+
  • Teaching: 15 min (because setup issues) +
    • Launch the Jupyter Notebook, create new notebooks, and exit the +Notebook.
    • +
    • Create Markdown cells in a notebook.
    • +
    • Create and run Python cells in a notebook.
    • +
  • +
  • Challenges: 0 min (accounted for in teaching time - no separate +exercise) +
    • Creating lists in Markdown
    • +
    • What is displayed when several expressions are put in a single +cell?
    • +
    • Change an existing cell from code to Markdown
    • +
    • Rendering LaTeX-style equations
    • +
  • +
+
+

+Variables and Assignment (9:15)

+
  • Teaching: 10 min +
    • Write programs that assign scalar values to variables and perform +calculations with those values.
    • +
    • Correctly trace value changes in programs that use scalar +assignment.
    • +
  • +
  • Challenges: 10 min +
    • Trace execution of code swapping two values using an intermediate +variable.
    • +
    • Predict final values of variables after several assignments.
    • +
    • What happens if you try to index a number?
    • +
    • Which is a better variable name, m, min, +or minutes?
    • +
    • What do the following slice expressions produce?
    • +
  • +
+
+

+Data Types and Type +Conversion (09:35)

+
  • Teaching: 10 min +
    • Explain key differences between integers and floating point +numbers.
    • +
    • Explain key differences between numbers and character strings.
    • +
    • Use built-in functions to convert between integers, floating point +numbers, and strings.
    • +
  • +
  • Challenges: 10 min +
    • What type of value is 3.4?
    • +
    • What type of value is 3.25 + 4?
    • +
    • What type of value would you use to represent: +
      • Number of days since the start of the year.
      • +
      • Time elapsed since the start of the year.
      • +
      • Etc.
      • +
    • +
    • How can you use // (integer division) and +% (modulo)?
    • +
    • What does int("3.4") do?
    • +
    • Given these float, int, and string values, which expressions will +print a particular result?
    • +
    • What do you expect 1+2j + 3 to produce?
    • +
  • +
+
+

+Built-in Functions and Help +(09:55)

+
  • Teaching: 15 min +
    • Explain the purpose of functions.
    • +
    • Correctly call built-in Python functions.
    • +
    • Correctly nest calls to built-in functions.
    • +
    • Use help to display documentation for built-in functions.
    • +
    • Correctly describe situations in which SyntaxError and NameError +occur.
    • +
  • +
  • Challenges: 10 min +
    • Explain the order of operations in the following complex +expression.
    • +
    • What will each nested combination of min and +max calls produce?
    • +
    • Why don’t max and min return +None when given no arguments?
    • +
    • Given what we have seen so far, what index expression will get the +last character in a string?
    • +
  • +
+
+

+Coffee: 15 min (10:20)

+
+
+

+Libraries (10:35)

+
  • Teaching: 10 min +
    • Explain what software libraries are and why programmers create and +use them.
    • +
    • Write programs that import and use libraries from Python’s standard +library.
    • +
    • Find and read documentation for standard libraries interactively (in +the interpreter) and online.
    • +
  • +
  • Challenges: 10 min +
    • Which function from the standard math library could you use to +calculate a square root?
    • +
    • What library would you use to select a random value from data?
    • +
    • If help(math) produces an error, what have you +forgotten to do?
    • +
    • Fill in the blanks in code below so that the import statement and +program run.
    • +
  • +
+
+

+Reading Tabular Data +(10:55)

+
  • Teaching: 10 min +
    • Import the Pandas library.
    • +
    • Use Pandas to load a simple CSV data set.
    • +
    • Get some basic information about a Pandas DataFrame.
    • +
  • +
  • Challenges: 10 min +
    • Read the data for the Americas and display its summary +statistics.
    • +
    • What do .head and .tail do?
    • +
    • What string(s) should you pass to read_csv to read +files from other directories?
    • +
    • How can you write CSV data?
    • +
  • +
+
+

+DataFrames (11:15)

+
  • Teaching: 15 min +
    • Select individual values from a Pandas dataframe.
    • +
    • Select entire rows or entire columns from a dataframe.
    • +
    • Select a subset of both rows and columns from a dataframe in a +single operation.
    • +
    • Select a subset of a dataframe by a single Boolean criterion.
    • +
  • +
  • Challenges: 15 min +
    • Write an expression to find the Per Capita GDP of Serbia in +2007.
    • +
    • What rule governs what is (or isn’t) included in numerical and named +slices in Pandas?
    • +
    • What does each line in the following short program do?
    • +
    • What do idxmin and idxmax do?
    • +
    • Write expressions to get the GDP per capita for all countries in +1982, for all countries after 1985, etc.
    • +
    • Given the way its borders have changed since 1900, what would you do +if asked to create a table of GDP per capita for Poland for the +Twentieth Century?
    • +
  • +
+
+

+Plotting (11:45)

+
  • Teaching: 15 min +
    • Create a time series plot showing a single data set.
    • +
    • Create a scatter plot showing relationship between two data +sets.
    • +
  • +
  • Exercise: 15 min +
    • Fill in the blanks to plot the minimum GDP per capita over time for +European countries.
    • +
    • Modify the example to create a scatter plot of GDP per capita in +Asian countries.
    • +
    • Explain what each argument to plot does in the +following example.
    • +
  • +
+
+

+Lunch (12:15): 45 min

+
+
+

+Lists (13:00)

+
  • Teaching: 10 min +
    • Explain why programs need collections of values.
    • +
    • Write programs that create flat lists, index them, slice them, and +modify them through assignment and method calls.
    • +
  • +
  • Challenges: 10 min +
    • Fill in the blanks so that the program produces the output +shown.
    • +
    • How large are the following slices?
    • +
    • What do negative index expressions print?
    • +
    • What does a “stride” in a slice do?
    • +
    • How do slices treat out-of-range bounds?
    • +
    • What are the differences between sorting these two ways?
    • +
    • What is the difference between new = old and +new = old[:]?
    • +
  • +
+
+

+Loops (13:20)

+
  • Teaching: 10 min +
    • Explain what for loops are normally used for.
    • +
    • Trace the execution of a simple (unnested) loop and correctly state +the values of variables in each iteration.
    • +
    • Write for loops that use the Accumulator pattern to aggregate +values.
    • +
  • +
  • Challenges: 15 min +
    • Is an indentation error a syntax error or a runtime error?
    • +
    • Trace which lines of this program are executed in what order.
    • +
    • Fill in the blanks in this program so that it reverses a +string.
    • +
    • Fill in the blanks in this series of examples to get practice +accumulating values.
    • +
    • Reorder and indent these lines to calculate the cumulative sum of +the list values.
    • +
  • +
+
+

+Looping Over Data Sets +(13:45)

+
  • Teaching: 5 min +
    • Be able to read and write globbing expressions that match sets of +files.
    • +
    • Use glob to create lists of files.
    • +
    • Write for loops to perform operations on files given their names in +a list.
    • +
  • +
  • Challenges: 10 min +
    • Which filenames are not matched by this glob +expression?
    • +
    • Modify this program so that it prints the number of records in the +shortest file.
    • +
    • Write a program that reads and plots all of the regional data +sets.
    • +
  • +
+
+

+Writing Functions (14:00)

+
  • Teaching: 10 min +
    • Explain and identify the difference between function definition and +function call.
    • +
    • Write a function that takes a small, fixed number of arguments and +produces a single result.
    • +
  • +
  • Challenges: 15 min +
    • This code defines and calls a function - what does it print when +run?
    • +
    • Explain why this short program prints things in the order it +does.
    • +
    • Fill in the blanks to create a function that finds the minimum value +in a data file.
    • +
    • Fill in the blanks to create a function that finds the first +negative value in a list. What does your function do if the list is +empty?
    • +
    • Why is it sometimes useful to pass arguments by naming the +corresponding parameters?
    • +
    • Fill in the blanks and turn this short piece of code into a +function.
    • +
  • +
+
+

+Variable Scope (14:25)

+
  • Teaching: 10 min +
    • Identify local and global variables.
    • +
    • Identify parameters as local variables.
    • +
    • Read a traceback and determine the file, function, and line number +on which the error occurred.
    • +
  • +
  • Challenges: 10 min +
    • Trace the changes to the values in this program, being careful to +distinguish local from global values.
    • +
  • +
+
+

+Coffee (14:45): 15 min

+
+
+

+Conditionals (15:00)

+
  • Teaching: 10 min +
    • Correctly write programs that use if and else statements and simple +Boolean expressions (without logical operators).
    • +
    • Trace the execution of unnested conditionals and conditionals inside +loops.
    • +
  • +
  • Challenges: 15 min +
    • Trace the execution of this conditional statement.
    • +
    • Fill in the blanks so that this function replaces negative values +with zeroes.
    • +
    • Modify this program so that it only processes files with fewer than +50 records.
    • +
    • Modify this program so that it always finds the largest and smallest +values in a list no matter what the list’s values are.
    • +
  • +
+
+

+Programming Style (15:25)

+
  • Teaching: 15 min +
    • How can I make my programs more readable?
    • +
    • How do most programmers format their code?
    • +
    • How can programs check their own operation?
    • +
  • +
  • Challenges: 15 min +
    • Which lines in this code will be available as online help?
    • +
    • Turn the comments in this program into docstrings.
    • +
    • Rewrite this short program to be more readable.
    • +
  • +
+
+

+Wrap-Up (15:55)

+
  • Teaching: 20 min +
    • Name and locate scientific Python community sites for software, +workshops, and help.
    • +
  • +
  • Challenges: 0 min +
    • None.
    • +
  • +
+
+

+Feedback (16:15)

+
  • Teaching: 0 min
  • +
  • Challenges: 15 min +
    • Collect feedback
    • +
  • +
+
+

Finish (16:30)

+
+
+
+ + +
+
+ + + diff --git a/discuss.html b/discuss.html new file mode 100644 index 000000000..ff8154354 --- /dev/null +++ b/discuss.html @@ -0,0 +1,529 @@ + +Plotting and Programming in Python: Discussion +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Discussion

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + +

FIXME: general discussion and further reading for learners.

+ + +
+
+ + +
+
+ + + diff --git a/docsearch.css b/docsearch.css new file mode 100644 index 000000000..e5f1fe1df --- /dev/null +++ b/docsearch.css @@ -0,0 +1,148 @@ +/* Docsearch -------------------------------------------------------------- */ +/* + Source: https://github.com/algolia/docsearch/ + License: MIT +*/ + +.algolia-autocomplete { + display: block; + -webkit-box-flex: 1; + -ms-flex: 1; + flex: 1 +} + +.algolia-autocomplete .ds-dropdown-menu { + width: 100%; + min-width: none; + max-width: none; + padding: .75rem 0; + background-color: #fff; + background-clip: padding-box; + border: 1px solid rgba(0, 0, 0, .1); + box-shadow: 0 .5rem 1rem rgba(0, 0, 0, .175); +} + +@media (min-width:768px) { + .algolia-autocomplete .ds-dropdown-menu { + width: 175% + } +} + +.algolia-autocomplete .ds-dropdown-menu::before { + display: none +} + +.algolia-autocomplete .ds-dropdown-menu [class^=ds-dataset-] { + padding: 0; + background-color: rgb(255,255,255); + border: 0; + max-height: 80vh; +} + +.algolia-autocomplete .ds-dropdown-menu .ds-suggestions { + margin-top: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion { + padding: 0; + overflow: visible +} + +.algolia-autocomplete .algolia-docsearch-suggestion--category-header { + padding: .125rem 1rem; + margin-top: 0; + font-size: 1.3em; + font-weight: 500; + color: #00008B; + border-bottom: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--wrapper { + float: none; + padding-top: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--subcategory-column { + float: none; + width: auto; + padding: 0; + text-align: left +} + +.algolia-autocomplete .algolia-docsearch-suggestion--content { + float: none; + width: auto; + padding: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--content::before { + display: none +} + +.algolia-autocomplete .ds-suggestion:not(:first-child) .algolia-docsearch-suggestion--category-header { + padding-top: .75rem; + margin-top: .75rem; + border-top: 1px solid rgba(0, 0, 0, .1) +} + +.algolia-autocomplete .ds-suggestion .algolia-docsearch-suggestion--subcategory-column { + display: block; + padding: .1rem 1rem; + margin-bottom: 0.1; + font-size: 1.0em; + font-weight: 400 + /* display: none */ +} + +.algolia-autocomplete .algolia-docsearch-suggestion--title { + display: block; + padding: .25rem 1rem; + margin-bottom: 0; + font-size: 0.9em; + font-weight: 400 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--text { + padding: 0 1rem .5rem; + margin-top: -.25rem; + font-size: 0.8em; + font-weight: 400; + line-height: 1.25 +} + +.algolia-autocomplete .algolia-docsearch-footer { + width: 110px; + height: 20px; + z-index: 3; + margin-top: 10.66667px; + float: right; + font-size: 0; + line-height: 0; +} + +.algolia-autocomplete .algolia-docsearch-footer--logo { + background-image: url("data:image/svg+xml;utf8,"); + background-repeat: no-repeat; + background-position: 50%; + background-size: 100%; + overflow: hidden; + text-indent: -9000px; + width: 100%; + height: 100%; + display: block; + transform: translate(-8px); +} + +.algolia-autocomplete .algolia-docsearch-suggestion--highlight { + color: #FF8C00; + background: rgba(232, 189, 54, 0.1) +} + + +.algolia-autocomplete .algolia-docsearch-suggestion--text .algolia-docsearch-suggestion--highlight { + box-shadow: inset 0 -2px 0 0 rgba(105, 105, 105, .5) +} + +.algolia-autocomplete .ds-suggestion.ds-cursor .algolia-docsearch-suggestion--content { + background-color: rgba(192, 192, 192, .15) +} diff --git a/docsearch.js b/docsearch.js new file mode 100644 index 000000000..b35504cd3 --- /dev/null +++ b/docsearch.js @@ -0,0 +1,85 @@ +$(function() { + + // register a handler to move the focus to the search bar + // upon pressing shift + "/" (i.e. "?") + $(document).on('keydown', function(e) { + if (e.shiftKey && e.keyCode == 191) { + e.preventDefault(); + $("#search-input").focus(); + } + }); + + $(document).ready(function() { + // do keyword highlighting + /* modified from https://jsfiddle.net/julmot/bL6bb5oo/ */ + var mark = function() { + + var referrer = document.URL ; + var paramKey = "q" ; + + if (referrer.indexOf("?") !== -1) { + var qs = referrer.substr(referrer.indexOf('?') + 1); + var qs_noanchor = qs.split('#')[0]; + var qsa = qs_noanchor.split('&'); + var keyword = ""; + + for (var i = 0; i < qsa.length; i++) { + var currentParam = qsa[i].split('='); + + if (currentParam.length !== 2) { + continue; + } + + if (currentParam[0] == paramKey) { + keyword = decodeURIComponent(currentParam[1].replace(/\+/g, "%20")); + } + } + + if (keyword !== "") { + $(".contents").unmark({ + done: function() { + $(".contents").mark(keyword); + } + }); + } + } + }; + + mark(); + }); +}); + +/* Search term highlighting ------------------------------*/ + +function matchedWords(hit) { + var words = []; + + var hierarchy = hit._highlightResult.hierarchy; + // loop to fetch from lvl0, lvl1, etc. + for (var idx in hierarchy) { + words = words.concat(hierarchy[idx].matchedWords); + } + + var content = hit._highlightResult.content; + if (content) { + words = words.concat(content.matchedWords); + } + + // return unique words + var words_uniq = [...new Set(words)]; + return words_uniq; +} + +function updateHitURL(hit) { + + var words = matchedWords(hit); + var url = ""; + + if (hit.anchor) { + url = hit.url_without_anchor + '?q=' + escape(words.join(" ")) + '#' + hit.anchor; + } else { + url = hit.url + '?q=' + escape(words.join(" ")); + } + + return url; +} diff --git a/exercises.html b/exercises.html new file mode 100644 index 000000000..e404074ea --- /dev/null +++ b/exercises.html @@ -0,0 +1,529 @@ + +Plotting and Programming in Python: Further Exercises +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Further Exercises

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +
+ +
+ + +

FIXME: exercises that don’t fit into the regular schedule.

+ + +
+
+ + +
+
+ + + diff --git a/favicon-16x16.png b/favicon-16x16.png new file mode 100644 index 000000000..d44f8acb4 Binary files /dev/null and b/favicon-16x16.png differ diff --git a/favicon-32x32.png b/favicon-32x32.png new file mode 100644 index 000000000..63441d4c3 Binary files /dev/null and b/favicon-32x32.png differ diff --git a/favicons/cp/apple-touch-icon-114x114.png b/favicons/cp/apple-touch-icon-114x114.png new file mode 100644 index 000000000..a60b75810 Binary files /dev/null and b/favicons/cp/apple-touch-icon-114x114.png differ diff --git a/favicons/cp/apple-touch-icon-120x120.png b/favicons/cp/apple-touch-icon-120x120.png new file mode 100644 index 000000000..8f20a8f12 Binary files /dev/null and b/favicons/cp/apple-touch-icon-120x120.png differ diff --git a/favicons/cp/apple-touch-icon-144x144.png b/favicons/cp/apple-touch-icon-144x144.png new file mode 100644 index 000000000..4be151b14 Binary files /dev/null and b/favicons/cp/apple-touch-icon-144x144.png differ diff --git a/favicons/cp/apple-touch-icon-152x152.png b/favicons/cp/apple-touch-icon-152x152.png new file mode 100644 index 000000000..7d1d94395 Binary files /dev/null and b/favicons/cp/apple-touch-icon-152x152.png differ diff --git a/favicons/cp/apple-touch-icon-57x57.png b/favicons/cp/apple-touch-icon-57x57.png new file mode 100644 index 000000000..92309cef2 Binary files /dev/null and b/favicons/cp/apple-touch-icon-57x57.png differ diff --git a/favicons/cp/apple-touch-icon-60x60.png b/favicons/cp/apple-touch-icon-60x60.png new file mode 100644 index 000000000..de8148e58 Binary files /dev/null and b/favicons/cp/apple-touch-icon-60x60.png differ diff --git a/favicons/cp/apple-touch-icon-72x72.png b/favicons/cp/apple-touch-icon-72x72.png new file mode 100644 index 000000000..81d7e3d83 Binary files /dev/null and b/favicons/cp/apple-touch-icon-72x72.png differ diff --git a/favicons/cp/apple-touch-icon-76x76.png b/favicons/cp/apple-touch-icon-76x76.png new file mode 100644 index 000000000..15bca5c77 Binary files /dev/null and b/favicons/cp/apple-touch-icon-76x76.png differ diff --git a/favicons/cp/favicon-128.png b/favicons/cp/favicon-128.png new file mode 100644 index 000000000..e612cdc15 Binary files /dev/null and b/favicons/cp/favicon-128.png differ diff --git a/favicons/cp/favicon-16x16.png b/favicons/cp/favicon-16x16.png new file mode 100644 index 000000000..65b331112 Binary files /dev/null and b/favicons/cp/favicon-16x16.png differ diff --git a/favicons/cp/favicon-196x196.png b/favicons/cp/favicon-196x196.png new file mode 100644 index 000000000..0da938b27 Binary files /dev/null and b/favicons/cp/favicon-196x196.png differ diff --git a/favicons/cp/favicon-32x32.png b/favicons/cp/favicon-32x32.png new file mode 100644 index 000000000..0c1442e39 Binary files /dev/null and b/favicons/cp/favicon-32x32.png differ diff --git a/favicons/cp/favicon-96x96.png b/favicons/cp/favicon-96x96.png new file mode 100644 index 000000000..bed74ec8d Binary files /dev/null and b/favicons/cp/favicon-96x96.png differ diff --git a/favicons/cp/favicon.ico b/favicons/cp/favicon.ico new file mode 100644 index 000000000..4f2f2f11f Binary files /dev/null and b/favicons/cp/favicon.ico differ diff --git a/favicons/cp/mstile-144x144.png b/favicons/cp/mstile-144x144.png new file mode 100644 index 000000000..4be151b14 Binary files /dev/null and b/favicons/cp/mstile-144x144.png differ diff --git a/favicons/cp/mstile-150x150.png b/favicons/cp/mstile-150x150.png new file mode 100644 index 000000000..bf7ad5e79 Binary files /dev/null and b/favicons/cp/mstile-150x150.png differ diff --git a/favicons/cp/mstile-310x150.png b/favicons/cp/mstile-310x150.png new file mode 100644 index 000000000..6ac804843 Binary files /dev/null and b/favicons/cp/mstile-310x150.png differ diff --git a/favicons/cp/mstile-310x310.png b/favicons/cp/mstile-310x310.png new file mode 100644 index 000000000..b77814750 Binary files /dev/null and b/favicons/cp/mstile-310x310.png differ diff --git a/favicons/cp/mstile-70x70.png b/favicons/cp/mstile-70x70.png new file mode 100644 index 000000000..e612cdc15 Binary files /dev/null and b/favicons/cp/mstile-70x70.png differ diff --git a/favicons/dc/apple-touch-icon-114x114.png b/favicons/dc/apple-touch-icon-114x114.png new file mode 100644 index 000000000..edafbda13 Binary files /dev/null and b/favicons/dc/apple-touch-icon-114x114.png differ diff --git a/favicons/dc/apple-touch-icon-120x120.png b/favicons/dc/apple-touch-icon-120x120.png new file mode 100644 index 000000000..ee145ec5c Binary files /dev/null and b/favicons/dc/apple-touch-icon-120x120.png differ diff --git a/favicons/dc/apple-touch-icon-144x144.png b/favicons/dc/apple-touch-icon-144x144.png new file mode 100644 index 000000000..bf5070144 Binary files /dev/null and b/favicons/dc/apple-touch-icon-144x144.png differ diff --git a/favicons/dc/apple-touch-icon-152x152.png b/favicons/dc/apple-touch-icon-152x152.png new file mode 100644 index 000000000..bd596c816 Binary files /dev/null and b/favicons/dc/apple-touch-icon-152x152.png differ diff --git a/favicons/dc/apple-touch-icon-57x57.png b/favicons/dc/apple-touch-icon-57x57.png new file mode 100644 index 000000000..61c152735 Binary files /dev/null and b/favicons/dc/apple-touch-icon-57x57.png differ diff --git a/favicons/dc/apple-touch-icon-60x60.png b/favicons/dc/apple-touch-icon-60x60.png new file mode 100644 index 000000000..9daad3633 Binary files /dev/null and b/favicons/dc/apple-touch-icon-60x60.png differ diff --git a/favicons/dc/apple-touch-icon-72x72.png b/favicons/dc/apple-touch-icon-72x72.png new file mode 100644 index 000000000..2069520fc Binary files /dev/null and b/favicons/dc/apple-touch-icon-72x72.png differ diff --git a/favicons/dc/apple-touch-icon-76x76.png b/favicons/dc/apple-touch-icon-76x76.png new file mode 100644 index 000000000..3db01ca7d Binary files /dev/null and b/favicons/dc/apple-touch-icon-76x76.png differ diff --git a/favicons/dc/favicon-128.png b/favicons/dc/favicon-128.png new file mode 100644 index 000000000..9e3de2a49 Binary files /dev/null and b/favicons/dc/favicon-128.png differ diff --git a/favicons/dc/favicon-16x16.png b/favicons/dc/favicon-16x16.png new file mode 100644 index 000000000..4c9f9b8c5 Binary files /dev/null and b/favicons/dc/favicon-16x16.png differ diff --git a/favicons/dc/favicon-196x196.png b/favicons/dc/favicon-196x196.png new file mode 100644 index 000000000..588afc213 Binary files /dev/null and b/favicons/dc/favicon-196x196.png differ diff --git a/favicons/dc/favicon-32x32.png b/favicons/dc/favicon-32x32.png new file mode 100644 index 000000000..9c2ecbfbe Binary files /dev/null and b/favicons/dc/favicon-32x32.png differ diff --git a/favicons/dc/favicon-96x96.png b/favicons/dc/favicon-96x96.png new file mode 100644 index 000000000..ff13fc06e Binary files /dev/null and b/favicons/dc/favicon-96x96.png differ diff --git a/favicons/dc/favicon.ico b/favicons/dc/favicon.ico new file mode 100644 index 000000000..e4715f329 Binary files /dev/null and b/favicons/dc/favicon.ico differ diff --git a/favicons/dc/mstile-144x144.png b/favicons/dc/mstile-144x144.png new file mode 100644 index 000000000..bf5070144 Binary files /dev/null and b/favicons/dc/mstile-144x144.png differ diff --git a/favicons/dc/mstile-150x150.png b/favicons/dc/mstile-150x150.png new file mode 100644 index 000000000..c5844cca3 Binary files /dev/null and b/favicons/dc/mstile-150x150.png differ diff --git a/favicons/dc/mstile-310x150.png b/favicons/dc/mstile-310x150.png new file mode 100644 index 000000000..786813af8 Binary files /dev/null and b/favicons/dc/mstile-310x150.png differ diff --git a/favicons/dc/mstile-310x310.png b/favicons/dc/mstile-310x310.png new file mode 100644 index 000000000..9580653c6 Binary files /dev/null and b/favicons/dc/mstile-310x310.png differ diff --git a/favicons/dc/mstile-70x70.png b/favicons/dc/mstile-70x70.png new file mode 100644 index 000000000..9e3de2a49 Binary files /dev/null and b/favicons/dc/mstile-70x70.png differ diff --git a/favicons/lc/apple-touch-icon-114x114.png b/favicons/lc/apple-touch-icon-114x114.png new file mode 100644 index 000000000..6c83127ca Binary files /dev/null and b/favicons/lc/apple-touch-icon-114x114.png differ diff --git a/favicons/lc/apple-touch-icon-120x120.png b/favicons/lc/apple-touch-icon-120x120.png new file mode 100644 index 000000000..8334648f1 Binary files /dev/null and b/favicons/lc/apple-touch-icon-120x120.png differ diff --git a/favicons/lc/apple-touch-icon-144x144.png b/favicons/lc/apple-touch-icon-144x144.png new file mode 100644 index 000000000..5f32151ed Binary files /dev/null and b/favicons/lc/apple-touch-icon-144x144.png differ diff --git a/favicons/lc/apple-touch-icon-152x152.png b/favicons/lc/apple-touch-icon-152x152.png new file mode 100644 index 000000000..4e5c177ce Binary files /dev/null and b/favicons/lc/apple-touch-icon-152x152.png differ diff --git a/favicons/lc/apple-touch-icon-57x57.png b/favicons/lc/apple-touch-icon-57x57.png new file mode 100644 index 000000000..61f9c9c74 Binary files /dev/null and b/favicons/lc/apple-touch-icon-57x57.png differ diff --git a/favicons/lc/apple-touch-icon-60x60.png b/favicons/lc/apple-touch-icon-60x60.png new file mode 100644 index 000000000..ccb5ada1c Binary files /dev/null and b/favicons/lc/apple-touch-icon-60x60.png differ diff --git a/favicons/lc/apple-touch-icon-72x72.png b/favicons/lc/apple-touch-icon-72x72.png new file mode 100644 index 000000000..517d459af Binary files /dev/null and b/favicons/lc/apple-touch-icon-72x72.png differ diff --git a/favicons/lc/apple-touch-icon-76x76.png b/favicons/lc/apple-touch-icon-76x76.png new file mode 100644 index 000000000..17454b311 Binary files /dev/null and b/favicons/lc/apple-touch-icon-76x76.png differ diff --git a/favicons/lc/favicon-128.png b/favicons/lc/favicon-128.png new file mode 100644 index 000000000..9d781c901 Binary files /dev/null and b/favicons/lc/favicon-128.png differ diff --git a/favicons/lc/favicon-16x16.png b/favicons/lc/favicon-16x16.png new file mode 100644 index 000000000..3c20abcc0 Binary files /dev/null and b/favicons/lc/favicon-16x16.png differ diff --git a/favicons/lc/favicon-196x196.png b/favicons/lc/favicon-196x196.png new file mode 100644 index 000000000..46baaf8f9 Binary files /dev/null and b/favicons/lc/favicon-196x196.png differ diff --git a/favicons/lc/favicon-32x32.png b/favicons/lc/favicon-32x32.png new file mode 100644 index 000000000..ed6701ea1 Binary files /dev/null and b/favicons/lc/favicon-32x32.png differ diff --git a/favicons/lc/favicon-96x96.png b/favicons/lc/favicon-96x96.png new file mode 100644 index 000000000..bc468c73a Binary files /dev/null and b/favicons/lc/favicon-96x96.png differ diff --git a/favicons/lc/favicon.ico b/favicons/lc/favicon.ico new file mode 100644 index 000000000..5c14e8091 Binary files /dev/null and b/favicons/lc/favicon.ico differ diff --git a/favicons/lc/mstile-144x144.png b/favicons/lc/mstile-144x144.png new file mode 100644 index 000000000..5f32151ed Binary files /dev/null and b/favicons/lc/mstile-144x144.png differ diff --git a/favicons/lc/mstile-150x150.png b/favicons/lc/mstile-150x150.png new file mode 100644 index 000000000..924953a84 Binary files /dev/null and b/favicons/lc/mstile-150x150.png differ diff --git a/favicons/lc/mstile-310x150.png b/favicons/lc/mstile-310x150.png new file mode 100644 index 000000000..e4dcda444 Binary files /dev/null and b/favicons/lc/mstile-310x150.png differ diff --git a/favicons/lc/mstile-310x310.png b/favicons/lc/mstile-310x310.png new file mode 100644 index 000000000..a12c87632 Binary files /dev/null and b/favicons/lc/mstile-310x310.png differ diff --git a/favicons/lc/mstile-70x70.png b/favicons/lc/mstile-70x70.png new file mode 100644 index 000000000..9d781c901 Binary files /dev/null and b/favicons/lc/mstile-70x70.png differ diff --git a/favicons/swc/apple-touch-icon-114x114.png b/favicons/swc/apple-touch-icon-114x114.png new file mode 100644 index 000000000..e5125f8c4 Binary files /dev/null and b/favicons/swc/apple-touch-icon-114x114.png differ diff --git a/favicons/swc/apple-touch-icon-120x120.png b/favicons/swc/apple-touch-icon-120x120.png new file mode 100644 index 000000000..0f97a0aec Binary files /dev/null and b/favicons/swc/apple-touch-icon-120x120.png differ diff --git a/favicons/swc/apple-touch-icon-144x144.png b/favicons/swc/apple-touch-icon-144x144.png new file mode 100644 index 000000000..7441446cc Binary files /dev/null and b/favicons/swc/apple-touch-icon-144x144.png differ diff --git a/favicons/swc/apple-touch-icon-152x152.png b/favicons/swc/apple-touch-icon-152x152.png new file mode 100644 index 000000000..45cc338e5 Binary files /dev/null and b/favicons/swc/apple-touch-icon-152x152.png differ diff --git a/favicons/swc/apple-touch-icon-57x57.png b/favicons/swc/apple-touch-icon-57x57.png new file mode 100644 index 000000000..e180a4a32 Binary files /dev/null and b/favicons/swc/apple-touch-icon-57x57.png differ diff --git a/favicons/swc/apple-touch-icon-60x60.png b/favicons/swc/apple-touch-icon-60x60.png new file mode 100644 index 000000000..c96fd6ce7 Binary files /dev/null and b/favicons/swc/apple-touch-icon-60x60.png differ diff --git a/favicons/swc/apple-touch-icon-72x72.png b/favicons/swc/apple-touch-icon-72x72.png new file mode 100644 index 000000000..aae014aa7 Binary files /dev/null and b/favicons/swc/apple-touch-icon-72x72.png differ diff --git a/favicons/swc/apple-touch-icon-76x76.png b/favicons/swc/apple-touch-icon-76x76.png new file mode 100644 index 000000000..2167f94a7 Binary files /dev/null and b/favicons/swc/apple-touch-icon-76x76.png differ diff --git a/favicons/swc/favicon-128.png b/favicons/swc/favicon-128.png new file mode 100644 index 000000000..f61df620c Binary files /dev/null and b/favicons/swc/favicon-128.png differ diff --git a/favicons/swc/favicon-16x16.png b/favicons/swc/favicon-16x16.png new file mode 100644 index 000000000..2d20a4061 Binary files /dev/null and b/favicons/swc/favicon-16x16.png differ diff --git a/favicons/swc/favicon-196x196.png b/favicons/swc/favicon-196x196.png new file mode 100644 index 000000000..2a20d3a6f Binary files /dev/null and b/favicons/swc/favicon-196x196.png differ diff --git a/favicons/swc/favicon-32x32.png b/favicons/swc/favicon-32x32.png new file mode 100644 index 000000000..f622b73a1 Binary files /dev/null and b/favicons/swc/favicon-32x32.png differ diff --git a/favicons/swc/favicon-96x96.png b/favicons/swc/favicon-96x96.png new file mode 100644 index 000000000..5e57f66a5 Binary files /dev/null and b/favicons/swc/favicon-96x96.png differ diff --git a/favicons/swc/favicon.ico b/favicons/swc/favicon.ico new file mode 100644 index 000000000..f771790f2 Binary files /dev/null and b/favicons/swc/favicon.ico differ diff --git a/favicons/swc/mstile-144x144.png b/favicons/swc/mstile-144x144.png new file mode 100644 index 000000000..7441446cc Binary files /dev/null and b/favicons/swc/mstile-144x144.png differ diff --git a/favicons/swc/mstile-150x150.png b/favicons/swc/mstile-150x150.png new file mode 100644 index 000000000..d1594bcb8 Binary files /dev/null and b/favicons/swc/mstile-150x150.png differ diff --git a/favicons/swc/mstile-310x150.png b/favicons/swc/mstile-310x150.png new file mode 100644 index 000000000..f7d58b2b9 Binary files /dev/null and b/favicons/swc/mstile-310x150.png differ diff --git a/favicons/swc/mstile-310x310.png b/favicons/swc/mstile-310x310.png new file mode 100644 index 000000000..b632b421c Binary files /dev/null and b/favicons/swc/mstile-310x310.png differ diff --git a/favicons/swc/mstile-70x70.png b/favicons/swc/mstile-70x70.png new file mode 100644 index 000000000..f61df620c Binary files /dev/null and b/favicons/swc/mstile-70x70.png differ diff --git a/fig/0_anaconda_navigator_landing_page.png b/fig/0_anaconda_navigator_landing_page.png new file mode 100644 index 000000000..a5953fa5e Binary files /dev/null and b/fig/0_anaconda_navigator_landing_page.png differ diff --git a/fig/0_jupyterlab_landing_page.png b/fig/0_jupyterlab_landing_page.png new file mode 100644 index 000000000..6eb836b7c Binary files /dev/null and b/fig/0_jupyterlab_landing_page.png differ diff --git a/fig/0_jupyterlab_left_side_bar.png b/fig/0_jupyterlab_left_side_bar.png new file mode 100644 index 000000000..2469d4078 Binary files /dev/null and b/fig/0_jupyterlab_left_side_bar.png differ diff --git a/fig/0_jupyterlab_main_work_area.png b/fig/0_jupyterlab_main_work_area.png new file mode 100644 index 000000000..1dc30618d Binary files /dev/null and b/fig/0_jupyterlab_main_work_area.png differ diff --git a/fig/0_jupyterlab_menu_bar.png b/fig/0_jupyterlab_menu_bar.png new file mode 100644 index 000000000..dbbc585ca Binary files /dev/null and b/fig/0_jupyterlab_menu_bar.png differ diff --git a/fig/0_jupyterlab_notebook_screenshot.png b/fig/0_jupyterlab_notebook_screenshot.png new file mode 100644 index 000000000..928982480 Binary files /dev/null and b/fig/0_jupyterlab_notebook_screenshot.png differ diff --git a/fig/0_multipanel_jupyterlab_screenshot.png b/fig/0_multipanel_jupyterlab_screenshot.png new file mode 100644 index 000000000..adbca585b Binary files /dev/null and b/fig/0_multipanel_jupyterlab_screenshot.png differ diff --git a/fig/2_indexing.svg b/fig/2_indexing.svg new file mode 100644 index 000000000..0493e5941 --- /dev/null +++ b/fig/2_indexing.svg @@ -0,0 +1,24 @@ + + + + print(atom_name[0]) + + + h + e + l + i + u + m + + + + 0 + 1 + 2 + 3 + 4 + 5 + + + diff --git a/fig/9_correlations_solution1.svg b/fig/9_correlations_solution1.svg new file mode 100644 index 000000000..7b5c06a0c --- /dev/null +++ b/fig/9_correlations_solution1.svg @@ -0,0 +1,67 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + min + max + 300 + 400 + 500 + 600 + 700 + 800 + 900 + + + + + 40000 + 60000 + 80000 + 100000 + + + + + + diff --git a/fig/9_correlations_solution2.png b/fig/9_correlations_solution2.png new file mode 100644 index 000000000..d54af1a1f Binary files /dev/null and b/fig/9_correlations_solution2.png differ diff --git a/fig/9_gdp_australia.svg b/fig/9_gdp_australia.svg new file mode 100644 index 000000000..7f0504e5c --- /dev/null +++ b/fig/9_gdp_australia.svg @@ -0,0 +1,46 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 1950 + 1960 + 1970 + 1980 + 1990 + 2000 + + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + + + + + + diff --git a/fig/9_gdp_australia_formatted.svg b/fig/9_gdp_australia_formatted.svg new file mode 100644 index 000000000..cb1fe6cc7 --- /dev/null +++ b/fig/9_gdp_australia_formatted.svg @@ -0,0 +1,58 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1950 + 1960 + 1970 + 1980 + 1990 + 2000 + + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + + diff --git a/fig/9_gdp_australia_nz.svg b/fig/9_gdp_australia_nz.svg new file mode 100644 index 000000000..d8ae1142d --- /dev/null +++ b/fig/9_gdp_australia_nz.svg @@ -0,0 +1,54 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + country + Australia + New Zealand + + 1950 + 1960 + 1970 + 1980 + 1990 + 2000 + + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + GDP per capita + + diff --git a/fig/9_gdp_australia_nz_formatted.svg b/fig/9_gdp_australia_nz_formatted.svg new file mode 100644 index 000000000..d300207c8 --- /dev/null +++ b/fig/9_gdp_australia_nz_formatted.svg @@ -0,0 +1,68 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1950 + 1960 + 1970 + 1980 + 1990 + 2000 + + Year + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + GDP per capita ($) + + Australia + New Zealand + diff --git a/fig/9_gdp_bar.svg b/fig/9_gdp_bar.svg new file mode 100644 index 000000000..55b907b33 --- /dev/null +++ b/fig/9_gdp_bar.svg @@ -0,0 +1,119 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1952 + 1957 + 1962 + 1967 + 1972 + 1977 + 1982 + 1987 + 1992 + 1997 + 2002 + 2007 + + 0 + 5000 + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + + GDP per capita + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + country + Australia + New Zealand + + diff --git a/fig/9_gdp_correlation_data.svg b/fig/9_gdp_correlation_data.svg new file mode 100644 index 000000000..5d1cb7448 --- /dev/null +++ b/fig/9_gdp_correlation_data.svg @@ -0,0 +1,80 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + Australia + + 10000 + 12000 + 14000 + 16000 + 18000 + 20000 + 22000 + 24000 + + New Zealand + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/fig/9_gdp_correlation_plt.svg b/fig/9_gdp_correlation_plt.svg new file mode 100644 index 000000000..23cd95a24 --- /dev/null +++ b/fig/9_gdp_correlation_plt.svg @@ -0,0 +1,80 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + + 10000 + 12000 + 14000 + 16000 + 18000 + 20000 + 22000 + 24000 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/fig/9_minima_maxima_solution.png b/fig/9_minima_maxima_solution.png new file mode 100644 index 000000000..b7146fb36 Binary files /dev/null and b/fig/9_minima_maxima_solution.png differ diff --git a/fig/9_more_correlations_solution.svg b/fig/9_more_correlations_solution.svg new file mode 100644 index 000000000..1337d2e90 --- /dev/null +++ b/fig/9_more_correlations_solution.svg @@ -0,0 +1,187 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + 10000 + 20000 + 30000 + 40000 + 50000 + gdpPercap_2007 + + + 40 + 50 + 60 + 70 + 80 + + lifeExp_2007 + + + + + diff --git a/fig/9_simple_position_time_plot.svg b/fig/9_simple_position_time_plot.svg new file mode 100644 index 000000000..394d5f3c8 --- /dev/null +++ b/fig/9_simple_position_time_plot.svg @@ -0,0 +1,52 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.0 + 0.5 + 1.0 + 1.5 + 2.0 + 2.5 + 3.0 + Time (hr) + + + 0 + 50 + 100 + 150 + 200 + 250 + 300 + + Position (km) + + + + + + diff --git a/files/python-novice-gapminder-data.zip b/files/python-novice-gapminder-data.zip new file mode 100644 index 000000000..5988d3c5e Binary files /dev/null and b/files/python-novice-gapminder-data.zip differ diff --git a/images.html b/images.html new file mode 100644 index 000000000..3867548a2 --- /dev/null +++ b/images.html @@ -0,0 +1,702 @@ + + + + + +Plotting and Programming in Python: All Images + + + + + + + + + + + + +
+ Plotting and Programming in Python +
+ +
+
+ + + + + + +
+
+ + +

Running and Quitting

+
+

Figure 1

+ +

+Anaconda Navigator landing page

+
+

Figure 2

+ +

+JupyterLab landing page

+
+

Figure 3

+ +

+JupyterLab Menu Bar

+
+

Figure 4

+ +

+JupyterLab Left Side Bar

+
+

Figure 5

+ +

+JupyterLab Main Work Area

+
+

Figure 6

+ +

+Example Jupyter Notebook

+
+

Figure 7

+ +

+Multi-panel JupyterLab

+

Variables and Assignment

+
+

Figure 1

+ +
A line of Python code, print(atom_name[0]), demonstrates that using the zero index will output just the initial letter, in this case ‘h’ for helium.
A line of Python code, print(atom_name[0]), +demonstrates that using the zero index will output just the initial +letter, in this case ‘h’ for helium.
+

Data Types and Type Conversion

+

Built-in Functions and Help

+

Morning Coffee

+

Libraries

+

Reading Tabular Data into DataFrames

+

Pandas DataFrames

+

Plotting

+
+

Figure 1

+ +
A line chart showing time (hr) relative to position (km), using the values provided in the code block above. By default, the plotted line is blue against a white background, and the axes have been scaled automatically to fit the range of the input data.

+

Figure 2

+ +
GDP plot for Australia

+

Figure 3

+ +
GDP plot for Australia and New Zealand

+

Figure 4

+ +
GDP barplot for Australia

+

Figure 5

+ +
GDP formatted plot for Australia

+

Figure 6

+ +
GDP formatted plot for Australia and New Zealand

+

Figure 7

+ +
GDP correlation using plt.scatter

+

Figure 8

+ +
GDP correlation using data.T.plot.scatter

+

Figure 9

+ +
Minima Maxima Solution

+

Figure 10

+ +
Correlations Solution 1

+

Figure 11

+ +
Correlations Solution 2

+

Figure 12

+ +
More Correlations Solution

Lunch

+

Lists

+

For Loops

+

Conditionals

+

Looping Over Data Sets

+

Afternoon Coffee

+

Writing Functions

+

Variable Scope

+

Programming Style

+

Wrap-Up

+

Feedback

+
+
+
+
+ + +
+ + +
+ + + + + diff --git a/index.html b/index.html new file mode 100644 index 000000000..ca8694213 --- /dev/null +++ b/index.html @@ -0,0 +1,545 @@ + +Plotting and Programming in Python: Summary and Setup +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+

Summary and Setup

+ + +

This lesson is an introduction to programming in Python 3 for people +with little or no previous programming experience. It uses plotting as +its motivating example and is designed to be used in both Data Carpentry and Software Carpentry +workshops. This lesson references JupyterLab but +can be taught using alternative Python 3 interpreters as well (e.g., +repl.it, Anaconda).

+
+
+ +
+
+

Prerequisites

+
+
  1. Learners need to understand what files and directories are, what +a working directory is, and how to start a Python interpreter.

  2. +
  3. Learners must install Python 3 before the class starts.

  4. +
  5. Learners must get the gapminder data before class starts: please +download and unzip the file python-novice-gapminder-data.zip.

  6. +

Please see the setup instructions for more +details.

+
+
+
+ + +

Getting the Data

+

The data we will be using is taken from the gapminder +dataset. To obtain it, download and unzip the file python-novice-gapminder-data.zip. +In order to follow the presented material, you should launch the +JupyterLab server in the root directory (see Starting +JupyterLab).

+

Installing Python Using Anaconda

+

Please refer to the Python +section of the workshop website for installation instructions.

+
+ + +
+
+ + + diff --git a/instructor-notes.html b/instructor-notes.html new file mode 100644 index 000000000..cafbcf484 --- /dev/null +++ b/instructor-notes.html @@ -0,0 +1,630 @@ + + + + + +Plotting and Programming in Python: Instructor Notes + + + + + + + + + + + + +
+ Plotting and Programming in Python +
+ +
+
+ + + + + + +
+
+

Instructor Notes

+ +

General Notes +

+
+
+
It’s all right not to get through the whole lesson.
+
+This lesson is designed for people who have never programmed before, but +any given class may include people with a wide range of prior +experience. We have therefore included enough material to fill a full +day if need be, but expect that many offerings will only get as far as +the introduction to Pandas. +
+
Don’t tell people to Google things.
+
+One of the goals of this lesson is to help novices build a workable +mental model of how programming works. Until they have that model, they +will not know what to search for or how to recognize a helpful answer. +Telling them to Google can also give the impression that we think their +problem is trivial. (That said, if learners have done enough programming +before to be past these issues, having them search for solutions online +can help them solidify their understanding.) It’s also worth quoting Trevor +King’s comment about online search: “If you find anything, other +folks were confused enough to bother with a blog or Stack Overflow post, +so it’s probably not trivial.” +
+
+
+
+
+ + +
+ + +
+ + + + + diff --git a/instructor/01-run-quit.html b/instructor/01-run-quit.html new file mode 100644 index 000000000..8f6871932 --- /dev/null +++ b/instructor/01-run-quit.html @@ -0,0 +1,1252 @@ + +Plotting and Programming in Python: Running and Quitting +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Running and Quitting

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 15 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I run Python programs?
  • +
+
+
+
+
+
+

Objectives

+
  • Launch the JupyterLab server.
  • +
  • Create a new Python script.
  • +
  • Create a Jupyter notebook.
  • +
  • Shutdown the JupyterLab server.
  • +
  • Understand the difference between a Python script and a Jupyter +notebook.
  • +
  • Create Markdown cells in a notebook.
  • +
  • Create and run Python cells in a notebook.
  • +
+
+
+
+
+

To run Python, we are going to use Jupyter Notebooks via JupyterLab for +the remainder of this workshop. Jupyter notebooks are common in data +science and visualization and serve as a convenient common-denominator +experience for running Python code interactively where we can easily +view and share the results of our Python code.

+

There are other ways of editing, managing, and running code. Software +developers often use an integrated development environment (IDE) like PyCharm or Visual Studio Code, or text +editors like Vim or Emacs, to create and edit their Python programs. +After editing and saving your Python programs you can execute those +programs within the IDE itself or directly on the command line. In +contrast, Jupyter notebooks let us execute and view the results of our +Python code immediately within the notebook.

+

JupyterLab has several other handy features:

+
  • You can easily type, edit, and copy and paste blocks of code.
  • +
  • Tab complete allows you to easily access the names of things you are +using and learn more about them.
  • +
  • It allows you to annotate your code with links, different sized +text, bullets, etc. to make it more accessible to you and your +collaborators.
  • +
  • It allows you to display figures next to the code that produces them +to tell a complete story of the analysis.
  • +

Each notebook contains one or more cells that contain code, text, or +images.

+

Getting Started with JupyterLab

+

JupyterLab is an application server with a web user interface from Project Jupyter that enables one to work +with documents and activities such as Jupyter notebooks, text editors, +terminals, and even custom components in a flexible, integrated, and +extensible manner. JupyterLab requires a reasonably up-to-date browser +(ideally a current version of Chrome, Safari, or Firefox); Internet +Explorer versions 9 and below are not supported.

+

JupyterLab is included as part of the Anaconda Python distribution. +If you have not already installed the Anaconda Python distribution, see +the setup instructions for installation +instructions.

+

In this lesson we will run JupyterLab locally on our own machines so +it will not require an internet connection besides the initial +connection to download and install Anaconda and JupyterLab

+
  • Start the JupyterLab server on your machine
  • +
  • Use a web browser to open a special localhost URL that connects to +your JupyterLab server
  • +
  • The JupyterLab server does the work and the web browser renders the +result
  • +
  • Type code into the browser and see the results after your JupyterLab +server has finished executing your code
  • +
+
+ +
+
+

JupyterLab? What about Jupyter notebooks?

+
+

JupyterLab is the next +stage in the evolution of the Jupyter Notebook. If you have prior +experience working with Jupyter notebooks, then you will have a good +idea of what to expect from JupyterLab.

+

Experienced users of Jupyter notebooks interested in a more detailed +discussion of the similarities and differences between the JupyterLab +and Jupyter notebook user interfaces can find more information in the JupyterLab +user interface documentation.

+
+
+
+

Starting JupyterLab

+

You can start the JupyterLab server through the command line or +through an application called Anaconda Navigator. Anaconda +Navigator is included as part of the Anaconda Python distribution.

+
+

macOS - Command Line

+

To start the JupyterLab server you will need to access the command +line through the Terminal. There are two ways to open Terminal on +Mac.

+
  1. In your Applications folder, open Utilities and double-click on +Terminal
  2. +
  3. Press Command + spacebar to launch Spotlight. +Type Terminal and then double-click the search result or +hit Enter +
  4. +

After you have launched Terminal, type the command to launch the +JupyterLab server.

+
+

BASH +

+
$ jupyter lab
+
+
+
+

Windows Users - Command Line

+

To start the JupyterLab server you will need to access the Anaconda +Prompt.

+

Press Windows Logo Key and search for +Anaconda Prompt, click the result or press enter.

+

After you have launched the Anaconda Prompt, type the command:

+
+

BASH +

+
$ jupyter lab
+
+
+
+

Anaconda Navigator

+

To start a JupyterLab server from Anaconda Navigator you must first +start +Anaconda Navigator (click for detailed instructions on macOS, Windows, +and Linux). You can search for Anaconda Navigator via Spotlight on +macOS (Command + spacebar), the Windows search +function (Windows Logo Key) or opening a terminal shell and +executing the anaconda-navigator executable from the +command line.

+

After you have launched Anaconda Navigator, click the +Launch button under JupyterLab. You may need to scroll down +to find it.

+

Here is a screenshot of an Anaconda Navigator page similar to the one +that should open on either macOS or Windows.

+

+Anaconda Navigator landing page

+

And here is a screenshot of a JupyterLab landing page that should be +similar to the one that opens in your default web browser after starting +the JupyterLab server on either macOS or Windows.

+

+JupyterLab landing page

+
+

The JupyterLab Interface

+

JupyterLab has many features found in traditional integrated +development environments (IDEs) but is focused on providing flexible +building blocks for interactive, exploratory computing.

+

The JupyterLab +Interface consists of the Menu Bar, a collapsable Left Side Bar, and +the Main Work Area which contains tabs of documents and activities.

+
+ +

The Menu Bar at the top of JupyterLab has the top-level menus that +expose various actions available in JupyterLab along with their keyboard +shortcuts (where applicable). The following menus are included by +default.

+
  • +File: Actions related to files and directories such +as New, Open, Close, Save, etc. The +File menu also includes the Shut Down action used to +shutdown the JupyterLab server.
  • +
  • +Edit: Actions related to editing documents and +other activities such as Undo, Cut, Copy, +Paste, etc.
  • +
  • +View: Actions that alter the appearance of +JupyterLab.
  • +
  • +Run: Actions for running code in different +activities such as notebooks and code consoles (discussed below).
  • +
  • +Kernel: Actions for managing kernels. Kernels in +Jupyter will be explained in more detail below.
  • +
  • +Tabs: A list of the open documents and activities +in the main work area.
  • +
  • +Settings: Common JupyterLab settings can be +configured using this menu. There is also an Advanced Settings +Editor option in the dropdown menu that provides more fine-grained +control of JupyterLab settings and configuration options.
  • +
  • +Help: A list of JupyterLab and kernel help +links.
  • +
+
+ +
+
+

Kernels

+
+

The JupyterLab docs +define kernels as “separate processes started by the server that runs +your code in different programming languages and environments.” When we +open a Jupyter Notebook, that starts a kernel - a process - that is +going to run the code. In this lesson, we’ll be using the Jupyter +ipython kernel which lets us run Python 3 code interactively.

+

Using other Jupyter kernels +for other programming languages would let us write and execute code +in other programming languages in the same JupyterLab interface, like R, +Java, Julia, Ruby, JavaScript, Fortran, etc.

+
+
+
+

A screenshot of the default Menu Bar is provided below.

+

+JupyterLab Menu Bar

+
+
+ +

The left sidebar contains a number of commonly used tabs, such as a +file browser (showing the contents of the directory where the JupyterLab +server was launched), a list of running kernels and terminals, the +command palette, and a list of open tabs in the main work area. A +screenshot of the default Left Side Bar is provided below.

+

+JupyterLab Left Side Bar

+

The left sidebar can be collapsed or expanded by selecting “Show Left +Sidebar” in the View menu or by clicking on the active sidebar tab.

+
+
+

Main Work Area

+

The main work area in JupyterLab enables you to arrange documents +(notebooks, text files, etc.) and other activities (terminals, code +consoles, etc.) into panels of tabs that can be resized or subdivided. A +screenshot of the default Main Work Area is provided below.

+

If you do not see the Launcher tab, click the blue plus sign under +the “File” and “Edit” menus and it will appear.

+

+JupyterLab Main Work Area

+

Drag a tab to the center of a tab panel to move the tab to the panel. +Subdivide a tab panel by dragging a tab to the left, right, top, or +bottom of the panel. The work area has a single current activity. The +tab for the current activity is marked with a colored top border (blue +by default).

+
+

Creating a Python script

+
  • To start writing a new Python program click the Text File icon under +the Other header in the Launcher tab of the Main Work Area. +
    • You can also create a new plain text file by selecting the New +-> Text File from the File menu in the Menu Bar.
    • +
  • +
  • To convert this plain text file to a Python program, select the +Save File As action from the File menu in the Menu Bar +and give your new text file a name that ends with the .py +extension. +
    • The .py extension lets everyone (including the +operating system) know that this text file is a Python program.
    • +
    • This is convention, not a requirement.
    • +
  • +

Creating a Jupyter Notebook

+

To open a new notebook click the Python 3 icon under the +Notebook header in the Launcher tab in the main work area. You +can also create a new notebook by selecting New -> Notebook +from the File menu in the Menu Bar.

+

Additional notes on Jupyter notebooks.

+
  • Notebook files have the extension .ipynb to distinguish +them from plain-text Python programs.
  • +
  • Notebooks can be exported as Python scripts that can be run from the +command line.
  • +

Below is a screenshot of a Jupyter notebook running inside +JupyterLab. If you are interested in more details, then see the official +notebook documentation.

+

+Example Jupyter Notebook

+
+
+ +
+
+

How It’s Stored

+
+
  • The notebook file is stored in a format called JSON.
  • +
  • Just like a webpage, what’s saved looks different from what you see +in your browser.
  • +
  • But this format allows Jupyter to mix source code, text, and images, +all in one file.
  • +
+
+
+
+
+ +
+
+

Arranging Documents into Panels of Tabs

+
+

In the JupyterLab Main Work Area you can arrange documents into +panels of tabs. Here is an example from the official +documentation.

+

+Multi-panel JupyterLab

+

First, create a text file, Python console, and terminal window and +arrange them into three panels in the main work area. Next, create a +notebook, terminal window, and text file and arrange them into three +panels in the main work area. Finally, create your own combination of +panels and tabs. What combination of panels and tabs do you think will +be most useful for your workflow?

+
+
+
+
+
+ +
+
+

After creating the necessary tabs, you can drag one of the tabs to +the center of a panel to move the tab to the panel; next you can +subdivide a tab panel by dragging a tab to the left, right, top, or +bottom of the panel.

+
+
+
+
+
+
+ +
+
+

Code vs. Text

+
+

Jupyter mixes code and text in different types of blocks, called +cells. We often use the term “code” to mean “the source code of software +written in a language such as Python”. A “code cell” in a Notebook is a +cell that contains software; a “text cell” is one that contains ordinary +prose written for human beings.

+
+
+
+

The Notebook has Command and Edit modes.

+
  • If you press Esc and Return alternately, the +outer border of your code cell will change from gray to blue.
  • +
  • These are the Command (gray) and +Edit (blue) modes of your notebook.
  • +
  • Command mode allows you to edit notebook-level features, and Edit +mode changes the content of cells.
  • +
  • When in Command mode (esc/gray), +
    • The b key will make a new cell below the currently +selected cell.
    • +
    • The a key will make one above.
    • +
    • The x key will delete the current cell.
    • +
    • The z key will undo your last cell operation (which could +be a deletion, creation, etc).
    • +
  • +
  • All actions can be done using the menus, but there are lots of +keyboard shortcuts to speed things up.
  • +
+
+ +
+
+

Command Vs. Edit

+
+

In the Jupyter notebook page are you currently in Command or Edit +mode?
+Switch between the modes. Use the shortcuts to generate a new cell. Use +the shortcuts to delete a cell. Use the shortcuts to undo the last cell +operation you performed.

+
+
+
+
+
+ +
+
+

Command mode has a grey border and Edit mode has a blue border. Use +Esc and Return to switch between modes. You need +to be in Command mode (Press Esc if your cell is blue). Type +b or a. You need to be in Command mode (Press +Esc if your cell is blue). Type x. You need to be +in Command mode (Press Esc if your cell is blue). Type +z.

+
+
+
+
+
+

Use the keyboard and mouse to select and edit cells.

+
  • Pressing the Return key turns the border blue and engages +Edit mode, which allows you to type within the cell.
  • +
  • Because we want to be able to write many lines of code in a single +cell, pressing the Return key when in Edit mode (blue) moves +the cursor to the next line in the cell just like in a text editor.
  • +
  • We need some other way to tell the Notebook we want to run what’s in +the cell.
  • +
  • Pressing Shift+Return together will execute +the contents of the cell.
  • +
  • Notice that the Return and Shift keys on the +right of the keyboard are right next to each other.
  • +
+
+

The Notebook will turn Markdown into pretty-printed +documentation.

+
  • Notebooks can also render Markdown. +
    • A simple plain-text format for writing lists, links, and other +things that might go into a web page.
    • +
    • Equivalently, a subset of HTML that looks like what you’d send in an +old-fashioned email.
    • +
  • +
  • Turn the current cell into a Markdown cell by entering the Command +mode (Esc/gray) and press the M key.
  • +
  • +In [ ]: will disappear to show it is no longer a code +cell and you will be able to write in Markdown.
  • +
  • Turn the current cell into a Code cell by entering the Command mode +(Esc/gray) and press the y key.
  • +
+
+

Markdown does most of what HTML does.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Showing some markdown syntax and its rendered output.
Markdown codeRendered output
*   Use asterisks
+*   to create
+*   bullet lists.
+

+

+
  • Use asterisks
  • +
  • to create
  • +
  • bullet lists.
  • +
1.   Use numbers
+1.   to create
+1.   bullet lists.
+

+

+
  1. Use numbers
  2. +
  3. to create
  4. +
  5. numbered lists.
  6. +
*  You can use indents
+  *  To create sublists
+  *  of the same type
+*  Or sublists
+  1. Of different
+  1. types
+

+

+
  • You can use indents +
    • To create sublists
    • +
    • of the same type
    • +
  • +
  • Or sublists +
    1. Of different
    2. +
    3. types
    4. +
  • +
# A Level-1 Heading
+

+

+

A Level-1 Heading

+
## A Level-2 Heading (etc.)
+

+

+

A Level-2 Heading (etc.)

+
Line breaks
+don't matter.
+
+But blank lines
+create new paragraphs.
+

+

+

Line breaks don’t matter.

+

But blank lines create new paragraphs.

+
[Links](http://software-carpentry.org)
+are created with `[...](...)`.
+Or use [named links][data-carp].
+
+[data-carp]: http://datacarpentry.org
+

+

+

Links are created with +[...](...). Or use named links.

+
+
+ +
+
+

Creating Lists in Markdown

+
+

Create a nested list in a Markdown cell in a notebook that looks like +this:

+
  1. Get funding.
  2. +
  3. Do work.
  4. +
  • Design experiment.
  • +
  • Collect data.
  • +
  • Analyze.
  • +
  1. Write up.
  2. +
  3. Publish.
  4. +
+
+
+
+
+ +
+
+

This challenge integrates both the numbered list and bullet list. +Note that the bullet list is indented 2 spaces so that it is inline with +the items of the numbered list.

+
1.  Get funding.
+2.  Do work.
+    *   Design experiment.
+    *   Collect data.
+    *   Analyze.
+3.  Write up.
+4.  Publish.
+
+
+
+
+
+
+ +
+
+

More Math

+
+

What is displayed when a Python cell in a notebook that contains +several calculations is executed? For example, what happens when this +cell is executed?

+
+

PYTHON +

+
7 * 3
+2 + 1
+
+
+
+
+
+
+ +
+
+

Python returns the output of the last calculation.

+
+

PYTHON +

+
3
+
+
+
+
+
+
+
+ +
+
+

Change an Existing Cell from Code to Markdown

+
+

What happens if you write some Python in a code cell and then you +switch it to a Markdown cell? For example, put the following in a code +cell:

+
+

PYTHON +

+
x = 6 * 7 + 12
+print(x)
+
+

And then run it with Shift+Return to be sure +that it works as a code cell. Now go back to the cell and use +Esc then m to switch the cell to Markdown and +“run” it with Shift+Return. What happened and how +might this be useful?

+
+
+
+
+
+ +
+
+

The Python code gets treated like Markdown text. The lines appear as +if they are part of one contiguous paragraph. This could be useful to +temporarily turn on and off cells in notebooks that get used for +multiple purposes.

+
+

PYTHON +

+
x = 6 * 7 + 12 print(x)
+
+
+
+
+
+
+
+ +
+
+

Equations

+
+

Standard Markdown (such as we’re using for these notes) won’t render +equations, but the Notebook will. Create a new Markdown cell and enter +the following:

+
$\sum_{i=1}^{N} 2^{-i} \approx 1$
+

(It’s probably easier to copy and paste.) What does it display? What +do you think the underscore, _, circumflex, ^, +and dollar sign, $, do?

+
+
+
+
+
+ +
+
+

The notebook shows the equation as it would be rendered from LaTeX +equation syntax. The dollar sign, $, is used to tell +Markdown that the text in between is a LaTeX equation. If you’re not +familiar with LaTeX, underscore, _, is used for subscripts +and circumflex, ^, is used for superscripts. A pair of +curly braces, { and }, is used to group text +together so that the statement i=1 becomes the subscript +and N becomes the superscript. Similarly, -i +is in curly braces to make the whole statement the superscript for +2. \sum and \approx are LaTeX +commands for “sum over” and “approximate” symbols.

+
+
+
+
+
+

Closing JupyterLab

+
  • From the Menu Bar select the “File” menu and then choose “Shut Down” +at the bottom of the dropdown menu. You will be prompted to confirm that +you wish to shutdown the JupyterLab server (don’t forget to save your +work!). Click “Shut Down” to shutdown the JupyterLab server.
  • +
  • To restart the JupyterLab server you will need to re-run the +following command from a shell.
  • +
$ jupyter lab
+
+
+ +
+
+

Closing JupyterLab

+
+

Practice closing and restarting the JupyterLab server.

+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Python scripts are plain text files.
  • +
  • Use the Jupyter Notebook for editing and running Python.
  • +
  • The Notebook has Command and Edit modes.
  • +
  • Use the keyboard and mouse to select and edit cells.
  • +
  • The Notebook will turn Markdown into pretty-printed +documentation.
  • +
  • Markdown does most of what HTML does.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/02-variables.html b/instructor/02-variables.html new file mode 100644 index 000000000..4df30a26d --- /dev/null +++ b/instructor/02-variables.html @@ -0,0 +1,1082 @@ + +Plotting and Programming in Python: Variables and Assignment +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Variables and Assignment

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 20 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I store data in programs?
  • +
+
+
+
+
+
+

Objectives

+
  • Write programs that assign scalar values to variables and perform +calculations with those values.
  • +
  • Correctly trace value changes in programs that use scalar +assignment.
  • +
+
+
+
+
+

Use variables to store values.

+
  • Variables are names for values.

  • +
  • +

    Variable names

    +
    • can only contain letters, digits, and underscore +_ (typically used to separate words in long variable +names)
    • +
    • cannot start with a digit
    • +
    • are case sensitive (age, Age and AGE are three +different variables)
    • +
  • +
  • The name should also be meaningful so you or another programmer +know what it is

  • +
  • Variable names that start with underscores like +__alistairs_real_age have a special meaning so we won’t do +that until we understand the convention.

  • +
  • In Python the = symbol assigns the value on the +right to the name on the left.

  • +
  • The variable is created when a value is assigned to it.

  • +
  • +

    Here, Python assigns an age to a variable age and a +name in quotes to a variable first_name.

    +
    +

    PYTHON +

    +
    age = 42
    +first_name = 'Ahmed'
    +
    +
  • +

Use print to display values.

+
  • Python has a built-in function called print that prints +things as text.
  • +
  • Call the function (i.e., tell Python to run it) by using its +name.
  • +
  • Provide values to the function (i.e., the things to print) in +parentheses.
  • +
  • To add a string to the printout, wrap the string in single or double +quotes.
  • +
  • The values passed to the function are called +arguments +
  • +
+

PYTHON +

+
print(first_name, 'is', age, 'years old')
+
+
+

OUTPUT +

+
Ahmed is 42 years old
+
+
  • +print automatically puts a single space between items +to separate them.
  • +
  • And wraps around to a new line at the end.
  • +

Variables must be created before they are used.

+
  • If a variable doesn’t exist yet, or if the name has been +mis-spelled, Python reports an error. (Unlike some languages, which +“guess” a default value.)
  • +
+

PYTHON +

+
print(last_name)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-1-c1fbb4e96102> in <module>()
+----> 1 print(last_name)
+
+NameError: name 'last_name' is not defined
+
+
  • The last line of an error message is usually the most +informative.
  • +
  • We will look at error messages in detail later.
  • +
+
+ +
+
+

Variables Persist Between Cells

+
+

Be aware that it is the order of execution of cells that is +important in a Jupyter notebook, not the order in which they appear. +Python will remember all the code that was run previously, +including any variables you have defined, irrespective of the order in +the notebook. Therefore if you define variables lower down the notebook +and then (re)run cells further up, those defined further down will still +be present. As an example, create two cells with the following content, +in this order:

+
+

PYTHON +

+
print(myval)
+
+
+

PYTHON +

+
myval = 1
+
+

If you execute this in order, the first cell will give an error. +However, if you run the first cell after the second cell it +will print out 1. To prevent confusion, it can be helpful +to use the Kernel -> Restart & Run All +option which clears the interpreter and runs everything from a clean +slate going top to bottom.

+
+
+
+

Variables can be used in calculations.

+
  • We can use variables in calculations just as if they were values. +
    • Remember, we assigned the value 42 to age +a few lines ago.
    • +
  • +
+

PYTHON +

+
age = age + 3
+print('Age in three years:', age)
+
+
+

OUTPUT +

+
Age in three years: 45
+
+

Use an index to get a single character from a string.

+
  • The characters (individual letters, numbers, and so on) in a string +are ordered. For example, the string 'AB' is not the same +as 'BA'. Because of this ordering, we can treat the string +as a list of characters.
  • +
  • Each position in the string (first, second, etc.) is given a number. +This number is called an index or sometimes a +subscript.
  • +
  • Indices are numbered from 0.
  • +
  • Use the position’s index in square brackets to get the character at +that position.
  • +
A line of Python code, print(atom_name[0]), demonstrates that using the zero index will output just the initial letter, in this case ‘h’ for helium.
A line of Python code, print(atom_name[0]), +demonstrates that using the zero index will output just the initial +letter, in this case ‘h’ for helium.
+
+

PYTHON +

+
atom_name = 'helium'
+print(atom_name[0])
+
+
+

OUTPUT +

+
h
+
+

Use a slice to get a substring.

+
  • A part of a string is called a substring. A +substring can be as short as a single character.
  • +
  • An item in a list is called an element. Whenever we treat a string +as if it were a list, the string’s elements are its individual +characters.
  • +
  • A slice is a part of a string (or, more generally, a part of any +list-like thing).
  • +
  • We take a slice with the notation [start:stop], where +start is the integer index of the first element we want and +stop is the integer index of the element just +after the last element we want.
  • +
  • The difference between stop and start is +the slice’s length.
  • +
  • Taking a slice does not change the contents of the original string. +Instead, taking a slice returns a copy of part of the original +string.
  • +
+

PYTHON +

+
atom_name = 'sodium'
+print(atom_name[0:3])
+
+
+

OUTPUT +

+
sod
+
+

Use the built-in function len to find the length of a +string.

+
+

PYTHON +

+
print(len('helium'))
+
+
+

OUTPUT +

+
6
+
+
  • Nested functions are evaluated from the inside out, like in +mathematics.
  • +

Python is case-sensitive.

+
  • Python thinks that upper- and lower-case letters are different, so +Name and name are different variables.
  • +
  • There are conventions for using upper-case letters at the start of +variable names so we will use lower-case letters for now.
  • +

Use meaningful variable names.

+
  • Python doesn’t care what you call variables as long as they obey the +rules (alphanumeric characters and the underscore).
  • +
+

PYTHON +

+
flabadab = 42
+ewr_422_yY = 'Ahmed'
+print(ewr_422_yY, 'is', flabadab, 'years old')
+
+
  • Use meaningful variable names to help other people understand what +the program does.
  • +
  • The most important “other person” is your future self.
  • +
+
+ +
+
+

Swapping Values

+
+

Fill the table showing the values of the variables in this program +after each statement is executed.

+
+

PYTHON +

+
# Command  # Value of x   # Value of y   # Value of swap #
+x = 1.0    #              #              #               #
+y = 3.0    #              #              #               #
+swap = x   #              #              #               #
+x = y      #              #              #               #
+y = swap   #              #              #               #
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
# Command  # Value of x   # Value of y   # Value of swap #
+x = 1.0    # 1.0          # not defined  # not defined   #
+y = 3.0    # 1.0          # 3.0          # not defined   #
+swap = x   # 1.0          # 3.0          # 1.0           #
+x = y      # 3.0          # 3.0          # 1.0           #
+y = swap   # 3.0          # 1.0          # 1.0           #
+
+

These three lines exchange the values in x and +y using the swap variable for temporary +storage. This is a fairly common programming idiom.

+
+
+
+
+
+
+ +
+
+

Predicting Values

+
+

What is the final value of position in the program +below? (Try to predict the value without running the program, then check +your prediction.)

+
+

PYTHON +

+
initial = 'left'
+position = initial
+initial = 'right'
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(position)
+
+
+

OUTPUT +

+
left
+
+

The initial variable is assigned the value +'left'. In the second line, the position +variable also receives the string value 'left'. In third +line, the initial variable is given the value +'right', but the position variable retains its +string value of 'left'.

+
+
+
+
+
+
+ +
+
+

Challenge

+
+

If you assign a = 123, what happens if you try to get +the second digit of a via a[1]?

+
+
+
+
+
+ +
+
+

Numbers are not strings or sequences and Python will raise an error +if you try to perform an index operation on a number. In the next lesson on types and type +conversion we will learn more about types and how to convert between +different types. If you want the Nth digit of a number you can convert +it into a string using the str built-in function and then +perform an index operation on that string.

+
+

PYTHON +

+
a = 123
+print(a[1])
+
+
+

ERROR +

+
TypeError: 'int' object is not subscriptable
+
+
+

PYTHON +

+
a = str(123)
+print(a[1])
+
+
+

OUTPUT +

+
2
+
+
+
+
+
+
+
+ +
+
+

Choosing a Name

+
+

Which is a better variable name, m, min, or +minutes? Why? Hint: think about which code you would rather +inherit from someone who is leaving the lab:

+
  1. ts = m * 60 + s
  2. +
  3. tot_sec = min * 60 + sec
  4. +
  5. total_seconds = minutes * 60 + seconds
  6. +
+
+
+
+
+ +
+
+

minutes is better because min might mean +something like “minimum” (and actually is an existing built-in function +in Python that we will cover later).

+
+
+
+
+
+
+ +
+
+

Slicing practice

+
+

What does the following program print?

+
+

PYTHON +

+
atom_name = 'carbon'
+print('atom_name[1:3] is:', atom_name[1:3])
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
atom_name[1:3] is: ar
+
+
+
+
+
+
+
+ +
+
+

Slicing concepts

+
+

Given the following string:

+
+

PYTHON +

+
species_name = "Acacia buxifolia"
+
+

What would these expressions return?

+
  1. species_name[2:8]
  2. +
  3. +species_name[11:] (without a value after the +colon)
  4. +
  5. +species_name[:4] (without a value before the +colon)
  6. +
  7. +species_name[:] (just a colon)
  8. +
  9. species_name[11:-3]
  10. +
  11. species_name[-5:-3]
  12. +
  13. What happens when you choose a stop value which is out +of range? (i.e., try species_name[0:20] or +species_name[:103])
  14. +
+
+
+
+
+ +
+
+
  1. +species_name[2:8] returns the substring +'acia b' +
  2. +
  3. +species_name[11:] returns the substring +'folia', from position 11 until the end
  4. +
  5. +species_name[:4] returns the substring +'Acac', from the start up to but not including position +4
  6. +
  7. +species_name[:] returns the entire string +'Acacia buxifolia' +
  8. +
  9. +species_name[11:-3] returns the substring +'fo', from the 11th position to the third last +position
  10. +
  11. +species_name[-5:-3] also returns the substring +'fo', from the fifth last position to the third last
  12. +
  13. If a part of the slice is out of range, the operation does not fail. +species_name[0:20] gives the same result as +species_name[0:], and species_name[:103] gives +the same result as species_name[:] +
  14. +
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use variables to store values.
  • +
  • Use print to display values.
  • +
  • Variables persist between cells.
  • +
  • Variables must be created before they are used.
  • +
  • Variables can be used in calculations.
  • +
  • Use an index to get a single character from a string.
  • +
  • Use a slice to get a substring.
  • +
  • Use the built-in function len to find the length of a +string.
  • +
  • Python is case-sensitive.
  • +
  • Use meaningful variable names.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/03-types-conversion.html b/instructor/03-types-conversion.html new file mode 100644 index 000000000..f0327de72 --- /dev/null +++ b/instructor/03-types-conversion.html @@ -0,0 +1,1160 @@ + +Plotting and Programming in Python: Data Types and Type Conversion +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Data Types and Type Conversion

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 20 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • What kinds of data do programs store?
  • +
  • How can I convert one type to another?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain key differences between integers and floating point +numbers.
  • +
  • Explain key differences between numbers and character strings.
  • +
  • Use built-in functions to convert between integers, floating point +numbers, and strings.
  • +
+
+
+
+
+

Every value has a type.

+
  • Every value in a program has a specific type.
  • +
  • Integer (int): represents positive or negative whole +numbers like 3 or -512.
  • +
  • Floating point number (float): represents real numbers +like 3.14159 or -2.5.
  • +
  • Character string (usually called “string”, str): text. +
    • Written in either single quotes or double quotes (as long as they +match).
    • +
    • The quote marks aren’t printed when the string is displayed.
    • +
  • +

Use the built-in function type to find the type of a +value.

+
  • Use the built-in function type to find out what type a +value has.
  • +
  • Works on variables as well. +
    • But remember: the value has the type — the +variable is just a label.
    • +
  • +
+

PYTHON +

+
print(type(52))
+
+
+

OUTPUT +

+
<class 'int'>
+
+
+

PYTHON +

+
fitness = 'average'
+print(type(fitness))
+
+
+

OUTPUT +

+
<class 'str'>
+
+

Types control what operations (or methods) can be performed on a +given value.

+
  • A value’s type determines what the program can do to it.
  • +
+

PYTHON +

+
print(5 - 3)
+
+
+

OUTPUT +

+
2
+
+
+

PYTHON +

+
print('hello' - 'h')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-2-67f5626a1e07> in <module>()
+----> 1 print('hello' - 'h')
+
+TypeError: unsupported operand type(s) for -: 'str' and 'str'
+
+

You can use the “+” and “*” operators on strings.

+
  • “Adding” character strings concatenates them.
  • +
+

PYTHON +

+
full_name = 'Ahmed' + ' ' + 'Walsh'
+print(full_name)
+
+
+

OUTPUT +

+
Ahmed Walsh
+
+
  • Multiplying a character string by an integer N creates a +new string that consists of that character string repeated N +times. +
    • Since multiplication is repeated addition.
    • +
  • +
+

PYTHON +

+
separator = '=' * 10
+print(separator)
+
+
+

OUTPUT +

+
==========
+
+

Strings have a length (but numbers don’t).

+
  • The built-in function len counts the number of +characters in a string.
  • +
+

PYTHON +

+
print(len(full_name))
+
+
+

OUTPUT +

+
11
+
+
  • But numbers don’t have a length (not even zero).
  • +
+

PYTHON +

+
print(len(52))
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-3-f769e8e8097d> in <module>()
+----> 1 print(len(52))
+
+TypeError: object of type 'int' has no len()
+
+

Must convert numbers to strings or vice versa when operating on +them.

+
  • Cannot add numbers and strings.
  • +
+

PYTHON +

+
print(1 + '2')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-4-fe4f54a023c6> in <module>()
+----> 1 print(1 + '2')
+
+TypeError: unsupported operand type(s) for +: 'int' and 'str'
+
+
  • Not allowed because it’s ambiguous: should 1 + '2' be +3 or '12'?
  • +
  • Some types can be converted to other types by using the type name as +a function.
  • +
+

PYTHON +

+
print(1 + int('2'))
+print(str(1) + '2')
+
+
+

OUTPUT +

+
3
+12
+
+

Can mix integers and floats freely in operations.

+
  • Integers and floating-point numbers can be mixed in arithmetic. +
    • Python 3 automatically converts integers to floats as needed.
    • +
  • +
+

PYTHON +

+
print('half is', 1 / 2.0)
+print('three squared is', 3.0 ** 2)
+
+
+

OUTPUT +

+
half is 0.5
+three squared is 9.0
+
+

Variables only change value when something is assigned to them.

+
  • If we make one cell in a spreadsheet depend on another, and update +the latter, the former updates automatically.
  • +
  • This does not happen in programming languages.
  • +
+

PYTHON +

+
variable_one = 1
+variable_two = 5 * variable_one
+variable_one = 2
+print('first is', variable_one, 'and second is', variable_two)
+
+
+

OUTPUT +

+
first is 2 and second is 5
+
+
  • The computer reads the value of variable_one when doing +the multiplication, creates a new value, and assigns it to +variable_two.
  • +
  • Afterwards, the value of variable_two is set to the new +value and not dependent on variable_one so its +value does not automatically change when variable_one +changes.
  • +
+
+ +
+
+

Fractions

+
+

What type of value is 3.4? How can you find out?

+
+
+
+
+
+ +
+
+

It is a floating-point number (often abbreviated “float”). It is +possible to find out by using the built-in function +type().

+
+

PYTHON +

+
print(type(3.4))
+
+
+

OUTPUT +

+
<class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Automatic Type Conversion

+
+

What type of value is 3.25 + 4?

+
+
+
+
+
+ +
+
+

It is a float: integers are automatically converted to floats as +necessary.

+
+

PYTHON +

+
result = 3.25 + 4
+print(result, 'is', type(result))
+
+
+

OUTPUT +

+
7.25 is <class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Choose a Type

+
+

What type of value (integer, floating point number, or character +string) would you use to represent each of the following? Try to come up +with more than one good answer for each problem. For example, in # 1, +when would counting days with a floating point variable make more sense +than using an integer?

+
  1. Number of days since the start of the year.
  2. +
  3. Time elapsed from the start of the year until now in days.
  4. +
  5. Serial number of a piece of lab equipment.
  6. +
  7. A lab specimen’s age
  8. +
  9. Current population of a city.
  10. +
  11. Average population of a city over time.
  12. +
+
+
+
+
+ +
+
+

The answers to the questions are:

+
  1. Integer, since the number of days would lie between 1 and 365.
  2. +
  3. Floating point, since fractional days are required
  4. +
  5. Character string if serial number contains letters and numbers, +otherwise integer if the serial number consists only of numerals
  6. +
  7. This will vary! How do you define a specimen’s age? whole days since +collection (integer)? date and time (string)?
  8. +
  9. Choose floating point to represent population as large aggregates +(eg millions), or integer to represent population in units of +individuals.
  10. +
  11. Floating point number, since an average is likely to have a +fractional part.
  12. +
+
+
+
+
+
+ +
+
+

Division Types

+
+

In Python 3, the // operator performs integer +(whole-number) floor division, the / operator performs +floating-point division, and the % (or modulo) +operator calculates and returns the remainder from integer division:

+
+

PYTHON +

+
print('5 // 3:', 5 // 3)
+print('5 / 3:', 5 / 3)
+print('5 % 3:', 5 % 3)
+
+
+

OUTPUT +

+
5 // 3: 1
+5 / 3: 1.6666666666666667
+5 % 3: 2
+
+

If num_subjects is the number of subjects taking part in +a study, and num_per_survey is the number that can take +part in a single survey, write an expression that calculates the number +of surveys needed to reach everyone once.

+
+
+
+
+
+ +
+
+

We want the minimum number of surveys that reaches everyone once, +which is the rounded up value of +num_subjects/ num_per_survey. This is equivalent to +performing a floor division with // and adding 1. Before +the division we need to subtract 1 from the number of subjects to deal +with the case where num_subjects is evenly divisible by +num_per_survey.

+
+

PYTHON +

+
num_subjects = 600
+num_per_survey = 42
+num_surveys = (num_subjects - 1) // num_per_survey + 1
+
+print(num_subjects, 'subjects,', num_per_survey, 'per survey:', num_surveys)
+
+
+

OUTPUT +

+
600 subjects, 42 per survey: 15
+
+
+
+
+
+
+
+ +
+
+

Strings to Numbers

+
+

Where reasonable, float() will convert a string to a +floating point number, and int() will convert a floating +point number to an integer:

+
+

PYTHON +

+
print("string to float:", float("3.4"))
+print("float to int:", int(3.4))
+
+
+

OUTPUT +

+
string to float: 3.4
+float to int: 3
+
+

If the conversion doesn’t make sense, however, an error message will +occur.

+
+

PYTHON +

+
print("string to float:", float("Hello world!"))
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-5-df3b790bf0a2> in <module>
+----> 1 print("string to float:", float("Hello world!"))
+
+ValueError: could not convert string to float: 'Hello world!'
+
+

Given this information, what do you expect the following program to +do?

+

What does it actually do?

+

Why do you think it does that?

+
+

PYTHON +

+
print("fractional string to int:", int("3.4"))
+
+
+
+
+
+
+ +
+
+

What do you expect this program to do? It would not be so +unreasonable to expect the Python 3 int command to convert +the string “3.4” to 3.4 and an additional type conversion to 3. After +all, Python 3 performs a lot of other magic - isn’t that part of its +charm?

+
+

PYTHON +

+
int("3.4")
+
+
+

OUTPUT +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-2-ec6729dfccdc> in <module>
+----> 1 int("3.4")
+ValueError: invalid literal for int() with base 10: '3.4'
+
+

However, Python 3 throws an error. Why? To be consistent, possibly. +If you ask Python to perform two consecutive typecasts, you must convert +it explicitly in code.

+
+

PYTHON +

+
int(float("3.4"))
+
+
+

OUTPUT +

+
3
+
+
+
+
+
+
+
+ +
+
+

Arithmetic with Different Types

+
+

Which of the following will return the floating point number +2.0? Note: there may be more than one right answer.

+
+

PYTHON +

+
first = 1.0
+second = "1"
+third = "1.1"
+
+
  1. first + float(second)
  2. +
  3. float(second) + float(third)
  4. +
  5. first + int(third)
  6. +
  7. first + int(float(third))
  8. +
  9. int(first) + int(float(third))
  10. +
  11. 2.0 * second
  12. +
+
+
+
+
+ +
+
+

Answer: 1 and 4

+
+
+
+
+
+
+ +
+
+

Complex Numbers

+
+

Python provides complex numbers, which are written as +1.0+2.0j. If val is a complex number, its real +and imaginary parts can be accessed using dot notation as +val.real and val.imag.

+
+

PYTHON +

+
a_complex_number = 6 + 2j
+print(a_complex_number.real)
+print(a_complex_number.imag)
+
+
+

OUTPUT +

+
6.0
+2.0
+
+
  1. Why do you think Python uses j instead of +i for the imaginary part?
  2. +
  3. What do you expect 1 + 2j + 3 to produce?
  4. +
  5. What do you expect 4j to be? What about +4 j or 4 + j?
  6. +
+
+
+
+
+ +
+
+
  1. Standard mathematics treatments typically use i to +denote an imaginary number. However, from media reports it was an early +convention established from electrical engineering that now presents a +technically expensive area to change. Stack +Overflow provides additional explanation and discussion. +
  2. +
  3. (4+2j)
  4. +
  5. +4j and Syntax Error: invalid syntax. In +the latter cases, j is considered a variable and the +statement depends on if j is defined and if so, its +assigned value.
  6. +
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Every value has a type.
  • +
  • Use the built-in function type to find the type of a +value.
  • +
  • Types control what operations can be done on values.
  • +
  • Strings can be added and multiplied.
  • +
  • Strings have a length (but numbers don’t).
  • +
  • Must convert numbers to strings or vice versa when operating on +them.
  • +
  • Can mix integers and floats freely in operations.
  • +
  • Variables only change value when something is assigned to them.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/04-built-in.html b/instructor/04-built-in.html new file mode 100644 index 000000000..74c0dea2c --- /dev/null +++ b/instructor/04-built-in.html @@ -0,0 +1,1064 @@ + +Plotting and Programming in Python: Built-in Functions and Help +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Built-in Functions and Help

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 25 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I use built-in functions?
  • +
  • How can I find out what they do?
  • +
  • What kind of errors can occur in programs?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain the purpose of functions.
  • +
  • Correctly call built-in Python functions.
  • +
  • Correctly nest calls to built-in functions.
  • +
  • Use help to display documentation for built-in functions.
  • +
  • Correctly describe situations in which SyntaxError and NameError +occur.
  • +
+
+
+
+
+

Use comments to add documentation to programs.

+
+

PYTHON +

+
# This sentence isn't executed by Python.
+adjustment = 0.5   # Neither is this - anything after '#' is ignored.
+
+

A function may take zero or more arguments.

+
  • We have seen some functions already — now let’s take a closer +look.
  • +
  • An argument is a value passed into a function.
  • +
  • +len takes exactly one.
  • +
  • +int, str, and float create a +new value from an existing one.
  • +
  • +print takes zero or more.
  • +
  • +print with no arguments prints a blank line. +
    • Must always use parentheses, even if they’re empty, so that Python +knows a function is being called.
    • +
  • +
+

PYTHON +

+
print('before')
+print()
+print('after')
+
+
+

OUTPUT +

+
before
+
+after
+
+

Every function returns something.

+
  • Every function call produces some result.
  • +
  • If the function doesn’t have a useful result to return, it usually +returns the special value None. None is a +Python object that stands in anytime there is no value.
  • +
+

PYTHON +

+
result = print('example')
+print('result of print is', result)
+
+
+

OUTPUT +

+
example
+result of print is None
+
+

Commonly-used built-in functions include max, +min, and round.

+
  • Use max to find the largest value of one or more +values.
  • +
  • Use min to find the smallest.
  • +
  • Both work on character strings as well as numbers. +
    • “Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.
    • +
  • +
+

PYTHON +

+
print(max(1, 2, 3))
+print(min('a', 'A', '0'))
+
+
+

OUTPUT +

+
3
+0
+
+

Functions may only work for certain (combinations of) +arguments.

+
  • +max and min must be given at least one +argument. +
    • “Largest of the empty set” is a meaningless question.
    • +
  • +
  • And they must be given things that can meaningfully be +compared.
  • +
+

PYTHON +

+
print(max(1, 'a'))
+
+
+

ERROR +

+
TypeError                                 Traceback (most recent call last)
+<ipython-input-52-3f049acf3762> in <module>
+----> 1 print(max(1, 'a'))
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+

Functions may have default values for some arguments.

+
  • +round will round off a floating-point number.
  • +
  • By default, rounds to zero decimal places.
  • +
+

PYTHON +

+
round(3.712)
+
+
+

OUTPUT +

+
4
+
+
  • We can specify the number of decimal places we want.
  • +
+

PYTHON +

+
round(3.712, 1)
+
+
+

OUTPUT +

+
3.7
+
+

Functions attached to objects are called methods

+
  • Functions take another form that will be common in the pandas +episodes.
  • +
  • Methods have parentheses like functions, but come after the +variable.
  • +
  • Some methods are used for internal Python operations, and are marked +with double underlines.
  • +
+

PYTHON +

+
my_string = 'Hello world!'  # creation of a string object 
+
+print(len(my_string))       # the len function takes a string as an argument and returns the length of the string
+
+print(my_string.swapcase()) # calling the swapcase method on the my_string object
+
+print(my_string.__len__())  # calling the internal __len__ method on the my_string object, used by len(my_string)
+
+
+

OUTPUT +

+
12
+hELLO WORLD!
+12
+
+
  • You might even see them chained together. They operate left to +right.
  • +
+

PYTHON +

+
print(my_string.isupper())          # Not all the letters are uppercase
+print(my_string.upper())            # This capitalizes all the letters
+
+print(my_string.upper().isupper())  # Now all the letters are uppercase
+
+
+

OUTPUT +

+
False
+HELLO WORLD
+True
+
+

Use the built-in function help to get help for a +function.

+
  • Every built-in function has online documentation.
  • +
+

PYTHON +

+
help(round)
+
+
+

OUTPUT +

+
Help on built-in function round in module builtins:
+
+round(number, ndigits=None)
+    Round a number to a given precision in decimal digits.
+
+    The return value is an integer if ndigits is omitted or None.  Otherwise
+    the return value has the same type as the number.  ndigits may be negative.
+
+

The Jupyter Notebook has two ways to get help.

+
  • Option 1: Place the cursor near where the function is invoked in a +cell (i.e., the function name or its parameters), +
    • Hold down Shift, and press Tab.
    • +
    • Do this several times to expand the information returned.
    • +
  • +
  • Option 2: Type the function name in a cell with a question mark +after it. Then run the cell.
  • +

Python reports a syntax error when it can’t understand the source of +a program.

+
  • Won’t even try to run the program if it can’t be parsed.
  • +
+

PYTHON +

+
# Forgot to close the quote marks around the string.
+name = 'Feng
+
+
+

ERROR +

+
  File "<ipython-input-56-f42768451d55>", line 2
+    name = 'Feng
+                ^
+SyntaxError: EOL while scanning string literal
+
+
+

PYTHON +

+
# An extra '=' in the assignment.
+age = = 52
+
+
+

ERROR +

+
  File "<ipython-input-57-ccc3df3cf902>", line 2
+    age = = 52
+          ^
+SyntaxError: invalid syntax
+
+
  • Look more closely at the error message:
  • +
+

PYTHON +

+
print("hello world"
+
+
+

ERROR +

+
  File "<ipython-input-6-d1cc229bf815>", line 1
+    print ("hello world"
+                        ^
+SyntaxError: unexpected EOF while parsing
+
+
  • The message indicates a problem on first line of the input (“line +1”). +
    • In this case the “ipython-input” section of the file name tells us +that we are working with input into IPython, the Python interpreter used +by the Jupyter Notebook.
    • +
  • +
  • The -6- part of the filename indicates that the error +occurred in cell 6 of our Notebook.
  • +
  • Next is the problematic line of code, indicating the problem with a +^ pointer.
  • +

Python reports a runtime error when something goes wrong while a +program is executing.

+
+

PYTHON +

+
age = 53
+remaining = 100 - aege # mis-spelled 'age'
+
+
+

ERROR +

+
NameError                                 Traceback (most recent call last)
+<ipython-input-59-1214fb6c55fc> in <module>
+      1 age = 53
+----> 2 remaining = 100 - aege # mis-spelled 'age'
+
+NameError: name 'aege' is not defined
+
+
  • Fix syntax errors by reading the source and runtime errors by +tracing execution.
  • +
+
+ +
+
+

What Happens When

+
+
  1. Explain in simple terms the order of operations in the following +program: when does the addition happen, when does the subtraction +happen, when is each function called, etc.
  2. +
  3. What is the final value of radiance?
  4. +
+

PYTHON +

+
radiance = 1.0
+radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))
+
+
+
+
+
+
+ +
+
+
  1. Order of operations:
  2. +
  3. 1.1 * radiance = 1.1
  4. +
  5. 1.1 - 0.5 = 0.6
  6. +
  7. min(radiance, 0.6) = 0.6
  8. +
  9. 2.0 + 0.6 = 2.6
  10. +
  11. max(2.1, 2.6) = 2.6
  12. +
  13. At the end, radiance = 2.6 +
  14. +
+
+
+
+
+
+ +
+
+

Spot the Difference

+
+
  1. Predict what each of the print statements in the +program below will print.
  2. +
  3. Does max(len(rich), poor) run or produce an error +message? If it runs, does its result make any sense?
  4. +
+

PYTHON +

+
easy_string = "abc"
+print(max(easy_string))
+rich = "gold"
+poor = "tin"
+print(max(rich, poor))
+print(max(len(rich), len(poor)))
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(max(easy_string))
+
+
+

OUTPUT +

+
c
+
+
+

PYTHON +

+
print(max(rich, poor))
+
+
+

OUTPUT +

+
tin
+
+
+

PYTHON +

+
print(max(len(rich), len(poor)))
+
+
+

OUTPUT +

+
4
+
+

max(len(rich), poor) throws a TypeError. This turns into +max(4, 'tin') and as we discussed earlier a string and +integer cannot meaningfully be compared.

+
+

ERROR +

+
TypeError                                 Traceback (most recent call last)
+<ipython-input-65-bc82ad05177a> in <module>
+----> 1 max(len(rich), poor)
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
+
+
+
+
+
+ +
+
+

Why Not?

+
+

Why is it that max and min do not return +None when they are called with no arguments?

+
+
+
+
+
+ +
+
+

max and min return TypeErrors in this case +because the correct number of parameters was not supplied. If it just +returned None, the error would be much harder to trace as +it would likely be stored into a variable and used later in the program, +only to likely throw a runtime error.

+
+
+
+
+
+
+ +
+
+

Last Character of a String

+
+

If Python starts counting from zero, and len returns the +number of characters in a string, what index expression will get the +last character in the string name? (Note: we will see a +simpler way to do this in a later episode.)

+
+
+
+
+
+ +
+
+

name[len(name) - 1]

+
+
+
+
+
+
+ +
+
+

Explore the Python docs!

+
+

The official Python +documentation is arguably the most complete source of information +about the language. It is available in different languages and contains +a lot of useful resources. The Built-in +Functions page contains a catalogue of all of these functions, +including the ones that we’ve covered in this lesson. Some of these are +more advanced and unnecessary at the moment, but others are very simple +and useful.

+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use comments to add documentation to programs.
  • +
  • A function may take zero or more arguments.
  • +
  • Commonly-used built-in functions include max, +min, and round.
  • +
  • Functions may only work for certain (combinations of) +arguments.
  • +
  • Functions may have default values for some arguments.
  • +
  • Use the built-in function help to get help for a +function.
  • +
  • The Jupyter Notebook has two ways to get help.
  • +
  • Every function returns something.
  • +
  • Python reports a syntax error when it can’t understand the source of +a program.
  • +
  • Python reports a runtime error when something goes wrong while a +program is executing.
  • +
  • Fix syntax errors by reading the source code, and runtime errors by +tracing the program’s execution.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/05-coffee.html b/instructor/05-coffee.html new file mode 100644 index 000000000..4b4bd2611 --- /dev/null +++ b/instructor/05-coffee.html @@ -0,0 +1,546 @@ + +Plotting and Programming in Python: Morning Coffee +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Morning Coffee

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 0 minutes

+ +
+ +
+ + +

Reflection exercise

+

Over coffee, reflect on and discuss the following:

+
  • What are the different kinds of errors Python will report?
  • +
  • Did the code always produce the results you expected? If not, +why?
  • +
  • Is there something we can do to prevent errors when we write +code?
  • +
+
+ + +
+
+ + + diff --git a/instructor/06-libraries.html b/instructor/06-libraries.html new file mode 100644 index 000000000..2a7cbc7c8 --- /dev/null +++ b/instructor/06-libraries.html @@ -0,0 +1,1112 @@ + +Plotting and Programming in Python: Libraries +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Libraries

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 20 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I use software that other people have written?
  • +
  • How can I find out what that software does?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain what software libraries are and why programmers create and +use them.
  • +
  • Write programs that import and use modules from Python’s standard +library.
  • +
  • Find and read documentation for the standard library interactively +(in the interpreter) and online.
  • +
+
+
+
+
+

Most of the power of a programming language is in its +libraries.

+
  • A library is a collection of files (called +modules) that contains functions for use by other programs. +
    • May also contain data values (e.g., numerical constants) and other +things.
    • +
    • Library’s contents are supposed to be related, but there’s no way to +enforce that.
    • +
  • +
  • The Python standard +library is an extensive suite of modules that comes with Python +itself.
  • +
  • Many additional libraries are available from PyPI (the Python Package +Index).
  • +
  • We will see later how to write new libraries.
  • +
+
+ +
+
+

Libraries and modules

+
+

A library is a collection of modules, but the terms are often used +interchangeably, especially since many libraries only consist of a +single module, so don’t worry if you mix them.

+
+
+
+

A program must import a library module before using it.

+
  • Use import to load a library module into a program’s +memory.
  • +
  • Then refer to things from the module as +module_name.thing_name. +
    • Python uses . to mean “part of”.
    • +
  • +
  • Using math, one of the modules in the standard +library:
  • +
+

PYTHON +

+
import math
+
+print('pi is', math.pi)
+print('cos(pi) is', math.cos(math.pi))
+
+
+

OUTPUT +

+
pi is 3.141592653589793
+cos(pi) is -1.0
+
+
  • Have to refer to each item with the module’s name. +
    • +math.cos(pi) won’t work: the reference to +pi doesn’t somehow “inherit” the function’s reference to +math.
    • +
  • +

Use help to learn about the contents of a library +module.

+
  • Works just like help for a function.
  • +
+

PYTHON +

+
help(math)
+
+
+

OUTPUT +

+
Help on module math:
+
+NAME
+    math
+
+MODULE REFERENCE
+    http://docs.python.org/3/library/math
+
+    The following documentation is automatically generated from the Python
+    source files.  It may be incomplete, incorrect or include features that
+    are considered implementation detail and may vary between Python
+    implementations.  When in doubt, consult the module reference at the
+    location listed above.
+
+DESCRIPTION
+    This module is always available.  It provides access to the
+    mathematical functions defined by the C standard.
+
+FUNCTIONS
+    acos(x, /)
+        Return the arc cosine (measured in radians) of x.
+⋮ ⋮ ⋮
+
+

Import specific items from a library module to shorten +programs.

+
  • Use from ... import ... to load only specific items +from a library module.
  • +
  • Then refer to them directly without library name as prefix.
  • +
+

PYTHON +

+
from math import cos, pi
+
+print('cos(pi) is', cos(pi))
+
+
+

OUTPUT +

+
cos(pi) is -1.0
+
+

Create an alias for a library module when importing it to shorten +programs.

+
  • Use import ... as ... to give a library a short +alias while importing it.
  • +
  • Then refer to items in the library using that shortened name.
  • +
+

PYTHON +

+
import math as m
+
+print('cos(pi) is', m.cos(m.pi))
+
+
+

OUTPUT +

+
cos(pi) is -1.0
+
+
  • Commonly used for libraries that are frequently used or have long +names. +
    • E.g., the matplotlib plotting library is often aliased +as mpl.
    • +
  • +
  • But can make programs harder to understand, since readers must learn +your program’s aliases.
  • +
+
+ +
+
+

Exploring the Math Module

+
+
  1. What function from the math module can you use to +calculate a square root without using sqrt?
  2. +
  3. Since the library contains this function, why does sqrt +exist?
  4. +
+
+
+
+
+ +
+
+
  1. Using help(math) we see that we’ve got +pow(x,y) in addition to sqrt(x), so we could +use pow(x, 0.5) to find a square root.

  2. +
  3. The sqrt(x) function is arguably more readable than +pow(x, 0.5) when implementing equations. Readability is a +cornerstone of good programming, so it makes sense to provide a special +function for this specific common case.

  4. +

Also, the design of Python’s math library has its origin +in the C standard, which includes both sqrt(x) and +pow(x,y), so a little bit of the history of programming is +showing in Python’s function names.

+
+
+
+
+
+
+ +
+
+

Locating the Right Module

+
+

You want to select a random character from a string:

+
+

PYTHON +

+
bases = 'ACTTGCTTGAC'
+
+
  1. Which standard +library module could help you?
  2. +
  3. Which function would you select from that module? Are there +alternatives?
  4. +
  5. Try to write a program that uses the function.
  6. +
+
+
+
+
+ +
+
+

The random +module seems like it could help.

+

The string has 11 characters, each having a positional index from 0 +to 10. You could use the random.randrange +or random.randint +functions to get a random integer between 0 and 10, and then select the +bases character at that index:

+
+

PYTHON +

+
from random import randrange
+
+random_index = randrange(len(bases))
+print(bases[random_index])
+
+

or more compactly:

+
+

PYTHON +

+
from random import randrange
+
+print(bases[randrange(len(bases))])
+
+

Perhaps you found the random.sample +function? It allows for slightly less typing but might be a bit harder +to understand just by reading:

+
+

PYTHON +

+
from random import sample
+
+print(sample(bases, 1)[0])
+
+

Note that this function returns a list of values. We will learn about +lists in episode 11.

+

The simplest and shortest solution is the random.choice +function that does exactly what we want:

+
+

PYTHON +

+
from random import choice
+
+print(choice(bases))
+
+
+
+
+
+
+
+ +
+
+

Jigsaw Puzzle (Parson’s Problem) Programming Example

+
+

Rearrange the following statements so that a random DNA base is +printed and its index in the string. Not all statements may be needed. +Feel free to use/add intermediate variables.

+
+

PYTHON +

+
bases="ACTTGCTTGAC"
+import math
+import random
+___ = random.randrange(n_bases)
+___ = len(bases)
+print("random base ", bases[___], "base index", ___)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import math 
+import random
+bases = "ACTTGCTTGAC" 
+n_bases = len(bases)
+idx = random.randrange(n_bases)
+print("random base", bases[idx], "base index", idx)
+
+
+
+
+
+
+
+ +
+
+

When Is Help Available?

+
+

When a colleague of yours types help(math), Python +reports an error:

+
+

ERROR +

+
NameError: name 'math' is not defined
+
+

What has your colleague forgotten to do?

+
+
+
+
+
+ +
+
+

Importing the math module (import math)

+
+
+
+
+
+
+ +
+
+

Importing With Aliases

+
+
  1. Fill in the blanks so that the program below prints +90.0.
  2. +
  3. Rewrite the program so that it uses import +without as.
  4. +
  5. Which form do you find easier to read?
  6. +
+

PYTHON +

+
import math as m
+angle = ____.degrees(____.pi / 2)
+print(____)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import math as m
+angle = m.degrees(m.pi / 2)
+print(angle)
+
+

can be written as

+
+

PYTHON +

+
import math
+angle = math.degrees(math.pi / 2)
+print(angle)
+
+

Since you just wrote the code and are familiar with it, you might +actually find the first version easier to read. But when trying to read +a huge piece of code written by someone else, or when getting back to +your own huge piece of code after several months, non-abbreviated names +are often easier, except where there are clear abbreviation +conventions.

+
+
+
+
+
+
+ +
+
+

There Are Many Ways To Import Libraries!

+
+

Match the following print statements with the appropriate library +calls.

+

Print commands:

+
  1. print("sin(pi/2) =", sin(pi/2))
  2. +
  3. print("sin(pi/2) =", m.sin(m.pi/2))
  4. +
  5. print("sin(pi/2) =", math.sin(math.pi/2))
  6. +

Library calls:

+
  1. from math import sin, pi
  2. +
  3. import math
  4. +
  5. import math as m
  6. +
  7. from math import *
  8. +
+
+
+
+
+ +
+
+
  1. Library calls 1 and 4. In order to directly refer to +sin and pi without the library name as prefix, +you need to use the from ... import ... statement. Whereas +library call 1 specifically imports the two functions sin +and pi, library call 4 imports all functions in the +math module.
  2. +
  3. Library call 3. Here sin and pi are +referred to with a shortened library name m instead of +math. Library call 3 does exactly that using the +import ... as ... syntax - it creates an alias for +math in the form of the shortened name m.
  4. +
  5. Library call 2. Here sin and pi are +referred to with the regular library name math, so the +regular import ... call suffices.
  6. +

Note: although library call 4 works, importing all +names from a module using a wildcard import is not recommended as it makes it +unclear which names from the module are used in the code. In general it +is best to make your imports as specific as possible and to only import +what your code uses. In library call 1, the import +statement explicitly tells us that the sin function is +imported from the math module, but library call 4 does not +convey this information.

+
+
+
+
+
+
+ +
+
+

Importing Specific Items

+
+
  1. Fill in the blanks so that the program below prints +90.0.
  2. +
  3. Do you find this version easier to read than preceding ones?
  4. +
  5. Why wouldn’t programmers always use this form of +import?
  6. +
+

PYTHON +

+
____ math import ____, ____
+angle = degrees(pi / 2)
+print(angle)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
from math import degrees, pi
+angle = degrees(pi / 2)
+print(angle)
+
+

Most likely you find this version easier to read since it’s less +dense. The main reason not to use this form of import is to avoid name +clashes. For instance, you wouldn’t import degrees this way +if you also wanted to use the name degrees for a variable +or function of your own. Or if you were to also import a function named +degrees from another library.

+
+
+
+
+
+
+ +
+
+

Reading Error Messages

+
+
  1. Read the code below and try to identify what the errors are without +running it.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
+

PYTHON +

+
from math import log
+log(0)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-1-d72e1d780bab> in <module>
+      1 from math import log
+----> 2 log(0)
+
+ValueError: math domain error
+
+
  1. The logarithm of x is only defined for +x > 0, so 0 is outside the domain of the function.
  2. +
  3. You get an error of type ValueError, indicating that +the function received an inappropriate argument value. The additional +message “math domain error” makes it clearer what the problem is.
  4. +
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Most of the power of a programming language is in its +libraries.
  • +
  • A program must import a library module in order to use it.
  • +
  • Use help to learn about the contents of a library +module.
  • +
  • Import specific items from a library to shorten programs.
  • +
  • Create an alias for a library when importing it to shorten +programs.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/07-reading-tabular.html b/instructor/07-reading-tabular.html new file mode 100644 index 000000000..67cdaee8a --- /dev/null +++ b/instructor/07-reading-tabular.html @@ -0,0 +1,1083 @@ + +Plotting and Programming in Python: Reading Tabular Data into DataFrames +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Reading Tabular Data into DataFrames

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 20 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I read tabular data?
  • +
+
+
+
+
+
+

Objectives

+
  • Import the Pandas library.
  • +
  • Use Pandas to load a simple CSV data set.
  • +
  • Get some basic information about a Pandas DataFrame.
  • +
+
+
+
+
+

Use the Pandas library to do statistics on tabular data.

+
  • +Pandas is a widely-used +Python library for statistics, particularly on tabular data.
  • +
  • Borrows many features from R’s dataframes. +
    • A 2-dimensional table whose columns have names and potentially have +different data types.
    • +
  • +
  • Load Pandas with import pandas as pd. The alias +pd is commonly used to refer to the Pandas library in +code.
  • +
  • Read a Comma Separated Values (CSV) data file with +pd.read_csv. +
    • Argument is the name of the file to be read.
    • +
    • Returns a dataframe that you can assign to a variable
    • +
  • +
+

PYTHON +

+
import pandas as pd
+
+data_oceania = pd.read_csv('data/gapminder_gdp_oceania.csv')
+print(data_oceania)
+
+
+

OUTPUT +

+
       country  gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  \
+0    Australia     10039.59564     10949.64959     12217.22686
+1  New Zealand     10556.57566     12247.39532     13175.67800
+
+   gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  \
+0     14526.12465     16788.62948     18334.19751     19477.00928
+1     14463.91893     16046.03728     16233.71770     17632.41040
+
+   gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  \
+0     21888.88903     23424.76683     26997.93657     30687.75473
+1     19007.19129     18363.32494     21050.41377     23189.80135
+
+   gdpPercap_2007
+0     34435.36744
+1     25185.00911
+
+
  • The columns in a dataframe are the observed variables, and the rows +are the observations.
  • +
  • Pandas uses backslash \ to show wrapped lines when +output is too wide to fit the screen.
  • +
  • Using descriptive dataframe names helps us distinguish between +multiple dataframes so we won’t accidentally overwrite a dataframe or +read from the wrong one.
  • +
+
+ +
+
+

File Not Found

+
+

Our lessons store their data files in a data +sub-directory, which is why the path to the file is +data/gapminder_gdp_oceania.csv. If you forget to include +data/, or if you include it but your copy of the file is +somewhere else, you will get a runtime +error that ends with a line like this:

+
+

ERROR +

+
FileNotFoundError: [Errno 2] No such file or directory: 'data/gapminder_gdp_oceania.csv'
+
+
+
+
+

Use index_col to specify that a column’s values should +be used as row headings.

+
  • Row headings are numbers (0 and 1 in this case).
  • +
  • Really want to index by country.
  • +
  • Pass the name of the column to read_csv as its +index_col parameter to do this.
  • +
  • Naming the dataframe data_oceania_country tells us +which region the data includes (oceania) and how it is +indexed (country).
  • +
+

PYTHON +

+
data_oceania_country = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+print(data_oceania_country)
+
+
+

OUTPUT +

+
             gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+country
+Australia       10039.59564     10949.64959     12217.22686     14526.12465
+New Zealand     10556.57566     12247.39532     13175.67800     14463.91893
+
+             gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+country
+Australia       16788.62948     18334.19751     19477.00928     21888.88903
+New Zealand     16046.03728     16233.71770     17632.41040     19007.19129
+
+             gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+country
+Australia       23424.76683     26997.93657     30687.75473     34435.36744
+New Zealand     18363.32494     21050.41377     23189.80135     25185.00911
+
+

Use the DataFrame.info() method to find out more about +a dataframe.

+
+

PYTHON +

+
data_oceania_country.info()
+
+
+

OUTPUT +

+
<class 'pandas.core.frame.DataFrame'>
+Index: 2 entries, Australia to New Zealand
+Data columns (total 12 columns):
+gdpPercap_1952    2 non-null float64
+gdpPercap_1957    2 non-null float64
+gdpPercap_1962    2 non-null float64
+gdpPercap_1967    2 non-null float64
+gdpPercap_1972    2 non-null float64
+gdpPercap_1977    2 non-null float64
+gdpPercap_1982    2 non-null float64
+gdpPercap_1987    2 non-null float64
+gdpPercap_1992    2 non-null float64
+gdpPercap_1997    2 non-null float64
+gdpPercap_2002    2 non-null float64
+gdpPercap_2007    2 non-null float64
+dtypes: float64(12)
+memory usage: 208.0+ bytes
+
+
  • This is a DataFrame +
  • +
  • Two rows named 'Australia' and +'New Zealand' +
  • +
  • Twelve columns, each of which has two actual 64-bit floating point +values. +
    • We will talk later about null values, which are used to represent +missing observations.
    • +
  • +
  • Uses 208 bytes of memory.
  • +

The DataFrame.columns variable stores information about +the dataframe’s columns.

+
  • Note that this is data, not a method. (It doesn’t have +parentheses.) +
    • Like math.pi.
    • +
    • So do not use () to try to call it.
    • +
  • +
  • Called a member variable, or just member.
  • +
+

PYTHON +

+
print(data_oceania_country.columns)
+
+
+

OUTPUT +

+
Index(['gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967',
+       'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987',
+       'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007'],
+      dtype='object')
+
+

Use DataFrame.T to transpose a dataframe.

+
  • Sometimes want to treat columns as rows and vice versa.
  • +
  • Transpose (written .T) doesn’t copy the data, just +changes the program’s view of it.
  • +
  • Like columns, it is a member variable.
  • +
+

PYTHON +

+
print(data_oceania_country.T)
+
+
+

OUTPUT +

+
country           Australia  New Zealand
+gdpPercap_1952  10039.59564  10556.57566
+gdpPercap_1957  10949.64959  12247.39532
+gdpPercap_1962  12217.22686  13175.67800
+gdpPercap_1967  14526.12465  14463.91893
+gdpPercap_1972  16788.62948  16046.03728
+gdpPercap_1977  18334.19751  16233.71770
+gdpPercap_1982  19477.00928  17632.41040
+gdpPercap_1987  21888.88903  19007.19129
+gdpPercap_1992  23424.76683  18363.32494
+gdpPercap_1997  26997.93657  21050.41377
+gdpPercap_2002  30687.75473  23189.80135
+gdpPercap_2007  34435.36744  25185.00911
+
+

Use DataFrame.describe() to get summary statistics +about data.

+

DataFrame.describe() gets the summary statistics of only +the columns that have numerical data. All other columns are ignored, +unless you use the argument include='all'.

+
+

PYTHON +

+
print(data_oceania_country.describe())
+
+
+

OUTPUT +

+
       gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+count        2.000000        2.000000        2.000000        2.000000
+mean     10298.085650    11598.522455    12696.452430    14495.021790
+std        365.560078      917.644806      677.727301       43.986086
+min      10039.595640    10949.649590    12217.226860    14463.918930
+25%      10168.840645    11274.086022    12456.839645    14479.470360
+50%      10298.085650    11598.522455    12696.452430    14495.021790
+75%      10427.330655    11922.958888    12936.065215    14510.573220
+max      10556.575660    12247.395320    13175.678000    14526.124650
+
+       gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+count         2.00000        2.000000        2.000000        2.000000
+mean      16417.33338    17283.957605    18554.709840    20448.040160
+std         525.09198     1485.263517     1304.328377     2037.668013
+min       16046.03728    16233.717700    17632.410400    19007.191290
+25%       16231.68533    16758.837652    18093.560120    19727.615725
+50%       16417.33338    17283.957605    18554.709840    20448.040160
+75%       16602.98143    17809.077557    19015.859560    21168.464595
+max       16788.62948    18334.197510    19477.009280    21888.889030
+
+       gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+count        2.000000        2.000000        2.000000        2.000000
+mean     20894.045885    24024.175170    26938.778040    29810.188275
+std       3578.979883     4205.533703     5301.853680     6540.991104
+min      18363.324940    21050.413770    23189.801350    25185.009110
+25%      19628.685413    22537.294470    25064.289695    27497.598692
+50%      20894.045885    24024.175170    26938.778040    29810.188275
+75%      22159.406358    25511.055870    28813.266385    32122.777857
+max      23424.766830    26997.936570    30687.754730    34435.367440
+
+
  • Not particularly useful with just two records, but very helpful when +there are thousands.
  • +
+
+ +
+
+

Reading Other Data

+
+

Read the data in gapminder_gdp_americas.csv (which +should be in the same directory as +gapminder_gdp_oceania.csv) into a variable called +data_americas and display its summary statistics.

+
+
+
+
+
+ +
+
+

To read in a CSV, we use pd.read_csv and pass the +filename 'data/gapminder_gdp_americas.csv' to it. We also +once again pass the column name 'country' to the parameter +index_col in order to index by country. The summary +statistics can be displayed with the DataFrame.describe() +method.

+
+

PYTHON +

+
data_americas = pd.read_csv('data/gapminder_gdp_americas.csv', index_col='country')
+data_americas.describe()
+
+
+
+
+
+
+
+ +
+
+

Inspecting Data

+
+

After reading the data for the Americas, use +help(data_americas.head) and +help(data_americas.tail) to find out what +DataFrame.head and DataFrame.tail do.

+
  1. What method call will display the first three rows of this +data?
  2. +
  3. What method call will display the last three columns of this data? +(Hint: you may need to change your view of the data.)
  4. +
+
+
+
+
+ +
+
+
  1. We can check out the first five rows of data_americas +by executing data_americas.head() which lets us view the +beginning of the DataFrame. We can specify the number of rows we wish to +see by specifying the parameter n in our call to +data_americas.head(). To view the first three rows, +execute:
  2. +
+

PYTHON +

+
data_americas.head(n=3)
+
+
+

OUTPUT +

+
          continent  gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  \
+country
+Argentina  Americas     5911.315053     6856.856212     7133.166023
+Bolivia    Americas     2677.326347     2127.686326     2180.972546
+Brazil     Americas     2108.944355     2487.365989     3336.585802
+
+          gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  \
+country
+Argentina     8052.953021     9443.038526    10079.026740     8997.897412
+Bolivia       2586.886053     2980.331339     3548.097832     3156.510452
+Brazil        3429.864357     4985.711467     6660.118654     7030.835878
+
+           gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  \
+country
+Argentina     9139.671389     9308.418710    10967.281950     8797.640716
+Bolivia       2753.691490     2961.699694     3326.143191     3413.262690
+Brazil        7807.095818     6950.283021     7957.980824     8131.212843
+
+           gdpPercap_2007
+country
+Argentina    12779.379640
+Bolivia       3822.137084
+Brazil        9065.800825
+
+
  1. To check out the last three rows of data_americas, we +would use the command, americas.tail(n=3), analogous to +head() used above. However, here we want to look at the +last three columns so we need to change our view and then use +tail(). To do so, we create a new DataFrame in which rows +and columns are switched:
  2. +
+

PYTHON +

+
americas_flipped = data_americas.T
+
+

We can then view the last three columns of americas by +viewing the last three rows of americas_flipped:

+
+

PYTHON +

+
americas_flipped.tail(n=3)
+
+
+

OUTPUT +

+
country        Argentina  Bolivia   Brazil   Canada    Chile Colombia  \
+gdpPercap_1997   10967.3  3326.14  7957.98  28954.9  10118.1  6117.36
+gdpPercap_2002   8797.64  3413.26  8131.21    33329  10778.8  5755.26
+gdpPercap_2007   12779.4  3822.14   9065.8  36319.2  13171.6  7006.58
+
+country        Costa Rica     Cuba Dominican Republic  Ecuador    ...     \
+gdpPercap_1997    6677.05  5431.99             3614.1  7429.46    ...
+gdpPercap_2002    7723.45  6340.65            4563.81  5773.04    ...
+gdpPercap_2007    9645.06   8948.1            6025.37  6873.26    ...
+
+country          Mexico Nicaragua   Panama Paraguay     Peru Puerto Rico  \
+gdpPercap_1997   9767.3   2253.02  7113.69   4247.4  5838.35     16999.4
+gdpPercap_2002  10742.4   2474.55  7356.03  3783.67  5909.02     18855.6
+gdpPercap_2007  11977.6   2749.32  9809.19  4172.84  7408.91     19328.7
+
+country        Trinidad and Tobago United States  Uruguay Venezuela
+gdpPercap_1997             8792.57       35767.4  9230.24   10165.5
+gdpPercap_2002             11460.6       39097.1     7727   8605.05
+gdpPercap_2007             18008.5       42951.7  10611.5   11415.8
+
+

This shows the data that we want, but we may prefer to display three +columns instead of three rows, so we can flip it back:

+
+

PYTHON +

+
americas_flipped.tail(n=3).T    
+
+

Note: we could have done the above in a single line +of code by ‘chaining’ the commands:

+
+

PYTHON +

+
data_americas.T.tail(n=3).T
+
+
+
+
+
+
+
+ +
+
+

Reading Files in Other Directories

+
+

The data for your current project is stored in a file called +microbes.csv, which is located in a folder called +field_data. You are doing analysis in a notebook called +analysis.ipynb in a sibling folder called +thesis:

+
+

OUTPUT +

+
your_home_directory
++-- field_data/
+|   +-- microbes.csv
++-- thesis/
+    +-- analysis.ipynb
+
+

What value(s) should you pass to read_csv to read +microbes.csv in analysis.ipynb?

+
+
+
+
+
+ +
+
+

We need to specify the path to the file of interest in the call to +pd.read_csv. We first need to ‘jump’ out of the folder +thesis using ‘../’ and then into the folder +field_data using ‘field_data/’. Then we can specify the +filename `microbes.csv. The result is as follows:

+
+

PYTHON +

+
data_microbes = pd.read_csv('../field_data/microbes.csv')
+
+
+
+
+
+
+
+ +
+
+

Writing Data

+
+

As well as the read_csv function for reading data from a +file, Pandas provides a to_csv function to write dataframes +to files. Applying what you’ve learned about reading from files, write +one of your dataframes to a file called processed.csv. You +can use help to get information on how to use +to_csv.

+
+
+
+
+
+ +
+
+

In order to write the DataFrame data_americas to a file +called processed.csv, execute the following command:

+
+

PYTHON +

+
data_americas.to_csv('processed.csv')
+
+

For help on read_csv or to_csv, you could +execute, for example:

+
+

PYTHON +

+
help(data_americas.to_csv)
+help(pd.read_csv)
+
+

Note that help(to_csv) or help(pd.to_csv) +throws an error! This is due to the fact that to_csv is not +a global Pandas function, but a member function of DataFrames. This +means you can only call it on an instance of a DataFrame e.g., +data_americas.to_csv or +data_oceania.to_csv

+
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use the Pandas library to get basic statistics out of tabular +data.
  • +
  • Use index_col to specify that a column’s values should +be used as row headings.
  • +
  • Use DataFrame.info to find out more about a +dataframe.
  • +
  • The DataFrame.columns variable stores information about +the dataframe’s columns.
  • +
  • Use DataFrame.T to transpose a dataframe.
  • +
  • Use DataFrame.describe to get summary statistics about +data.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/08-data-frames.html b/instructor/08-data-frames.html new file mode 100644 index 000000000..0ce0d676a --- /dev/null +++ b/instructor/08-data-frames.html @@ -0,0 +1,1441 @@ + +Plotting and Programming in Python: Pandas DataFrames +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Pandas DataFrames

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 30 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I do statistical analysis of tabular data?
  • +
+
+
+
+
+
+

Objectives

+
  • Select individual values from a Pandas dataframe.
  • +
  • Select entire rows or entire columns from a dataframe.
  • +
  • Select a subset of both rows and columns from a dataframe in a +single operation.
  • +
  • Select a subset of a dataframe by a single Boolean criterion.
  • +
+
+
+
+
+

Note about Pandas DataFrames/Series

+

A DataFrame +is a collection of Series; +The DataFrame is the way Pandas represents a table, and Series is the +data-structure Pandas use to represent a column.

+

Pandas is built on top of the Numpy library, which in practice means +that most of the methods defined for Numpy Arrays apply to Pandas +Series/DataFrames.

+

What makes Pandas so attractive is the powerful interface to access +individual records of the table, proper handling of missing values, and +relational-databases operations between DataFrames.

+

Selecting values

+

To access a value at the position [i,j] of a DataFrame, +we have two options, depending on what is the meaning of i +in use. Remember that a DataFrame provides an index as a way to +identify the rows of the table; a row, then, has a position +inside the table as well as a label, which uniquely identifies +its entry in the DataFrame.

+

Use DataFrame.iloc[..., ...] to select values by their +(entry) position

+
  • Can specify location by numerical index analogously to 2D version of +character selection in strings.
  • +
+

PYTHON +

+
import pandas as pd
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.iloc[0, 0])
+
+
+

OUTPUT +

+
1601.056136
+
+

Use DataFrame.loc[..., ...] to select values by their +(entry) label.

+
  • Can specify location by row and/or column name.
  • +
+

PYTHON +

+
print(data.loc["Albania", "gdpPercap_1952"])
+
+
+

OUTPUT +

+
1601.056136
+
+

Use : on its own to mean all columns or all rows.

+
  • Just like Python’s usual slicing notation.
  • +
+

PYTHON +

+
print(data.loc["Albania", :])
+
+
+

OUTPUT +

+
gdpPercap_1952    1601.056136
+gdpPercap_1957    1942.284244
+gdpPercap_1962    2312.888958
+gdpPercap_1967    2760.196931
+gdpPercap_1972    3313.422188
+gdpPercap_1977    3533.003910
+gdpPercap_1982    3630.880722
+gdpPercap_1987    3738.932735
+gdpPercap_1992    2497.437901
+gdpPercap_1997    3193.054604
+gdpPercap_2002    4604.211737
+gdpPercap_2007    5937.029526
+Name: Albania, dtype: float64
+
+
  • Would get the same result printing data.loc["Albania"] +(without a second index).
  • +
+

PYTHON +

+
print(data.loc[:, "gdpPercap_1952"])
+
+
+

OUTPUT +

+
country
+Albania                    1601.056136
+Austria                    6137.076492
+Belgium                    8343.105127
+⋮ ⋮ ⋮
+Switzerland               14734.232750
+Turkey                     1969.100980
+United Kingdom             9979.508487
+Name: gdpPercap_1952, dtype: float64
+
+
  • Would get the same result printing +data["gdpPercap_1952"] +
  • +
  • Also get the same result printing data.gdpPercap_1952 +(not recommended, because easily confused with . notation +for methods)
  • +

Select multiple columns or rows using DataFrame.loc and +a named slice.

+
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'])
+
+
+

OUTPUT +

+
             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy           8243.582340    10022.401310    12269.273780
+Montenegro      4649.593785     5907.850937     7778.414017
+Netherlands    12790.849560    15363.251360    18794.745670
+Norway         13450.401510    16361.876470    18965.055510
+Poland          5338.752143     6557.152776     8006.506993
+
+

In the above code, we discover that slicing using +loc is inclusive at both ends, which differs from +slicing using iloc, where slicing +indicates everything up to but not including the final index.

+

Result of slicing can be used in further operations.

+
  • Usually don’t just print a slice.
  • +
  • All the statistical operators that work on entire dataframes work +the same way on slices.
  • +
  • E.g., calculate max of a slice.
  • +
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].max())
+
+
+

OUTPUT +

+
gdpPercap_1962    13450.40151
+gdpPercap_1967    16361.87647
+gdpPercap_1972    18965.05551
+dtype: float64
+
+
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].min())
+
+
+

OUTPUT +

+
gdpPercap_1962    4649.593785
+gdpPercap_1967    5907.850937
+gdpPercap_1972    7778.414017
+dtype: float64
+
+

Use comparisons to select data based on value.

+
  • Comparison is applied element by element.
  • +
  • Returns a similarly-shaped dataframe of True and +False.
  • +
+

PYTHON +

+
# Use a subset of data to keep output readable.
+subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
+print('Subset of data:\n', subset)
+
+# Which values were greater than 10000 ?
+print('\nWhere are values large?\n', subset > 10000)
+
+
+

OUTPUT +

+
Subset of data:
+             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy           8243.582340    10022.401310    12269.273780
+Montenegro      4649.593785     5907.850937     7778.414017
+Netherlands    12790.849560    15363.251360    18794.745670
+Norway         13450.401510    16361.876470    18965.055510
+Poland          5338.752143     6557.152776     8006.506993
+
+Where are values large?
+            gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
+country
+Italy                False           True           True
+Montenegro           False          False          False
+Netherlands           True           True           True
+Norway                True           True           True
+Poland               False          False          False
+
+

Select values or NaN using a Boolean mask.

+
  • A frame full of Booleans is sometimes called a mask because +of how it can be used.
  • +
+

PYTHON +

+
mask = subset > 10000
+print(subset[mask])
+
+
+

OUTPUT +

+
             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy                   NaN     10022.40131     12269.27378
+Montenegro              NaN             NaN             NaN
+Netherlands     12790.84956     15363.25136     18794.74567
+Norway          13450.40151     16361.87647     18965.05551
+Poland                  NaN             NaN             NaN
+
+
  • Get the value where the mask is true, and NaN (Not a Number) where +it is false.
  • +
  • Useful because NaNs are ignored by operations like max, min, +average, etc.
  • +
+

PYTHON +

+
print(subset[subset > 10000].describe())
+
+
+

OUTPUT +

+
       gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+count        2.000000        3.000000        3.000000
+mean     13120.625535    13915.843047    16676.358320
+std        466.373656     3408.589070     3817.597015
+min      12790.849560    10022.401310    12269.273780
+25%      12955.737547    12692.826335    15532.009725
+50%      13120.625535    15363.251360    18794.745670
+75%      13285.513523    15862.563915    18879.900590
+max      13450.401510    16361.876470    18965.055510
+
+

Group By: split-apply-combine

+
+
+ +
+
+

Learners often struggle here, many may not work with financial data +and concepts so they find the example concepts difficult to get their +head around. The biggest problem though is the line generating the +wealth_score, this step needs to be talked through throughly: * It uses +implicit conversion between boolean and float values which has not been +covered in the course so far. * The axis=1 argument needs to be +explained clearly.

+
+
+
+
+

Pandas vectorizing methods and grouping operations are features that +provide users much flexibility to analyse their data.

+

For instance, let’s say we want to have a clearer view on how the +European countries split themselves according to their GDP.

+
  1. We may have a glance by splitting the countries in two groups during +the years surveyed, those who presented a GDP higher than the +European average and those with a lower GDP.
  2. +
  3. We then estimate a wealthy score based on the historical +(from 1962 to 2007) values, where we account how many times a country +has participated in the groups of lower or higher +GDP
  4. +
+

PYTHON +

+
mask_higher = data > data.mean()
+wealth_score = mask_higher.aggregate('sum', axis=1) / len(data.columns)
+print(wealth_score)
+
+
+

OUTPUT +

+
country
+Albania                   0.000000
+Austria                   1.000000
+Belgium                   1.000000
+Bosnia and Herzegovina    0.000000
+Bulgaria                  0.000000
+Croatia                   0.000000
+Czech Republic            0.500000
+Denmark                   1.000000
+Finland                   1.000000
+France                    1.000000
+Germany                   1.000000
+Greece                    0.333333
+Hungary                   0.000000
+Iceland                   1.000000
+Ireland                   0.333333
+Italy                     0.500000
+Montenegro                0.000000
+Netherlands               1.000000
+Norway                    1.000000
+Poland                    0.000000
+Portugal                  0.000000
+Romania                   0.000000
+Serbia                    0.000000
+Slovak Republic           0.000000
+Slovenia                  0.333333
+Spain                     0.333333
+Sweden                    1.000000
+Switzerland               1.000000
+Turkey                    0.000000
+United Kingdom            1.000000
+dtype: float64
+
+

Finally, for each group in the wealth_score table, we +sum their (financial) contribution across the years surveyed using +chained methods:

+
+

PYTHON +

+
print(data.groupby(wealth_score).sum())
+
+
+

OUTPUT +

+
          gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+0.000000    36916.854200    46110.918793    56850.065437    71324.848786
+0.333333    16790.046878    20942.456800    25744.935321    33567.667670
+0.500000    11807.544405    14505.000150    18380.449470    21421.846200
+1.000000   104317.277560   127332.008735   149989.154201   178000.350040
+
+          gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+0.000000    88569.346898   104459.358438   113553.768507   119649.599409
+0.333333    45277.839976    53860.456750    59679.634020    64436.912960
+0.500000    25377.727380    29056.145370    31914.712050    35517.678220
+1.000000   215162.343140   241143.412730   263388.781960   296825.131210
+
+          gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+0.000000    92380.047256   103772.937598   118590.929863   149577.357928
+0.333333    67918.093220    80876.051580   102086.795210   122803.729520
+0.500000    36310.666080    40723.538700    45564.308390    51403.028210
+1.000000   315238.235970   346930.926170   385109.939210   427850.333420
+
+
+
+ +
+
+

Selection of Individual Values

+
+

Assume Pandas has been imported into your notebook and the Gapminder +GDP data for Europe has been loaded:

+
+

PYTHON +

+
import pandas as pd
+
+data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+
+

Write an expression to find the Per Capita GDP of Serbia in 2007.

+
+
+
+
+
+ +
+
+

The selection can be done by using the labels for both the row +(“Serbia”) and the column (“gdpPercap_2007”):

+
+

PYTHON +

+
print(data_europe.loc['Serbia', 'gdpPercap_2007'])
+
+

The output is

+
+

OUTPUT +

+
9786.534714
+
+
+
+
+
+
+
+ +
+
+

Extent of Slicing

+
+
  1. Do the two statements below produce the same output?
  2. +
  3. Based on this, what rule governs what is included (or not) in +numerical slices and named slices in Pandas?
  4. +
+

PYTHON +

+
print(data_europe.iloc[0:2, 0:2])
+print(data_europe.loc['Albania':'Belgium', 'gdpPercap_1952':'gdpPercap_1962'])
+
+
+
+
+
+
+ +
+
+

No, they do not produce the same output! The output of the first +statement is:

+
+

OUTPUT +

+
        gdpPercap_1952  gdpPercap_1957
+country
+Albania     1601.056136     1942.284244
+Austria     6137.076492     8842.598030
+
+

The second statement gives:

+
+

OUTPUT +

+
        gdpPercap_1952  gdpPercap_1957  gdpPercap_1962
+country
+Albania     1601.056136     1942.284244     2312.888958
+Austria     6137.076492     8842.598030    10750.721110
+Belgium     8343.105127     9714.960623    10991.206760
+
+

Clearly, the second statement produces an additional column and an +additional row compared to the first statement.
+What conclusion can we draw? We see that a numerical slice, 0:2, +omits the final index (i.e. index 2) in the range provided, +while a named slice, ‘gdpPercap_1952’:‘gdpPercap_1962’, +includes the final element.

+
+
+
+
+
+
+ +
+
+

Reconstructing Data

+
+

Explain what each line in the following short program does: what is +in first, second, etc.?

+
+

PYTHON +

+
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+second = first[first['continent'] == 'Americas']
+third = second.drop('Puerto Rico')
+fourth = third.drop('continent', axis = 1)
+fourth.to_csv('result.csv')
+
+
+
+
+
+
+ +
+
+

Let’s go through this piece of code line by line.

+
+

PYTHON +

+
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+
+

This line loads the dataset containing the GDP data from all +countries into a dataframe called first. The +index_col='country' parameter selects which column to use +as the row labels in the dataframe.

+
+

PYTHON +

+
second = first[first['continent'] == 'Americas']
+
+

This line makes a selection: only those rows of first +for which the ‘continent’ column matches ‘Americas’ are extracted. +Notice how the Boolean expression inside the brackets, +first['continent'] == 'Americas', is used to select only +those rows where the expression is true. Try printing this expression! +Can you print also its individual True/False elements? (hint: first +assign the expression to a variable)

+
+

PYTHON +

+
third = second.drop('Puerto Rico')
+
+

As the syntax suggests, this line drops the row from +second where the label is ‘Puerto Rico’. The resulting +dataframe third has one row less than the original +dataframe second.

+
+

PYTHON +

+
fourth = third.drop('continent', axis = 1)
+
+

Again we apply the drop function, but in this case we are dropping +not a row but a whole column. To accomplish this, we need to specify +also the axis parameter (we want to drop the second column +which has index 1).

+
+

PYTHON +

+
fourth.to_csv('result.csv')
+
+

The final step is to write the data that we have been working on to a +csv file. Pandas makes this easy with the to_csv() +function. The only required argument to the function is the filename. +Note that the file will be written in the directory from which you +started the Jupyter or Python session.

+
+
+
+
+
+
+ +
+
+

Selecting Indices

+
+

Explain in simple terms what idxmin and +idxmax do in the short program below. When would you use +these methods?

+
+

PYTHON +

+
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.idxmin())
+print(data.idxmax())
+
+
+
+
+
+
+ +
+
+

For each column in data, idxmin will return +the index value corresponding to each column’s minimum; +idxmax will do accordingly the same for each column’s +maximum value.

+

You can use these functions whenever you want to get the row index of +the minimum/maximum value and not the actual minimum/maximum value.

+
+
+
+
+
+
+ +
+
+

Practice with Selection

+
+

Assume Pandas has been imported and the Gapminder GDP data for Europe +has been loaded. Write an expression to select each of the +following:

+
  1. GDP per capita for all countries in 1982.
  2. +
  3. GDP per capita for Denmark for all years.
  4. +
  5. GDP per capita for all countries for years after 1985.
  6. +
  7. GDP per capita for each country in 2007 as a multiple of GDP per +capita for that country in 1952.
  8. +
+
+
+
+
+ +
+
+

1:

+
+

PYTHON +

+
data['gdpPercap_1982']
+
+

2:

+
+

PYTHON +

+
data.loc['Denmark',:]
+
+

3:

+
+

PYTHON +

+
data.loc[:,'gdpPercap_1985':]
+
+

Pandas is smart enough to recognize the number at the end of the +column label and does not give you an error, although no column named +gdpPercap_1985 actually exists. This is useful if new +columns are added to the CSV file later.

+

4:

+
+

PYTHON +

+
data['gdpPercap_2007']/data['gdpPercap_1952']
+
+
+
+
+
+
+
+ +
+
+

Many Ways of Access

+
+

There are at least two ways of accessing a value or slice of a +DataFrame: by name or index. However, there are many others. For +example, a single column or row can be accessed either as a +DataFrame or a Series object.

+

Suggest different ways of doing the following operations on a +DataFrame:

+
  1. Access a single column
  2. +
  3. Access a single row
  4. +
  5. Access an individual DataFrame element
  6. +
  7. Access several columns
  8. +
  9. Access several rows
  10. +
  11. Access a subset of specific rows and columns
  12. +
  13. Access a subset of row and column ranges
  14. +
+
+
+
+
+ +
+
+

1. Access a single column:

+
+

PYTHON +

+
# by name
+data["col_name"]   # as a Series
+data[["col_name"]] # as a DataFrame
+
+# by name using .loc
+data.T.loc["col_name"]  # as a Series
+data.T.loc[["col_name"]].T  # as a DataFrame
+
+# Dot notation (Series)
+data.col_name
+
+# by index (iloc)
+data.iloc[:, col_index]   # as a Series
+data.iloc[:, [col_index]] # as a DataFrame
+
+# using a mask
+data.T[data.T.index == "col_name"].T
+
+

2. Access a single row:

+
+

PYTHON +

+
# by name using .loc
+data.loc["row_name"] # as a Series
+data.loc[["row_name"]] # as a DataFrame
+
+# by name
+data.T["row_name"] # as a Series
+data.T[["row_name"]].T # as a DataFrame
+
+# by index
+data.iloc[row_index]   # as a Series
+data.iloc[[row_index]]   # as a DataFrame
+
+# using mask
+data[data.index == "row_name"]
+
+

3. Access an individual DataFrame element:

+
+

PYTHON +

+
# by column/row names
+data["column_name"]["row_name"]         # as a Series
+
+data[["col_name"]].loc["row_name"]  # as a Series
+data[["col_name"]].loc[["row_name"]]  # as a DataFrame
+
+data.loc["row_name"]["col_name"]  # as a value
+data.loc[["row_name"]]["col_name"]  # as a Series
+data.loc[["row_name"]][["col_name"]]  # as a DataFrame
+
+data.loc["row_name", "col_name"]  # as a value
+data.loc[["row_name"], "col_name"]  # as a Series. Preserves index. Column name is moved to `.name`.
+data.loc["row_name", ["col_name"]]  # as a Series. Index is moved to `.name.` Sets index to column name.
+data.loc[["row_name"], ["col_name"]]  # as a DataFrame (preserves original index and column name)
+
+# by column/row names: Dot notation
+data.col_name.row_name
+
+# by column/row indices
+data.iloc[row_index, col_index] # as a value
+data.iloc[[row_index], col_index] # as a Series. Preserves index. Column name is moved to `.name`
+data.iloc[row_index, [col_index]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.iloc[[row_index], [col_index]] # as a DataFrame (preserves original index and column name)
+
+# column name + row index
+data["col_name"][row_index]
+data.col_name[row_index]
+data["col_name"].iloc[row_index]
+
+# column index + row name
+data.iloc[:, [col_index]].loc["row_name"]  # as a Series
+data.iloc[:, [col_index]].loc[["row_name"]]  # as a DataFrame
+
+# using masks
+data[data.index == "row_name"].T[data.T.index == "col_name"].T
+
+

4. Access several columns:

+
+

PYTHON +

+
# by name
+data[["col1", "col2", "col3"]]
+data.loc[:, ["col1", "col2", "col3"]]
+
+# by index
+data.iloc[:, [col1_index, col2_index, col3_index]]
+
+

5. Access several rows

+
+

PYTHON +

+
# by name
+data.loc[["row1", "row2", "row3"]]
+
+# by index
+data.iloc[[row1_index, row2_index, row3_index]]
+
+

6. Access a subset of specific rows and columns

+
+

PYTHON +

+
# by names
+data.loc[["row1", "row2", "row3"], ["col1", "col2", "col3"]]
+
+# by indices
+data.iloc[[row1_index, row2_index, row3_index], [col1_index, col2_index, col3_index]]
+
+# column names + row indices
+data[["col1", "col2", "col3"]].iloc[[row1_index, row2_index, row3_index]]
+
+# column indices + row names
+data.iloc[:, [col1_index, col2_index, col3_index]].loc[["row1", "row2", "row3"]]
+
+

7. Access a subset of row and column ranges

+
+

PYTHON +

+
# by name
+data.loc["row1":"row2", "col1":"col2"]
+
+# by index
+data.iloc[row1_index:row2_index, col1_index:col2_index]
+
+# column names + row indices
+data.loc[:, "col1_name":"col2_name"].iloc[row1_index:row2_index]
+
+# column indices + row names
+data.iloc[:, col1_index:col2_index].loc["row1":"row2"]
+
+
+
+
+
+
+
+ +
+
+

Exploring available methods using the +dir() function

+
+

Python includes a dir() function that can be used to +display all of the available methods (functions) that are built into a +data object. In Episode 4, we used some methods with a string. But we +can see many more are available by using dir():

+
+

PYTHON +

+
my_string = 'Hello world!'   # creation of a string object 
+dir(my_string)
+
+

This command returns:

+
+

PYTHON +

+
['__add__',
+...
+'__subclasshook__',
+'capitalize',
+'casefold',
+'center',
+...
+'upper',
+'zfill']
+
+

You can use help() or Shift+Tab to +get more information about what these methods do.

+

Assume Pandas has been imported and the Gapminder GDP data for Europe +has been loaded as data. Then, use dir() to +find the function that prints out the median per-capita GDP across all +European countries for each year that information is available.

+
+
+
+
+
+ +
+
+

Among many choices, dir() lists the +median() function as a possibility. Thus,

+
+

PYTHON +

+
data.median()
+
+
+
+
+
+
+
+ +
+
+

Interpretation

+
+

Poland’s borders have been stable since 1945, but changed several +times in the years before then. How would you handle this if you were +creating a table of GDP per capita for Poland for the entire twentieth +century?

+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use DataFrame.iloc[..., ...] to select values by +integer location.
  • +
  • Use : on its own to mean all columns or all rows.
  • +
  • Select multiple columns or rows using DataFrame.loc and +a named slice.
  • +
  • Result of slicing can be used in further operations.
  • +
  • Use comparisons to select data based on value.
  • +
  • Select values or NaN using a Boolean mask.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/09-plotting.html b/instructor/09-plotting.html new file mode 100644 index 000000000..c4788d2f2 --- /dev/null +++ b/instructor/09-plotting.html @@ -0,0 +1,987 @@ + +Plotting and Programming in Python: Plotting +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Plotting

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 30 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I plot my data?
  • +
  • How can I save my plot for publishing?
  • +
+
+
+
+
+
+

Objectives

+
  • Create a time series plot showing a single data set.
  • +
  • Create a scatter plot showing relationship between two data +sets.
  • +
+
+
+
+
+

+matplotlib is the +most widely used scientific plotting library in Python.

+
  • Commonly use a sub-library called matplotlib.pyplot.
  • +
  • The Jupyter Notebook will render plots inline by default.
  • +
+

PYTHON +

+
import matplotlib.pyplot as plt
+
+
  • Simple plots are then (fairly) simple to create.
  • +
+

PYTHON +

+
time = [0, 1, 2, 3]
+position = [0, 100, 200, 300]
+
+plt.plot(time, position)
+plt.xlabel('Time (hr)')
+plt.ylabel('Position (km)')
+
+
A line chart showing time (hr) relative to position (km), using the values provided in the code block above. By default, the plotted line is blue against a white background, and the axes have been scaled automatically to fit the range of the input data.
+
+ +
+
+

Display All Open Figures

+
+

In our Jupyter Notebook example, running the cell should generate the +figure directly below the code. The figure is also included in the +Notebook document for future viewing. However, other Python environments +like an interactive Python session started from a terminal or a Python +script executed via the command line require an additional command to +display the figure.

+

Instruct matplotlib to show a figure:

+
+

PYTHON +

+
plt.show()
+
+

This command can also be used within a Notebook - for instance, to +display multiple figures if several are created by a single cell.

+
+
+
+

Plot data directly from a Pandas dataframe.

+
  • We can also plot Pandas +dataframes.
  • +
  • Before plotting, we convert the column headings from a +string to integer data type, since they +represent numerical values, using str.replace() +to remove the gpdPercap_ prefix and then astype(int) +to convert the series of string values +(['1952', '1957', ..., '2007']) to a series of integers: +[1925, 1957, ..., 2007].
  • +
+

PYTHON +

+
import pandas as pd
+
+data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+
+# Extract year from last 4 characters of each column name
+# The current column names are structured as 'gdpPercap_(year)', 
+# so we want to keep the (year) part only for clarity when plotting GDP vs. years
+# To do this we use replace(), which removes from the string the characters stated in the argument
+# This method works on strings, so we use replace() from Pandas Series.str vectorized string functions
+
+years = data.columns.str.replace('gdpPercap_', '')
+
+# Convert year values to integers, saving results back to dataframe
+
+data.columns = years.astype(int)
+
+data.loc['Australia'].plot()
+
+
GDP plot for Australia

Select and transform data, then plot it.

+
  • By default, DataFrame.plot +plots with the rows as the X axis.
  • +
  • We can transpose the data in order to plot multiple series.
  • +
+

PYTHON +

+
data.T.plot()
+plt.ylabel('GDP per capita')
+
+
GDP plot for Australia and New Zealand

Many styles of plot are available.

+
  • For example, do a bar plot using a fancier style.
  • +
+

PYTHON +

+
plt.style.use('ggplot')
+data.T.plot(kind='bar')
+plt.ylabel('GDP per capita')
+
+
GDP barplot for Australia

Data can also be plotted by calling the matplotlib +plot function directly.

+
  • The command is plt.plot(x, y) +
  • +
  • The color and format of markers can also be specified as an +additional optional argument e.g., b- is a blue line, +g-- is a green dashed line.
  • +

Get Australia data from dataframe

+
+

PYTHON +

+
years = data.columns
+gdp_australia = data.loc['Australia']
+
+plt.plot(years, gdp_australia, 'g--')
+
+
GDP formatted plot for Australia

Can plot many sets of data together.

+
+

PYTHON +

+
# Select two countries' worth of data.
+gdp_australia = data.loc['Australia']
+gdp_nz = data.loc['New Zealand']
+
+# Plot with differently-colored markers.
+plt.plot(years, gdp_australia, 'b-', label='Australia')
+plt.plot(years, gdp_nz, 'g-', label='New Zealand')
+
+# Create legend.
+plt.legend(loc='upper left')
+plt.xlabel('Year')
+plt.ylabel('GDP per capita ($)')
+
+
+
+ +
+
+

Adding a Legend

+
+

Often when plotting multiple datasets on the same figure it is +desirable to have a legend describing the data.

+

This can be done in matplotlib in two stages:

+
  • Provide a label for each dataset in the figure:
  • +
+

PYTHON +

+
plt.plot(years, gdp_australia, label='Australia')
+plt.plot(years, gdp_nz, label='New Zealand')
+
+
  • Instruct matplotlib to create the legend.
  • +
+

PYTHON +

+
plt.legend()
+
+

By default matplotlib will attempt to place the legend in a suitable +position. If you would rather specify a position this can be done with +the loc= argument, e.g to place the legend in the upper +left corner of the plot, specify loc='upper left'

+
+
+
+
GDP formatted plot for Australia and New Zealand
  • Plot a scatter plot correlating the GDP of Australia and New +Zealand
  • +
  • Use either plt.scatter or +DataFrame.plot.scatter +
  • +
+

PYTHON +

+
plt.scatter(gdp_australia, gdp_nz)
+
+
GDP correlation using plt.scatter
+

PYTHON +

+
data.T.plot.scatter(x = 'Australia', y = 'New Zealand')
+
+
GDP correlation using data.T.plot.scatter
+
+ +
+
+

Minima and Maxima

+
+

Fill in the blanks below to plot the minimum GDP per capita over time +for all the countries in Europe. Modify it again to plot the maximum GDP +per capita over time for Europe.

+
+

PYTHON +

+
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+data_europe.____.plot(label='min')
+data_europe.____
+plt.legend(loc='best')
+plt.xticks(rotation=90)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+data_europe.min().plot(label='min')
+data_europe.max().plot(label='max')
+plt.legend(loc='best')
+plt.xticks(rotation=90)
+
+
Minima Maxima Solution
+
+
+
+
+
+ +
+
+

Correlations

+
+

Modify the example in the notes to create a scatter plot showing the +relationship between the minimum and maximum GDP per capita among the +countries in Asia for each year in the data set. What relationship do +you see (if any)?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col='country')
+data_asia.describe().T.plot(kind='scatter', x='min', y='max')
+
+
Correlations Solution 1

No particular correlations can be seen between the minimum and +maximum GDP values year on year. It seems the fortunes of asian +countries do not rise and fall together.

+
+
+
+
+
+
+ +
+
+

Correlations (continued) +

+
+

You might note that the variability in the maximum is much higher +than that of the minimum. Take a look at the maximum and the max +indexes:

+
+

PYTHON +

+
data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col='country')
+data_asia.max().plot()
+print(data_asia.idxmax())
+print(data_asia.idxmin())
+
+
+
+
+
+
+ +
+
+
Correlations Solution 2

Seems the variability in this value is due to a sharp drop after +1972. Some geopolitics at play perhaps? Given the dominance of oil +producing countries, maybe the Brent crude index would make an +interesting comparison? Whilst Myanmar consistently has the lowest GDP, +the highest GDP nation has varied more notably.

+
+
+
+
+
+
+ +
+
+

More Correlations

+
+

This short program creates a plot showing the correlation between GDP +and life expectancy for 2007, normalizing marker size by population:

+
+

PYTHON +

+
data_all = pd.read_csv('data/gapminder_all.csv', index_col='country')
+data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007',
+              s=data_all['pop_2007']/1e6)
+
+

Using online help and other resources, explain what each argument to +plot does.

+
+
+
+
+
+ +
+
+
More Correlations Solution

A good place to look is the documentation for the plot function - +help(data_all.plot).

+

kind - As seen already this determines the kind of plot to be +drawn.

+

x and y - A column name or index that determines what data will be +placed on the x and y axes of the plot

+

s - Details for this can be found in the documentation of +plt.scatter. A single number or one value for each data point. +Determines the size of the plotted points.

+
+
+
+
+
+
+ +
+
+

Saving your plot to a file

+
+

If you are satisfied with the plot you see you may want to save it to +a file, perhaps to include it in a publication. There is a function in +the matplotlib.pyplot module that accomplishes this: savefig. +Calling this function, e.g. with

+
+

PYTHON +

+
plt.savefig('my_figure.png')
+
+

will save the current figure to the file my_figure.png. +The file format will automatically be deduced from the file name +extension (other formats are pdf, ps, eps and svg).

+

Note that functions in plt refer to a global figure +variable and after a figure has been displayed to the screen (e.g. with +plt.show) matplotlib will make this variable refer to a new +empty figure. Therefore, make sure you call plt.savefig +before the plot is displayed to the screen, otherwise you may find a +file with an empty plot.

+

When using dataframes, data is often generated and plotted to screen +in one line. In addition to using plt.savefig, we can save +a reference to the current figure in a local variable (with +plt.gcf) and call the savefig class method +from that variable to save the figure to file.

+
+

PYTHON +

+
data.plot(kind='bar')
+fig = plt.gcf() # get current figure
+fig.savefig('my_figure.png')
+
+
+
+
+
+
+ +
+
+

Making your plots accessible

+
+

Whenever you are generating plots to go into a paper or a +presentation, there are a few things you can do to make sure that +everyone can understand your plots.

+
  • Always make sure your text is large enough to read. Use the +fontsize parameter in xlabel, +ylabel, title, and legend, and tick_params +with labelsize to increase the text size of the numbers +on your axes.
  • +
  • Similarly, you should make your graph elements easy to see. Use +s to increase the size of your scatterplot markers and +linewidth to increase the sizes of your plot lines.
  • +
  • Using color (and nothing else) to distinguish between different plot +elements will make your plots unreadable to anyone who is colorblind, or +who happens to have a black-and-white office printer. For lines, the +linestyle parameter lets you use different types of lines. +For scatterplots, marker lets you change the shape of your +points. If you’re unsure about your colors, you can use Coblis +or Color Oracle to simulate what +your plots would look like to those with colorblindness.
  • +
+
+
+
+
+ +
+
+

Key Points

+
+
  • +matplotlib is the +most widely used scientific plotting library in Python.
  • +
  • Plot data directly from a Pandas dataframe.
  • +
  • Select and transform data, then plot it.
  • +
  • Many styles of plot are available: see the Python Graph +Gallery for more options.
  • +
  • Can plot many sets of data together.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/10-lunch.html b/instructor/10-lunch.html new file mode 100644 index 000000000..5c06e01c6 --- /dev/null +++ b/instructor/10-lunch.html @@ -0,0 +1,540 @@ + +Plotting and Programming in Python: Lunch +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Lunch

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 0 minutes

+ +
+ +
+ + + +

Over lunch, reflect on and discuss the following:

+
  • What sort of packages might you use in Python and why would you use +them?
  • +
  • How would data need to be formatted to be used in Pandas data +frames? Would the data you have meet these requirements?
  • +
  • What limitations or problems might you run into when thinking about +how to apply what we’ve learned to your own projects or data?
  • +
+
+ + +
+
+ + + diff --git a/instructor/11-lists.html b/instructor/11-lists.html new file mode 100644 index 000000000..7b6014aae --- /dev/null +++ b/instructor/11-lists.html @@ -0,0 +1,1154 @@ + +Plotting and Programming in Python: Lists +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Lists

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 20 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I store multiple values?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain why programs need collections of values.
  • +
  • Write programs that create flat lists, index them, slice them, and +modify them through assignment and method calls.
  • +
+
+
+
+
+

A list stores many values in a single structure.

+
  • Doing calculations with a hundred variables called +pressure_001, pressure_002, etc., would be at +least as slow as doing them by hand.
  • +
  • Use a list to store many values together. +
    • Contained within square brackets [...].
    • +
    • Values separated by commas ,.
    • +
  • +
  • Use len to find out how many values are in a list.
  • +
+

PYTHON +

+
pressures = [0.273, 0.275, 0.277, 0.275, 0.276]
+print('pressures:', pressures)
+print('length:', len(pressures))
+
+
+

OUTPUT +

+
pressures: [0.273, 0.275, 0.277, 0.275, 0.276]
+length: 5
+
+

Use an item’s index to fetch it from a list.

+
  • Just like strings.
  • +
+

PYTHON +

+
print('zeroth item of pressures:', pressures[0])
+print('fourth item of pressures:', pressures[4])
+
+
+

OUTPUT +

+
zeroth item of pressures: 0.273
+fourth item of pressures: 0.276
+
+

Lists’ values can be replaced by assigning to them.

+
  • Use an index expression on the left of assignment to replace a +value.
  • +
+

PYTHON +

+
pressures[0] = 0.265
+print('pressures is now:', pressures)
+
+
+

OUTPUT +

+
pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]
+
+

Appending items to a list lengthens it.

+
  • Use list_name.append to add items to the end of a +list.
  • +
+

PYTHON +

+
primes = [2, 3, 5]
+print('primes is initially:', primes)
+primes.append(7)
+print('primes has become:', primes)
+
+
+

OUTPUT +

+
primes is initially: [2, 3, 5]
+primes has become: [2, 3, 5, 7]
+
+
  • +append is a method of lists. +
    • Like a function, but tied to a particular object.
    • +
  • +
  • Use object_name.method_name to call methods. +
    • Deliberately resembles the way we refer to things in a library.
    • +
  • +
  • We will meet other methods of lists as we go along. +
    • Use help(list) for a preview.
    • +
  • +
  • +extend is similar to append, but it allows +you to combine two lists. For example:
  • +
+

PYTHON +

+
teen_primes = [11, 13, 17, 19]
+middle_aged_primes = [37, 41, 43, 47]
+print('primes is currently:', primes)
+primes.extend(teen_primes)
+print('primes has now become:', primes)
+primes.append(middle_aged_primes)
+print('primes has finally become:', primes)
+
+
+

OUTPUT +

+
primes is currently: [2, 3, 5, 7]
+primes has now become: [2, 3, 5, 7, 11, 13, 17, 19]
+primes has finally become: [2, 3, 5, 7, 11, 13, 17, 19, [37, 41, 43, 47]]
+
+

Note that while extend maintains the “flat” structure of +the list, appending a list to a list means the last element in +primes will itself be a list, not an integer. Lists can +contain values of any type; therefore, lists of lists are possible.

+

Use del to remove items from a list entirely.

+
  • We use del list_name[index] to remove an element from a +list (in the example, 9 is not a prime number) and thus shorten it.
  • +
  • +del is not a function or a method, but a statement in +the language.
  • +
+

PYTHON +

+
primes = [2, 3, 5, 7, 9]
+print('primes before removing last item:', primes)
+del primes[4]
+print('primes after removing last item:', primes)
+
+
+

OUTPUT +

+
primes before removing last item: [2, 3, 5, 7, 9]
+primes after removing last item: [2, 3, 5, 7]
+
+

The empty list contains no values.

+
  • Use [] on its own to represent a list that doesn’t +contain any values. +
    • “The zero of lists.”
    • +
  • +
  • Helpful as a starting point for collecting values (which we will see +in the next episode).
  • +

Lists may contain values of different types.

+
  • A single list may contain numbers, strings, and anything else.
  • +
+

PYTHON +

+
goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']
+
+

Character strings can be indexed like lists.

+
  • Get single characters from a character string using indexes in +square brackets.
  • +
+

PYTHON +

+
element = 'carbon'
+print('zeroth character:', element[0])
+print('third character:', element[3])
+
+
+

OUTPUT +

+
zeroth character: c
+third character: b
+
+

Character strings are immutable.

+
  • Cannot change the characters in a string after it has been created. +
    • +Immutable: can’t be changed after creation.
    • +
    • In contrast, lists are mutable: they can be modified in +place.
    • +
  • +
  • Python considers the string to be a single value with parts, not a +collection of values.
  • +
+

PYTHON +

+
element[0] = 'C'
+
+
+

ERROR +

+
TypeError: 'str' object does not support item assignment
+
+
  • Lists and character strings are both collections.
  • +

Indexing beyond the end of the collection is an error.

+
  • Python reports an IndexError if we attempt to access a +value that doesn’t exist. +
    • This is a kind of runtime error.
    • +
    • Cannot be detected as the code is parsed because the index might be +calculated based on data.
    • +
  • +
+

PYTHON +

+
print('99th element of element is:', element[99])
+
+
+

OUTPUT +

+
IndexError: string index out of range
+
+
+
+ +
+
+

Fill in the Blanks

+
+

Fill in the blanks so that the program below produces the output +shown.

+
+

PYTHON +

+
values = ____
+values.____(1)
+values.____(3)
+values.____(5)
+print('first time:', values)
+values = values[____]
+print('second time:', values)
+
+
+

OUTPUT +

+
first time: [1, 3, 5]
+second time: [3, 5]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
values = []
+values.append(1)
+values.append(3)
+values.append(5)
+print('first time:', values)
+values = values[1:]
+print('second time:', values)
+
+
+
+
+
+
+
+ +
+
+

How Large is a Slice?

+
+

If start and stop are both non-negative +integers, how long is the list values[start:stop]?

+
+
+
+
+
+ +
+
+

The list values[start:stop] has up to +stop - start elements. For example, +values[1:4] has the 3 elements values[1], +values[2], and values[3]. Why ‘up to’? As we +saw in episode 2, if stop +is greater than the total length of the list values, we +will still get a list back but it will be shorter than expected.

+
+
+
+
+
+
+ +
+
+

From Strings to Lists and Back

+
+

Given this:

+
+

PYTHON +

+
print('string to list:', list('tin'))
+print('list to string:', ''.join(['g', 'o', 'l', 'd']))
+
+
+

OUTPUT +

+
string to list: ['t', 'i', 'n']
+list to string: gold
+
+
  1. What does list('some string') do?
  2. +
  3. What does '-'.join(['x', 'y', 'z']) generate?
  4. +
+
+
+
+
+ +
+
+
  1. +list('some string') +converts a string into a list containing all of its characters.
  2. +
  3. +join +returns a string that is the concatenation of each string +element in the list and adds the separator between each element in the +list. This results in x-y-z. The separator between the +elements is the string that provides this method.
  4. +
+
+
+
+
+
+ +
+
+

Working With the End

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'helium'
+print(element[-1])
+
+
  1. How does Python interpret a negative index?
  2. +
  3. If a list or string has N elements, what is the most negative index +that can safely be used with it, and what location does that index +represent?
  4. +
  5. If values is a list, what does +del values[-1] do?
  6. +
  7. How can you display all elements but the last one without changing +values? (Hint: you will need to combine slicing and +negative indexing.)
  8. +
+
+
+
+
+ +
+
+

The program prints m.

+
  1. Python interprets a negative index as starting from the end (as +opposed to starting from the beginning). The last element is +-1.
  2. +
  3. The last index that can safely be used with a list of N elements is +element -N, which represents the first element.
  4. +
  5. +del values[-1] removes the last element from the +list.
  6. +
  7. values[:-1]
  8. +
+
+
+
+
+
+ +
+
+

Stepping Through a List

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'fluorine'
+print(element[::2])
+print(element[::-1])
+
+
  1. If we write a slice as low:high:stride, what does +stride do?
  2. +
  3. What expression would select all of the even-numbered items from a +collection?
  4. +
+
+
+
+
+ +
+
+

The program prints

+
+

PYTHON +

+
furn
+eniroulf
+
+
  1. +stride is the step size of the slice.
  2. +
  3. The slice 1::2 selects all even-numbered items from a +collection: it starts with element 1 (which is the second +element, since indexing starts at 0), goes on until the end +(since no end is given), and uses a step size of +2 (i.e., selects every second element).
  4. +
+
+
+
+
+
+ +
+
+

Slice Bounds

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'lithium'
+print(element[0:20])
+print(element[-1:3])
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
lithium
+
+

The first statement prints the whole string, since the slice goes +beyond the total length of the string. The second statement returns an +empty string, because the slice goes “out of bounds” of the string.

+
+
+
+
+
+
+ +
+
+

Sort and Sorted

+
+

What do these two programs print? In simple terms, explain the +difference between sorted(letters) and +letters.sort().

+
+

PYTHON +

+
# Program A
+letters = list('gold')
+result = sorted(letters)
+print('letters is', letters, 'and result is', result)
+
+
+

PYTHON +

+
# Program B
+letters = list('gold')
+result = letters.sort()
+print('letters is', letters, 'and result is', result)
+
+
+
+
+
+
+ +
+
+

Program A prints

+
+

OUTPUT +

+
letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']
+
+

Program B prints

+
+

OUTPUT +

+
letters is ['d', 'g', 'l', 'o'] and result is None
+
+

sorted(letters) returns a sorted copy of the list +letters (the original list letters remains +unchanged), while letters.sort() sorts the list +letters in-place and does not return anything.

+
+
+
+
+
+
+ +
+
+

Copying (or Not)

+
+

What do these two programs print? In simple terms, explain the +difference between new = old and +new = old[:].

+
+

PYTHON +

+
# Program A
+old = list('gold')
+new = old      # simple assignment
+new[0] = 'D'
+print('new is', new, 'and old is', old)
+
+
+

PYTHON +

+
# Program B
+old = list('gold')
+new = old[:]   # assigning a slice
+new[0] = 'D'
+print('new is', new, 'and old is', old)
+
+
+
+
+
+
+ +
+
+

Program A prints

+
+

OUTPUT +

+
new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']
+
+

Program B prints

+
+

OUTPUT +

+
new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']
+
+

new = old makes new a reference to the list +old; new and old point towards +the same object.

+

new = old[:] however creates a new list object +new containing all elements from the list old; +new and old are different objects.

+
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • A list stores many values in a single structure.
  • +
  • Use an item’s index to fetch it from a list.
  • +
  • Lists’ values can be replaced by assigning to them.
  • +
  • Appending items to a list lengthens it.
  • +
  • Use del to remove items from a list entirely.
  • +
  • The empty list contains no values.
  • +
  • Lists may contain values of different types.
  • +
  • Character strings can be indexed like lists.
  • +
  • Character strings are immutable.
  • +
  • Indexing beyond the end of the collection is an error.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/12-for-loops.html b/instructor/12-for-loops.html new file mode 100644 index 000000000..fc4379e51 --- /dev/null +++ b/instructor/12-for-loops.html @@ -0,0 +1,1180 @@ + +Plotting and Programming in Python: For Loops +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

For Loops

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 25 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I make a program do many things?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain what for loops are normally used for.
  • +
  • Trace the execution of a simple (unnested) loop and correctly state +the values of variables in each iteration.
  • +
  • Write for loops that use the Accumulator pattern to aggregate +values.
  • +
+
+
+
+
+

A for loop executes commands once for each value in a +collection.

+
  • Doing calculations on the values in a list one by one is as painful +as working with pressure_001, pressure_002, +etc.
  • +
  • A for loop tells Python to execute some statements once for +each value in a list, a character string, or some other collection.
  • +
  • “for each thing in this group, do these operations”
  • +
+

PYTHON +

+
for number in [2, 3, 5]:
+    print(number)
+
+
  • This for loop is equivalent to:
  • +
+

PYTHON +

+
print(2)
+print(3)
+print(5)
+
+
  • And the for loop’s output is:
  • +
+

OUTPUT +

+
2
+3
+5
+
+

A for loop is made up of a collection, a loop variable, +and a body.

+
+

PYTHON +

+
for number in [2, 3, 5]:
+    print(number)
+
+
  • The collection, [2, 3, 5], is what the loop is being +run on.
  • +
  • The body, print(number), specifies what to do for each +value in the collection.
  • +
  • The loop variable, number, is what changes for each +iteration of the loop. +
    • The “current thing”.
    • +
  • +

The first line of the for loop must end with a colon, +and the body must be indented.

+
  • The colon at the end of the first line signals the start of a +block of statements.
  • +
  • Python uses indentation rather than {} or +begin/end to show nesting. +
    • Any consistent indentation is legal, but almost everyone uses four +spaces.
    • +
  • +
+

PYTHON +

+
for number in [2, 3, 5]:
+print(number)
+
+
+

ERROR +

+
IndentationError: expected an indented block
+
+
  • Indentation is always meaningful in Python.
  • +
+

PYTHON +

+
firstName = "Jon"
+  lastName = "Smith"
+
+
+

ERROR +

+
  File "<ipython-input-7-f65f2962bf9c>", line 2
+    lastName = "Smith"
+    ^
+IndentationError: unexpected indent
+
+
  • This error can be fixed by removing the extra spaces at the +beginning of the second line.
  • +

Loop variables can be called anything.

+
  • As with all variables, loop variables are: +
    • Created on demand.
    • +
    • Meaningless: their names can be anything at all.
    • +
  • +
+

PYTHON +

+
for kitten in [2, 3, 5]:
+    print(kitten)
+
+

The body of a loop can contain many statements.

+
  • But no loop should be more than a few lines long.
  • +
  • Hard for human beings to keep larger chunks of code in mind.
  • +
+

PYTHON +

+
primes = [2, 3, 5]
+for p in primes:
+    squared = p ** 2
+    cubed = p ** 3
+    print(p, squared, cubed)
+
+
+

OUTPUT +

+
2 4 8
+3 9 27
+5 25 125
+
+

Use range to iterate over a sequence of numbers.

+
  • The built-in function range +produces a sequence of numbers. +
    • +Not a list: the numbers are produced on demand to make +looping over large ranges more efficient.
    • +
  • +
  • +range(N) is the numbers 0..N-1 +
    • Exactly the legal indices of a list or character string of length +N
    • +
  • +
+

PYTHON +

+
print('a range is not a list: range(0, 3)')
+for number in range(0, 3):
+    print(number)
+
+
+

OUTPUT +

+
a range is not a list: range(0, 3)
+0
+1
+2
+
+

The Accumulator pattern turns many values into one.

+
  • A common pattern in programs is to: +
    1. Initialize an accumulator variable to zero, the empty +string, or the empty list.
    2. +
    3. Update the variable with values from a collection.
    4. +
  • +
+

PYTHON +

+
# Sum the first 10 integers.
+total = 0
+for number in range(10):
+   total = total + (number + 1)
+print(total)
+
+
+

OUTPUT +

+
55
+
+
  • Read total = total + (number + 1) as: +
    • Add 1 to the current value of the loop variable +number.
    • +
    • Add that to the current value of the accumulator variable +total.
    • +
    • Assign that to total, replacing the current value.
    • +
  • +
  • We have to add number + 1 because range +produces 0..9, not 1..10.
  • +
+
+ +
+
+

Classifying Errors

+
+

Is an indentation error a syntax error or a runtime error?

+
+
+
+
+
+ +
+
+

An IndentationError is a syntax error. Programs with syntax errors +cannot be started. A program with a runtime error will start but an +error will be thrown under certain conditions.

+
+
+
+
+
+
+ +
+
+

Tracing Execution

+
+

Create a table showing the numbers of the lines that are executed +when this program runs, and the values of the variables after each line +is executed.

+
+

PYTHON +

+
total = 0
+for char in "tin":
+    total = total + 1
+
+
+
+
+
+
+ +
+
+ + + + + + + + + + + + + + + + +
Line noVariables
1total = 0
2total = 0 char = ‘t’
3total = 1 char = ‘t’
2total = 1 char = ‘i’
3total = 2 char = ‘i’
2total = 2 char = ‘n’
3total = 3 char = ‘n’
+
+
+
+
+
+ +
+
+

Reversing a String

+
+

Fill in the blanks in the program below so that it prints “nit” (the +reverse of the original character string “tin”).

+
+

PYTHON +

+
original = "tin"
+result = ____
+for char in original:
+    result = ____
+print(result)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
original = "tin"
+result = ""
+for char in original:
+    result = char + result
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating

+
+

Fill in the blanks in each of the programs below to produce the +indicated result.

+
+

PYTHON +

+
# Total length of the strings in the list: ["red", "green", "blue"] => 12
+total = 0
+for word in ["red", "green", "blue"]:
+    ____ = ____ + len(word)
+print(total)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
total = 0
+for word in ["red", "green", "blue"]:
+    total = total + len(word)
+print(total)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+
+

PYTHON +

+
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
+lengths = ____
+for word in ["red", "green", "blue"]:
+    lengths.____(____)
+print(lengths)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
lengths = []
+for word in ["red", "green", "blue"]:
+    lengths.append(len(word))
+print(lengths)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+
+

PYTHON +

+
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
+words = ["red", "green", "blue"]
+result = ____
+for ____ in ____:
+    ____
+print(result)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
words = ["red", "green", "blue"]
+result = ""
+for word in words:
+    result = result + word
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+

Create an acronym: Starting from the list +["red", "green", "blue"], create the acronym +"RGB" using a for loop.

+

Hint: You may need to use a string method to +properly format the acronym.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
acronym = ""
+for word in ["red", "green", "blue"]:
+    acronym = acronym + word[0].upper()
+print(acronym)
+
+
+
+
+
+
+
+ +
+
+

Cumulative Sum

+
+

Reorder and properly indent the lines of code below so that they +print a list with the cumulative sum of data. The result should be +[1, 3, 5, 10].

+
+

PYTHON +

+
cumulative.append(total)
+for number in data:
+cumulative = []
+total = total + number
+total = 0
+print(cumulative)
+data = [1,2,2,5]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
total = 0
+data = [1,2,2,5]
+cumulative = []
+for number in data:
+    total = total + number
+    cumulative.append(total)
+print(cumulative)
+
+
+
+
+
+
+
+ +
+
+

Identifying Variable Name Errors

+
+
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code and read the error message. What type of +NameError do you think this is? Is it a string with no +quotes, a misspelled variable, or a variable that should have been +defined but was not?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+

PYTHON +

+
for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (Number % 3) == 0:
+        message = message + a
+    else:
+        message = message + "b"
+print(message)
+
+
+
+
+
+
+ +
+
+
  • Python variable names are case sensitive: number and +Number refer to different variables.
  • +
  • The variable message needs to be initialized as an +empty string.
  • +
  • We want to add the string "a" to message, +not the undefined variable a.
  • +
+

PYTHON +

+
message = ""
+for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (number % 3) == 0:
+        message = message + "a"
+    else:
+        message = message + "b"
+print(message)
+
+
+
+
+
+
+
+ +
+
+

Identifying Item Errors

+
+
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
  5. Fix the error.
  6. +
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+ +
+
+

This list has 4 elements and the index to access the last element in +the list is 3.

+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[3])
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • A for loop executes commands once for each value in a +collection.
  • +
  • A for loop is made up of a collection, a loop variable, +and a body.
  • +
  • The first line of the for loop must end with a colon, +and the body must be indented.
  • +
  • Indentation is always meaningful in Python.
  • +
  • Loop variables can be called anything (but it is strongly advised to +have a meaningful name to the looping variable).
  • +
  • The body of a loop can contain many statements.
  • +
  • Use range to iterate over a sequence of numbers.
  • +
  • The Accumulator pattern turns many values into one.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/13-conditionals.html b/instructor/13-conditionals.html new file mode 100644 index 000000000..b63f78bb9 --- /dev/null +++ b/instructor/13-conditionals.html @@ -0,0 +1,1095 @@ + +Plotting and Programming in Python: Conditionals +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Conditionals

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 25 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can programs do different things for different data?
  • +
+
+
+
+
+
+

Objectives

+
  • Correctly write programs that use if and else statements and simple +Boolean expressions (without logical operators).
  • +
  • Trace the execution of unnested conditionals and conditionals inside +loops.
  • +
+
+
+
+
+

Use if statements to control whether or not a block of +code is executed.

+
  • An if statement (more properly called a +conditional statement) controls whether some block of code is +executed or not.
  • +
  • Structure is similar to a for statement: +
    • First line opens with if and ends with a colon
    • +
    • Body containing one or more statements is indented (usually by 4 +spaces)
    • +
  • +
+

PYTHON +

+
mass = 3.54
+if mass > 3.0:
+    print(mass, 'is large')
+
+mass = 2.07
+if mass > 3.0:
+    print (mass, 'is large')
+
+
+

OUTPUT +

+
3.54 is large
+
+

Conditionals are often used inside loops.

+
  • Not much point using a conditional when we know the value (as +above).
  • +
  • But useful when we have a collection to process.
  • +
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 3.0:
+        print(m, 'is large')
+
+
+

OUTPUT +

+
3.54 is large
+9.22 is large
+
+

Use else to execute a block of code when an +if condition is not true.

+
  • +else can be used following an if.
  • +
  • Allows us to specify an alternative to execute when the +if branch isn’t taken.
  • +
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 3.0:
+        print(m, 'is large')
+    else:
+        print(m, 'is small')
+
+
+

OUTPUT +

+
3.54 is large
+2.07 is small
+9.22 is large
+1.86 is small
+1.71 is small
+
+

Use elif to specify additional tests.

+
  • May want to provide several alternative choices, each with its own +test.
  • +
  • Use elif (short for “else if”) and a condition to +specify these.
  • +
  • Always associated with an if.
  • +
  • Must come before the else (which is the “catch +all”).
  • +
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 9.0:
+        print(m, 'is HUGE')
+    elif m > 3.0:
+        print(m, 'is large')
+    else:
+        print(m, 'is small')
+
+
+

OUTPUT +

+
3.54 is large
+2.07 is small
+9.22 is HUGE
+1.86 is small
+1.71 is small
+
+

Conditions are tested once, in order.

+
  • Python steps through the branches of the conditional in order, +testing each in turn.
  • +
  • So ordering matters.
  • +
+

PYTHON +

+
grade = 85
+if grade >= 90:
+    print('grade is A')
+elif grade >= 80:
+    print('grade is B')
+elif grade >= 70:
+    print('grade is C')
+
+
+

OUTPUT +

+
grade is B
+
+
  • Does not automatically go back and re-evaluate if values +change.
  • +
+

PYTHON +

+
velocity = 10.0
+if velocity > 20.0:
+    print('moving too fast')
+else:
+    print('adjusting velocity')
+    velocity = 50.0
+
+
+

OUTPUT +

+
adjusting velocity
+
+
  • Often use conditionals in a loop to “evolve” the values of +variables.
  • +
+

PYTHON +

+
velocity = 10.0
+for i in range(5): # execute the loop 5 times
+    print(i, ':', velocity)
+    if velocity > 20.0:
+        print('moving too fast')
+        velocity = velocity - 5.0
+    else:
+        print('moving too slow')
+        velocity = velocity + 10.0
+print('final velocity:', velocity)
+
+
+

OUTPUT +

+
0 : 10.0
+moving too slow
+1 : 20.0
+moving too slow
+2 : 30.0
+moving too fast
+3 : 25.0
+moving too fast
+4 : 20.0
+moving too slow
+final velocity: 30.0
+
+

Create a table showing variables’ values to trace a program’s +execution.

+
+ + + + + + + + + + + + + + + + + + + + + +
+i + +0 + +. + +1 + +. + +2 + +. + +3 + +. + +4 + +. +
+velocity + +10.0 + +20.0 + +. + +30.0 + +. + +25.0 + +. + +20.0 + +. + +30.0 +
  • The program must have a print statement +outside the body of the loop to show the final value of +velocity, since its value is updated by the last iteration +of the loop.
  • +
+
+ +
+
+

Compound Relations Using and, +or, and Parentheses

+
+

Often, you want some combination of things to be true. You can +combine relations within a conditional using and and +or. Continuing the example above, suppose you have

+
+

PYTHON +

+
mass     = [ 3.54,  2.07,  9.22,  1.86,  1.71]
+velocity = [10.00, 20.00, 30.00, 25.00, 20.00]
+
+i = 0
+for i in range(5):
+    if mass[i] > 5 and velocity[i] > 20:
+        print("Fast heavy object.  Duck!")
+    elif mass[i] > 2 and mass[i] <= 5 and velocity[i] <= 20:
+        print("Normal traffic")
+    elif mass[i] <= 2 and velocity[i] <= 20:
+        print("Slow light object.  Ignore it")
+    else:
+        print("Whoa!  Something is up with the data.  Check it")
+
+

Just like with arithmetic, you can and should use parentheses +whenever there is possible ambiguity. A good general rule is to +always use parentheses when mixing and and +or in the same condition. That is, instead of:

+
+

PYTHON +

+
if mass[i] <= 2 or mass[i] >= 5 and velocity[i] > 20:
+
+

write one of these:

+
+

PYTHON +

+
if (mass[i] <= 2 or mass[i] >= 5) and velocity[i] > 20:
+if mass[i] <= 2 or (mass[i] >= 5 and velocity[i] > 20):
+
+

so it is perfectly clear to a reader (and to Python) what you really +mean.

+
+
+
+
+
+ +
+
+

Tracing Execution

+
+

What does this program print?

+
+

PYTHON +

+
pressure = 71.9
+if pressure > 50.0:
+    pressure = 25.0
+elif pressure <= 50.0:
+    pressure = 0.0
+print(pressure)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
25.0
+
+
+
+
+
+
+
+ +
+
+

Trimming Values

+
+

Fill in the blanks so that this program creates a new list containing +zeroes where the original list’s values were negative and ones where the +original list’s values were positive.

+
+

PYTHON +

+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = ____
+for value in original:
+    if ____:
+        result.append(0)
+    else:
+        ____
+print(result)
+
+
+

OUTPUT +

+
[0, 1, 1, 1, 0, 1]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = []
+for value in original:
+    if value < 0.0:
+        result.append(0)
+    else:
+        result.append(1)
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Processing Small Files

+
+

Modify this program so that it only processes files with fewer than +50 records.

+
+

PYTHON +

+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+    contents = pd.read_csv(filename)
+    ____:
+        print(filename, len(contents))
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+    contents = pd.read_csv(filename)
+    if len(contents) < 50:
+        print(filename, len(contents))
+
+
+
+
+
+
+
+ +
+
+

Initializing

+
+

Modify this program so that it finds the largest and smallest values +in the list no matter what the range of values originally is.

+
+

PYTHON +

+
values = [...some test data...]
+smallest, largest = None, None
+for v in values:
+    if ____:
+        smallest, largest = v, v
+    ____:
+        smallest = min(____, v)
+        largest = max(____, v)
+print(smallest, largest)
+
+

What are the advantages and disadvantages of using this method to +find the range of the data?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest = None, None
+for v in values:
+    if smallest is None and largest is None:
+        smallest, largest = v, v
+    else:
+        smallest = min(smallest, v)
+        largest = max(largest, v)
+print(smallest, largest)
+
+

If you wrote == None instead of is None, +that works too, but Python programmers always write is None +because of the special way None works in the language.

+

It can be argued that an advantage of using this method would be to +make the code more readable. However, a disadvantage is that this code +is not efficient because within each iteration of the for +loop statement, there are two more loops that run over two numbers each +(the min and max functions). It would be more +efficient to iterate over each number just once:

+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest = None, None
+for v in values:
+    if smallest is None or v < smallest:
+        smallest = v
+    if largest is None or v > largest:
+        largest = v
+print(smallest, largest)
+
+

Now we have one loop, but four comparison tests. There are two ways +we could improve it further: either use fewer comparisons in each +iteration, or use two loops that each contain only one comparison test. +The simplest solution is often the best:

+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest = min(values)
+largest = max(values)
+print(smallest, largest)
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use if statements to control whether or not a block of +code is executed.
  • +
  • Conditionals are often used inside loops.
  • +
  • Use else to execute a block of code when an +if condition is not true.
  • +
  • Use elif to specify additional tests.
  • +
  • Conditions are tested once, in order.
  • +
  • Create a table showing variables’ values to trace a program’s +execution.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/14-looping-data-sets.html b/instructor/14-looping-data-sets.html new file mode 100644 index 000000000..9d9a7964a --- /dev/null +++ b/instructor/14-looping-data-sets.html @@ -0,0 +1,859 @@ + +Plotting and Programming in Python: Looping Over Data Sets +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Looping Over Data Sets

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 15 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I process many data sets with a single command?
  • +
+
+
+
+
+
+

Objectives

+
  • Be able to read and write globbing expressions that match sets of +files.
  • +
  • Use glob to create lists of files.
  • +
  • Write for loops to perform operations on files given their names in +a list.
  • +
+
+
+
+
+

Use a for loop to process files given a list of their +names.

+
  • A filename is a character string.
  • +
  • And lists can contain character strings.
  • +
+

PYTHON +

+
import pandas as pd
+for filename in ['data/gapminder_gdp_africa.csv', 'data/gapminder_gdp_asia.csv']:
+    data = pd.read_csv(filename, index_col='country')
+    print(filename, data.min())
+
+
+

OUTPUT +

+
data/gapminder_gdp_africa.csv gdpPercap_1952    298.846212
+gdpPercap_1957    335.997115
+gdpPercap_1962    355.203227
+gdpPercap_1967    412.977514
+⋮ ⋮ ⋮
+gdpPercap_1997    312.188423
+gdpPercap_2002    241.165877
+gdpPercap_2007    277.551859
+dtype: float64
+data/gapminder_gdp_asia.csv gdpPercap_1952    331
+gdpPercap_1957    350
+gdpPercap_1962    388
+gdpPercap_1967    349
+⋮ ⋮ ⋮
+gdpPercap_1997    415
+gdpPercap_2002    611
+gdpPercap_2007    944
+dtype: float64
+
+

Use glob.glob +to find sets of files whose names match a pattern.

+
  • In Unix, the term “globbing” means “matching a set of files with a +pattern”.
  • +
  • The most common patterns are: +
    • +* meaning “match zero or more characters”
    • +
    • +? meaning “match exactly one character”
    • +
  • +
  • Python’s standard library contains the glob +module to provide pattern matching functionality
  • +
  • The glob +module contains a function also called glob to match file +patterns
  • +
  • E.g., glob.glob('*.txt') matches all files in the +current directory whose names end with .txt.
  • +
  • Result is a (possibly empty) list of character strings.
  • +
+

PYTHON +

+
import glob
+print('all csv files in data directory:', glob.glob('data/*.csv'))
+
+
+

OUTPUT +

+
all csv files in data directory: ['data/gapminder_all.csv', 'data/gapminder_gdp_africa.csv', \
+'data/gapminder_gdp_americas.csv', 'data/gapminder_gdp_asia.csv', 'data/gapminder_gdp_europe.csv', \
+'data/gapminder_gdp_oceania.csv']
+
+
+

PYTHON +

+
print('all PDB files:', glob.glob('*.pdb'))
+
+
+

OUTPUT +

+
all PDB files: []
+
+

Use glob and for to process batches of +files.

+
  • Helps a lot if the files are named and stored systematically and +consistently so that simple patterns will find the right data.
  • +
+

PYTHON +

+
for filename in glob.glob('data/gapminder_*.csv'):
+    data = pd.read_csv(filename)
+    print(filename, data['gdpPercap_1952'].min())
+
+
+

OUTPUT +

+
data/gapminder_all.csv 298.8462121
+data/gapminder_gdp_africa.csv 298.8462121
+data/gapminder_gdp_americas.csv 1397.717137
+data/gapminder_gdp_asia.csv 331.0
+data/gapminder_gdp_europe.csv 973.5331948
+data/gapminder_gdp_oceania.csv 10039.59564
+
+
  • This includes all data, as well as per-region data.
  • +
  • Use a more specific pattern in the exercises to exclude the whole +data set.
  • +
  • But note that the minimum of the entire data set is also the minimum +of one of the data sets, which is a nice check on correctness.
  • +
+
+ +
+
+

Determining Matches

+
+

Which of these files is not matched by the expression +glob.glob('data/*as*.csv')?

+
  1. data/gapminder_gdp_africa.csv
  2. +
  3. data/gapminder_gdp_americas.csv
  4. +
  5. data/gapminder_gdp_asia.csv
  6. +
+
+
+
+
+ +
+
+

1 is not matched by the glob.

+
+
+
+
+
+
+ +
+
+

Minimum File Size

+
+

Modify this program so that it prints the number of records in the +file that has the fewest records.

+
+

PYTHON +

+
import glob
+import pandas as pd
+fewest = ____
+for filename in glob.glob('data/*.csv'):
+    dataframe = pd.____(filename)
+    fewest = min(____, dataframe.shape[0])
+print('smallest file has', fewest, 'records')
+
+

Note that the DataFrame.shape() +method returns a tuple with the number of rows and columns of the +data frame.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import glob
+import pandas as pd
+fewest = float('Inf')
+for filename in glob.glob('data/*.csv'):
+    dataframe = pd.read_csv(filename)
+    fewest = min(fewest, dataframe.shape[0])
+print('smallest file has', fewest, 'records')
+
+

You might have chosen to initialize the fewest variable +with a number greater than the numbers you’re dealing with, but that +could lead to trouble if you reuse the code with bigger numbers. Python +lets you use positive infinity, which will work no matter how big your +numbers are. What other special strings does the float +function recognize?

+
+
+
+
+
+
+ +
+
+

Comparing Data

+
+

Write a program that reads in the regional data sets and plots the +average GDP per capita for each region over time in a single chart. +Pandas will raise an error if it encounters non-numeric columns in a +dataframe computation so you may need to either filter out those columns +or tell pandas to ignore them.

+
+
+
+
+
+ +
+
+

This solution builds a useful legend by using the string +split method to extract the region from +the path ‘data/gapminder_gdp_a_specific_region.csv’.

+
+

PYTHON +

+
import glob
+import pandas as pd
+import matplotlib.pyplot as plt
+fig, ax = plt.subplots(1,1)
+for filename in glob.glob('data/gapminder_gdp*.csv'):
+    dataframe = pd.read_csv(filename)
+    # extract <region> from the filename, expected to be in the format 'data/gapminder_gdp_<region>.csv'.
+    # we will split the string using the split method and `_` as our separator,
+    # retrieve the last string in the list that split returns (`<region>.csv`), 
+    # and then remove the `.csv` extension from that string.
+    # NOTE: the pathlib module covered in the next callout also offers
+    # convenient abstractions for working with filesystem paths and could solve this as well:
+    # from pathlib import Path
+    # region = Path(filename).stem.split('_')[-1]
+    region = filename.split('_')[-1][:-4] 
+    # pandas raises errors when it encounters non-numeric columns in a dataframe computation
+    # but we can tell pandas to ignore them with the `numeric_only` parameter
+    dataframe.mean(numeric_only=True).plot(ax=ax, label=region)
+    # NOTE: another way of doing this selects just the columns with gdp in their name using the filter method
+    # dataframe.filter(like="gdp").mean().plot(ax=ax, label=region)
+
+plt.legend()
+plt.show()
+
+
+
+
+
+
+
+ +
+
+

Dealing with File Paths

+
+

The pathlib +module provides useful abstractions for file and path manipulation +like returning the name of a file without the file extension. This is +very useful when looping over files and directories. In the example +below, we create a Path object and inspect its +attributes.

+
+

PYTHON +

+
from pathlib import Path
+
+p = Path("data/gapminder_gdp_africa.csv")
+print(p.parent)
+print(p.stem)
+print(p.suffix)
+
+
+

OUTPUT +

+
data
+gapminder_gdp_africa
+.csv
+
+

Hint: Check all available attributes and methods on +the Path object with the dir() function.

+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Use a for loop to process files given a list of their +names.
  • +
  • Use glob.glob to find sets of files whose names match a +pattern.
  • +
  • Use glob and for to process batches of +files.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/15-coffee.html b/instructor/15-coffee.html new file mode 100644 index 000000000..23d840137 --- /dev/null +++ b/instructor/15-coffee.html @@ -0,0 +1,551 @@ + +Plotting and Programming in Python: Afternoon Coffee +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Afternoon Coffee

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 0 minutes

+ +
+ +
+ + +

Reflection exercise

+

Over break, reflect on and discuss the following:

+
  • A common refrain in software engineering is “Don’t Repeat Yourself”. +How do the techniques we’ve learned in the last lessons help us avoid +repeating ourselves? Note that in practice there is some nuance to +this and should be balanced with doing the simplest thing that could +possibly work. +
  • +
  • What are the pros / cons of making a variable global or local to a +function?
  • +
  • When would you consider turning a block of code into a function +definition?
  • +
+
+ + +
+
+ + + diff --git a/instructor/16-writing-functions.html b/instructor/16-writing-functions.html new file mode 100644 index 000000000..ab0c681a2 --- /dev/null +++ b/instructor/16-writing-functions.html @@ -0,0 +1,1382 @@ + +Plotting and Programming in Python: Writing Functions +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Writing Functions

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 25 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I create my own functions?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain and identify the difference between function definition and +function call.
  • +
  • Write a function that takes a small, fixed number of arguments and +produces a single result.
  • +
+
+
+
+
+

Break programs down into functions to make them easier to +understand.

+
  • Human beings can only keep a few items in working memory at a +time.
  • +
  • Understand larger/more complicated ideas by understanding and +combining pieces. +
    • Components in a machine.
    • +
    • Lemmas when proving theorems.
    • +
  • +
  • Functions serve the same purpose in programs. +
    • +Encapsulate complexity so that we can treat it as a single +“thing”.
    • +
  • +
  • Also enables re-use. +
    • Write one time, use many times.
    • +
  • +

Define a function using def with a name, parameters, +and a block of code.

+
  • Begin the definition of a new function with def.
  • +
  • Followed by the name of the function. +
    • Must obey the same rules as variable names.
    • +
  • +
  • Then parameters in parentheses. +
    • Empty parentheses if the function doesn’t take any inputs.
    • +
    • We will discuss this in detail in a moment.
    • +
  • +
  • Then a colon.
  • +
  • Then an indented block of code.
  • +
+

PYTHON +

+
def print_greeting():
+    print('Hello!')
+    print('The weather is nice today.')
+    print('Right?')
+
+

Defining a function does not run it.

+
  • Defining a function does not run it. +
    • Like assigning a value to a variable.
    • +
  • +
  • Must call the function to execute the code it contains.
  • +
+

PYTHON +

+
print_greeting()
+
+
+

OUTPUT +

+
Hello!
+
+

Arguments in a function call are matched to its defined +parameters.

+
  • Functions are most useful when they can operate on different +data.
  • +
  • Specify parameters when defining a function. +
    • These become variables when the function is executed.
    • +
    • Are assigned the arguments in the call (i.e., the values passed to +the function).
    • +
    • If you don’t name the arguments when using them in the call, the +arguments will be matched to parameters in the order the parameters are +defined in the function.
    • +
  • +
+

PYTHON +

+
def print_date(year, month, day):
+    joined = str(year) + '/' + str(month) + '/' + str(day)
+    print(joined)
+
+print_date(1871, 3, 19)
+
+
+

OUTPUT +

+
1871/3/19
+
+

Or, we can name the arguments when we call the function, which allows +us to specify them in any order and adds clarity to the call site; +otherwise as one is reading the code they might forget if the second +argument is the month or the day for example.

+
+

PYTHON +

+
print_date(month=3, day=19, year=1871)
+
+
+

OUTPUT +

+
1871/3/19
+
+
  • Via Twitter: +() contains the ingredients for the function while the body +contains the recipe.
  • +

Functions may return a result to their caller using +return.

+
  • Use return ... to give a value back to the caller.
  • +
  • May occur anywhere in the function.
  • +
  • But functions are easier to understand if return +occurs: +
    • At the start to handle special cases.
    • +
    • At the very end, with a final result.
    • +
  • +
+

PYTHON +

+
def average(values):
+    if len(values) == 0:
+        return None
+    return sum(values) / len(values)
+
+
+

PYTHON +

+
a = average([1, 3, 4])
+print('average of actual values:', a)
+
+
+

OUTPUT +

+
average of actual values: 2.6666666666666665
+
+
+

PYTHON +

+
print('average of empty list:', average([]))
+
+
+

OUTPUT +

+
average of empty list: None
+
+
+

PYTHON +

+
result = print_date(1871, 3, 19)
+print('result of call is:', result)
+
+
+

OUTPUT +

+
1871/3/19
+result of call is: None
+
+
+
+ +
+
+

Identifying Syntax Errors

+
+
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code and read the error message. Is it a +SyntaxError or an IndentationError?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3 until you have fixed all the errors.
  8. +
+

PYTHON +

+
def another_function
+  print("Syntax errors are annoying.")
+   print("But at least python tells us about them!")
+  print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def another_function():
+  print("Syntax errors are annoying.")
+  print("But at least Python tells us about them!")
+  print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+
+ +
+
+

Definition and Use

+
+

What does the following program print?

+
+

PYTHON +

+
def report(pressure):
+    print('pressure is', pressure)
+
+print('calling', report, 22.5)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
calling <function report at 0x7fd128ff1bf8> 22.5
+
+

A function call always needs parenthesis, otherwise you get memory +address of the function object. So, if we wanted to call the function +named report, and give it the value 22.5 to report on, we could have our +function call as follows

+
+

PYTHON +

+
print("calling")
+report(22.5)
+
+
+

OUTPUT +

+
calling
+pressure is 22.5
+
+
+
+
+
+
+
+ +
+
+

Order of Operations

+
+
  1. What’s wrong in this example?
  2. +
+

PYTHON +

+
result = print_time(11, 37, 59)
+
+def print_time(hour, minute, second):
+   time_string = str(hour) + ':' + str(minute) + ':' + str(second)
+   print(time_string)
+
+
  1. After fixing the problem above, explain why running this example +code:
  2. +
+

PYTHON +

+
result = print_time(11, 37, 59)
+print('result of call is:', result)
+
+

gives this output:

+
+

OUTPUT +

+
11:37:59
+result of call is: None
+
+
  1. Why is the result of the call None?
  2. +
+
+
+
+
+ +
+
+
  1. The problem with the example is that the function +print_time() is defined after the call to the +function is made. Python doesn’t know how to resolve the name +print_time since it hasn’t been defined yet and will raise +a NameError e.g., +NameError: name 'print_time' is not defined

  2. +
  3. The first line of output 11:37:59 is printed by the +first line of code, result = print_time(11, 37, 59) that +binds the value returned by invoking print_time to the +variable result. The second line is from the second print +call to print the contents of the result variable.

  4. +
  5. print_time() does not explicitly return +a value, so it automatically returns None.

  6. +
+
+
+
+
+
+ +
+
+

Encapsulation

+
+

Fill in the blanks to create a function that takes a single filename +as an argument, loads the data in the file named by the argument, and +returns the minimum value in that data.

+
+

PYTHON +

+
import pandas as pd
+
+def min_in_data(____):
+    data = ____
+    return ____
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import pandas as pd
+
+def min_in_data(filename):
+    data = pd.read_csv(filename)
+    return data.min()
+
+
+
+
+
+
+
+ +
+
+

Find the First

+
+

Fill in the blanks to create a function that takes a list of numbers +as an argument and returns the first negative value in the list. What +does your function do if the list is empty? What if the list has no +negative numbers?

+
+

PYTHON +

+
def first_negative(values):
+    for v in ____:
+        if ____:
+            return ____
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def first_negative(values):
+    for v in values:
+        if v < 0:
+            return v
+
+

If an empty list or a list with all positive values is passed to this +function, it returns None:

+
+

PYTHON +

+
my_list = []
+print(first_negative(my_list))
+
+
+

OUTPUT +

+
None
+
+
+
+
+
+
+
+ +
+
+

Calling by Name

+
+

Earlier we saw this function:

+
+

PYTHON +

+
def print_date(year, month, day):
+    joined = str(year) + '/' + str(month) + '/' + str(day)
+    print(joined)
+
+

We saw that we can call the function using named arguments, +like this:

+
+

PYTHON +

+
print_date(day=1, month=2, year=2003)
+
+
  1. What does print_date(day=1, month=2, year=2003) +print?
  2. +
  3. When have you seen a function call like this before?
  4. +
  5. When and why is it useful to call functions this way?
  6. +
+
+
+
+
+ +
+
+
  1. 2003/2/1
  2. +
  3. We saw examples of using named arguments when working with +the pandas library. For example, when reading in a dataset using +data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country'), +the last argument index_col is a named argument.
  4. +
  5. Using named arguments can make code more readable since one can see +from the function call what name the different arguments have inside the +function. It can also reduce the chances of passing arguments in the +wrong order, since by using named arguments the order doesn’t +matter.
  6. +
+
+
+
+
+
+ +
+
+

Encapsulation of an If/Print Block

+
+

The code below will run on a label-printer for chicken eggs. A +digital scale will report a chicken egg mass (in grams) to the computer +and then the computer will print a label.

+
+

PYTHON +

+
import random
+for i in range(10):
+
+    # simulating the mass of a chicken egg
+    # the (random) mass will be 70 +/- 20 grams
+    mass = 70 + 20.0 * (2.0 * random.random() - 1.0)
+
+    print(mass)
+
+    # egg sizing machinery prints a label
+    if mass >= 85:
+        print("jumbo")
+    elif mass >= 70:
+        print("large")
+    elif mass < 70 and mass >= 55:
+        print("medium")
+    else:
+        print("small")
+
+

The if-block that classifies the eggs might be useful in other +situations, so to avoid repeating it, we could fold it into a function, +get_egg_label(). Revising the program to use the function +would give us this:

+
+

PYTHON +

+
# revised version
+import random
+for i in range(10):
+
+    # simulating the mass of a chicken egg
+    # the (random) mass will be 70 +/- 20 grams
+    mass = 70 + 20.0 * (2.0 * random.random() - 1.0)
+
+    print(mass, get_egg_label(mass))
+
+
  1. Create a function definition for get_egg_label() that +will work with the revised program above. Note that the +get_egg_label() function’s return value will be important. +Sample output from the above program would be +71.23 large.
  2. +
  3. A dirty egg might have a mass of more than 90 grams, and a spoiled +or broken egg will probably have a mass that’s less than 50 grams. +Modify your get_egg_label() function to account for these +error conditions. Sample output could be +25 too light, probably spoiled.
  4. +
+
+
+
+
+ +
+
+
+

PYTHON +

+
def get_egg_label(mass):
+    # egg sizing machinery prints a label
+    egg_label = "Unlabelled"
+    if mass >= 90:
+        egg_label = "warning: egg might be dirty"
+    elif mass >= 85:
+        egg_label = "jumbo"
+    elif mass >= 70:
+        egg_label = "large"
+    elif mass < 70 and mass >= 55:
+        egg_label = "medium"
+    elif mass < 50:
+        egg_label = "too light, probably spoiled"
+    else:
+        egg_label = "small"
+    return egg_label
+
+
+
+
+
+
+
+ +
+
+

Encapsulating Data Analysis

+
+

Assume that the following code has been executed:

+
+

PYTHON +

+
import pandas as pd
+
+data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col=0)
+japan = data_asia.loc['Japan']
+
+
  1. Complete the statements below to obtain the average GDP for Japan +across the years reported for the 1980s.
  2. +
+

PYTHON +

+
year = 1983
+gdp_decade = 'gdpPercap_' + str(year // ____)
+avg = (japan.loc[gdp_decade + ___] + japan.loc[gdp_decade + ___]) / 2
+
+
  1. Abstract the code above into a single function.
  2. +
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_'+___+'.csv',delimiter=',',index_col=0)
+    ____
+    ____
+    ____
+    return avg
+
+
  1. How would you generalize this function if you did not know +beforehand which specific years occurred as columns in the data? For +instance, what if we also had data from years ending in 1 and 9 for each +decade? (Hint: use the columns to filter out the ones that correspond to +the decade, instead of enumerating them in the code.)
  2. +
+
+
+
+
+ +
+
+
  1. The average GDP for Japan across the years reported for the 1980s is +computed with:
  2. +
+

PYTHON +

+
year = 1983
+gdp_decade = 'gdpPercap_' + str(year // 10)
+avg = (japan.loc[gdp_decade + '2'] + japan.loc[gdp_decade + '7']) / 2
+
+
  1. That code as a function is:
  2. +
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0)
+    c = data_countries.loc[country]
+    gdp_decade = 'gdpPercap_' + str(year // 10)
+    avg = (c.loc[gdp_decade + '2'] + c.loc[gdp_decade + '7'])/2
+    return avg
+
+
  1. To obtain the average for the relevant years, we need to loop over +them:
  2. +
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0)
+    c = data_countries.loc[country]
+    gdp_decade = 'gdpPercap_' + str(year // 10)
+    total = 0.0
+    num_years = 0
+    for yr_header in c.index: # c's index contains reported years
+        if yr_header.startswith(gdp_decade):
+            total = total + c.loc[yr_header]
+            num_years = num_years + 1
+    return total/num_years
+
+

The function can now be called by:

+
+

PYTHON +

+
avg_gdp_in_decade('Japan','asia',1983)
+
+
+

OUTPUT +

+
20880.023800000003
+
+
+
+
+
+
+
+ +
+
+

Simulating a dynamical system

+
+

In mathematics, a dynamical +system is a system in which a function describes the time dependence +of a point in a geometrical space. A canonical example of a dynamical +system is the logistic map, a +growth model that computes a new population density (between 0 and 1) +based on the current density. In the model, time takes discrete values +0, 1, 2, …

+
  1. Define a function called logistic_map that takes two +inputs: x, representing the current population (at time +t), and a parameter r = 1. This function +should return a value representing the state of the system (population) +at time t + 1, using the mapping function:
  2. +

f(t+1) = r * f(t) * [1 - f(t)]

+
  1. Using a for or while loop, iterate the +logistic_map function defined in part 1, starting from an +initial population of 0.5, for a period of time +t_final = 10. Store the intermediate results in a list so +that after the loop terminates you have accumulated a sequence of values +representing the state of the logistic map at times +t = [0,1,...,t_final] (11 values in total). Print this list +to see the evolution of the population.

  2. +
  3. Encapsulate the logic of your loop into a function called +iterate that takes the initial population as its first +input, the parameter t_final as its second input and the +parameter r as its third input. The function should return +the list of values representing the state of the logistic map at times +t = [0,1,...,t_final]. Run this function for periods +t_final = 100 and 1000 and print some of the +values. Is the population trending toward a steady state?

  4. +
+
+
+
+
+ +
+
+
  1. +

    PYTHON +

    +
    def logistic_map(x, r):
    +    return r * x * (1 - x)
    +
  2. +
  3. +

    PYTHON +

    +
    initial_population = 0.5
    +t_final = 10
    +r = 1.0
    +population = [initial_population]
    +
    +for t in range(t_final):
    +    population.append( logistic_map(population[t], r) )
    +
  4. +
  5. +
    +

    PYTHON +

    +
    def iterate(initial_population, t_final, r):
    +    population = [initial_population]
    +    for t in range(t_final):
    +        population.append( logistic_map(population[t], r) )
    +    return population
    +
    +for period in (10, 100, 1000):
    +    population = iterate(0.5, period, 1)
    +    print(population[-1])
    +
    +
    +

    OUTPUT +

    +
    0.06945089389714401
    +0.009395779870614648
    +0.0009913908614406382
    +
    +The population seems to be approaching zero.
  6. +
+
+
+
+
+
+ +
+
+

Using Functions With Conditionals in Pandas

+
+

Functions will often contain conditionals. Here is a short example +that will indicate which quartile the argument is in based on hand-coded +values for the quartile cut points.

+
+

PYTHON +

+
def calculate_life_quartile(exp):
+    if exp < 58.41:
+        # This observation is in the first quartile
+        return 1
+    elif exp >= 58.41 and exp < 67.05:
+        # This observation is in the second quartile
+       return 2
+    elif exp >= 67.05 and exp < 71.70:
+        # This observation is in the third quartile
+       return 3
+    elif exp >= 71.70:
+        # This observation is in the fourth quartile
+       return 4
+    else:
+        # This observation has bad data
+       return None
+
+calculate_life_quartile(62.5)
+
+
+

OUTPUT +

+
2
+
+

That function would typically be used within a for loop, +but Pandas has a different, more efficient way of doing the same thing, +and that is by applying a function to a dataframe or a portion +of a dataframe. Here is an example, using the definition above.

+
+

PYTHON +

+
data = pd.read_csv('data/gapminder_all.csv')
+data['life_qrtl'] = data['lifeExp_1952'].apply(calculate_life_quartile)
+
+

There is a lot in that second line, so let’s take it piece by piece. +On the right side of the = we start with +data['lifeExp'], which is the column in the dataframe +called data labeled lifExp. We use the +apply() to do what it says, apply the +calculate_life_quartile to the value of this column for +every row in the dataframe.

+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Break programs down into functions to make them easier to +understand.
  • +
  • Define a function using def with a name, parameters, +and a block of code.
  • +
  • Defining a function does not run it.
  • +
  • Arguments in a function call are matched to its defined +parameters.
  • +
  • Functions may return a result to their caller using +return.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/17-scope.html b/instructor/17-scope.html new file mode 100644 index 000000000..115a0fb17 --- /dev/null +++ b/instructor/17-scope.html @@ -0,0 +1,714 @@ + +Plotting and Programming in Python: Variable Scope +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Variable Scope

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 20 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How do function calls actually work?
  • +
  • How can I determine where errors occurred?
  • +
+
+
+
+
+
+

Objectives

+
  • Identify local and global variables.
  • +
  • Identify parameters as local variables.
  • +
  • Read a traceback and determine the file, function, and line number +on which the error occurred, the type of error, and the error +message.
  • +
+
+
+
+
+

The scope of a variable is the part of a program that can ‘see’ that +variable.

+
  • There are only so many sensible names for variables.
  • +
  • People using functions shouldn’t have to worry about what variable +names the author of the function used.
  • +
  • People writing functions shouldn’t have to worry about what variable +names the function’s caller uses.
  • +
  • The part of a program in which a variable is visible is called its +scope.
  • +
+

PYTHON +

+
pressure = 103.9
+
+def adjust(t):
+    temperature = t * 1.43 / pressure
+    return temperature
+
+
  • +pressure is a global variable. +
    • Defined outside any particular function.
    • +
    • Visible everywhere.
    • +
  • +
  • +t and temperature are local +variables in adjust. +
    • Defined in the function.
    • +
    • Not visible in the main program.
    • +
    • Remember: a function parameter is a variable that is automatically +assigned a value when the function is called.
    • +
  • +
+

PYTHON +

+
print('adjusted:', adjust(0.9))
+print('temperature after call:', temperature)
+
+
+

OUTPUT +

+
adjusted: 0.01238691049085659
+
+
+

ERROR +

+
Traceback (most recent call last):
+  File "/Users/swcarpentry/foo.py", line 8, in <module>
+    print('temperature after call:', temperature)
+NameError: name 'temperature' is not defined
+
+
+
+ +
+
+

Local and Global Variable Use

+
+

Trace the values of all variables in this program as it is executed. +(Use ‘—’ as the value of variables before and after they exist.)

+
+

PYTHON +

+
limit = 100
+
+def clip(value):
+    return min(max(0.0, value), limit)
+
+value = -22.5
+print(clip(value))
+
+
+
+
+
+
+ +
+
+

Reading Error Messages

+
+

Read the traceback below, and identify the following:

+
  1. How many levels does the traceback have?
  2. +
  3. What is the file name where the error occurred?
  4. +
  5. What is the function name where the error occurred?
  6. +
  7. On which line number in this function did the error occur?
  8. +
  9. What is the type of error?
  10. +
  11. What is the error message?
  12. +
+

ERROR +

+
---------------------------------------------------------------------------
+KeyError                                  Traceback (most recent call last)
+<ipython-input-2-e4c4cbafeeb5> in <module>()
+      1 import errors_02
+----> 2 errors_02.print_friday_message()
+
+/Users/ghopper/thesis/code/errors_02.py in print_friday_message()
+     13
+     14 def print_friday_message():
+---> 15     print_message("Friday")
+
+/Users/ghopper/thesis/code/errors_02.py in print_message(day)
+      9         "sunday": "Aw, the weekend is almost over."
+     10     }
+---> 11     print(messages[day])
+     12
+     13
+
+KeyError: 'Friday'
+
+
+
+
+
+
+ +
+
+
  1. Three levels.
  2. +
  3. errors_02.py
  4. +
  5. print_message
  6. +
  7. Line 11
  8. +
  9. +KeyError. These errors occur when we are trying to look +up a key that does not exist (usually in a data structure such as a +dictionary). We can find more information about the +KeyError and other built-in exceptions in the Python +docs.
  10. +
  11. KeyError: 'Friday'
  12. +
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • The scope of a variable is the part of a program that can ‘see’ that +variable.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/18-style.html b/instructor/18-style.html new file mode 100644 index 000000000..282263298 --- /dev/null +++ b/instructor/18-style.html @@ -0,0 +1,857 @@ + +Plotting and Programming in Python: Programming Style +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Programming Style

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 30 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I make my programs more readable?
  • +
  • How do most programmers format their code?
  • +
  • How can programs check their own operation?
  • +
+
+
+
+
+
+

Objectives

+
  • Provide sound justifications for basic rules of coding style.
  • +
  • Refactor one-page programs to make them more readable and justify +the changes.
  • +
  • Use Python community coding standards (PEP-8).
  • +
+
+
+
+
+

Coding style

+

A consistent coding style helps others (including our future selves) +read and understand code more easily. Code is read much more often than +it is written, and as the Zen of Python +states, “Readability counts”. Python proposed a standard style through +one of its first Python Enhancement Proposals (PEP), PEP8.

+

Some points worth highlighting:

+
  • document your code and ensure that assumptions, internal algorithms, +expected inputs, expected outputs, etc., are clear
  • +
  • use clear, semantically meaningful variable names
  • +
  • use white-space, not tabs, to indent lines (tabs can cause +problems across different text editors, operating systems, and version +control systems)
  • +

Follow standard Python style in your code.

+
  • +PEP8: a style +guide for Python that discusses topics such as how to name variables, +how to indent your code, how to structure your import +statements, etc. Adhering to PEP8 makes it easier for other Python +developers to read and understand your code, and to understand what +their contributions should look like.
  • +
  • To check your code for compliance with PEP8, you can use the pycodestyle application +and tools like the black code +formatter can automatically format your code to conform to PEP8 and +pycodestyle (a Jupyter notebook formatter also exists nb_black).
  • +
  • Some groups and organizations follow different style guidelines +besides PEP8. For example, the Google style +guide on Python makes slightly different recommendations. Google +wrote an application that can help you format your code in either their +style or PEP8 called yapf.
  • +
  • With respect to coding style, the key is consistency. +Choose a style for your project be it PEP8, the Google style, or +something else and do your best to ensure that you and anyone else you +are collaborating with sticks to it. Consistency within a project is +often more impactful than the particular style used. A consistent style +will make your software easier to read and understand for others and for +your future self.
  • +

Use assertions to check for internal errors.

+

Assertions are a simple but powerful method for making sure that the +context in which your code is executing is as you expect.

+
+

PYTHON +

+
def calc_bulk_density(mass, volume):
+    '''Return dry bulk density = powder mass / powder volume.'''
+    assert volume > 0
+    return mass / volume
+
+

If the assertion is False, the Python interpreter raises +an AssertionError runtime exception. The source code for +the expression that failed will be displayed as part of the error +message. To ignore assertions in your code run the interpreter with the +‘-O’ (optimize) switch. Assertions should contain only simple checks and +never change the state of the program. For example, an assertion should +never contain an assignment.

+

Use docstrings to provide builtin help.

+

If the first thing in a function is a character string that is not +assigned directly to a variable, Python attaches it to the function, +accessible via the builtin help function. This string that provides +documentation is also known as a docstring.

+
+

PYTHON +

+
def average(values):
+    "Return average of values, or None if no values are supplied."
+
+    if len(values) == 0:
+        return None
+    return sum(values) / len(values)
+
+help(average)
+
+
+

OUTPUT +

+
Help on function average in module __main__:
+
+average(values)
+    Return average of values, or None if no values are supplied.
+
+
+
+ +
+
+

Multiline Strings

+
+

Often use multiline strings for documentation. These start +and end with three quote characters (either single or double) and end +with three matching characters.

+
+

PYTHON +

+
"""This string spans
+multiple lines.
+
+Blank lines are allowed."""
+
+
+
+
+
+
+ +
+
+

What Will Be Shown?

+
+

Highlight the lines in the code below that will be available as +online help. Are there lines that should be made available, but won’t +be? Will any lines produce a syntax error or a runtime error?

+
+

PYTHON +

+
"Find maximum edit distance between multiple sequences."
+# This finds the maximum distance between all sequences.
+
+def overall_max(sequences):
+    '''Determine overall maximum edit distance.'''
+
+    highest = 0
+    for left in sequences:
+        for right in sequences:
+            '''Avoid checking sequence against itself.'''
+            if left != right:
+                this = edit_distance(left, right)
+                highest = max(highest, this)
+
+    # Report.
+    return highest
+
+
+
+
+
+
+ +
+
+

Document This

+
+

Use comments to describe and help others understand potentially +unintuitive sections or individual lines of code. They are especially +useful to whoever may need to understand and edit your code in the +future, including yourself.

+

Use docstrings to document the acceptable inputs and expected outputs +of a method or class, its purpose, assumptions and intended behavior. +Docstrings are displayed when a user invokes the builtin +help method on your method or class.

+

Turn the comment in the following function into a docstring and check +that help displays it properly.

+
+

PYTHON +

+
def middle(a, b, c):
+    # Return the middle value of three.
+    # Assumes the values can actually be compared.
+    values = [a, b, c]
+    values.sort()
+    return values[1]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def middle(a, b, c):
+    '''Return the middle value of three.
+    Assumes the values can actually be compared.'''
+    values = [a, b, c]
+    values.sort()
+    return values[1]
+
+
+
+
+
+
+
+ +
+
+

Clean Up This Code

+
+
  1. Read this short program and try to predict what it does.
  2. +
  3. Run it: how accurate was your prediction?
  4. +
  5. Refactor the program to make it more readable. Remember to run it +after each change to ensure its behavior hasn’t changed.
  6. +
  7. Compare your rewrite with your neighbor’s. What did you do the same? +What did you do differently, and why?
  8. +
+

PYTHON +

+
n = 10
+s = 'et cetera'
+print(s)
+i = 0
+while i < n:
+    # print('at', j)
+    new = ''
+    for j in range(len(s)):
+        left = j-1
+        right = (j+1)%len(s)
+        if s[left]==s[right]: new = new + '-'
+        else: new = new + '*'
+    s=''.join(new)
+    print(s)
+    i += 1
+
+
+
+
+
+
+ +
+
+

Here’s one solution.

+
+

PYTHON +

+
def string_machine(input_string, iterations):
+    """
+    Takes input_string and generates a new string with -'s and *'s
+    corresponding to characters that have identical adjacent characters
+    or not, respectively.  Iterates through this procedure with the resultant
+    strings for the supplied number of iterations.
+    """
+    print(input_string)
+    input_string_length = len(input_string)
+    old = input_string
+    for i in range(iterations):
+        new = ''
+        # iterate through characters in previous string
+        for j in range(input_string_length):
+            left = j-1
+            right = (j+1) % input_string_length  # ensure right index wraps around
+            if old[left] == old[right]:
+                new = new + '-'
+            else:
+                new = new + '*'
+        print(new)
+        # store new string as old
+        old = new     
+
+string_machine('et cetera', 10)
+
+
+

OUTPUT +

+
et cetera
+*****-***
+----*-*--
+---*---*-
+--*-*-*-*
+**-------
+***-----*
+--**---**
+*****-***
+----*-*--
+---*---*-
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
  • Follow standard Python style in your code.
  • +
  • Use docstrings to provide builtin help.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/19-wrap.html b/instructor/19-wrap.html new file mode 100644 index 000000000..a71e49f7b --- /dev/null +++ b/instructor/19-wrap.html @@ -0,0 +1,598 @@ + +Plotting and Programming in Python: Wrap-Up +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Wrap-Up

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 20 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • What have we learned?
  • +
  • What else is out there and where do I find it?
  • +
+
+
+
+
+
+

Objectives

+
  • Name and locate scientific Python community sites for software, +workshops, and help.
  • +
+
+
+
+
+

Leslie Lamport once said, “Writing is nature’s way of showing you how +sloppy your thinking is.” The same is true of programming: many things +that seem obvious when we’re thinking about them turn out to be anything +but when we have to explain them precisely.

+

Python supports a large and diverse community across academia and +industry.

+
+
+ +
+
+

Key Points

+
+
  • Python supports a large and diverse community across academia and +industry.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/20-feedback.html b/instructor/20-feedback.html new file mode 100644 index 000000000..906e96040 --- /dev/null +++ b/instructor/20-feedback.html @@ -0,0 +1,569 @@ + +Plotting and Programming in Python: Feedback +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Feedback

+

Last updated on 2024-10-18 | + + Edit this page

+ + + +

Estimated time: 15 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How did the class go?
  • +
+
+
+
+
+
+

Objectives

+
  • Gather feedback on the class
  • +
+
+
+
+
+

Gather feedback from participants.

+
+
+ +
+
+

Key Points

+
+
  • We are constantly seeking to improve this course.
  • +
+
+
+ + + +
+
+ + +
+
+ + + diff --git a/instructor/404.html b/instructor/404.html new file mode 100644 index 000000000..6f995c79a --- /dev/null +++ b/instructor/404.html @@ -0,0 +1,546 @@ + +Plotting and Programming in Python: Page not found +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Page not found

+ +

Our apologies!

+

We cannot seem to find the page you are looking for. Here are some +tips that may help:

+
  1. try going back to the previous +page or
  2. +
  3. navigate to any other page using the navigation bar on the +left.
  4. +
  5. if the URL ends with /index.html, try removing +that.
  6. +
  7. head over to the home page of this +lesson +
  8. +

If you came here from a link in this lesson, please contact the +lesson maintainers using the links at the foot of this page.

+
+
+ + +
+
+ + + diff --git a/instructor/CODE_OF_CONDUCT.html b/instructor/CODE_OF_CONDUCT.html new file mode 100644 index 000000000..b72e7d19e --- /dev/null +++ b/instructor/CODE_OF_CONDUCT.html @@ -0,0 +1,538 @@ + +Plotting and Programming in Python: Contributor Code of Conduct +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Contributor Code of Conduct

+

Last updated on 2024-10-18 | + + Edit this page

+ + + + + +
+ +
+ + + +

As contributors and maintainers of this project, we pledge to follow +the The +Carpentries Code of Conduct.

+

Instances of abusive, harassing, or otherwise unacceptable behavior +may be reported by following our reporting +guidelines.

+ + + +
+
+ + +
+
+ + + diff --git a/instructor/LICENSE.html b/instructor/LICENSE.html new file mode 100644 index 000000000..772ccbfb9 --- /dev/null +++ b/instructor/LICENSE.html @@ -0,0 +1,586 @@ + +Plotting and Programming in Python: Licenses +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Licenses

+

Last updated on 2024-10-18 | + + Edit this page

+ + + + + +
+ +
+ + + +

Instructional Material

+

All Carpentries (Software Carpentry, Data Carpentry, and Library +Carpentry) instructional material is made available under the Creative Commons +Attribution license. The following is a human-readable summary of +(and not a substitute for) the full legal +text of the CC BY 4.0 license.

+

You are free:

+
  • to Share—copy and redistribute the material in any +medium or format
  • +
  • to Adapt—remix, transform, and build upon the +material
  • +

for any purpose, even commercially.

+

The licensor cannot revoke these freedoms as long as you follow the +license terms.

+

Under the following terms:

+
  • Attribution—You must give appropriate credit +(mentioning that your work is derived from work that is Copyright (c) +The Carpentries and, where practical, linking to https://carpentries.org/), provide a link to the +license, and indicate if changes were made. You may do so in any +reasonable manner, but not in any way that suggests the licensor +endorses you or your use.

  • +
  • No additional restrictions—You may not apply +legal terms or technological measures that legally restrict others from +doing anything the license permits. With the understanding +that:

  • +

Notices:

+
  • You do not have to comply with the license for elements of the +material in the public domain or where your use is permitted by an +applicable exception or limitation.
  • +
  • No warranties are given. The license may not give you all of the +permissions necessary for your intended use. For example, other rights +such as publicity, privacy, or moral rights may limit how you use the +material.
  • +

Software

+

Except where otherwise noted, the example programs and other software +provided by The Carpentries are made available under the OSI-approved MIT +license.

+

Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +“Software”), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions:

+

The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software.

+

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

+

Trademark

+

“The Carpentries”, “Software Carpentry”, “Data Carpentry”, and +“Library Carpentry” and their respective logos are registered trademarks +of Community Initiatives.

+
+
+ + +
+
+ + + diff --git a/instructor/aio.html b/instructor/aio.html new file mode 100644 index 000000000..0971b9e9d --- /dev/null +++ b/instructor/aio.html @@ -0,0 +1,9959 @@ + + + + + +Plotting and Programming in Python: All in One View + + + + + + + + + + + + +
+ Plotting and Programming in Python +
+ +
+
+ + + + + + +
+
+ + +

Content from Running and Quitting

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 15 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I run Python programs?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Launch the JupyterLab server.
  • +
  • Create a new Python script.
  • +
  • Create a Jupyter notebook.
  • +
  • Shutdown the JupyterLab server.
  • +
  • Understand the difference between a Python script and a Jupyter +notebook.
  • +
  • Create Markdown cells in a notebook.
  • +
  • Create and run Python cells in a notebook.
  • +
+
+
+
+
+
+

To run Python, we are going to use Jupyter Notebooks via JupyterLab for +the remainder of this workshop. Jupyter notebooks are common in data +science and visualization and serve as a convenient common-denominator +experience for running Python code interactively where we can easily +view and share the results of our Python code.

+

There are other ways of editing, managing, and running code. Software +developers often use an integrated development environment (IDE) like PyCharm or Visual Studio Code, or text +editors like Vim or Emacs, to create and edit their Python programs. +After editing and saving your Python programs you can execute those +programs within the IDE itself or directly on the command line. In +contrast, Jupyter notebooks let us execute and view the results of our +Python code immediately within the notebook.

+

JupyterLab has several other handy features:

+
    +
  • You can easily type, edit, and copy and paste blocks of code.
  • +
  • Tab complete allows you to easily access the names of things you are +using and learn more about them.
  • +
  • It allows you to annotate your code with links, different sized +text, bullets, etc. to make it more accessible to you and your +collaborators.
  • +
  • It allows you to display figures next to the code that produces them +to tell a complete story of the analysis.
  • +
+

Each notebook contains one or more cells that contain code, text, or +images.

+

Getting Started with JupyterLab +

+
+

JupyterLab is an application server with a web user interface from Project Jupyter that enables one to work +with documents and activities such as Jupyter notebooks, text editors, +terminals, and even custom components in a flexible, integrated, and +extensible manner. JupyterLab requires a reasonably up-to-date browser +(ideally a current version of Chrome, Safari, or Firefox); Internet +Explorer versions 9 and below are not supported.

+

JupyterLab is included as part of the Anaconda Python distribution. +If you have not already installed the Anaconda Python distribution, see +the setup instructions for installation +instructions.

+

In this lesson we will run JupyterLab locally on our own machines so +it will not require an internet connection besides the initial +connection to download and install Anaconda and JupyterLab

+
    +
  • Start the JupyterLab server on your machine
  • +
  • Use a web browser to open a special localhost URL that connects to +your JupyterLab server
  • +
  • The JupyterLab server does the work and the web browser renders the +result
  • +
  • Type code into the browser and see the results after your JupyterLab +server has finished executing your code
  • +
+
+
+ +
+
+

JupyterLab? What about Jupyter notebooks?

+
+

JupyterLab is the next +stage in the evolution of the Jupyter Notebook. If you have prior +experience working with Jupyter notebooks, then you will have a good +idea of what to expect from JupyterLab.

+

Experienced users of Jupyter notebooks interested in a more detailed +discussion of the similarities and differences between the JupyterLab +and Jupyter notebook user interfaces can find more information in the JupyterLab +user interface documentation.

+
+
+
+

Starting JupyterLab +

+
+

You can start the JupyterLab server through the command line or +through an application called Anaconda Navigator. Anaconda +Navigator is included as part of the Anaconda Python distribution.

+
+

macOS - Command Line +

+

To start the JupyterLab server you will need to access the command +line through the Terminal. There are two ways to open Terminal on +Mac.

+
    +
  1. In your Applications folder, open Utilities and double-click on +Terminal
  2. +
  3. Press Command + spacebar to launch Spotlight. +Type Terminal and then double-click the search result or +hit Enter +
  4. +
+

After you have launched Terminal, type the command to launch the +JupyterLab server.

+
+

BASH +

+
$ jupyter lab
+
+
+
+

Windows Users - Command Line +

+

To start the JupyterLab server you will need to access the Anaconda +Prompt.

+

Press Windows Logo Key and search for +Anaconda Prompt, click the result or press enter.

+

After you have launched the Anaconda Prompt, type the command:

+
+

BASH +

+
$ jupyter lab
+
+
+
+

Anaconda Navigator +

+

To start a JupyterLab server from Anaconda Navigator you must first +start +Anaconda Navigator (click for detailed instructions on macOS, Windows, +and Linux). You can search for Anaconda Navigator via Spotlight on +macOS (Command + spacebar), the Windows search +function (Windows Logo Key) or opening a terminal shell and +executing the anaconda-navigator executable from the +command line.

+

After you have launched Anaconda Navigator, click the +Launch button under JupyterLab. You may need to scroll down +to find it.

+

Here is a screenshot of an Anaconda Navigator page similar to the one +that should open on either macOS or Windows.

+

+Anaconda Navigator landing page

+

And here is a screenshot of a JupyterLab landing page that should be +similar to the one that opens in your default web browser after starting +the JupyterLab server on either macOS or Windows.

+

+JupyterLab landing page

+
+

The JupyterLab Interface +

+
+

JupyterLab has many features found in traditional integrated +development environments (IDEs) but is focused on providing flexible +building blocks for interactive, exploratory computing.

+

The JupyterLab +Interface consists of the Menu Bar, a collapsable Left Side Bar, and +the Main Work Area which contains tabs of documents and activities.

+
+ +

The Menu Bar at the top of JupyterLab has the top-level menus that +expose various actions available in JupyterLab along with their keyboard +shortcuts (where applicable). The following menus are included by +default.

+
    +
  • +File: Actions related to files and directories such +as New, Open, Close, Save, etc. The +File menu also includes the Shut Down action used to +shutdown the JupyterLab server.
  • +
  • +Edit: Actions related to editing documents and +other activities such as Undo, Cut, Copy, +Paste, etc.
  • +
  • +View: Actions that alter the appearance of +JupyterLab.
  • +
  • +Run: Actions for running code in different +activities such as notebooks and code consoles (discussed below).
  • +
  • +Kernel: Actions for managing kernels. Kernels in +Jupyter will be explained in more detail below.
  • +
  • +Tabs: A list of the open documents and activities +in the main work area.
  • +
  • +Settings: Common JupyterLab settings can be +configured using this menu. There is also an Advanced Settings +Editor option in the dropdown menu that provides more fine-grained +control of JupyterLab settings and configuration options.
  • +
  • +Help: A list of JupyterLab and kernel help +links.
  • +
+
+
+ +
+
+

Kernels

+
+

The JupyterLab docs +define kernels as “separate processes started by the server that runs +your code in different programming languages and environments.” When we +open a Jupyter Notebook, that starts a kernel - a process - that is +going to run the code. In this lesson, we’ll be using the Jupyter +ipython kernel which lets us run Python 3 code interactively.

+

Using other Jupyter kernels +for other programming languages would let us write and execute code +in other programming languages in the same JupyterLab interface, like R, +Java, Julia, Ruby, JavaScript, Fortran, etc.

+
+
+
+

A screenshot of the default Menu Bar is provided below.

+

+JupyterLab Menu Bar

+
+
+ +

The left sidebar contains a number of commonly used tabs, such as a +file browser (showing the contents of the directory where the JupyterLab +server was launched), a list of running kernels and terminals, the +command palette, and a list of open tabs in the main work area. A +screenshot of the default Left Side Bar is provided below.

+

+JupyterLab Left Side Bar

+

The left sidebar can be collapsed or expanded by selecting “Show Left +Sidebar” in the View menu or by clicking on the active sidebar tab.

+
+
+

Main Work Area +

+

The main work area in JupyterLab enables you to arrange documents +(notebooks, text files, etc.) and other activities (terminals, code +consoles, etc.) into panels of tabs that can be resized or subdivided. A +screenshot of the default Main Work Area is provided below.

+

If you do not see the Launcher tab, click the blue plus sign under +the “File” and “Edit” menus and it will appear.

+

+JupyterLab Main Work Area

+

Drag a tab to the center of a tab panel to move the tab to the panel. +Subdivide a tab panel by dragging a tab to the left, right, top, or +bottom of the panel. The work area has a single current activity. The +tab for the current activity is marked with a colored top border (blue +by default).

+
+

Creating a Python script +

+
+
    +
  • To start writing a new Python program click the Text File icon under +the Other header in the Launcher tab of the Main Work Area. +
      +
    • You can also create a new plain text file by selecting the New +-> Text File from the File menu in the Menu Bar.
    • +
    +
  • +
  • To convert this plain text file to a Python program, select the +Save File As action from the File menu in the Menu Bar +and give your new text file a name that ends with the .py +extension. +
      +
    • The .py extension lets everyone (including the +operating system) know that this text file is a Python program.
    • +
    • This is convention, not a requirement.
    • +
    +
  • +

Creating a Jupyter Notebook +

+
+

To open a new notebook click the Python 3 icon under the +Notebook header in the Launcher tab in the main work area. You +can also create a new notebook by selecting New -> Notebook +from the File menu in the Menu Bar.

+

Additional notes on Jupyter notebooks.

+
    +
  • Notebook files have the extension .ipynb to distinguish +them from plain-text Python programs.
  • +
  • Notebooks can be exported as Python scripts that can be run from the +command line.
  • +
+

Below is a screenshot of a Jupyter notebook running inside +JupyterLab. If you are interested in more details, then see the official +notebook documentation.

+

+Example Jupyter Notebook

+
+
+ +
+
+

How It’s Stored

+
+
    +
  • The notebook file is stored in a format called JSON.
  • +
  • Just like a webpage, what’s saved looks different from what you see +in your browser.
  • +
  • But this format allows Jupyter to mix source code, text, and images, +all in one file.
  • +
+
+
+
+
+
+ +
+
+

Arranging Documents into Panels of Tabs

+
+

In the JupyterLab Main Work Area you can arrange documents into +panels of tabs. Here is an example from the official +documentation.

+

+Multi-panel JupyterLab

+

First, create a text file, Python console, and terminal window and +arrange them into three panels in the main work area. Next, create a +notebook, terminal window, and text file and arrange them into three +panels in the main work area. Finally, create your own combination of +panels and tabs. What combination of panels and tabs do you think will +be most useful for your workflow?

+
+
+
+
+
+ +
+
+

After creating the necessary tabs, you can drag one of the tabs to +the center of a panel to move the tab to the panel; next you can +subdivide a tab panel by dragging a tab to the left, right, top, or +bottom of the panel.

+
+
+
+
+
+
+ +
+
+

Code vs. Text

+
+

Jupyter mixes code and text in different types of blocks, called +cells. We often use the term “code” to mean “the source code of software +written in a language such as Python”. A “code cell” in a Notebook is a +cell that contains software; a “text cell” is one that contains ordinary +prose written for human beings.

+
+
+
+

The Notebook has Command and Edit modes. +

+
+
    +
  • If you press Esc and Return alternately, the +outer border of your code cell will change from gray to blue.
  • +
  • These are the Command (gray) and +Edit (blue) modes of your notebook.
  • +
  • Command mode allows you to edit notebook-level features, and Edit +mode changes the content of cells.
  • +
  • When in Command mode (esc/gray), +
      +
    • The b key will make a new cell below the currently +selected cell.
    • +
    • The a key will make one above.
    • +
    • The x key will delete the current cell.
    • +
    • The z key will undo your last cell operation (which could +be a deletion, creation, etc).
    • +
    +
  • +
  • All actions can be done using the menus, but there are lots of +keyboard shortcuts to speed things up.
  • +
+
+
+ +
+
+

Command Vs. Edit

+
+

In the Jupyter notebook page are you currently in Command or Edit +mode?
+Switch between the modes. Use the shortcuts to generate a new cell. Use +the shortcuts to delete a cell. Use the shortcuts to undo the last cell +operation you performed.

+
+
+
+
+
+ +
+
+

Command mode has a grey border and Edit mode has a blue border. Use +Esc and Return to switch between modes. You need +to be in Command mode (Press Esc if your cell is blue). Type +b or a. You need to be in Command mode (Press +Esc if your cell is blue). Type x. You need to be +in Command mode (Press Esc if your cell is blue). Type +z.

+
+
+
+
+
+

Use the keyboard and mouse to select and edit cells. +

+
    +
  • Pressing the Return key turns the border blue and engages +Edit mode, which allows you to type within the cell.
  • +
  • Because we want to be able to write many lines of code in a single +cell, pressing the Return key when in Edit mode (blue) moves +the cursor to the next line in the cell just like in a text editor.
  • +
  • We need some other way to tell the Notebook we want to run what’s in +the cell.
  • +
  • Pressing Shift+Return together will execute +the contents of the cell.
  • +
  • Notice that the Return and Shift keys on the +right of the keyboard are right next to each other.
  • +
+
+
+

The Notebook will turn Markdown into pretty-printed +documentation. +

+
    +
  • Notebooks can also render Markdown. +
      +
    • A simple plain-text format for writing lists, links, and other +things that might go into a web page.
    • +
    • Equivalently, a subset of HTML that looks like what you’d send in an +old-fashioned email.
    • +
    +
  • +
  • Turn the current cell into a Markdown cell by entering the Command +mode (Esc/gray) and press the M key.
  • +
  • +In [ ]: will disappear to show it is no longer a code +cell and you will be able to write in Markdown.
  • +
  • Turn the current cell into a Code cell by entering the Command mode +(Esc/gray) and press the y key.
  • +
+
+
+

Markdown does most of what HTML does. +

+ + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Showing some markdown syntax and its rendered output.
Markdown codeRendered output
*   Use asterisks
+*   to create
+*   bullet lists.
+

+

+
    +
  • Use asterisks
  • +
  • to create
  • +
  • bullet lists.
  • +
+
1.   Use numbers
+1.   to create
+1.   bullet lists.
+

+

+
    +
  1. Use numbers
  2. +
  3. to create
  4. +
  5. numbered lists.
  6. +
+
*  You can use indents
+  *  To create sublists
+  *  of the same type
+*  Or sublists
+  1. Of different
+  1. types
+

+

+
    +
  • You can use indents +
      +
    • To create sublists
    • +
    • of the same type
    • +
    +
  • +
  • Or sublists +
      +
    1. Of different
    2. +
    3. types
    4. +
    +
  • +
+
# A Level-1 Heading
+

+

+

A Level-1 Heading

+
## A Level-2 Heading (etc.)
+

+

+

A Level-2 Heading (etc.)

+
Line breaks
+don't matter.
+
+But blank lines
+create new paragraphs.
+

+

+

Line breaks don’t matter.

+

But blank lines create new paragraphs.

+
[Links](http://software-carpentry.org)
+are created with `[...](...)`.
+Or use [named links][data-carp].
+
+[data-carp]: http://datacarpentry.org
+

+

+

Links are created with +[...](...). Or use named links.

+
+
+
+ +
+
+

Creating Lists in Markdown

+
+

Create a nested list in a Markdown cell in a notebook that looks like +this:

+
    +
  1. Get funding.
  2. +
  3. Do work.
  4. +
+
    +
  • Design experiment.
  • +
  • Collect data.
  • +
  • Analyze.
  • +
+
    +
  1. Write up.
  2. +
  3. Publish.
  4. +
+
+
+
+
+
+ +
+
+

This challenge integrates both the numbered list and bullet list. +Note that the bullet list is indented 2 spaces so that it is inline with +the items of the numbered list.

+
1.  Get funding.
+2.  Do work.
+    *   Design experiment.
+    *   Collect data.
+    *   Analyze.
+3.  Write up.
+4.  Publish.
+
+
+
+
+
+
+ +
+
+

More Math

+
+

What is displayed when a Python cell in a notebook that contains +several calculations is executed? For example, what happens when this +cell is executed?

+
+

PYTHON +

+
7 * 3
+2 + 1
+
+
+
+
+
+
+ +
+
+

Python returns the output of the last calculation.

+
+

PYTHON +

+
3
+
+
+
+
+
+
+
+ +
+
+

Change an Existing Cell from Code to Markdown

+
+

What happens if you write some Python in a code cell and then you +switch it to a Markdown cell? For example, put the following in a code +cell:

+
+

PYTHON +

+
x = 6 * 7 + 12
+print(x)
+
+

And then run it with Shift+Return to be sure +that it works as a code cell. Now go back to the cell and use +Esc then m to switch the cell to Markdown and +“run” it with Shift+Return. What happened and how +might this be useful?

+
+
+
+
+
+ +
+
+

The Python code gets treated like Markdown text. The lines appear as +if they are part of one contiguous paragraph. This could be useful to +temporarily turn on and off cells in notebooks that get used for +multiple purposes.

+
+

PYTHON +

+
x = 6 * 7 + 12 print(x)
+
+
+
+
+
+
+
+ +
+
+

Equations

+
+

Standard Markdown (such as we’re using for these notes) won’t render +equations, but the Notebook will. Create a new Markdown cell and enter +the following:

+
$\sum_{i=1}^{N} 2^{-i} \approx 1$
+

(It’s probably easier to copy and paste.) What does it display? What +do you think the underscore, _, circumflex, ^, +and dollar sign, $, do?

+
+
+
+
+
+ +
+
+

The notebook shows the equation as it would be rendered from LaTeX +equation syntax. The dollar sign, $, is used to tell +Markdown that the text in between is a LaTeX equation. If you’re not +familiar with LaTeX, underscore, _, is used for subscripts +and circumflex, ^, is used for superscripts. A pair of +curly braces, { and }, is used to group text +together so that the statement i=1 becomes the subscript +and N becomes the superscript. Similarly, -i +is in curly braces to make the whole statement the superscript for +2. \sum and \approx are LaTeX +commands for “sum over” and “approximate” symbols.

+
+
+
+
+
+

Closing JupyterLab +

+
+
    +
  • From the Menu Bar select the “File” menu and then choose “Shut Down” +at the bottom of the dropdown menu. You will be prompted to confirm that +you wish to shutdown the JupyterLab server (don’t forget to save your +work!). Click “Shut Down” to shutdown the JupyterLab server.
  • +
  • To restart the JupyterLab server you will need to re-run the +following command from a shell.
  • +
+
$ jupyter lab
+
+
+ +
+
+

Closing JupyterLab

+
+

Practice closing and restarting the JupyterLab server.

+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Python scripts are plain text files.
  • +
  • Use the Jupyter Notebook for editing and running Python.
  • +
  • The Notebook has Command and Edit modes.
  • +
  • Use the keyboard and mouse to select and edit cells.
  • +
  • The Notebook will turn Markdown into pretty-printed +documentation.
  • +
  • Markdown does most of what HTML does.
  • +
+
+
+
+

Content from Variables and Assignment

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 20 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I store data in programs?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Write programs that assign scalar values to variables and perform +calculations with those values.
  • +
  • Correctly trace value changes in programs that use scalar +assignment.
  • +
+
+
+
+
+
+

Use variables to store values. +

+
+
    +
  • Variables are names for values.

  • +
  • +

    Variable names

    +
      +
    • can only contain letters, digits, and underscore +_ (typically used to separate words in long variable +names)
    • +
    • cannot start with a digit
    • +
    • are case sensitive (age, Age and AGE are three +different variables)
    • +
    +
  • +
  • The name should also be meaningful so you or another programmer +know what it is

  • +
  • Variable names that start with underscores like +__alistairs_real_age have a special meaning so we won’t do +that until we understand the convention.

  • +
  • In Python the = symbol assigns the value on the +right to the name on the left.

  • +
  • The variable is created when a value is assigned to it.

  • +
  • +

    Here, Python assigns an age to a variable age and a +name in quotes to a variable first_name.

    +
    +

    PYTHON +

    +
    age = 42
    +first_name = 'Ahmed'
    +
    +
  • +

Use print to display values. +

+
+
    +
  • Python has a built-in function called print that prints +things as text.
  • +
  • Call the function (i.e., tell Python to run it) by using its +name.
  • +
  • Provide values to the function (i.e., the things to print) in +parentheses.
  • +
  • To add a string to the printout, wrap the string in single or double +quotes.
  • +
  • The values passed to the function are called +arguments +
  • +
+
+

PYTHON +

+
print(first_name, 'is', age, 'years old')
+
+
+

OUTPUT +

+
Ahmed is 42 years old
+
+
    +
  • +print automatically puts a single space between items +to separate them.
  • +
  • And wraps around to a new line at the end.
  • +

Variables must be created before they are used. +

+
+
    +
  • If a variable doesn’t exist yet, or if the name has been +mis-spelled, Python reports an error. (Unlike some languages, which +“guess” a default value.)
  • +
+
+

PYTHON +

+
print(last_name)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-1-c1fbb4e96102> in <module>()
+----> 1 print(last_name)
+
+NameError: name 'last_name' is not defined
+
+
    +
  • The last line of an error message is usually the most +informative.
  • +
  • We will look at error messages in detail later.
  • +
+
+
+ +
+
+

Variables Persist Between Cells

+
+

Be aware that it is the order of execution of cells that is +important in a Jupyter notebook, not the order in which they appear. +Python will remember all the code that was run previously, +including any variables you have defined, irrespective of the order in +the notebook. Therefore if you define variables lower down the notebook +and then (re)run cells further up, those defined further down will still +be present. As an example, create two cells with the following content, +in this order:

+
+

PYTHON +

+
print(myval)
+
+
+

PYTHON +

+
myval = 1
+
+

If you execute this in order, the first cell will give an error. +However, if you run the first cell after the second cell it +will print out 1. To prevent confusion, it can be helpful +to use the Kernel -> Restart & Run All +option which clears the interpreter and runs everything from a clean +slate going top to bottom.

+
+
+
+

Variables can be used in calculations. +

+
+
    +
  • We can use variables in calculations just as if they were values. +
      +
    • Remember, we assigned the value 42 to age +a few lines ago.
    • +
    +
  • +
+
+

PYTHON +

+
age = age + 3
+print('Age in three years:', age)
+
+
+

OUTPUT +

+
Age in three years: 45
+
+

Use an index to get a single character from a string. +

+
+
    +
  • The characters (individual letters, numbers, and so on) in a string +are ordered. For example, the string 'AB' is not the same +as 'BA'. Because of this ordering, we can treat the string +as a list of characters.
  • +
  • Each position in the string (first, second, etc.) is given a number. +This number is called an index or sometimes a +subscript.
  • +
  • Indices are numbered from 0.
  • +
  • Use the position’s index in square brackets to get the character at +that position.
  • +
+
A line of Python code, print(atom_name[0]), demonstrates that using the zero index will output just the initial letter, in this case ‘h’ for helium.
A line of Python code, print(atom_name[0]), +demonstrates that using the zero index will output just the initial +letter, in this case ‘h’ for helium.
+
+

PYTHON +

+
atom_name = 'helium'
+print(atom_name[0])
+
+
+

OUTPUT +

+
h
+
+

Use a slice to get a substring. +

+
+
    +
  • A part of a string is called a substring. A +substring can be as short as a single character.
  • +
  • An item in a list is called an element. Whenever we treat a string +as if it were a list, the string’s elements are its individual +characters.
  • +
  • A slice is a part of a string (or, more generally, a part of any +list-like thing).
  • +
  • We take a slice with the notation [start:stop], where +start is the integer index of the first element we want and +stop is the integer index of the element just +after the last element we want.
  • +
  • The difference between stop and start is +the slice’s length.
  • +
  • Taking a slice does not change the contents of the original string. +Instead, taking a slice returns a copy of part of the original +string.
  • +
+
+

PYTHON +

+
atom_name = 'sodium'
+print(atom_name[0:3])
+
+
+

OUTPUT +

+
sod
+
+

Use the built-in function len to find the length of a +string. +

+
+
+

PYTHON +

+
print(len('helium'))
+
+
+

OUTPUT +

+
6
+
+
    +
  • Nested functions are evaluated from the inside out, like in +mathematics.
  • +

Python is case-sensitive. +

+
+
    +
  • Python thinks that upper- and lower-case letters are different, so +Name and name are different variables.
  • +
  • There are conventions for using upper-case letters at the start of +variable names so we will use lower-case letters for now.
  • +

Use meaningful variable names. +

+
+
    +
  • Python doesn’t care what you call variables as long as they obey the +rules (alphanumeric characters and the underscore).
  • +
+
+

PYTHON +

+
flabadab = 42
+ewr_422_yY = 'Ahmed'
+print(ewr_422_yY, 'is', flabadab, 'years old')
+
+
    +
  • Use meaningful variable names to help other people understand what +the program does.
  • +
  • The most important “other person” is your future self.
  • +
+
+
+ +
+
+

Swapping Values

+
+

Fill the table showing the values of the variables in this program +after each statement is executed.

+
+

PYTHON +

+
# Command  # Value of x   # Value of y   # Value of swap #
+x = 1.0    #              #              #               #
+y = 3.0    #              #              #               #
+swap = x   #              #              #               #
+x = y      #              #              #               #
+y = swap   #              #              #               #
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
# Command  # Value of x   # Value of y   # Value of swap #
+x = 1.0    # 1.0          # not defined  # not defined   #
+y = 3.0    # 1.0          # 3.0          # not defined   #
+swap = x   # 1.0          # 3.0          # 1.0           #
+x = y      # 3.0          # 3.0          # 1.0           #
+y = swap   # 3.0          # 1.0          # 1.0           #
+
+

These three lines exchange the values in x and +y using the swap variable for temporary +storage. This is a fairly common programming idiom.

+
+
+
+
+
+
+ +
+
+

Predicting Values

+
+

What is the final value of position in the program +below? (Try to predict the value without running the program, then check +your prediction.)

+
+

PYTHON +

+
initial = 'left'
+position = initial
+initial = 'right'
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(position)
+
+
+

OUTPUT +

+
left
+
+

The initial variable is assigned the value +'left'. In the second line, the position +variable also receives the string value 'left'. In third +line, the initial variable is given the value +'right', but the position variable retains its +string value of 'left'.

+
+
+
+
+
+
+ +
+
+

Challenge

+
+

If you assign a = 123, what happens if you try to get +the second digit of a via a[1]?

+
+
+
+
+
+ +
+
+

Numbers are not strings or sequences and Python will raise an error +if you try to perform an index operation on a number. In the next lesson on types and type +conversion we will learn more about types and how to convert between +different types. If you want the Nth digit of a number you can convert +it into a string using the str built-in function and then +perform an index operation on that string.

+
+

PYTHON +

+
a = 123
+print(a[1])
+
+
+

ERROR +

+
TypeError: 'int' object is not subscriptable
+
+
+

PYTHON +

+
a = str(123)
+print(a[1])
+
+
+

OUTPUT +

+
2
+
+
+
+
+
+
+
+ +
+
+

Choosing a Name

+
+

Which is a better variable name, m, min, or +minutes? Why? Hint: think about which code you would rather +inherit from someone who is leaving the lab:

+
    +
  1. ts = m * 60 + s
  2. +
  3. tot_sec = min * 60 + sec
  4. +
  5. total_seconds = minutes * 60 + seconds
  6. +
+
+
+
+
+
+ +
+
+

minutes is better because min might mean +something like “minimum” (and actually is an existing built-in function +in Python that we will cover later).

+
+
+
+
+
+
+ +
+
+

Slicing practice

+
+

What does the following program print?

+
+

PYTHON +

+
atom_name = 'carbon'
+print('atom_name[1:3] is:', atom_name[1:3])
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
atom_name[1:3] is: ar
+
+
+
+
+
+
+
+ +
+
+

Slicing concepts

+
+

Given the following string:

+
+

PYTHON +

+
species_name = "Acacia buxifolia"
+
+

What would these expressions return?

+
    +
  1. species_name[2:8]
  2. +
  3. +species_name[11:] (without a value after the +colon)
  4. +
  5. +species_name[:4] (without a value before the +colon)
  6. +
  7. +species_name[:] (just a colon)
  8. +
  9. species_name[11:-3]
  10. +
  11. species_name[-5:-3]
  12. +
  13. What happens when you choose a stop value which is out +of range? (i.e., try species_name[0:20] or +species_name[:103])
  14. +
+
+
+
+
+
+ +
+
+
    +
  1. +species_name[2:8] returns the substring +'acia b' +
  2. +
  3. +species_name[11:] returns the substring +'folia', from position 11 until the end
  4. +
  5. +species_name[:4] returns the substring +'Acac', from the start up to but not including position +4
  6. +
  7. +species_name[:] returns the entire string +'Acacia buxifolia' +
  8. +
  9. +species_name[11:-3] returns the substring +'fo', from the 11th position to the third last +position
  10. +
  11. +species_name[-5:-3] also returns the substring +'fo', from the fifth last position to the third last
  12. +
  13. If a part of the slice is out of range, the operation does not fail. +species_name[0:20] gives the same result as +species_name[0:], and species_name[:103] gives +the same result as species_name[:] +
  14. +
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use variables to store values.
  • +
  • Use print to display values.
  • +
  • Variables persist between cells.
  • +
  • Variables must be created before they are used.
  • +
  • Variables can be used in calculations.
  • +
  • Use an index to get a single character from a string.
  • +
  • Use a slice to get a substring.
  • +
  • Use the built-in function len to find the length of a +string.
  • +
  • Python is case-sensitive.
  • +
  • Use meaningful variable names.
  • +
+
+
+
+

Content from Data Types and Type Conversion

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 20 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • What kinds of data do programs store?
  • +
  • How can I convert one type to another?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain key differences between integers and floating point +numbers.
  • +
  • Explain key differences between numbers and character strings.
  • +
  • Use built-in functions to convert between integers, floating point +numbers, and strings.
  • +
+
+
+
+
+
+

Every value has a type. +

+
+
    +
  • Every value in a program has a specific type.
  • +
  • Integer (int): represents positive or negative whole +numbers like 3 or -512.
  • +
  • Floating point number (float): represents real numbers +like 3.14159 or -2.5.
  • +
  • Character string (usually called “string”, str): text. +
      +
    • Written in either single quotes or double quotes (as long as they +match).
    • +
    • The quote marks aren’t printed when the string is displayed.
    • +
    +
  • +

Use the built-in function type to find the type of a +value. +

+
+
    +
  • Use the built-in function type to find out what type a +value has.
  • +
  • Works on variables as well. +
      +
    • But remember: the value has the type — the +variable is just a label.
    • +
    +
  • +
+
+

PYTHON +

+
print(type(52))
+
+
+

OUTPUT +

+
<class 'int'>
+
+
+

PYTHON +

+
fitness = 'average'
+print(type(fitness))
+
+
+

OUTPUT +

+
<class 'str'>
+
+

Types control what operations (or methods) can be performed on a +given value. +

+
+
    +
  • A value’s type determines what the program can do to it.
  • +
+
+

PYTHON +

+
print(5 - 3)
+
+
+

OUTPUT +

+
2
+
+
+

PYTHON +

+
print('hello' - 'h')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-2-67f5626a1e07> in <module>()
+----> 1 print('hello' - 'h')
+
+TypeError: unsupported operand type(s) for -: 'str' and 'str'
+
+

You can use the “+” and “*” operators on strings. +

+
+
    +
  • “Adding” character strings concatenates them.
  • +
+
+

PYTHON +

+
full_name = 'Ahmed' + ' ' + 'Walsh'
+print(full_name)
+
+
+

OUTPUT +

+
Ahmed Walsh
+
+
    +
  • Multiplying a character string by an integer N creates a +new string that consists of that character string repeated N +times. +
      +
    • Since multiplication is repeated addition.
    • +
    +
  • +
+
+

PYTHON +

+
separator = '=' * 10
+print(separator)
+
+
+

OUTPUT +

+
==========
+
+

Strings have a length (but numbers don’t). +

+
+
    +
  • The built-in function len counts the number of +characters in a string.
  • +
+
+

PYTHON +

+
print(len(full_name))
+
+
+

OUTPUT +

+
11
+
+
    +
  • But numbers don’t have a length (not even zero).
  • +
+
+

PYTHON +

+
print(len(52))
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-3-f769e8e8097d> in <module>()
+----> 1 print(len(52))
+
+TypeError: object of type 'int' has no len()
+
+

Must convert numbers to strings or vice versa when operating on +them. +

+
+
    +
  • Cannot add numbers and strings.
  • +
+
+

PYTHON +

+
print(1 + '2')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-4-fe4f54a023c6> in <module>()
+----> 1 print(1 + '2')
+
+TypeError: unsupported operand type(s) for +: 'int' and 'str'
+
+
    +
  • Not allowed because it’s ambiguous: should 1 + '2' be +3 or '12'?
  • +
  • Some types can be converted to other types by using the type name as +a function.
  • +
+
+

PYTHON +

+
print(1 + int('2'))
+print(str(1) + '2')
+
+
+

OUTPUT +

+
3
+12
+
+

Can mix integers and floats freely in operations. +

+
+
    +
  • Integers and floating-point numbers can be mixed in arithmetic. +
      +
    • Python 3 automatically converts integers to floats as needed.
    • +
    +
  • +
+
+

PYTHON +

+
print('half is', 1 / 2.0)
+print('three squared is', 3.0 ** 2)
+
+
+

OUTPUT +

+
half is 0.5
+three squared is 9.0
+
+

Variables only change value when something is assigned to them. +

+
+
    +
  • If we make one cell in a spreadsheet depend on another, and update +the latter, the former updates automatically.
  • +
  • This does not happen in programming languages.
  • +
+
+

PYTHON +

+
variable_one = 1
+variable_two = 5 * variable_one
+variable_one = 2
+print('first is', variable_one, 'and second is', variable_two)
+
+
+

OUTPUT +

+
first is 2 and second is 5
+
+
    +
  • The computer reads the value of variable_one when doing +the multiplication, creates a new value, and assigns it to +variable_two.
  • +
  • Afterwards, the value of variable_two is set to the new +value and not dependent on variable_one so its +value does not automatically change when variable_one +changes.
  • +
+
+
+ +
+
+

Fractions

+
+

What type of value is 3.4? How can you find out?

+
+
+
+
+
+ +
+
+

It is a floating-point number (often abbreviated “float”). It is +possible to find out by using the built-in function +type().

+
+

PYTHON +

+
print(type(3.4))
+
+
+

OUTPUT +

+
<class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Automatic Type Conversion

+
+

What type of value is 3.25 + 4?

+
+
+
+
+
+ +
+
+

It is a float: integers are automatically converted to floats as +necessary.

+
+

PYTHON +

+
result = 3.25 + 4
+print(result, 'is', type(result))
+
+
+

OUTPUT +

+
7.25 is <class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Choose a Type

+
+

What type of value (integer, floating point number, or character +string) would you use to represent each of the following? Try to come up +with more than one good answer for each problem. For example, in # 1, +when would counting days with a floating point variable make more sense +than using an integer?

+
    +
  1. Number of days since the start of the year.
  2. +
  3. Time elapsed from the start of the year until now in days.
  4. +
  5. Serial number of a piece of lab equipment.
  6. +
  7. A lab specimen’s age
  8. +
  9. Current population of a city.
  10. +
  11. Average population of a city over time.
  12. +
+
+
+
+
+
+ +
+
+

The answers to the questions are:

+
    +
  1. Integer, since the number of days would lie between 1 and 365.
  2. +
  3. Floating point, since fractional days are required
  4. +
  5. Character string if serial number contains letters and numbers, +otherwise integer if the serial number consists only of numerals
  6. +
  7. This will vary! How do you define a specimen’s age? whole days since +collection (integer)? date and time (string)?
  8. +
  9. Choose floating point to represent population as large aggregates +(eg millions), or integer to represent population in units of +individuals.
  10. +
  11. Floating point number, since an average is likely to have a +fractional part.
  12. +
+
+
+
+
+
+
+ +
+
+

Division Types

+
+

In Python 3, the // operator performs integer +(whole-number) floor division, the / operator performs +floating-point division, and the % (or modulo) +operator calculates and returns the remainder from integer division:

+
+

PYTHON +

+
print('5 // 3:', 5 // 3)
+print('5 / 3:', 5 / 3)
+print('5 % 3:', 5 % 3)
+
+
+

OUTPUT +

+
5 // 3: 1
+5 / 3: 1.6666666666666667
+5 % 3: 2
+
+

If num_subjects is the number of subjects taking part in +a study, and num_per_survey is the number that can take +part in a single survey, write an expression that calculates the number +of surveys needed to reach everyone once.

+
+
+
+
+
+ +
+
+

We want the minimum number of surveys that reaches everyone once, +which is the rounded up value of +num_subjects/ num_per_survey. This is equivalent to +performing a floor division with // and adding 1. Before +the division we need to subtract 1 from the number of subjects to deal +with the case where num_subjects is evenly divisible by +num_per_survey.

+
+

PYTHON +

+
num_subjects = 600
+num_per_survey = 42
+num_surveys = (num_subjects - 1) // num_per_survey + 1
+
+print(num_subjects, 'subjects,', num_per_survey, 'per survey:', num_surveys)
+
+
+

OUTPUT +

+
600 subjects, 42 per survey: 15
+
+
+
+
+
+
+
+ +
+
+

Strings to Numbers

+
+

Where reasonable, float() will convert a string to a +floating point number, and int() will convert a floating +point number to an integer:

+
+

PYTHON +

+
print("string to float:", float("3.4"))
+print("float to int:", int(3.4))
+
+
+

OUTPUT +

+
string to float: 3.4
+float to int: 3
+
+

If the conversion doesn’t make sense, however, an error message will +occur.

+
+

PYTHON +

+
print("string to float:", float("Hello world!"))
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-5-df3b790bf0a2> in <module>
+----> 1 print("string to float:", float("Hello world!"))
+
+ValueError: could not convert string to float: 'Hello world!'
+
+

Given this information, what do you expect the following program to +do?

+

What does it actually do?

+

Why do you think it does that?

+
+

PYTHON +

+
print("fractional string to int:", int("3.4"))
+
+
+
+
+
+
+ +
+
+

What do you expect this program to do? It would not be so +unreasonable to expect the Python 3 int command to convert +the string “3.4” to 3.4 and an additional type conversion to 3. After +all, Python 3 performs a lot of other magic - isn’t that part of its +charm?

+
+

PYTHON +

+
int("3.4")
+
+
+

OUTPUT +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-2-ec6729dfccdc> in <module>
+----> 1 int("3.4")
+ValueError: invalid literal for int() with base 10: '3.4'
+
+

However, Python 3 throws an error. Why? To be consistent, possibly. +If you ask Python to perform two consecutive typecasts, you must convert +it explicitly in code.

+
+

PYTHON +

+
int(float("3.4"))
+
+
+

OUTPUT +

+
3
+
+
+
+
+
+
+
+ +
+
+

Arithmetic with Different Types

+
+

Which of the following will return the floating point number +2.0? Note: there may be more than one right answer.

+
+

PYTHON +

+
first = 1.0
+second = "1"
+third = "1.1"
+
+
    +
  1. first + float(second)
  2. +
  3. float(second) + float(third)
  4. +
  5. first + int(third)
  6. +
  7. first + int(float(third))
  8. +
  9. int(first) + int(float(third))
  10. +
  11. 2.0 * second
  12. +
+
+
+
+
+
+ +
+
+

Answer: 1 and 4

+
+
+
+
+
+
+ +
+
+

Complex Numbers

+
+

Python provides complex numbers, which are written as +1.0+2.0j. If val is a complex number, its real +and imaginary parts can be accessed using dot notation as +val.real and val.imag.

+
+

PYTHON +

+
a_complex_number = 6 + 2j
+print(a_complex_number.real)
+print(a_complex_number.imag)
+
+
+

OUTPUT +

+
6.0
+2.0
+
+
    +
  1. Why do you think Python uses j instead of +i for the imaginary part?
  2. +
  3. What do you expect 1 + 2j + 3 to produce?
  4. +
  5. What do you expect 4j to be? What about +4 j or 4 + j?
  6. +
+
+
+
+
+
+ +
+
+
    +
  1. Standard mathematics treatments typically use i to +denote an imaginary number. However, from media reports it was an early +convention established from electrical engineering that now presents a +technically expensive area to change. Stack +Overflow provides additional explanation and discussion. +
  2. +
  3. (4+2j)
  4. +
  5. +4j and Syntax Error: invalid syntax. In +the latter cases, j is considered a variable and the +statement depends on if j is defined and if so, its +assigned value.
  6. +
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Every value has a type.
  • +
  • Use the built-in function type to find the type of a +value.
  • +
  • Types control what operations can be done on values.
  • +
  • Strings can be added and multiplied.
  • +
  • Strings have a length (but numbers don’t).
  • +
  • Must convert numbers to strings or vice versa when operating on +them.
  • +
  • Can mix integers and floats freely in operations.
  • +
  • Variables only change value when something is assigned to them.
  • +
+
+
+
+

Content from Built-in Functions and Help

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 25 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I use built-in functions?
  • +
  • How can I find out what they do?
  • +
  • What kind of errors can occur in programs?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain the purpose of functions.
  • +
  • Correctly call built-in Python functions.
  • +
  • Correctly nest calls to built-in functions.
  • +
  • Use help to display documentation for built-in functions.
  • +
  • Correctly describe situations in which SyntaxError and NameError +occur.
  • +
+
+
+
+
+
+

Use comments to add documentation to programs. +

+
+
+

PYTHON +

+
# This sentence isn't executed by Python.
+adjustment = 0.5   # Neither is this - anything after '#' is ignored.
+
+

A function may take zero or more arguments. +

+
+
    +
  • We have seen some functions already — now let’s take a closer +look.
  • +
  • An argument is a value passed into a function.
  • +
  • +len takes exactly one.
  • +
  • +int, str, and float create a +new value from an existing one.
  • +
  • +print takes zero or more.
  • +
  • +print with no arguments prints a blank line. +
      +
    • Must always use parentheses, even if they’re empty, so that Python +knows a function is being called.
    • +
    +
  • +
+
+

PYTHON +

+
print('before')
+print()
+print('after')
+
+
+

OUTPUT +

+
before
+
+after
+
+

Every function returns something. +

+
+
    +
  • Every function call produces some result.
  • +
  • If the function doesn’t have a useful result to return, it usually +returns the special value None. None is a +Python object that stands in anytime there is no value.
  • +
+
+

PYTHON +

+
result = print('example')
+print('result of print is', result)
+
+
+

OUTPUT +

+
example
+result of print is None
+
+

Commonly-used built-in functions include max, +min, and round. +

+
+
    +
  • Use max to find the largest value of one or more +values.
  • +
  • Use min to find the smallest.
  • +
  • Both work on character strings as well as numbers. +
      +
    • “Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.
    • +
    +
  • +
+
+

PYTHON +

+
print(max(1, 2, 3))
+print(min('a', 'A', '0'))
+
+
+

OUTPUT +

+
3
+0
+
+

Functions may only work for certain (combinations of) +arguments. +

+
+
    +
  • +max and min must be given at least one +argument. +
      +
    • “Largest of the empty set” is a meaningless question.
    • +
    +
  • +
  • And they must be given things that can meaningfully be +compared.
  • +
+
+

PYTHON +

+
print(max(1, 'a'))
+
+
+

ERROR +

+
TypeError                                 Traceback (most recent call last)
+<ipython-input-52-3f049acf3762> in <module>
+----> 1 print(max(1, 'a'))
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+

Functions may have default values for some arguments. +

+
+
    +
  • +round will round off a floating-point number.
  • +
  • By default, rounds to zero decimal places.
  • +
+
+

PYTHON +

+
round(3.712)
+
+
+

OUTPUT +

+
4
+
+
    +
  • We can specify the number of decimal places we want.
  • +
+
+

PYTHON +

+
round(3.712, 1)
+
+
+

OUTPUT +

+
3.7
+
+

Functions attached to objects are called methods +

+
+
    +
  • Functions take another form that will be common in the pandas +episodes.
  • +
  • Methods have parentheses like functions, but come after the +variable.
  • +
  • Some methods are used for internal Python operations, and are marked +with double underlines.
  • +
+
+

PYTHON +

+
my_string = 'Hello world!'  # creation of a string object 
+
+print(len(my_string))       # the len function takes a string as an argument and returns the length of the string
+
+print(my_string.swapcase()) # calling the swapcase method on the my_string object
+
+print(my_string.__len__())  # calling the internal __len__ method on the my_string object, used by len(my_string)
+
+
+

OUTPUT +

+
12
+hELLO WORLD!
+12
+
+
    +
  • You might even see them chained together. They operate left to +right.
  • +
+
+

PYTHON +

+
print(my_string.isupper())          # Not all the letters are uppercase
+print(my_string.upper())            # This capitalizes all the letters
+
+print(my_string.upper().isupper())  # Now all the letters are uppercase
+
+
+

OUTPUT +

+
False
+HELLO WORLD
+True
+
+

Use the built-in function help to get help for a +function. +

+
+
    +
  • Every built-in function has online documentation.
  • +
+
+

PYTHON +

+
help(round)
+
+
+

OUTPUT +

+
Help on built-in function round in module builtins:
+
+round(number, ndigits=None)
+    Round a number to a given precision in decimal digits.
+
+    The return value is an integer if ndigits is omitted or None.  Otherwise
+    the return value has the same type as the number.  ndigits may be negative.
+
+

The Jupyter Notebook has two ways to get help. +

+
+
    +
  • Option 1: Place the cursor near where the function is invoked in a +cell (i.e., the function name or its parameters), +
      +
    • Hold down Shift, and press Tab.
    • +
    • Do this several times to expand the information returned.
    • +
    +
  • +
  • Option 2: Type the function name in a cell with a question mark +after it. Then run the cell.
  • +

Python reports a syntax error when it can’t understand the source of +a program. +

+
+
    +
  • Won’t even try to run the program if it can’t be parsed.
  • +
+
+

PYTHON +

+
# Forgot to close the quote marks around the string.
+name = 'Feng
+
+
+

ERROR +

+
  File "<ipython-input-56-f42768451d55>", line 2
+    name = 'Feng
+                ^
+SyntaxError: EOL while scanning string literal
+
+
+

PYTHON +

+
# An extra '=' in the assignment.
+age = = 52
+
+
+

ERROR +

+
  File "<ipython-input-57-ccc3df3cf902>", line 2
+    age = = 52
+          ^
+SyntaxError: invalid syntax
+
+
    +
  • Look more closely at the error message:
  • +
+
+

PYTHON +

+
print("hello world"
+
+
+

ERROR +

+
  File "<ipython-input-6-d1cc229bf815>", line 1
+    print ("hello world"
+                        ^
+SyntaxError: unexpected EOF while parsing
+
+
    +
  • The message indicates a problem on first line of the input (“line +1”). +
      +
    • In this case the “ipython-input” section of the file name tells us +that we are working with input into IPython, the Python interpreter used +by the Jupyter Notebook.
    • +
    +
  • +
  • The -6- part of the filename indicates that the error +occurred in cell 6 of our Notebook.
  • +
  • Next is the problematic line of code, indicating the problem with a +^ pointer.
  • +

Python reports a runtime error when something goes wrong while a +program is executing. +

+
+
+

PYTHON +

+
age = 53
+remaining = 100 - aege # mis-spelled 'age'
+
+
+

ERROR +

+
NameError                                 Traceback (most recent call last)
+<ipython-input-59-1214fb6c55fc> in <module>
+      1 age = 53
+----> 2 remaining = 100 - aege # mis-spelled 'age'
+
+NameError: name 'aege' is not defined
+
+
    +
  • Fix syntax errors by reading the source and runtime errors by +tracing execution.
  • +
+
+
+ +
+
+

What Happens When

+
+
    +
  1. Explain in simple terms the order of operations in the following +program: when does the addition happen, when does the subtraction +happen, when is each function called, etc.
  2. +
  3. What is the final value of radiance?
  4. +
+
+

PYTHON +

+
radiance = 1.0
+radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))
+
+
+
+
+
+
+ +
+
+
    +
  1. Order of operations:
  2. +
  3. 1.1 * radiance = 1.1
  4. +
  5. 1.1 - 0.5 = 0.6
  6. +
  7. min(radiance, 0.6) = 0.6
  8. +
  9. 2.0 + 0.6 = 2.6
  10. +
  11. max(2.1, 2.6) = 2.6
  12. +
  13. At the end, radiance = 2.6 +
  14. +
+
+
+
+
+
+
+ +
+
+

Spot the Difference

+
+
    +
  1. Predict what each of the print statements in the +program below will print.
  2. +
  3. Does max(len(rich), poor) run or produce an error +message? If it runs, does its result make any sense?
  4. +
+
+

PYTHON +

+
easy_string = "abc"
+print(max(easy_string))
+rich = "gold"
+poor = "tin"
+print(max(rich, poor))
+print(max(len(rich), len(poor)))
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(max(easy_string))
+
+
+

OUTPUT +

+
c
+
+
+

PYTHON +

+
print(max(rich, poor))
+
+
+

OUTPUT +

+
tin
+
+
+

PYTHON +

+
print(max(len(rich), len(poor)))
+
+
+

OUTPUT +

+
4
+
+

max(len(rich), poor) throws a TypeError. This turns into +max(4, 'tin') and as we discussed earlier a string and +integer cannot meaningfully be compared.

+
+

ERROR +

+
TypeError                                 Traceback (most recent call last)
+<ipython-input-65-bc82ad05177a> in <module>
+----> 1 max(len(rich), poor)
+
+TypeError: '>' not supported between instances of 'str' and 'int'
+
+
+
+
+
+
+
+ +
+
+

Why Not?

+
+

Why is it that max and min do not return +None when they are called with no arguments?

+
+
+
+
+
+ +
+
+

max and min return TypeErrors in this case +because the correct number of parameters was not supplied. If it just +returned None, the error would be much harder to trace as +it would likely be stored into a variable and used later in the program, +only to likely throw a runtime error.

+
+
+
+
+
+
+ +
+
+

Last Character of a String

+
+

If Python starts counting from zero, and len returns the +number of characters in a string, what index expression will get the +last character in the string name? (Note: we will see a +simpler way to do this in a later episode.)

+
+
+
+
+
+ +
+
+

name[len(name) - 1]

+
+
+
+
+
+
+ +
+
+

Explore the Python docs!

+
+

The official Python +documentation is arguably the most complete source of information +about the language. It is available in different languages and contains +a lot of useful resources. The Built-in +Functions page contains a catalogue of all of these functions, +including the ones that we’ve covered in this lesson. Some of these are +more advanced and unnecessary at the moment, but others are very simple +and useful.

+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use comments to add documentation to programs.
  • +
  • A function may take zero or more arguments.
  • +
  • Commonly-used built-in functions include max, +min, and round.
  • +
  • Functions may only work for certain (combinations of) +arguments.
  • +
  • Functions may have default values for some arguments.
  • +
  • Use the built-in function help to get help for a +function.
  • +
  • The Jupyter Notebook has two ways to get help.
  • +
  • Every function returns something.
  • +
  • Python reports a syntax error when it can’t understand the source of +a program.
  • +
  • Python reports a runtime error when something goes wrong while a +program is executing.
  • +
  • Fix syntax errors by reading the source code, and runtime errors by +tracing the program’s execution.
  • +
+
+
+
+

Content from Morning Coffee

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 0 minutes

+
+ +
+

Reflection exercise +

+
+

Over coffee, reflect on and discuss the following:

+
    +
  • What are the different kinds of errors Python will report?
  • +
  • Did the code always produce the results you expected? If not, +why?
  • +
  • Is there something we can do to prevent errors when we write +code?
  • +

Content from Libraries

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 20 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I use software that other people have written?
  • +
  • How can I find out what that software does?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain what software libraries are and why programmers create and +use them.
  • +
  • Write programs that import and use modules from Python’s standard +library.
  • +
  • Find and read documentation for the standard library interactively +(in the interpreter) and online.
  • +
+
+
+
+
+
+

Most of the power of a programming language is in its +libraries. +

+
+
    +
  • A library is a collection of files (called +modules) that contains functions for use by other programs. +
      +
    • May also contain data values (e.g., numerical constants) and other +things.
    • +
    • Library’s contents are supposed to be related, but there’s no way to +enforce that.
    • +
    +
  • +
  • The Python standard +library is an extensive suite of modules that comes with Python +itself.
  • +
  • Many additional libraries are available from PyPI (the Python Package +Index).
  • +
  • We will see later how to write new libraries.
  • +
+
+
+ +
+
+

Libraries and modules

+
+

A library is a collection of modules, but the terms are often used +interchangeably, especially since many libraries only consist of a +single module, so don’t worry if you mix them.

+
+
+
+

A program must import a library module before using it. +

+
+
    +
  • Use import to load a library module into a program’s +memory.
  • +
  • Then refer to things from the module as +module_name.thing_name. +
      +
    • Python uses . to mean “part of”.
    • +
    +
  • +
  • Using math, one of the modules in the standard +library:
  • +
+
+

PYTHON +

+
import math
+
+print('pi is', math.pi)
+print('cos(pi) is', math.cos(math.pi))
+
+
+

OUTPUT +

+
pi is 3.141592653589793
+cos(pi) is -1.0
+
+
    +
  • Have to refer to each item with the module’s name. +
      +
    • +math.cos(pi) won’t work: the reference to +pi doesn’t somehow “inherit” the function’s reference to +math.
    • +
    +
  • +

Use help to learn about the contents of a library +module. +

+
+
    +
  • Works just like help for a function.
  • +
+
+

PYTHON +

+
help(math)
+
+
+

OUTPUT +

+
Help on module math:
+
+NAME
+    math
+
+MODULE REFERENCE
+    http://docs.python.org/3/library/math
+
+    The following documentation is automatically generated from the Python
+    source files.  It may be incomplete, incorrect or include features that
+    are considered implementation detail and may vary between Python
+    implementations.  When in doubt, consult the module reference at the
+    location listed above.
+
+DESCRIPTION
+    This module is always available.  It provides access to the
+    mathematical functions defined by the C standard.
+
+FUNCTIONS
+    acos(x, /)
+        Return the arc cosine (measured in radians) of x.
+⋮ ⋮ ⋮
+
+

Import specific items from a library module to shorten +programs. +

+
+
    +
  • Use from ... import ... to load only specific items +from a library module.
  • +
  • Then refer to them directly without library name as prefix.
  • +
+
+

PYTHON +

+
from math import cos, pi
+
+print('cos(pi) is', cos(pi))
+
+
+

OUTPUT +

+
cos(pi) is -1.0
+
+

Create an alias for a library module when importing it to shorten +programs. +

+
+
    +
  • Use import ... as ... to give a library a short +alias while importing it.
  • +
  • Then refer to items in the library using that shortened name.
  • +
+
+

PYTHON +

+
import math as m
+
+print('cos(pi) is', m.cos(m.pi))
+
+
+

OUTPUT +

+
cos(pi) is -1.0
+
+
    +
  • Commonly used for libraries that are frequently used or have long +names. +
      +
    • E.g., the matplotlib plotting library is often aliased +as mpl.
    • +
    +
  • +
  • But can make programs harder to understand, since readers must learn +your program’s aliases.
  • +
+
+
+ +
+
+

Exploring the Math Module

+
+
    +
  1. What function from the math module can you use to +calculate a square root without using sqrt?
  2. +
  3. Since the library contains this function, why does sqrt +exist?
  4. +
+
+
+
+
+
+ +
+
+
    +
  1. Using help(math) we see that we’ve got +pow(x,y) in addition to sqrt(x), so we could +use pow(x, 0.5) to find a square root.

  2. +
  3. The sqrt(x) function is arguably more readable than +pow(x, 0.5) when implementing equations. Readability is a +cornerstone of good programming, so it makes sense to provide a special +function for this specific common case.

  4. +
+

Also, the design of Python’s math library has its origin +in the C standard, which includes both sqrt(x) and +pow(x,y), so a little bit of the history of programming is +showing in Python’s function names.

+
+
+
+
+
+
+ +
+
+

Locating the Right Module

+
+

You want to select a random character from a string:

+
+

PYTHON +

+
bases = 'ACTTGCTTGAC'
+
+
    +
  1. Which standard +library module could help you?
  2. +
  3. Which function would you select from that module? Are there +alternatives?
  4. +
  5. Try to write a program that uses the function.
  6. +
+
+
+
+
+
+ +
+
+

The random +module seems like it could help.

+

The string has 11 characters, each having a positional index from 0 +to 10. You could use the random.randrange +or random.randint +functions to get a random integer between 0 and 10, and then select the +bases character at that index:

+
+

PYTHON +

+
from random import randrange
+
+random_index = randrange(len(bases))
+print(bases[random_index])
+
+

or more compactly:

+
+

PYTHON +

+
from random import randrange
+
+print(bases[randrange(len(bases))])
+
+

Perhaps you found the random.sample +function? It allows for slightly less typing but might be a bit harder +to understand just by reading:

+
+

PYTHON +

+
from random import sample
+
+print(sample(bases, 1)[0])
+
+

Note that this function returns a list of values. We will learn about +lists in episode 11.

+

The simplest and shortest solution is the random.choice +function that does exactly what we want:

+
+

PYTHON +

+
from random import choice
+
+print(choice(bases))
+
+
+
+
+
+
+
+ +
+
+

Jigsaw Puzzle (Parson’s Problem) Programming Example

+
+

Rearrange the following statements so that a random DNA base is +printed and its index in the string. Not all statements may be needed. +Feel free to use/add intermediate variables.

+
+

PYTHON +

+
bases="ACTTGCTTGAC"
+import math
+import random
+___ = random.randrange(n_bases)
+___ = len(bases)
+print("random base ", bases[___], "base index", ___)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import math 
+import random
+bases = "ACTTGCTTGAC" 
+n_bases = len(bases)
+idx = random.randrange(n_bases)
+print("random base", bases[idx], "base index", idx)
+
+
+
+
+
+
+
+ +
+
+

When Is Help Available?

+
+

When a colleague of yours types help(math), Python +reports an error:

+
+

ERROR +

+
NameError: name 'math' is not defined
+
+

What has your colleague forgotten to do?

+
+
+
+
+
+ +
+
+

Importing the math module (import math)

+
+
+
+
+
+
+ +
+
+

Importing With Aliases

+
+
    +
  1. Fill in the blanks so that the program below prints +90.0.
  2. +
  3. Rewrite the program so that it uses import +without as.
  4. +
  5. Which form do you find easier to read?
  6. +
+
+

PYTHON +

+
import math as m
+angle = ____.degrees(____.pi / 2)
+print(____)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import math as m
+angle = m.degrees(m.pi / 2)
+print(angle)
+
+

can be written as

+
+

PYTHON +

+
import math
+angle = math.degrees(math.pi / 2)
+print(angle)
+
+

Since you just wrote the code and are familiar with it, you might +actually find the first version easier to read. But when trying to read +a huge piece of code written by someone else, or when getting back to +your own huge piece of code after several months, non-abbreviated names +are often easier, except where there are clear abbreviation +conventions.

+
+
+
+
+
+
+ +
+
+

There Are Many Ways To Import Libraries!

+
+

Match the following print statements with the appropriate library +calls.

+

Print commands:

+
    +
  1. print("sin(pi/2) =", sin(pi/2))
  2. +
  3. print("sin(pi/2) =", m.sin(m.pi/2))
  4. +
  5. print("sin(pi/2) =", math.sin(math.pi/2))
  6. +
+

Library calls:

+
    +
  1. from math import sin, pi
  2. +
  3. import math
  4. +
  5. import math as m
  6. +
  7. from math import *
  8. +
+
+
+
+
+
+ +
+
+
    +
  1. Library calls 1 and 4. In order to directly refer to +sin and pi without the library name as prefix, +you need to use the from ... import ... statement. Whereas +library call 1 specifically imports the two functions sin +and pi, library call 4 imports all functions in the +math module.
  2. +
  3. Library call 3. Here sin and pi are +referred to with a shortened library name m instead of +math. Library call 3 does exactly that using the +import ... as ... syntax - it creates an alias for +math in the form of the shortened name m.
  4. +
  5. Library call 2. Here sin and pi are +referred to with the regular library name math, so the +regular import ... call suffices.
  6. +
+

Note: although library call 4 works, importing all +names from a module using a wildcard import is not recommended as it makes it +unclear which names from the module are used in the code. In general it +is best to make your imports as specific as possible and to only import +what your code uses. In library call 1, the import +statement explicitly tells us that the sin function is +imported from the math module, but library call 4 does not +convey this information.

+
+
+
+
+
+
+ +
+
+

Importing Specific Items

+
+
    +
  1. Fill in the blanks so that the program below prints +90.0.
  2. +
  3. Do you find this version easier to read than preceding ones?
  4. +
  5. Why wouldn’t programmers always use this form of +import?
  6. +
+
+

PYTHON +

+
____ math import ____, ____
+angle = degrees(pi / 2)
+print(angle)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
from math import degrees, pi
+angle = degrees(pi / 2)
+print(angle)
+
+

Most likely you find this version easier to read since it’s less +dense. The main reason not to use this form of import is to avoid name +clashes. For instance, you wouldn’t import degrees this way +if you also wanted to use the name degrees for a variable +or function of your own. Or if you were to also import a function named +degrees from another library.

+
+
+
+
+
+
+ +
+
+

Reading Error Messages

+
+
    +
  1. Read the code below and try to identify what the errors are without +running it.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
+
+

PYTHON +

+
from math import log
+log(0)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+<ipython-input-1-d72e1d780bab> in <module>
+      1 from math import log
+----> 2 log(0)
+
+ValueError: math domain error
+
+
    +
  1. The logarithm of x is only defined for +x > 0, so 0 is outside the domain of the function.
  2. +
  3. You get an error of type ValueError, indicating that +the function received an inappropriate argument value. The additional +message “math domain error” makes it clearer what the problem is.
  4. +
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Most of the power of a programming language is in its +libraries.
  • +
  • A program must import a library module in order to use it.
  • +
  • Use help to learn about the contents of a library +module.
  • +
  • Import specific items from a library to shorten programs.
  • +
  • Create an alias for a library when importing it to shorten +programs.
  • +
+
+
+
+

Content from Reading Tabular Data into DataFrames

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 20 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I read tabular data?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Import the Pandas library.
  • +
  • Use Pandas to load a simple CSV data set.
  • +
  • Get some basic information about a Pandas DataFrame.
  • +
+
+
+
+
+
+

Use the Pandas library to do statistics on tabular data. +

+
+
    +
  • +Pandas is a widely-used +Python library for statistics, particularly on tabular data.
  • +
  • Borrows many features from R’s dataframes. +
      +
    • A 2-dimensional table whose columns have names and potentially have +different data types.
    • +
    +
  • +
  • Load Pandas with import pandas as pd. The alias +pd is commonly used to refer to the Pandas library in +code.
  • +
  • Read a Comma Separated Values (CSV) data file with +pd.read_csv. +
      +
    • Argument is the name of the file to be read.
    • +
    • Returns a dataframe that you can assign to a variable
    • +
    +
  • +
+
+

PYTHON +

+
import pandas as pd
+
+data_oceania = pd.read_csv('data/gapminder_gdp_oceania.csv')
+print(data_oceania)
+
+
+

OUTPUT +

+
       country  gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  \
+0    Australia     10039.59564     10949.64959     12217.22686
+1  New Zealand     10556.57566     12247.39532     13175.67800
+
+   gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  \
+0     14526.12465     16788.62948     18334.19751     19477.00928
+1     14463.91893     16046.03728     16233.71770     17632.41040
+
+   gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  \
+0     21888.88903     23424.76683     26997.93657     30687.75473
+1     19007.19129     18363.32494     21050.41377     23189.80135
+
+   gdpPercap_2007
+0     34435.36744
+1     25185.00911
+
+
    +
  • The columns in a dataframe are the observed variables, and the rows +are the observations.
  • +
  • Pandas uses backslash \ to show wrapped lines when +output is too wide to fit the screen.
  • +
  • Using descriptive dataframe names helps us distinguish between +multiple dataframes so we won’t accidentally overwrite a dataframe or +read from the wrong one.
  • +
+
+
+ +
+
+

File Not Found

+
+

Our lessons store their data files in a data +sub-directory, which is why the path to the file is +data/gapminder_gdp_oceania.csv. If you forget to include +data/, or if you include it but your copy of the file is +somewhere else, you will get a runtime +error that ends with a line like this:

+
+

ERROR +

+
FileNotFoundError: [Errno 2] No such file or directory: 'data/gapminder_gdp_oceania.csv'
+
+
+
+
+

Use index_col to specify that a column’s values should +be used as row headings. +

+
+
    +
  • Row headings are numbers (0 and 1 in this case).
  • +
  • Really want to index by country.
  • +
  • Pass the name of the column to read_csv as its +index_col parameter to do this.
  • +
  • Naming the dataframe data_oceania_country tells us +which region the data includes (oceania) and how it is +indexed (country).
  • +
+
+

PYTHON +

+
data_oceania_country = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+print(data_oceania_country)
+
+
+

OUTPUT +

+
             gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+country
+Australia       10039.59564     10949.64959     12217.22686     14526.12465
+New Zealand     10556.57566     12247.39532     13175.67800     14463.91893
+
+             gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+country
+Australia       16788.62948     18334.19751     19477.00928     21888.88903
+New Zealand     16046.03728     16233.71770     17632.41040     19007.19129
+
+             gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+country
+Australia       23424.76683     26997.93657     30687.75473     34435.36744
+New Zealand     18363.32494     21050.41377     23189.80135     25185.00911
+
+

Use the DataFrame.info() method to find out more about +a dataframe. +

+
+
+

PYTHON +

+
data_oceania_country.info()
+
+
+

OUTPUT +

+
<class 'pandas.core.frame.DataFrame'>
+Index: 2 entries, Australia to New Zealand
+Data columns (total 12 columns):
+gdpPercap_1952    2 non-null float64
+gdpPercap_1957    2 non-null float64
+gdpPercap_1962    2 non-null float64
+gdpPercap_1967    2 non-null float64
+gdpPercap_1972    2 non-null float64
+gdpPercap_1977    2 non-null float64
+gdpPercap_1982    2 non-null float64
+gdpPercap_1987    2 non-null float64
+gdpPercap_1992    2 non-null float64
+gdpPercap_1997    2 non-null float64
+gdpPercap_2002    2 non-null float64
+gdpPercap_2007    2 non-null float64
+dtypes: float64(12)
+memory usage: 208.0+ bytes
+
+
    +
  • This is a DataFrame +
  • +
  • Two rows named 'Australia' and +'New Zealand' +
  • +
  • Twelve columns, each of which has two actual 64-bit floating point +values. +
      +
    • We will talk later about null values, which are used to represent +missing observations.
    • +
    +
  • +
  • Uses 208 bytes of memory.
  • +

The DataFrame.columns variable stores information about +the dataframe’s columns. +

+
+
    +
  • Note that this is data, not a method. (It doesn’t have +parentheses.) +
      +
    • Like math.pi.
    • +
    • So do not use () to try to call it.
    • +
    +
  • +
  • Called a member variable, or just member.
  • +
+
+

PYTHON +

+
print(data_oceania_country.columns)
+
+
+

OUTPUT +

+
Index(['gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967',
+       'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987',
+       'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007'],
+      dtype='object')
+
+

Use DataFrame.T to transpose a dataframe. +

+
+
    +
  • Sometimes want to treat columns as rows and vice versa.
  • +
  • Transpose (written .T) doesn’t copy the data, just +changes the program’s view of it.
  • +
  • Like columns, it is a member variable.
  • +
+
+

PYTHON +

+
print(data_oceania_country.T)
+
+
+

OUTPUT +

+
country           Australia  New Zealand
+gdpPercap_1952  10039.59564  10556.57566
+gdpPercap_1957  10949.64959  12247.39532
+gdpPercap_1962  12217.22686  13175.67800
+gdpPercap_1967  14526.12465  14463.91893
+gdpPercap_1972  16788.62948  16046.03728
+gdpPercap_1977  18334.19751  16233.71770
+gdpPercap_1982  19477.00928  17632.41040
+gdpPercap_1987  21888.88903  19007.19129
+gdpPercap_1992  23424.76683  18363.32494
+gdpPercap_1997  26997.93657  21050.41377
+gdpPercap_2002  30687.75473  23189.80135
+gdpPercap_2007  34435.36744  25185.00911
+
+

Use DataFrame.describe() to get summary statistics +about data. +

+
+

DataFrame.describe() gets the summary statistics of only +the columns that have numerical data. All other columns are ignored, +unless you use the argument include='all'.

+
+

PYTHON +

+
print(data_oceania_country.describe())
+
+
+

OUTPUT +

+
       gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+count        2.000000        2.000000        2.000000        2.000000
+mean     10298.085650    11598.522455    12696.452430    14495.021790
+std        365.560078      917.644806      677.727301       43.986086
+min      10039.595640    10949.649590    12217.226860    14463.918930
+25%      10168.840645    11274.086022    12456.839645    14479.470360
+50%      10298.085650    11598.522455    12696.452430    14495.021790
+75%      10427.330655    11922.958888    12936.065215    14510.573220
+max      10556.575660    12247.395320    13175.678000    14526.124650
+
+       gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+count         2.00000        2.000000        2.000000        2.000000
+mean      16417.33338    17283.957605    18554.709840    20448.040160
+std         525.09198     1485.263517     1304.328377     2037.668013
+min       16046.03728    16233.717700    17632.410400    19007.191290
+25%       16231.68533    16758.837652    18093.560120    19727.615725
+50%       16417.33338    17283.957605    18554.709840    20448.040160
+75%       16602.98143    17809.077557    19015.859560    21168.464595
+max       16788.62948    18334.197510    19477.009280    21888.889030
+
+       gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+count        2.000000        2.000000        2.000000        2.000000
+mean     20894.045885    24024.175170    26938.778040    29810.188275
+std       3578.979883     4205.533703     5301.853680     6540.991104
+min      18363.324940    21050.413770    23189.801350    25185.009110
+25%      19628.685413    22537.294470    25064.289695    27497.598692
+50%      20894.045885    24024.175170    26938.778040    29810.188275
+75%      22159.406358    25511.055870    28813.266385    32122.777857
+max      23424.766830    26997.936570    30687.754730    34435.367440
+
+
    +
  • Not particularly useful with just two records, but very helpful when +there are thousands.
  • +
+
+
+ +
+
+

Reading Other Data

+
+

Read the data in gapminder_gdp_americas.csv (which +should be in the same directory as +gapminder_gdp_oceania.csv) into a variable called +data_americas and display its summary statistics.

+
+
+
+
+
+ +
+
+

To read in a CSV, we use pd.read_csv and pass the +filename 'data/gapminder_gdp_americas.csv' to it. We also +once again pass the column name 'country' to the parameter +index_col in order to index by country. The summary +statistics can be displayed with the DataFrame.describe() +method.

+
+

PYTHON +

+
data_americas = pd.read_csv('data/gapminder_gdp_americas.csv', index_col='country')
+data_americas.describe()
+
+
+
+
+
+
+
+ +
+
+

Inspecting Data

+
+

After reading the data for the Americas, use +help(data_americas.head) and +help(data_americas.tail) to find out what +DataFrame.head and DataFrame.tail do.

+
    +
  1. What method call will display the first three rows of this +data?
  2. +
  3. What method call will display the last three columns of this data? +(Hint: you may need to change your view of the data.)
  4. +
+
+
+
+
+
+ +
+
+
    +
  1. We can check out the first five rows of data_americas +by executing data_americas.head() which lets us view the +beginning of the DataFrame. We can specify the number of rows we wish to +see by specifying the parameter n in our call to +data_americas.head(). To view the first three rows, +execute:
  2. +
+
+

PYTHON +

+
data_americas.head(n=3)
+
+
+

OUTPUT +

+
          continent  gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  \
+country
+Argentina  Americas     5911.315053     6856.856212     7133.166023
+Bolivia    Americas     2677.326347     2127.686326     2180.972546
+Brazil     Americas     2108.944355     2487.365989     3336.585802
+
+          gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  \
+country
+Argentina     8052.953021     9443.038526    10079.026740     8997.897412
+Bolivia       2586.886053     2980.331339     3548.097832     3156.510452
+Brazil        3429.864357     4985.711467     6660.118654     7030.835878
+
+           gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  \
+country
+Argentina     9139.671389     9308.418710    10967.281950     8797.640716
+Bolivia       2753.691490     2961.699694     3326.143191     3413.262690
+Brazil        7807.095818     6950.283021     7957.980824     8131.212843
+
+           gdpPercap_2007
+country
+Argentina    12779.379640
+Bolivia       3822.137084
+Brazil        9065.800825
+
+
    +
  1. To check out the last three rows of data_americas, we +would use the command, americas.tail(n=3), analogous to +head() used above. However, here we want to look at the +last three columns so we need to change our view and then use +tail(). To do so, we create a new DataFrame in which rows +and columns are switched:
  2. +
+
+

PYTHON +

+
americas_flipped = data_americas.T
+
+

We can then view the last three columns of americas by +viewing the last three rows of americas_flipped:

+
+

PYTHON +

+
americas_flipped.tail(n=3)
+
+
+

OUTPUT +

+
country        Argentina  Bolivia   Brazil   Canada    Chile Colombia  \
+gdpPercap_1997   10967.3  3326.14  7957.98  28954.9  10118.1  6117.36
+gdpPercap_2002   8797.64  3413.26  8131.21    33329  10778.8  5755.26
+gdpPercap_2007   12779.4  3822.14   9065.8  36319.2  13171.6  7006.58
+
+country        Costa Rica     Cuba Dominican Republic  Ecuador    ...     \
+gdpPercap_1997    6677.05  5431.99             3614.1  7429.46    ...
+gdpPercap_2002    7723.45  6340.65            4563.81  5773.04    ...
+gdpPercap_2007    9645.06   8948.1            6025.37  6873.26    ...
+
+country          Mexico Nicaragua   Panama Paraguay     Peru Puerto Rico  \
+gdpPercap_1997   9767.3   2253.02  7113.69   4247.4  5838.35     16999.4
+gdpPercap_2002  10742.4   2474.55  7356.03  3783.67  5909.02     18855.6
+gdpPercap_2007  11977.6   2749.32  9809.19  4172.84  7408.91     19328.7
+
+country        Trinidad and Tobago United States  Uruguay Venezuela
+gdpPercap_1997             8792.57       35767.4  9230.24   10165.5
+gdpPercap_2002             11460.6       39097.1     7727   8605.05
+gdpPercap_2007             18008.5       42951.7  10611.5   11415.8
+
+

This shows the data that we want, but we may prefer to display three +columns instead of three rows, so we can flip it back:

+
+

PYTHON +

+
americas_flipped.tail(n=3).T    
+
+

Note: we could have done the above in a single line +of code by ‘chaining’ the commands:

+
+

PYTHON +

+
data_americas.T.tail(n=3).T
+
+
+
+
+
+
+
+ +
+
+

Reading Files in Other Directories

+
+

The data for your current project is stored in a file called +microbes.csv, which is located in a folder called +field_data. You are doing analysis in a notebook called +analysis.ipynb in a sibling folder called +thesis:

+
+

OUTPUT +

+
your_home_directory
++-- field_data/
+|   +-- microbes.csv
++-- thesis/
+    +-- analysis.ipynb
+
+

What value(s) should you pass to read_csv to read +microbes.csv in analysis.ipynb?

+
+
+
+
+
+ +
+
+

We need to specify the path to the file of interest in the call to +pd.read_csv. We first need to ‘jump’ out of the folder +thesis using ‘../’ and then into the folder +field_data using ‘field_data/’. Then we can specify the +filename `microbes.csv. The result is as follows:

+
+

PYTHON +

+
data_microbes = pd.read_csv('../field_data/microbes.csv')
+
+
+
+
+
+
+
+ +
+
+

Writing Data

+
+

As well as the read_csv function for reading data from a +file, Pandas provides a to_csv function to write dataframes +to files. Applying what you’ve learned about reading from files, write +one of your dataframes to a file called processed.csv. You +can use help to get information on how to use +to_csv.

+
+
+
+
+
+ +
+
+

In order to write the DataFrame data_americas to a file +called processed.csv, execute the following command:

+
+

PYTHON +

+
data_americas.to_csv('processed.csv')
+
+

For help on read_csv or to_csv, you could +execute, for example:

+
+

PYTHON +

+
help(data_americas.to_csv)
+help(pd.read_csv)
+
+

Note that help(to_csv) or help(pd.to_csv) +throws an error! This is due to the fact that to_csv is not +a global Pandas function, but a member function of DataFrames. This +means you can only call it on an instance of a DataFrame e.g., +data_americas.to_csv or +data_oceania.to_csv

+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use the Pandas library to get basic statistics out of tabular +data.
  • +
  • Use index_col to specify that a column’s values should +be used as row headings.
  • +
  • Use DataFrame.info to find out more about a +dataframe.
  • +
  • The DataFrame.columns variable stores information about +the dataframe’s columns.
  • +
  • Use DataFrame.T to transpose a dataframe.
  • +
  • Use DataFrame.describe to get summary statistics about +data.
  • +
+
+
+
+

Content from Pandas DataFrames

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 30 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I do statistical analysis of tabular data?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Select individual values from a Pandas dataframe.
  • +
  • Select entire rows or entire columns from a dataframe.
  • +
  • Select a subset of both rows and columns from a dataframe in a +single operation.
  • +
  • Select a subset of a dataframe by a single Boolean criterion.
  • +
+
+
+
+
+
+

Note about Pandas DataFrames/Series +

+
+

A DataFrame +is a collection of Series; +The DataFrame is the way Pandas represents a table, and Series is the +data-structure Pandas use to represent a column.

+

Pandas is built on top of the Numpy library, which in practice means +that most of the methods defined for Numpy Arrays apply to Pandas +Series/DataFrames.

+

What makes Pandas so attractive is the powerful interface to access +individual records of the table, proper handling of missing values, and +relational-databases operations between DataFrames.

+

Selecting values +

+
+

To access a value at the position [i,j] of a DataFrame, +we have two options, depending on what is the meaning of i +in use. Remember that a DataFrame provides an index as a way to +identify the rows of the table; a row, then, has a position +inside the table as well as a label, which uniquely identifies +its entry in the DataFrame.

+

Use DataFrame.iloc[..., ...] to select values by their +(entry) position +

+
+
    +
  • Can specify location by numerical index analogously to 2D version of +character selection in strings.
  • +
+
+

PYTHON +

+
import pandas as pd
+data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.iloc[0, 0])
+
+
+

OUTPUT +

+
1601.056136
+
+

Use DataFrame.loc[..., ...] to select values by their +(entry) label. +

+
+
    +
  • Can specify location by row and/or column name.
  • +
+
+

PYTHON +

+
print(data.loc["Albania", "gdpPercap_1952"])
+
+
+

OUTPUT +

+
1601.056136
+
+

Use : on its own to mean all columns or all rows. +

+
+
    +
  • Just like Python’s usual slicing notation.
  • +
+
+

PYTHON +

+
print(data.loc["Albania", :])
+
+
+

OUTPUT +

+
gdpPercap_1952    1601.056136
+gdpPercap_1957    1942.284244
+gdpPercap_1962    2312.888958
+gdpPercap_1967    2760.196931
+gdpPercap_1972    3313.422188
+gdpPercap_1977    3533.003910
+gdpPercap_1982    3630.880722
+gdpPercap_1987    3738.932735
+gdpPercap_1992    2497.437901
+gdpPercap_1997    3193.054604
+gdpPercap_2002    4604.211737
+gdpPercap_2007    5937.029526
+Name: Albania, dtype: float64
+
+
    +
  • Would get the same result printing data.loc["Albania"] +(without a second index).
  • +
+
+

PYTHON +

+
print(data.loc[:, "gdpPercap_1952"])
+
+
+

OUTPUT +

+
country
+Albania                    1601.056136
+Austria                    6137.076492
+Belgium                    8343.105127
+⋮ ⋮ ⋮
+Switzerland               14734.232750
+Turkey                     1969.100980
+United Kingdom             9979.508487
+Name: gdpPercap_1952, dtype: float64
+
+
    +
  • Would get the same result printing +data["gdpPercap_1952"] +
  • +
  • Also get the same result printing data.gdpPercap_1952 +(not recommended, because easily confused with . notation +for methods)
  • +

Select multiple columns or rows using DataFrame.loc and +a named slice. +

+
+
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'])
+
+
+

OUTPUT +

+
             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy           8243.582340    10022.401310    12269.273780
+Montenegro      4649.593785     5907.850937     7778.414017
+Netherlands    12790.849560    15363.251360    18794.745670
+Norway         13450.401510    16361.876470    18965.055510
+Poland          5338.752143     6557.152776     8006.506993
+
+

In the above code, we discover that slicing using +loc is inclusive at both ends, which differs from +slicing using iloc, where slicing +indicates everything up to but not including the final index.

+

Result of slicing can be used in further operations. +

+
+
    +
  • Usually don’t just print a slice.
  • +
  • All the statistical operators that work on entire dataframes work +the same way on slices.
  • +
  • E.g., calculate max of a slice.
  • +
+
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].max())
+
+
+

OUTPUT +

+
gdpPercap_1962    13450.40151
+gdpPercap_1967    16361.87647
+gdpPercap_1972    18965.05551
+dtype: float64
+
+
+

PYTHON +

+
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].min())
+
+
+

OUTPUT +

+
gdpPercap_1962    4649.593785
+gdpPercap_1967    5907.850937
+gdpPercap_1972    7778.414017
+dtype: float64
+
+

Use comparisons to select data based on value. +

+
+
    +
  • Comparison is applied element by element.
  • +
  • Returns a similarly-shaped dataframe of True and +False.
  • +
+
+

PYTHON +

+
# Use a subset of data to keep output readable.
+subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
+print('Subset of data:\n', subset)
+
+# Which values were greater than 10000 ?
+print('\nWhere are values large?\n', subset > 10000)
+
+
+

OUTPUT +

+
Subset of data:
+             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy           8243.582340    10022.401310    12269.273780
+Montenegro      4649.593785     5907.850937     7778.414017
+Netherlands    12790.849560    15363.251360    18794.745670
+Norway         13450.401510    16361.876470    18965.055510
+Poland          5338.752143     6557.152776     8006.506993
+
+Where are values large?
+            gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
+country
+Italy                False           True           True
+Montenegro           False          False          False
+Netherlands           True           True           True
+Norway                True           True           True
+Poland               False          False          False
+
+

Select values or NaN using a Boolean mask. +

+
+
    +
  • A frame full of Booleans is sometimes called a mask because +of how it can be used.
  • +
+
+

PYTHON +

+
mask = subset > 10000
+print(subset[mask])
+
+
+

OUTPUT +

+
             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+country
+Italy                   NaN     10022.40131     12269.27378
+Montenegro              NaN             NaN             NaN
+Netherlands     12790.84956     15363.25136     18794.74567
+Norway          13450.40151     16361.87647     18965.05551
+Poland                  NaN             NaN             NaN
+
+
    +
  • Get the value where the mask is true, and NaN (Not a Number) where +it is false.
  • +
  • Useful because NaNs are ignored by operations like max, min, +average, etc.
  • +
+
+

PYTHON +

+
print(subset[subset > 10000].describe())
+
+
+

OUTPUT +

+
       gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
+count        2.000000        3.000000        3.000000
+mean     13120.625535    13915.843047    16676.358320
+std        466.373656     3408.589070     3817.597015
+min      12790.849560    10022.401310    12269.273780
+25%      12955.737547    12692.826335    15532.009725
+50%      13120.625535    15363.251360    18794.745670
+75%      13285.513523    15862.563915    18879.900590
+max      13450.401510    16361.876470    18965.055510
+
+

Group By: split-apply-combine +

+
+
+
+ +
+
+

Learners often struggle here, many may not work with financial data +and concepts so they find the example concepts difficult to get their +head around. The biggest problem though is the line generating the +wealth_score, this step needs to be talked through throughly: * It uses +implicit conversion between boolean and float values which has not been +covered in the course so far. * The axis=1 argument needs to be +explained clearly.

+
+
+
+
+

Pandas vectorizing methods and grouping operations are features that +provide users much flexibility to analyse their data.

+

For instance, let’s say we want to have a clearer view on how the +European countries split themselves according to their GDP.

+
    +
  1. We may have a glance by splitting the countries in two groups during +the years surveyed, those who presented a GDP higher than the +European average and those with a lower GDP.
  2. +
  3. We then estimate a wealthy score based on the historical +(from 1962 to 2007) values, where we account how many times a country +has participated in the groups of lower or higher +GDP
  4. +
+
+

PYTHON +

+
mask_higher = data > data.mean()
+wealth_score = mask_higher.aggregate('sum', axis=1) / len(data.columns)
+print(wealth_score)
+
+
+

OUTPUT +

+
country
+Albania                   0.000000
+Austria                   1.000000
+Belgium                   1.000000
+Bosnia and Herzegovina    0.000000
+Bulgaria                  0.000000
+Croatia                   0.000000
+Czech Republic            0.500000
+Denmark                   1.000000
+Finland                   1.000000
+France                    1.000000
+Germany                   1.000000
+Greece                    0.333333
+Hungary                   0.000000
+Iceland                   1.000000
+Ireland                   0.333333
+Italy                     0.500000
+Montenegro                0.000000
+Netherlands               1.000000
+Norway                    1.000000
+Poland                    0.000000
+Portugal                  0.000000
+Romania                   0.000000
+Serbia                    0.000000
+Slovak Republic           0.000000
+Slovenia                  0.333333
+Spain                     0.333333
+Sweden                    1.000000
+Switzerland               1.000000
+Turkey                    0.000000
+United Kingdom            1.000000
+dtype: float64
+
+

Finally, for each group in the wealth_score table, we +sum their (financial) contribution across the years surveyed using +chained methods:

+
+

PYTHON +

+
print(data.groupby(wealth_score).sum())
+
+
+

OUTPUT +

+
          gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
+0.000000    36916.854200    46110.918793    56850.065437    71324.848786
+0.333333    16790.046878    20942.456800    25744.935321    33567.667670
+0.500000    11807.544405    14505.000150    18380.449470    21421.846200
+1.000000   104317.277560   127332.008735   149989.154201   178000.350040
+
+          gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
+0.000000    88569.346898   104459.358438   113553.768507   119649.599409
+0.333333    45277.839976    53860.456750    59679.634020    64436.912960
+0.500000    25377.727380    29056.145370    31914.712050    35517.678220
+1.000000   215162.343140   241143.412730   263388.781960   296825.131210
+
+          gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007
+0.000000    92380.047256   103772.937598   118590.929863   149577.357928
+0.333333    67918.093220    80876.051580   102086.795210   122803.729520
+0.500000    36310.666080    40723.538700    45564.308390    51403.028210
+1.000000   315238.235970   346930.926170   385109.939210   427850.333420
+
+
+
+ +
+
+

Selection of Individual Values

+
+

Assume Pandas has been imported into your notebook and the Gapminder +GDP data for Europe has been loaded:

+
+

PYTHON +

+
import pandas as pd
+
+data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+
+

Write an expression to find the Per Capita GDP of Serbia in 2007.

+
+
+
+
+
+ +
+
+

The selection can be done by using the labels for both the row +(“Serbia”) and the column (“gdpPercap_2007”):

+
+

PYTHON +

+
print(data_europe.loc['Serbia', 'gdpPercap_2007'])
+
+

The output is

+
+

OUTPUT +

+
9786.534714
+
+
+
+
+
+
+
+ +
+
+

Extent of Slicing

+
+
    +
  1. Do the two statements below produce the same output?
  2. +
  3. Based on this, what rule governs what is included (or not) in +numerical slices and named slices in Pandas?
  4. +
+
+

PYTHON +

+
print(data_europe.iloc[0:2, 0:2])
+print(data_europe.loc['Albania':'Belgium', 'gdpPercap_1952':'gdpPercap_1962'])
+
+
+
+
+
+
+ +
+
+

No, they do not produce the same output! The output of the first +statement is:

+
+

OUTPUT +

+
        gdpPercap_1952  gdpPercap_1957
+country
+Albania     1601.056136     1942.284244
+Austria     6137.076492     8842.598030
+
+

The second statement gives:

+
+

OUTPUT +

+
        gdpPercap_1952  gdpPercap_1957  gdpPercap_1962
+country
+Albania     1601.056136     1942.284244     2312.888958
+Austria     6137.076492     8842.598030    10750.721110
+Belgium     8343.105127     9714.960623    10991.206760
+
+

Clearly, the second statement produces an additional column and an +additional row compared to the first statement.
+What conclusion can we draw? We see that a numerical slice, 0:2, +omits the final index (i.e. index 2) in the range provided, +while a named slice, ‘gdpPercap_1952’:‘gdpPercap_1962’, +includes the final element.

+
+
+
+
+
+
+ +
+
+

Reconstructing Data

+
+

Explain what each line in the following short program does: what is +in first, second, etc.?

+
+

PYTHON +

+
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+second = first[first['continent'] == 'Americas']
+third = second.drop('Puerto Rico')
+fourth = third.drop('continent', axis = 1)
+fourth.to_csv('result.csv')
+
+
+
+
+
+
+ +
+
+

Let’s go through this piece of code line by line.

+
+

PYTHON +

+
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
+
+

This line loads the dataset containing the GDP data from all +countries into a dataframe called first. The +index_col='country' parameter selects which column to use +as the row labels in the dataframe.

+
+

PYTHON +

+
second = first[first['continent'] == 'Americas']
+
+

This line makes a selection: only those rows of first +for which the ‘continent’ column matches ‘Americas’ are extracted. +Notice how the Boolean expression inside the brackets, +first['continent'] == 'Americas', is used to select only +those rows where the expression is true. Try printing this expression! +Can you print also its individual True/False elements? (hint: first +assign the expression to a variable)

+
+

PYTHON +

+
third = second.drop('Puerto Rico')
+
+

As the syntax suggests, this line drops the row from +second where the label is ‘Puerto Rico’. The resulting +dataframe third has one row less than the original +dataframe second.

+
+

PYTHON +

+
fourth = third.drop('continent', axis = 1)
+
+

Again we apply the drop function, but in this case we are dropping +not a row but a whole column. To accomplish this, we need to specify +also the axis parameter (we want to drop the second column +which has index 1).

+
+

PYTHON +

+
fourth.to_csv('result.csv')
+
+

The final step is to write the data that we have been working on to a +csv file. Pandas makes this easy with the to_csv() +function. The only required argument to the function is the filename. +Note that the file will be written in the directory from which you +started the Jupyter or Python session.

+
+
+
+
+
+
+ +
+
+

Selecting Indices

+
+

Explain in simple terms what idxmin and +idxmax do in the short program below. When would you use +these methods?

+
+

PYTHON +

+
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+print(data.idxmin())
+print(data.idxmax())
+
+
+
+
+
+
+ +
+
+

For each column in data, idxmin will return +the index value corresponding to each column’s minimum; +idxmax will do accordingly the same for each column’s +maximum value.

+

You can use these functions whenever you want to get the row index of +the minimum/maximum value and not the actual minimum/maximum value.

+
+
+
+
+
+
+ +
+
+

Practice with Selection

+
+

Assume Pandas has been imported and the Gapminder GDP data for Europe +has been loaded. Write an expression to select each of the +following:

+
    +
  1. GDP per capita for all countries in 1982.
  2. +
  3. GDP per capita for Denmark for all years.
  4. +
  5. GDP per capita for all countries for years after 1985.
  6. +
  7. GDP per capita for each country in 2007 as a multiple of GDP per +capita for that country in 1952.
  8. +
+
+
+
+
+
+ +
+
+

1:

+
+

PYTHON +

+
data['gdpPercap_1982']
+
+

2:

+
+

PYTHON +

+
data.loc['Denmark',:]
+
+

3:

+
+

PYTHON +

+
data.loc[:,'gdpPercap_1985':]
+
+

Pandas is smart enough to recognize the number at the end of the +column label and does not give you an error, although no column named +gdpPercap_1985 actually exists. This is useful if new +columns are added to the CSV file later.

+

4:

+
+

PYTHON +

+
data['gdpPercap_2007']/data['gdpPercap_1952']
+
+
+
+
+
+
+
+ +
+
+

Many Ways of Access

+
+

There are at least two ways of accessing a value or slice of a +DataFrame: by name or index. However, there are many others. For +example, a single column or row can be accessed either as a +DataFrame or a Series object.

+

Suggest different ways of doing the following operations on a +DataFrame:

+
    +
  1. Access a single column
  2. +
  3. Access a single row
  4. +
  5. Access an individual DataFrame element
  6. +
  7. Access several columns
  8. +
  9. Access several rows
  10. +
  11. Access a subset of specific rows and columns
  12. +
  13. Access a subset of row and column ranges
  14. +
+
+
+
+
+
+ +
+
+

1. Access a single column:

+
+

PYTHON +

+
# by name
+data["col_name"]   # as a Series
+data[["col_name"]] # as a DataFrame
+
+# by name using .loc
+data.T.loc["col_name"]  # as a Series
+data.T.loc[["col_name"]].T  # as a DataFrame
+
+# Dot notation (Series)
+data.col_name
+
+# by index (iloc)
+data.iloc[:, col_index]   # as a Series
+data.iloc[:, [col_index]] # as a DataFrame
+
+# using a mask
+data.T[data.T.index == "col_name"].T
+
+

2. Access a single row:

+
+

PYTHON +

+
# by name using .loc
+data.loc["row_name"] # as a Series
+data.loc[["row_name"]] # as a DataFrame
+
+# by name
+data.T["row_name"] # as a Series
+data.T[["row_name"]].T # as a DataFrame
+
+# by index
+data.iloc[row_index]   # as a Series
+data.iloc[[row_index]]   # as a DataFrame
+
+# using mask
+data[data.index == "row_name"]
+
+

3. Access an individual DataFrame element:

+
+

PYTHON +

+
# by column/row names
+data["column_name"]["row_name"]         # as a Series
+
+data[["col_name"]].loc["row_name"]  # as a Series
+data[["col_name"]].loc[["row_name"]]  # as a DataFrame
+
+data.loc["row_name"]["col_name"]  # as a value
+data.loc[["row_name"]]["col_name"]  # as a Series
+data.loc[["row_name"]][["col_name"]]  # as a DataFrame
+
+data.loc["row_name", "col_name"]  # as a value
+data.loc[["row_name"], "col_name"]  # as a Series. Preserves index. Column name is moved to `.name`.
+data.loc["row_name", ["col_name"]]  # as a Series. Index is moved to `.name.` Sets index to column name.
+data.loc[["row_name"], ["col_name"]]  # as a DataFrame (preserves original index and column name)
+
+# by column/row names: Dot notation
+data.col_name.row_name
+
+# by column/row indices
+data.iloc[row_index, col_index] # as a value
+data.iloc[[row_index], col_index] # as a Series. Preserves index. Column name is moved to `.name`
+data.iloc[row_index, [col_index]] # as a Series. Index is moved to `.name.` Sets index to column name.
+data.iloc[[row_index], [col_index]] # as a DataFrame (preserves original index and column name)
+
+# column name + row index
+data["col_name"][row_index]
+data.col_name[row_index]
+data["col_name"].iloc[row_index]
+
+# column index + row name
+data.iloc[:, [col_index]].loc["row_name"]  # as a Series
+data.iloc[:, [col_index]].loc[["row_name"]]  # as a DataFrame
+
+# using masks
+data[data.index == "row_name"].T[data.T.index == "col_name"].T
+
+

4. Access several columns:

+
+

PYTHON +

+
# by name
+data[["col1", "col2", "col3"]]
+data.loc[:, ["col1", "col2", "col3"]]
+
+# by index
+data.iloc[:, [col1_index, col2_index, col3_index]]
+
+

5. Access several rows

+
+

PYTHON +

+
# by name
+data.loc[["row1", "row2", "row3"]]
+
+# by index
+data.iloc[[row1_index, row2_index, row3_index]]
+
+

6. Access a subset of specific rows and columns

+
+

PYTHON +

+
# by names
+data.loc[["row1", "row2", "row3"], ["col1", "col2", "col3"]]
+
+# by indices
+data.iloc[[row1_index, row2_index, row3_index], [col1_index, col2_index, col3_index]]
+
+# column names + row indices
+data[["col1", "col2", "col3"]].iloc[[row1_index, row2_index, row3_index]]
+
+# column indices + row names
+data.iloc[:, [col1_index, col2_index, col3_index]].loc[["row1", "row2", "row3"]]
+
+

7. Access a subset of row and column ranges

+
+

PYTHON +

+
# by name
+data.loc["row1":"row2", "col1":"col2"]
+
+# by index
+data.iloc[row1_index:row2_index, col1_index:col2_index]
+
+# column names + row indices
+data.loc[:, "col1_name":"col2_name"].iloc[row1_index:row2_index]
+
+# column indices + row names
+data.iloc[:, col1_index:col2_index].loc["row1":"row2"]
+
+
+
+
+
+
+
+ +
+
+

Exploring available methods using the +dir() function

+
+

Python includes a dir() function that can be used to +display all of the available methods (functions) that are built into a +data object. In Episode 4, we used some methods with a string. But we +can see many more are available by using dir():

+
+

PYTHON +

+
my_string = 'Hello world!'   # creation of a string object 
+dir(my_string)
+
+

This command returns:

+
+

PYTHON +

+
['__add__',
+...
+'__subclasshook__',
+'capitalize',
+'casefold',
+'center',
+...
+'upper',
+'zfill']
+
+

You can use help() or Shift+Tab to +get more information about what these methods do.

+

Assume Pandas has been imported and the Gapminder GDP data for Europe +has been loaded as data. Then, use dir() to +find the function that prints out the median per-capita GDP across all +European countries for each year that information is available.

+
+
+
+
+
+ +
+
+

Among many choices, dir() lists the +median() function as a possibility. Thus,

+
+

PYTHON +

+
data.median()
+
+
+
+
+
+
+
+ +
+
+

Interpretation

+
+

Poland’s borders have been stable since 1945, but changed several +times in the years before then. How would you handle this if you were +creating a table of GDP per capita for Poland for the entire twentieth +century?

+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use DataFrame.iloc[..., ...] to select values by +integer location.
  • +
  • Use : on its own to mean all columns or all rows.
  • +
  • Select multiple columns or rows using DataFrame.loc and +a named slice.
  • +
  • Result of slicing can be used in further operations.
  • +
  • Use comparisons to select data based on value.
  • +
  • Select values or NaN using a Boolean mask.
  • +
+
+
+
+

Content from Plotting

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 30 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I plot my data?
  • +
  • How can I save my plot for publishing?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Create a time series plot showing a single data set.
  • +
  • Create a scatter plot showing relationship between two data +sets.
  • +
+
+
+
+
+
+

+matplotlib is the +most widely used scientific plotting library in Python. +

+
+
    +
  • Commonly use a sub-library called matplotlib.pyplot.
  • +
  • The Jupyter Notebook will render plots inline by default.
  • +
+
+

PYTHON +

+
import matplotlib.pyplot as plt
+
+
    +
  • Simple plots are then (fairly) simple to create.
  • +
+
+

PYTHON +

+
time = [0, 1, 2, 3]
+position = [0, 100, 200, 300]
+
+plt.plot(time, position)
+plt.xlabel('Time (hr)')
+plt.ylabel('Position (km)')
+
+
A line chart showing time (hr) relative to position (km), using the values provided in the code block above. By default, the plotted line is blue against a white background, and the axes have been scaled automatically to fit the range of the input data.
+
+ +
+
+

Display All Open Figures

+
+

In our Jupyter Notebook example, running the cell should generate the +figure directly below the code. The figure is also included in the +Notebook document for future viewing. However, other Python environments +like an interactive Python session started from a terminal or a Python +script executed via the command line require an additional command to +display the figure.

+

Instruct matplotlib to show a figure:

+
+

PYTHON +

+
plt.show()
+
+

This command can also be used within a Notebook - for instance, to +display multiple figures if several are created by a single cell.

+
+
+
+

Plot data directly from a Pandas dataframe. +

+
+
    +
  • We can also plot Pandas +dataframes.
  • +
  • Before plotting, we convert the column headings from a +string to integer data type, since they +represent numerical values, using str.replace() +to remove the gpdPercap_ prefix and then astype(int) +to convert the series of string values +(['1952', '1957', ..., '2007']) to a series of integers: +[1925, 1957, ..., 2007].
  • +
+
+

PYTHON +

+
import pandas as pd
+
+data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
+
+# Extract year from last 4 characters of each column name
+# The current column names are structured as 'gdpPercap_(year)', 
+# so we want to keep the (year) part only for clarity when plotting GDP vs. years
+# To do this we use replace(), which removes from the string the characters stated in the argument
+# This method works on strings, so we use replace() from Pandas Series.str vectorized string functions
+
+years = data.columns.str.replace('gdpPercap_', '')
+
+# Convert year values to integers, saving results back to dataframe
+
+data.columns = years.astype(int)
+
+data.loc['Australia'].plot()
+
+
GDP plot for Australia

Select and transform data, then plot it. +

+
+
    +
  • By default, DataFrame.plot +plots with the rows as the X axis.
  • +
  • We can transpose the data in order to plot multiple series.
  • +
+
+

PYTHON +

+
data.T.plot()
+plt.ylabel('GDP per capita')
+
+
GDP plot for Australia and New Zealand

Many styles of plot are available. +

+
+
    +
  • For example, do a bar plot using a fancier style.
  • +
+
+

PYTHON +

+
plt.style.use('ggplot')
+data.T.plot(kind='bar')
+plt.ylabel('GDP per capita')
+
+
GDP barplot for Australia

Data can also be plotted by calling the matplotlib +plot function directly. +

+
+
    +
  • The command is plt.plot(x, y) +
  • +
  • The color and format of markers can also be specified as an +additional optional argument e.g., b- is a blue line, +g-- is a green dashed line.
  • +

Get Australia data from dataframe +

+
+
+

PYTHON +

+
years = data.columns
+gdp_australia = data.loc['Australia']
+
+plt.plot(years, gdp_australia, 'g--')
+
+
GDP formatted plot for Australia

Can plot many sets of data together. +

+
+
+

PYTHON +

+
# Select two countries' worth of data.
+gdp_australia = data.loc['Australia']
+gdp_nz = data.loc['New Zealand']
+
+# Plot with differently-colored markers.
+plt.plot(years, gdp_australia, 'b-', label='Australia')
+plt.plot(years, gdp_nz, 'g-', label='New Zealand')
+
+# Create legend.
+plt.legend(loc='upper left')
+plt.xlabel('Year')
+plt.ylabel('GDP per capita ($)')
+
+
+
+ +
+
+

Adding a Legend

+
+

Often when plotting multiple datasets on the same figure it is +desirable to have a legend describing the data.

+

This can be done in matplotlib in two stages:

+
    +
  • Provide a label for each dataset in the figure:
  • +
+
+

PYTHON +

+
plt.plot(years, gdp_australia, label='Australia')
+plt.plot(years, gdp_nz, label='New Zealand')
+
+
    +
  • Instruct matplotlib to create the legend.
  • +
+
+

PYTHON +

+
plt.legend()
+
+

By default matplotlib will attempt to place the legend in a suitable +position. If you would rather specify a position this can be done with +the loc= argument, e.g to place the legend in the upper +left corner of the plot, specify loc='upper left'

+
+
+
+
GDP formatted plot for Australia and New Zealand
    +
  • Plot a scatter plot correlating the GDP of Australia and New +Zealand
  • +
  • Use either plt.scatter or +DataFrame.plot.scatter +
  • +
+
+

PYTHON +

+
plt.scatter(gdp_australia, gdp_nz)
+
+
GDP correlation using plt.scatter
+

PYTHON +

+
data.T.plot.scatter(x = 'Australia', y = 'New Zealand')
+
+
GDP correlation using data.T.plot.scatter
+
+ +
+
+

Minima and Maxima

+
+

Fill in the blanks below to plot the minimum GDP per capita over time +for all the countries in Europe. Modify it again to plot the maximum GDP +per capita over time for Europe.

+
+

PYTHON +

+
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+data_europe.____.plot(label='min')
+data_europe.____
+plt.legend(loc='best')
+plt.xticks(rotation=90)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
+data_europe.min().plot(label='min')
+data_europe.max().plot(label='max')
+plt.legend(loc='best')
+plt.xticks(rotation=90)
+
+
Minima Maxima Solution
+
+
+
+
+
+
+ +
+
+

Correlations

+
+

Modify the example in the notes to create a scatter plot showing the +relationship between the minimum and maximum GDP per capita among the +countries in Asia for each year in the data set. What relationship do +you see (if any)?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col='country')
+data_asia.describe().T.plot(kind='scatter', x='min', y='max')
+
+
Correlations Solution 1

No particular correlations can be seen between the minimum and +maximum GDP values year on year. It seems the fortunes of asian +countries do not rise and fall together.

+
+
+
+
+
+
+ +
+
+

Correlations (continued) +

+
+

You might note that the variability in the maximum is much higher +than that of the minimum. Take a look at the maximum and the max +indexes:

+
+

PYTHON +

+
data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col='country')
+data_asia.max().plot()
+print(data_asia.idxmax())
+print(data_asia.idxmin())
+
+
+
+
+
+
+ +
+
+
Correlations Solution 2

Seems the variability in this value is due to a sharp drop after +1972. Some geopolitics at play perhaps? Given the dominance of oil +producing countries, maybe the Brent crude index would make an +interesting comparison? Whilst Myanmar consistently has the lowest GDP, +the highest GDP nation has varied more notably.

+
+
+
+
+
+
+ +
+
+

More Correlations

+
+

This short program creates a plot showing the correlation between GDP +and life expectancy for 2007, normalizing marker size by population:

+
+

PYTHON +

+
data_all = pd.read_csv('data/gapminder_all.csv', index_col='country')
+data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007',
+              s=data_all['pop_2007']/1e6)
+
+

Using online help and other resources, explain what each argument to +plot does.

+
+
+
+
+
+ +
+
+
More Correlations Solution

A good place to look is the documentation for the plot function - +help(data_all.plot).

+

kind - As seen already this determines the kind of plot to be +drawn.

+

x and y - A column name or index that determines what data will be +placed on the x and y axes of the plot

+

s - Details for this can be found in the documentation of +plt.scatter. A single number or one value for each data point. +Determines the size of the plotted points.

+
+
+
+
+
+
+ +
+
+

Saving your plot to a file

+
+

If you are satisfied with the plot you see you may want to save it to +a file, perhaps to include it in a publication. There is a function in +the matplotlib.pyplot module that accomplishes this: savefig. +Calling this function, e.g. with

+
+

PYTHON +

+
plt.savefig('my_figure.png')
+
+

will save the current figure to the file my_figure.png. +The file format will automatically be deduced from the file name +extension (other formats are pdf, ps, eps and svg).

+

Note that functions in plt refer to a global figure +variable and after a figure has been displayed to the screen (e.g. with +plt.show) matplotlib will make this variable refer to a new +empty figure. Therefore, make sure you call plt.savefig +before the plot is displayed to the screen, otherwise you may find a +file with an empty plot.

+

When using dataframes, data is often generated and plotted to screen +in one line. In addition to using plt.savefig, we can save +a reference to the current figure in a local variable (with +plt.gcf) and call the savefig class method +from that variable to save the figure to file.

+
+

PYTHON +

+
data.plot(kind='bar')
+fig = plt.gcf() # get current figure
+fig.savefig('my_figure.png')
+
+
+
+
+
+
+ +
+
+

Making your plots accessible

+
+

Whenever you are generating plots to go into a paper or a +presentation, there are a few things you can do to make sure that +everyone can understand your plots.

+
    +
  • Always make sure your text is large enough to read. Use the +fontsize parameter in xlabel, +ylabel, title, and legend, and tick_params +with labelsize to increase the text size of the numbers +on your axes.
  • +
  • Similarly, you should make your graph elements easy to see. Use +s to increase the size of your scatterplot markers and +linewidth to increase the sizes of your plot lines.
  • +
  • Using color (and nothing else) to distinguish between different plot +elements will make your plots unreadable to anyone who is colorblind, or +who happens to have a black-and-white office printer. For lines, the +linestyle parameter lets you use different types of lines. +For scatterplots, marker lets you change the shape of your +points. If you’re unsure about your colors, you can use Coblis +or Color Oracle to simulate what +your plots would look like to those with colorblindness.
  • +
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • +matplotlib is the +most widely used scientific plotting library in Python.
  • +
  • Plot data directly from a Pandas dataframe.
  • +
  • Select and transform data, then plot it.
  • +
  • Many styles of plot are available: see the Python Graph +Gallery for more options.
  • +
  • Can plot many sets of data together.
  • +
+
+
+
+

Content from Lunch

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 0 minutes

+
+ +
+

Over lunch, reflect on and discuss the following:

+
    +
  • What sort of packages might you use in Python and why would you use +them?
  • +
  • How would data need to be formatted to be used in Pandas data +frames? Would the data you have meet these requirements?
  • +
  • What limitations or problems might you run into when thinking about +how to apply what we’ve learned to your own projects or data?
  • +

Content from Lists

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 20 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I store multiple values?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain why programs need collections of values.
  • +
  • Write programs that create flat lists, index them, slice them, and +modify them through assignment and method calls.
  • +
+
+
+
+
+
+

A list stores many values in a single structure. +

+
+
    +
  • Doing calculations with a hundred variables called +pressure_001, pressure_002, etc., would be at +least as slow as doing them by hand.
  • +
  • Use a list to store many values together. +
      +
    • Contained within square brackets [...].
    • +
    • Values separated by commas ,.
    • +
    +
  • +
  • Use len to find out how many values are in a list.
  • +
+
+

PYTHON +

+
pressures = [0.273, 0.275, 0.277, 0.275, 0.276]
+print('pressures:', pressures)
+print('length:', len(pressures))
+
+
+

OUTPUT +

+
pressures: [0.273, 0.275, 0.277, 0.275, 0.276]
+length: 5
+
+

Use an item’s index to fetch it from a list. +

+
+
    +
  • Just like strings.
  • +
+
+

PYTHON +

+
print('zeroth item of pressures:', pressures[0])
+print('fourth item of pressures:', pressures[4])
+
+
+

OUTPUT +

+
zeroth item of pressures: 0.273
+fourth item of pressures: 0.276
+
+

Lists’ values can be replaced by assigning to them. +

+
+
    +
  • Use an index expression on the left of assignment to replace a +value.
  • +
+
+

PYTHON +

+
pressures[0] = 0.265
+print('pressures is now:', pressures)
+
+
+

OUTPUT +

+
pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]
+
+

Appending items to a list lengthens it. +

+
+
    +
  • Use list_name.append to add items to the end of a +list.
  • +
+
+

PYTHON +

+
primes = [2, 3, 5]
+print('primes is initially:', primes)
+primes.append(7)
+print('primes has become:', primes)
+
+
+

OUTPUT +

+
primes is initially: [2, 3, 5]
+primes has become: [2, 3, 5, 7]
+
+
    +
  • +append is a method of lists. +
      +
    • Like a function, but tied to a particular object.
    • +
    +
  • +
  • Use object_name.method_name to call methods. +
      +
    • Deliberately resembles the way we refer to things in a library.
    • +
    +
  • +
  • We will meet other methods of lists as we go along. +
      +
    • Use help(list) for a preview.
    • +
    +
  • +
  • +extend is similar to append, but it allows +you to combine two lists. For example:
  • +
+
+

PYTHON +

+
teen_primes = [11, 13, 17, 19]
+middle_aged_primes = [37, 41, 43, 47]
+print('primes is currently:', primes)
+primes.extend(teen_primes)
+print('primes has now become:', primes)
+primes.append(middle_aged_primes)
+print('primes has finally become:', primes)
+
+
+

OUTPUT +

+
primes is currently: [2, 3, 5, 7]
+primes has now become: [2, 3, 5, 7, 11, 13, 17, 19]
+primes has finally become: [2, 3, 5, 7, 11, 13, 17, 19, [37, 41, 43, 47]]
+
+

Note that while extend maintains the “flat” structure of +the list, appending a list to a list means the last element in +primes will itself be a list, not an integer. Lists can +contain values of any type; therefore, lists of lists are possible.

+

Use del to remove items from a list entirely. +

+
+
    +
  • We use del list_name[index] to remove an element from a +list (in the example, 9 is not a prime number) and thus shorten it.
  • +
  • +del is not a function or a method, but a statement in +the language.
  • +
+
+

PYTHON +

+
primes = [2, 3, 5, 7, 9]
+print('primes before removing last item:', primes)
+del primes[4]
+print('primes after removing last item:', primes)
+
+
+

OUTPUT +

+
primes before removing last item: [2, 3, 5, 7, 9]
+primes after removing last item: [2, 3, 5, 7]
+
+

The empty list contains no values. +

+
+
    +
  • Use [] on its own to represent a list that doesn’t +contain any values. +
      +
    • “The zero of lists.”
    • +
    +
  • +
  • Helpful as a starting point for collecting values (which we will see +in the next episode).
  • +

Lists may contain values of different types. +

+
+
    +
  • A single list may contain numbers, strings, and anything else.
  • +
+
+

PYTHON +

+
goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']
+
+

Character strings can be indexed like lists. +

+
+
    +
  • Get single characters from a character string using indexes in +square brackets.
  • +
+
+

PYTHON +

+
element = 'carbon'
+print('zeroth character:', element[0])
+print('third character:', element[3])
+
+
+

OUTPUT +

+
zeroth character: c
+third character: b
+
+

Character strings are immutable. +

+
+
    +
  • Cannot change the characters in a string after it has been created. +
      +
    • +Immutable: can’t be changed after creation.
    • +
    • In contrast, lists are mutable: they can be modified in +place.
    • +
    +
  • +
  • Python considers the string to be a single value with parts, not a +collection of values.
  • +
+
+

PYTHON +

+
element[0] = 'C'
+
+
+

ERROR +

+
TypeError: 'str' object does not support item assignment
+
+
    +
  • Lists and character strings are both collections.
  • +

Indexing beyond the end of the collection is an error. +

+
+
    +
  • Python reports an IndexError if we attempt to access a +value that doesn’t exist. +
      +
    • This is a kind of runtime error.
    • +
    • Cannot be detected as the code is parsed because the index might be +calculated based on data.
    • +
    +
  • +
+
+

PYTHON +

+
print('99th element of element is:', element[99])
+
+
+

OUTPUT +

+
IndexError: string index out of range
+
+
+
+ +
+
+

Fill in the Blanks

+
+

Fill in the blanks so that the program below produces the output +shown.

+
+

PYTHON +

+
values = ____
+values.____(1)
+values.____(3)
+values.____(5)
+print('first time:', values)
+values = values[____]
+print('second time:', values)
+
+
+

OUTPUT +

+
first time: [1, 3, 5]
+second time: [3, 5]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
values = []
+values.append(1)
+values.append(3)
+values.append(5)
+print('first time:', values)
+values = values[1:]
+print('second time:', values)
+
+
+
+
+
+
+
+ +
+
+

How Large is a Slice?

+
+

If start and stop are both non-negative +integers, how long is the list values[start:stop]?

+
+
+
+
+
+ +
+
+

The list values[start:stop] has up to +stop - start elements. For example, +values[1:4] has the 3 elements values[1], +values[2], and values[3]. Why ‘up to’? As we +saw in episode 2, if stop +is greater than the total length of the list values, we +will still get a list back but it will be shorter than expected.

+
+
+
+
+
+
+ +
+
+

From Strings to Lists and Back

+
+

Given this:

+
+

PYTHON +

+
print('string to list:', list('tin'))
+print('list to string:', ''.join(['g', 'o', 'l', 'd']))
+
+
+

OUTPUT +

+
string to list: ['t', 'i', 'n']
+list to string: gold
+
+
    +
  1. What does list('some string') do?
  2. +
  3. What does '-'.join(['x', 'y', 'z']) generate?
  4. +
+
+
+
+
+
+ +
+
+
    +
  1. +list('some string') +converts a string into a list containing all of its characters.
  2. +
  3. +join +returns a string that is the concatenation of each string +element in the list and adds the separator between each element in the +list. This results in x-y-z. The separator between the +elements is the string that provides this method.
  4. +
+
+
+
+
+
+
+ +
+
+

Working With the End

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'helium'
+print(element[-1])
+
+
    +
  1. How does Python interpret a negative index?
  2. +
  3. If a list or string has N elements, what is the most negative index +that can safely be used with it, and what location does that index +represent?
  4. +
  5. If values is a list, what does +del values[-1] do?
  6. +
  7. How can you display all elements but the last one without changing +values? (Hint: you will need to combine slicing and +negative indexing.)
  8. +
+
+
+
+
+
+ +
+
+

The program prints m.

+
    +
  1. Python interprets a negative index as starting from the end (as +opposed to starting from the beginning). The last element is +-1.
  2. +
  3. The last index that can safely be used with a list of N elements is +element -N, which represents the first element.
  4. +
  5. +del values[-1] removes the last element from the +list.
  6. +
  7. values[:-1]
  8. +
+
+
+
+
+
+
+ +
+
+

Stepping Through a List

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'fluorine'
+print(element[::2])
+print(element[::-1])
+
+
    +
  1. If we write a slice as low:high:stride, what does +stride do?
  2. +
  3. What expression would select all of the even-numbered items from a +collection?
  4. +
+
+
+
+
+
+ +
+
+

The program prints

+
+

PYTHON +

+
furn
+eniroulf
+
+
    +
  1. +stride is the step size of the slice.
  2. +
  3. The slice 1::2 selects all even-numbered items from a +collection: it starts with element 1 (which is the second +element, since indexing starts at 0), goes on until the end +(since no end is given), and uses a step size of +2 (i.e., selects every second element).
  4. +
+
+
+
+
+
+
+ +
+
+

Slice Bounds

+
+

What does the following program print?

+
+

PYTHON +

+
element = 'lithium'
+print(element[0:20])
+print(element[-1:3])
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
lithium
+
+

The first statement prints the whole string, since the slice goes +beyond the total length of the string. The second statement returns an +empty string, because the slice goes “out of bounds” of the string.

+
+
+
+
+
+
+ +
+
+

Sort and Sorted

+
+

What do these two programs print? In simple terms, explain the +difference between sorted(letters) and +letters.sort().

+
+

PYTHON +

+
# Program A
+letters = list('gold')
+result = sorted(letters)
+print('letters is', letters, 'and result is', result)
+
+
+

PYTHON +

+
# Program B
+letters = list('gold')
+result = letters.sort()
+print('letters is', letters, 'and result is', result)
+
+
+
+
+
+
+ +
+
+

Program A prints

+
+

OUTPUT +

+
letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']
+
+

Program B prints

+
+

OUTPUT +

+
letters is ['d', 'g', 'l', 'o'] and result is None
+
+

sorted(letters) returns a sorted copy of the list +letters (the original list letters remains +unchanged), while letters.sort() sorts the list +letters in-place and does not return anything.

+
+
+
+
+
+
+ +
+
+

Copying (or Not)

+
+

What do these two programs print? In simple terms, explain the +difference between new = old and +new = old[:].

+
+

PYTHON +

+
# Program A
+old = list('gold')
+new = old      # simple assignment
+new[0] = 'D'
+print('new is', new, 'and old is', old)
+
+
+

PYTHON +

+
# Program B
+old = list('gold')
+new = old[:]   # assigning a slice
+new[0] = 'D'
+print('new is', new, 'and old is', old)
+
+
+
+
+
+
+ +
+
+

Program A prints

+
+

OUTPUT +

+
new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']
+
+

Program B prints

+
+

OUTPUT +

+
new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']
+
+

new = old makes new a reference to the list +old; new and old point towards +the same object.

+

new = old[:] however creates a new list object +new containing all elements from the list old; +new and old are different objects.

+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • A list stores many values in a single structure.
  • +
  • Use an item’s index to fetch it from a list.
  • +
  • Lists’ values can be replaced by assigning to them.
  • +
  • Appending items to a list lengthens it.
  • +
  • Use del to remove items from a list entirely.
  • +
  • The empty list contains no values.
  • +
  • Lists may contain values of different types.
  • +
  • Character strings can be indexed like lists.
  • +
  • Character strings are immutable.
  • +
  • Indexing beyond the end of the collection is an error.
  • +
+
+
+
+

Content from For Loops

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 25 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I make a program do many things?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain what for loops are normally used for.
  • +
  • Trace the execution of a simple (unnested) loop and correctly state +the values of variables in each iteration.
  • +
  • Write for loops that use the Accumulator pattern to aggregate +values.
  • +
+
+
+
+
+
+

A for loop executes commands once for each value in a +collection. +

+
+
    +
  • Doing calculations on the values in a list one by one is as painful +as working with pressure_001, pressure_002, +etc.
  • +
  • A for loop tells Python to execute some statements once for +each value in a list, a character string, or some other collection.
  • +
  • “for each thing in this group, do these operations”
  • +
+
+

PYTHON +

+
for number in [2, 3, 5]:
+    print(number)
+
+
    +
  • This for loop is equivalent to:
  • +
+
+

PYTHON +

+
print(2)
+print(3)
+print(5)
+
+
    +
  • And the for loop’s output is:
  • +
+
+

OUTPUT +

+
2
+3
+5
+
+

A for loop is made up of a collection, a loop variable, +and a body. +

+
+
+

PYTHON +

+
for number in [2, 3, 5]:
+    print(number)
+
+
    +
  • The collection, [2, 3, 5], is what the loop is being +run on.
  • +
  • The body, print(number), specifies what to do for each +value in the collection.
  • +
  • The loop variable, number, is what changes for each +iteration of the loop. +
      +
    • The “current thing”.
    • +
    +
  • +

The first line of the for loop must end with a colon, +and the body must be indented. +

+
+
    +
  • The colon at the end of the first line signals the start of a +block of statements.
  • +
  • Python uses indentation rather than {} or +begin/end to show nesting. +
      +
    • Any consistent indentation is legal, but almost everyone uses four +spaces.
    • +
    +
  • +
+
+

PYTHON +

+
for number in [2, 3, 5]:
+print(number)
+
+
+

ERROR +

+
IndentationError: expected an indented block
+
+
    +
  • Indentation is always meaningful in Python.
  • +
+
+

PYTHON +

+
firstName = "Jon"
+  lastName = "Smith"
+
+
+

ERROR +

+
  File "<ipython-input-7-f65f2962bf9c>", line 2
+    lastName = "Smith"
+    ^
+IndentationError: unexpected indent
+
+
    +
  • This error can be fixed by removing the extra spaces at the +beginning of the second line.
  • +

Loop variables can be called anything. +

+
+
    +
  • As with all variables, loop variables are: +
      +
    • Created on demand.
    • +
    • Meaningless: their names can be anything at all.
    • +
    +
  • +
+
+

PYTHON +

+
for kitten in [2, 3, 5]:
+    print(kitten)
+
+

The body of a loop can contain many statements. +

+
+
    +
  • But no loop should be more than a few lines long.
  • +
  • Hard for human beings to keep larger chunks of code in mind.
  • +
+
+

PYTHON +

+
primes = [2, 3, 5]
+for p in primes:
+    squared = p ** 2
+    cubed = p ** 3
+    print(p, squared, cubed)
+
+
+

OUTPUT +

+
2 4 8
+3 9 27
+5 25 125
+
+

Use range to iterate over a sequence of numbers. +

+
+
    +
  • The built-in function range +produces a sequence of numbers. +
      +
    • +Not a list: the numbers are produced on demand to make +looping over large ranges more efficient.
    • +
    +
  • +
  • +range(N) is the numbers 0..N-1 +
      +
    • Exactly the legal indices of a list or character string of length +N
    • +
    +
  • +
+
+

PYTHON +

+
print('a range is not a list: range(0, 3)')
+for number in range(0, 3):
+    print(number)
+
+
+

OUTPUT +

+
a range is not a list: range(0, 3)
+0
+1
+2
+
+

The Accumulator pattern turns many values into one. +

+
+
    +
  • A common pattern in programs is to: +
      +
    1. Initialize an accumulator variable to zero, the empty +string, or the empty list.
    2. +
    3. Update the variable with values from a collection.
    4. +
    +
  • +
+
+

PYTHON +

+
# Sum the first 10 integers.
+total = 0
+for number in range(10):
+   total = total + (number + 1)
+print(total)
+
+
+

OUTPUT +

+
55
+
+
    +
  • Read total = total + (number + 1) as: +
      +
    • Add 1 to the current value of the loop variable +number.
    • +
    • Add that to the current value of the accumulator variable +total.
    • +
    • Assign that to total, replacing the current value.
    • +
    +
  • +
  • We have to add number + 1 because range +produces 0..9, not 1..10.
  • +
+
+
+ +
+
+

Classifying Errors

+
+

Is an indentation error a syntax error or a runtime error?

+
+
+
+
+
+ +
+
+

An IndentationError is a syntax error. Programs with syntax errors +cannot be started. A program with a runtime error will start but an +error will be thrown under certain conditions.

+
+
+
+
+
+
+ +
+
+

Tracing Execution

+
+

Create a table showing the numbers of the lines that are executed +when this program runs, and the values of the variables after each line +is executed.

+
+

PYTHON +

+
total = 0
+for char in "tin":
+    total = total + 1
+
+
+
+
+
+
+ +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Line noVariables
1total = 0
2total = 0 char = ‘t’
3total = 1 char = ‘t’
2total = 1 char = ‘i’
3total = 2 char = ‘i’
2total = 2 char = ‘n’
3total = 3 char = ‘n’
+
+
+
+
+
+
+ +
+
+

Reversing a String

+
+

Fill in the blanks in the program below so that it prints “nit” (the +reverse of the original character string “tin”).

+
+

PYTHON +

+
original = "tin"
+result = ____
+for char in original:
+    result = ____
+print(result)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
original = "tin"
+result = ""
+for char in original:
+    result = char + result
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating

+
+

Fill in the blanks in each of the programs below to produce the +indicated result.

+
+

PYTHON +

+
# Total length of the strings in the list: ["red", "green", "blue"] => 12
+total = 0
+for word in ["red", "green", "blue"]:
+    ____ = ____ + len(word)
+print(total)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
total = 0
+for word in ["red", "green", "blue"]:
+    total = total + len(word)
+print(total)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+
+

PYTHON +

+
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
+lengths = ____
+for word in ["red", "green", "blue"]:
+    lengths.____(____)
+print(lengths)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
lengths = []
+for word in ["red", "green", "blue"]:
+    lengths.append(len(word))
+print(lengths)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+
+

PYTHON +

+
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
+words = ["red", "green", "blue"]
+result = ____
+for ____ in ____:
+    ____
+print(result)
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
words = ["red", "green", "blue"]
+result = ""
+for word in words:
+    result = result + word
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Practice Accumulating +(continued) +

+
+

Create an acronym: Starting from the list +["red", "green", "blue"], create the acronym +"RGB" using a for loop.

+

Hint: You may need to use a string method to +properly format the acronym.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
acronym = ""
+for word in ["red", "green", "blue"]:
+    acronym = acronym + word[0].upper()
+print(acronym)
+
+
+
+
+
+
+
+ +
+
+

Cumulative Sum

+
+

Reorder and properly indent the lines of code below so that they +print a list with the cumulative sum of data. The result should be +[1, 3, 5, 10].

+
+

PYTHON +

+
cumulative.append(total)
+for number in data:
+cumulative = []
+total = total + number
+total = 0
+print(cumulative)
+data = [1,2,2,5]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
total = 0
+data = [1,2,2,5]
+cumulative = []
+for number in data:
+    total = total + number
+    cumulative.append(total)
+print(cumulative)
+
+
+
+
+
+
+
+ +
+
+

Identifying Variable Name Errors

+
+
    +
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code and read the error message. What type of +NameError do you think this is? Is it a string with no +quotes, a misspelled variable, or a variable that should have been +defined but was not?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+
+

PYTHON +

+
for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (Number % 3) == 0:
+        message = message + a
+    else:
+        message = message + "b"
+print(message)
+
+
+
+
+
+
+ +
+
+
    +
  • Python variable names are case sensitive: number and +Number refer to different variables.
  • +
  • The variable message needs to be initialized as an +empty string.
  • +
  • We want to add the string "a" to message, +not the undefined variable a.
  • +
+
+

PYTHON +

+
message = ""
+for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (number % 3) == 0:
+        message = message + "a"
+    else:
+        message = message + "b"
+print(message)
+
+
+
+
+
+
+
+ +
+
+

Identifying Item Errors

+
+
    +
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
  5. Fix the error.
  6. +
+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+ +
+
+

This list has 4 elements and the index to access the last element in +the list is 3.

+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[3])
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • A for loop executes commands once for each value in a +collection.
  • +
  • A for loop is made up of a collection, a loop variable, +and a body.
  • +
  • The first line of the for loop must end with a colon, +and the body must be indented.
  • +
  • Indentation is always meaningful in Python.
  • +
  • Loop variables can be called anything (but it is strongly advised to +have a meaningful name to the looping variable).
  • +
  • The body of a loop can contain many statements.
  • +
  • Use range to iterate over a sequence of numbers.
  • +
  • The Accumulator pattern turns many values into one.
  • +
+
+
+
+

Content from Conditionals

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 25 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can programs do different things for different data?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Correctly write programs that use if and else statements and simple +Boolean expressions (without logical operators).
  • +
  • Trace the execution of unnested conditionals and conditionals inside +loops.
  • +
+
+
+
+
+
+

Use if statements to control whether or not a block of +code is executed. +

+
+
    +
  • An if statement (more properly called a +conditional statement) controls whether some block of code is +executed or not.
  • +
  • Structure is similar to a for statement: +
      +
    • First line opens with if and ends with a colon
    • +
    • Body containing one or more statements is indented (usually by 4 +spaces)
    • +
    +
  • +
+
+

PYTHON +

+
mass = 3.54
+if mass > 3.0:
+    print(mass, 'is large')
+
+mass = 2.07
+if mass > 3.0:
+    print (mass, 'is large')
+
+
+

OUTPUT +

+
3.54 is large
+
+

Conditionals are often used inside loops. +

+
+
    +
  • Not much point using a conditional when we know the value (as +above).
  • +
  • But useful when we have a collection to process.
  • +
+
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 3.0:
+        print(m, 'is large')
+
+
+

OUTPUT +

+
3.54 is large
+9.22 is large
+
+

Use else to execute a block of code when an +if condition is not true. +

+
+
    +
  • +else can be used following an if.
  • +
  • Allows us to specify an alternative to execute when the +if branch isn’t taken.
  • +
+
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 3.0:
+        print(m, 'is large')
+    else:
+        print(m, 'is small')
+
+
+

OUTPUT +

+
3.54 is large
+2.07 is small
+9.22 is large
+1.86 is small
+1.71 is small
+
+

Use elif to specify additional tests. +

+
+
    +
  • May want to provide several alternative choices, each with its own +test.
  • +
  • Use elif (short for “else if”) and a condition to +specify these.
  • +
  • Always associated with an if.
  • +
  • Must come before the else (which is the “catch +all”).
  • +
+
+

PYTHON +

+
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
+for m in masses:
+    if m > 9.0:
+        print(m, 'is HUGE')
+    elif m > 3.0:
+        print(m, 'is large')
+    else:
+        print(m, 'is small')
+
+
+

OUTPUT +

+
3.54 is large
+2.07 is small
+9.22 is HUGE
+1.86 is small
+1.71 is small
+
+

Conditions are tested once, in order. +

+
+
    +
  • Python steps through the branches of the conditional in order, +testing each in turn.
  • +
  • So ordering matters.
  • +
+
+

PYTHON +

+
grade = 85
+if grade >= 90:
+    print('grade is A')
+elif grade >= 80:
+    print('grade is B')
+elif grade >= 70:
+    print('grade is C')
+
+
+

OUTPUT +

+
grade is B
+
+
    +
  • Does not automatically go back and re-evaluate if values +change.
  • +
+
+

PYTHON +

+
velocity = 10.0
+if velocity > 20.0:
+    print('moving too fast')
+else:
+    print('adjusting velocity')
+    velocity = 50.0
+
+
+

OUTPUT +

+
adjusting velocity
+
+
    +
  • Often use conditionals in a loop to “evolve” the values of +variables.
  • +
+
+

PYTHON +

+
velocity = 10.0
+for i in range(5): # execute the loop 5 times
+    print(i, ':', velocity)
+    if velocity > 20.0:
+        print('moving too fast')
+        velocity = velocity - 5.0
+    else:
+        print('moving too slow')
+        velocity = velocity + 10.0
+print('final velocity:', velocity)
+
+
+

OUTPUT +

+
0 : 10.0
+moving too slow
+1 : 20.0
+moving too slow
+2 : 30.0
+moving too fast
+3 : 25.0
+moving too fast
+4 : 20.0
+moving too slow
+final velocity: 30.0
+
+

Create a table showing variables’ values to trace a program’s +execution. +

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + +
+i + +0 + +. + +1 + +. + +2 + +. + +3 + +. + +4 + +. +
+velocity + +10.0 + +20.0 + +. + +30.0 + +. + +25.0 + +. + +20.0 + +. + +30.0 +
+
    +
  • The program must have a print statement +outside the body of the loop to show the final value of +velocity, since its value is updated by the last iteration +of the loop.
  • +
+
+
+ +
+
+

Compound Relations Using and, +or, and Parentheses

+
+

Often, you want some combination of things to be true. You can +combine relations within a conditional using and and +or. Continuing the example above, suppose you have

+
+

PYTHON +

+
mass     = [ 3.54,  2.07,  9.22,  1.86,  1.71]
+velocity = [10.00, 20.00, 30.00, 25.00, 20.00]
+
+i = 0
+for i in range(5):
+    if mass[i] > 5 and velocity[i] > 20:
+        print("Fast heavy object.  Duck!")
+    elif mass[i] > 2 and mass[i] <= 5 and velocity[i] <= 20:
+        print("Normal traffic")
+    elif mass[i] <= 2 and velocity[i] <= 20:
+        print("Slow light object.  Ignore it")
+    else:
+        print("Whoa!  Something is up with the data.  Check it")
+
+

Just like with arithmetic, you can and should use parentheses +whenever there is possible ambiguity. A good general rule is to +always use parentheses when mixing and and +or in the same condition. That is, instead of:

+
+

PYTHON +

+
if mass[i] <= 2 or mass[i] >= 5 and velocity[i] > 20:
+
+

write one of these:

+
+

PYTHON +

+
if (mass[i] <= 2 or mass[i] >= 5) and velocity[i] > 20:
+if mass[i] <= 2 or (mass[i] >= 5 and velocity[i] > 20):
+
+

so it is perfectly clear to a reader (and to Python) what you really +mean.

+
+
+
+
+
+ +
+
+

Tracing Execution

+
+

What does this program print?

+
+

PYTHON +

+
pressure = 71.9
+if pressure > 50.0:
+    pressure = 25.0
+elif pressure <= 50.0:
+    pressure = 0.0
+print(pressure)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
25.0
+
+
+
+
+
+
+
+ +
+
+

Trimming Values

+
+

Fill in the blanks so that this program creates a new list containing +zeroes where the original list’s values were negative and ones where the +original list’s values were positive.

+
+

PYTHON +

+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = ____
+for value in original:
+    if ____:
+        result.append(0)
+    else:
+        ____
+print(result)
+
+
+

OUTPUT +

+
[0, 1, 1, 1, 0, 1]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
+result = []
+for value in original:
+    if value < 0.0:
+        result.append(0)
+    else:
+        result.append(1)
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Processing Small Files

+
+

Modify this program so that it only processes files with fewer than +50 records.

+
+

PYTHON +

+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+    contents = pd.read_csv(filename)
+    ____:
+        print(filename, len(contents))
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import glob
+import pandas as pd
+for filename in glob.glob('data/*.csv'):
+    contents = pd.read_csv(filename)
+    if len(contents) < 50:
+        print(filename, len(contents))
+
+
+
+
+
+
+
+ +
+
+

Initializing

+
+

Modify this program so that it finds the largest and smallest values +in the list no matter what the range of values originally is.

+
+

PYTHON +

+
values = [...some test data...]
+smallest, largest = None, None
+for v in values:
+    if ____:
+        smallest, largest = v, v
+    ____:
+        smallest = min(____, v)
+        largest = max(____, v)
+print(smallest, largest)
+
+

What are the advantages and disadvantages of using this method to +find the range of the data?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest = None, None
+for v in values:
+    if smallest is None and largest is None:
+        smallest, largest = v, v
+    else:
+        smallest = min(smallest, v)
+        largest = max(largest, v)
+print(smallest, largest)
+
+

If you wrote == None instead of is None, +that works too, but Python programmers always write is None +because of the special way None works in the language.

+

It can be argued that an advantage of using this method would be to +make the code more readable. However, a disadvantage is that this code +is not efficient because within each iteration of the for +loop statement, there are two more loops that run over two numbers each +(the min and max functions). It would be more +efficient to iterate over each number just once:

+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest, largest = None, None
+for v in values:
+    if smallest is None or v < smallest:
+        smallest = v
+    if largest is None or v > largest:
+        largest = v
+print(smallest, largest)
+
+

Now we have one loop, but four comparison tests. There are two ways +we could improve it further: either use fewer comparisons in each +iteration, or use two loops that each contain only one comparison test. +The simplest solution is often the best:

+
+

PYTHON +

+
values = [-2,1,65,78,-54,-24,100]
+smallest = min(values)
+largest = max(values)
+print(smallest, largest)
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use if statements to control whether or not a block of +code is executed.
  • +
  • Conditionals are often used inside loops.
  • +
  • Use else to execute a block of code when an +if condition is not true.
  • +
  • Use elif to specify additional tests.
  • +
  • Conditions are tested once, in order.
  • +
  • Create a table showing variables’ values to trace a program’s +execution.
  • +
+
+
+
+

Content from Looping Over Data Sets

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 15 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I process many data sets with a single command?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Be able to read and write globbing expressions that match sets of +files.
  • +
  • Use glob to create lists of files.
  • +
  • Write for loops to perform operations on files given their names in +a list.
  • +
+
+
+
+
+
+

Use a for loop to process files given a list of their +names. +

+
+
    +
  • A filename is a character string.
  • +
  • And lists can contain character strings.
  • +
+
+

PYTHON +

+
import pandas as pd
+for filename in ['data/gapminder_gdp_africa.csv', 'data/gapminder_gdp_asia.csv']:
+    data = pd.read_csv(filename, index_col='country')
+    print(filename, data.min())
+
+
+

OUTPUT +

+
data/gapminder_gdp_africa.csv gdpPercap_1952    298.846212
+gdpPercap_1957    335.997115
+gdpPercap_1962    355.203227
+gdpPercap_1967    412.977514
+⋮ ⋮ ⋮
+gdpPercap_1997    312.188423
+gdpPercap_2002    241.165877
+gdpPercap_2007    277.551859
+dtype: float64
+data/gapminder_gdp_asia.csv gdpPercap_1952    331
+gdpPercap_1957    350
+gdpPercap_1962    388
+gdpPercap_1967    349
+⋮ ⋮ ⋮
+gdpPercap_1997    415
+gdpPercap_2002    611
+gdpPercap_2007    944
+dtype: float64
+
+

Use glob.glob +to find sets of files whose names match a pattern. +

+
+
    +
  • In Unix, the term “globbing” means “matching a set of files with a +pattern”.
  • +
  • The most common patterns are: +
      +
    • +* meaning “match zero or more characters”
    • +
    • +? meaning “match exactly one character”
    • +
    +
  • +
  • Python’s standard library contains the glob +module to provide pattern matching functionality
  • +
  • The glob +module contains a function also called glob to match file +patterns
  • +
  • E.g., glob.glob('*.txt') matches all files in the +current directory whose names end with .txt.
  • +
  • Result is a (possibly empty) list of character strings.
  • +
+
+

PYTHON +

+
import glob
+print('all csv files in data directory:', glob.glob('data/*.csv'))
+
+
+

OUTPUT +

+
all csv files in data directory: ['data/gapminder_all.csv', 'data/gapminder_gdp_africa.csv', \
+'data/gapminder_gdp_americas.csv', 'data/gapminder_gdp_asia.csv', 'data/gapminder_gdp_europe.csv', \
+'data/gapminder_gdp_oceania.csv']
+
+
+

PYTHON +

+
print('all PDB files:', glob.glob('*.pdb'))
+
+
+

OUTPUT +

+
all PDB files: []
+
+

Use glob and for to process batches of +files. +

+
+
    +
  • Helps a lot if the files are named and stored systematically and +consistently so that simple patterns will find the right data.
  • +
+
+

PYTHON +

+
for filename in glob.glob('data/gapminder_*.csv'):
+    data = pd.read_csv(filename)
+    print(filename, data['gdpPercap_1952'].min())
+
+
+

OUTPUT +

+
data/gapminder_all.csv 298.8462121
+data/gapminder_gdp_africa.csv 298.8462121
+data/gapminder_gdp_americas.csv 1397.717137
+data/gapminder_gdp_asia.csv 331.0
+data/gapminder_gdp_europe.csv 973.5331948
+data/gapminder_gdp_oceania.csv 10039.59564
+
+
    +
  • This includes all data, as well as per-region data.
  • +
  • Use a more specific pattern in the exercises to exclude the whole +data set.
  • +
  • But note that the minimum of the entire data set is also the minimum +of one of the data sets, which is a nice check on correctness.
  • +
+
+
+ +
+
+

Determining Matches

+
+

Which of these files is not matched by the expression +glob.glob('data/*as*.csv')?

+
    +
  1. data/gapminder_gdp_africa.csv
  2. +
  3. data/gapminder_gdp_americas.csv
  4. +
  5. data/gapminder_gdp_asia.csv
  6. +
+
+
+
+
+
+ +
+
+

1 is not matched by the glob.

+
+
+
+
+
+
+ +
+
+

Minimum File Size

+
+

Modify this program so that it prints the number of records in the +file that has the fewest records.

+
+

PYTHON +

+
import glob
+import pandas as pd
+fewest = ____
+for filename in glob.glob('data/*.csv'):
+    dataframe = pd.____(filename)
+    fewest = min(____, dataframe.shape[0])
+print('smallest file has', fewest, 'records')
+
+

Note that the DataFrame.shape() +method returns a tuple with the number of rows and columns of the +data frame.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import glob
+import pandas as pd
+fewest = float('Inf')
+for filename in glob.glob('data/*.csv'):
+    dataframe = pd.read_csv(filename)
+    fewest = min(fewest, dataframe.shape[0])
+print('smallest file has', fewest, 'records')
+
+

You might have chosen to initialize the fewest variable +with a number greater than the numbers you’re dealing with, but that +could lead to trouble if you reuse the code with bigger numbers. Python +lets you use positive infinity, which will work no matter how big your +numbers are. What other special strings does the float +function recognize?

+
+
+
+
+
+
+ +
+
+

Comparing Data

+
+

Write a program that reads in the regional data sets and plots the +average GDP per capita for each region over time in a single chart. +Pandas will raise an error if it encounters non-numeric columns in a +dataframe computation so you may need to either filter out those columns +or tell pandas to ignore them.

+
+
+
+
+
+ +
+
+

This solution builds a useful legend by using the string +split method to extract the region from +the path ‘data/gapminder_gdp_a_specific_region.csv’.

+
+

PYTHON +

+
import glob
+import pandas as pd
+import matplotlib.pyplot as plt
+fig, ax = plt.subplots(1,1)
+for filename in glob.glob('data/gapminder_gdp*.csv'):
+    dataframe = pd.read_csv(filename)
+    # extract <region> from the filename, expected to be in the format 'data/gapminder_gdp_<region>.csv'.
+    # we will split the string using the split method and `_` as our separator,
+    # retrieve the last string in the list that split returns (`<region>.csv`), 
+    # and then remove the `.csv` extension from that string.
+    # NOTE: the pathlib module covered in the next callout also offers
+    # convenient abstractions for working with filesystem paths and could solve this as well:
+    # from pathlib import Path
+    # region = Path(filename).stem.split('_')[-1]
+    region = filename.split('_')[-1][:-4] 
+    # pandas raises errors when it encounters non-numeric columns in a dataframe computation
+    # but we can tell pandas to ignore them with the `numeric_only` parameter
+    dataframe.mean(numeric_only=True).plot(ax=ax, label=region)
+    # NOTE: another way of doing this selects just the columns with gdp in their name using the filter method
+    # dataframe.filter(like="gdp").mean().plot(ax=ax, label=region)
+
+plt.legend()
+plt.show()
+
+
+
+
+
+
+
+ +
+
+

Dealing with File Paths

+
+

The pathlib +module provides useful abstractions for file and path manipulation +like returning the name of a file without the file extension. This is +very useful when looping over files and directories. In the example +below, we create a Path object and inspect its +attributes.

+
+

PYTHON +

+
from pathlib import Path
+
+p = Path("data/gapminder_gdp_africa.csv")
+print(p.parent)
+print(p.stem)
+print(p.suffix)
+
+
+

OUTPUT +

+
data
+gapminder_gdp_africa
+.csv
+
+

Hint: Check all available attributes and methods on +the Path object with the dir() function.

+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Use a for loop to process files given a list of their +names.
  • +
  • Use glob.glob to find sets of files whose names match a +pattern.
  • +
  • Use glob and for to process batches of +files.
  • +
+
+
+
+

Content from Afternoon Coffee

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 0 minutes

+
+ +
+

Reflection exercise +

+
+

Over break, reflect on and discuss the following:

+
    +
  • A common refrain in software engineering is “Don’t Repeat Yourself”. +How do the techniques we’ve learned in the last lessons help us avoid +repeating ourselves? Note that in practice there is some nuance to +this and should be balanced with doing the simplest thing that could +possibly work. +
  • +
  • What are the pros / cons of making a variable global or local to a +function?
  • +
  • When would you consider turning a block of code into a function +definition?
  • +

Content from Writing Functions

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 25 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I create my own functions?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain and identify the difference between function definition and +function call.
  • +
  • Write a function that takes a small, fixed number of arguments and +produces a single result.
  • +
+
+
+
+
+
+

Break programs down into functions to make them easier to +understand. +

+
+
    +
  • Human beings can only keep a few items in working memory at a +time.
  • +
  • Understand larger/more complicated ideas by understanding and +combining pieces. +
      +
    • Components in a machine.
    • +
    • Lemmas when proving theorems.
    • +
    +
  • +
  • Functions serve the same purpose in programs. +
      +
    • +Encapsulate complexity so that we can treat it as a single +“thing”.
    • +
    +
  • +
  • Also enables re-use. +
      +
    • Write one time, use many times.
    • +
    +
  • +

Define a function using def with a name, parameters, +and a block of code. +

+
+
    +
  • Begin the definition of a new function with def.
  • +
  • Followed by the name of the function. +
      +
    • Must obey the same rules as variable names.
    • +
    +
  • +
  • Then parameters in parentheses. +
      +
    • Empty parentheses if the function doesn’t take any inputs.
    • +
    • We will discuss this in detail in a moment.
    • +
    +
  • +
  • Then a colon.
  • +
  • Then an indented block of code.
  • +
+
+

PYTHON +

+
def print_greeting():
+    print('Hello!')
+    print('The weather is nice today.')
+    print('Right?')
+
+

Defining a function does not run it. +

+
+
    +
  • Defining a function does not run it. +
      +
    • Like assigning a value to a variable.
    • +
    +
  • +
  • Must call the function to execute the code it contains.
  • +
+
+

PYTHON +

+
print_greeting()
+
+
+

OUTPUT +

+
Hello!
+
+

Arguments in a function call are matched to its defined +parameters. +

+
+
    +
  • Functions are most useful when they can operate on different +data.
  • +
  • Specify parameters when defining a function. +
      +
    • These become variables when the function is executed.
    • +
    • Are assigned the arguments in the call (i.e., the values passed to +the function).
    • +
    • If you don’t name the arguments when using them in the call, the +arguments will be matched to parameters in the order the parameters are +defined in the function.
    • +
    +
  • +
+
+

PYTHON +

+
def print_date(year, month, day):
+    joined = str(year) + '/' + str(month) + '/' + str(day)
+    print(joined)
+
+print_date(1871, 3, 19)
+
+
+

OUTPUT +

+
1871/3/19
+
+

Or, we can name the arguments when we call the function, which allows +us to specify them in any order and adds clarity to the call site; +otherwise as one is reading the code they might forget if the second +argument is the month or the day for example.

+
+

PYTHON +

+
print_date(month=3, day=19, year=1871)
+
+
+

OUTPUT +

+
1871/3/19
+
+
    +
  • Via Twitter: +() contains the ingredients for the function while the body +contains the recipe.
  • +

Functions may return a result to their caller using +return. +

+
+
    +
  • Use return ... to give a value back to the caller.
  • +
  • May occur anywhere in the function.
  • +
  • But functions are easier to understand if return +occurs: +
      +
    • At the start to handle special cases.
    • +
    • At the very end, with a final result.
    • +
    +
  • +
+
+

PYTHON +

+
def average(values):
+    if len(values) == 0:
+        return None
+    return sum(values) / len(values)
+
+
+

PYTHON +

+
a = average([1, 3, 4])
+print('average of actual values:', a)
+
+
+

OUTPUT +

+
average of actual values: 2.6666666666666665
+
+
+

PYTHON +

+
print('average of empty list:', average([]))
+
+
+

OUTPUT +

+
average of empty list: None
+
+ +
+

PYTHON +

+
result = print_date(1871, 3, 19)
+print('result of call is:', result)
+
+
+

OUTPUT +

+
1871/3/19
+result of call is: None
+
+
+
+ +
+
+

Identifying Syntax Errors

+
+
    +
  1. Read the code below and try to identify what the errors are +without running it.
  2. +
  3. Run the code and read the error message. Is it a +SyntaxError or an IndentationError?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3 until you have fixed all the errors.
  8. +
+
+

PYTHON +

+
def another_function
+  print("Syntax errors are annoying.")
+   print("But at least python tells us about them!")
+  print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def another_function():
+  print("Syntax errors are annoying.")
+  print("But at least Python tells us about them!")
+  print("So they are usually not too hard to fix.")
+
+
+
+
+
+
+
+ +
+
+

Definition and Use

+
+

What does the following program print?

+
+

PYTHON +

+
def report(pressure):
+    print('pressure is', pressure)
+
+print('calling', report, 22.5)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
calling <function report at 0x7fd128ff1bf8> 22.5
+
+

A function call always needs parenthesis, otherwise you get memory +address of the function object. So, if we wanted to call the function +named report, and give it the value 22.5 to report on, we could have our +function call as follows

+
+

PYTHON +

+
print("calling")
+report(22.5)
+
+
+

OUTPUT +

+
calling
+pressure is 22.5
+
+
+
+
+
+
+
+ +
+
+

Order of Operations

+
+
    +
  1. What’s wrong in this example?
  2. +
+
+

PYTHON +

+
result = print_time(11, 37, 59)
+
+def print_time(hour, minute, second):
+   time_string = str(hour) + ':' + str(minute) + ':' + str(second)
+   print(time_string)
+
+
    +
  1. After fixing the problem above, explain why running this example +code:
  2. +
+
+

PYTHON +

+
result = print_time(11, 37, 59)
+print('result of call is:', result)
+
+

gives this output:

+
+

OUTPUT +

+
11:37:59
+result of call is: None
+
+
    +
  1. Why is the result of the call None?
  2. +
+
+
+
+
+
+ +
+
+
    +
  1. The problem with the example is that the function +print_time() is defined after the call to the +function is made. Python doesn’t know how to resolve the name +print_time since it hasn’t been defined yet and will raise +a NameError e.g., +NameError: name 'print_time' is not defined

  2. +
  3. The first line of output 11:37:59 is printed by the +first line of code, result = print_time(11, 37, 59) that +binds the value returned by invoking print_time to the +variable result. The second line is from the second print +call to print the contents of the result variable.

  4. +
  5. print_time() does not explicitly return +a value, so it automatically returns None.

  6. +
+
+
+
+
+
+
+ +
+
+

Encapsulation

+
+

Fill in the blanks to create a function that takes a single filename +as an argument, loads the data in the file named by the argument, and +returns the minimum value in that data.

+
+

PYTHON +

+
import pandas as pd
+
+def min_in_data(____):
+    data = ____
+    return ____
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
import pandas as pd
+
+def min_in_data(filename):
+    data = pd.read_csv(filename)
+    return data.min()
+
+
+
+
+
+
+
+ +
+
+

Find the First

+
+

Fill in the blanks to create a function that takes a list of numbers +as an argument and returns the first negative value in the list. What +does your function do if the list is empty? What if the list has no +negative numbers?

+
+

PYTHON +

+
def first_negative(values):
+    for v in ____:
+        if ____:
+            return ____
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def first_negative(values):
+    for v in values:
+        if v < 0:
+            return v
+
+

If an empty list or a list with all positive values is passed to this +function, it returns None:

+
+

PYTHON +

+
my_list = []
+print(first_negative(my_list))
+
+
+

OUTPUT +

+
None
+
+
+
+
+
+
+
+ +
+
+

Calling by Name

+
+

Earlier we saw this function:

+
+

PYTHON +

+
def print_date(year, month, day):
+    joined = str(year) + '/' + str(month) + '/' + str(day)
+    print(joined)
+
+

We saw that we can call the function using named arguments, +like this:

+
+

PYTHON +

+
print_date(day=1, month=2, year=2003)
+
+
    +
  1. What does print_date(day=1, month=2, year=2003) +print?
  2. +
  3. When have you seen a function call like this before?
  4. +
  5. When and why is it useful to call functions this way?
  6. +
+
+
+
+
+
+ +
+
+
    +
  1. 2003/2/1
  2. +
  3. We saw examples of using named arguments when working with +the pandas library. For example, when reading in a dataset using +data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country'), +the last argument index_col is a named argument.
  4. +
  5. Using named arguments can make code more readable since one can see +from the function call what name the different arguments have inside the +function. It can also reduce the chances of passing arguments in the +wrong order, since by using named arguments the order doesn’t +matter.
  6. +
+
+
+
+
+
+
+ +
+
+

Encapsulation of an If/Print Block

+
+

The code below will run on a label-printer for chicken eggs. A +digital scale will report a chicken egg mass (in grams) to the computer +and then the computer will print a label.

+
+

PYTHON +

+
import random
+for i in range(10):
+
+    # simulating the mass of a chicken egg
+    # the (random) mass will be 70 +/- 20 grams
+    mass = 70 + 20.0 * (2.0 * random.random() - 1.0)
+
+    print(mass)
+
+    # egg sizing machinery prints a label
+    if mass >= 85:
+        print("jumbo")
+    elif mass >= 70:
+        print("large")
+    elif mass < 70 and mass >= 55:
+        print("medium")
+    else:
+        print("small")
+
+

The if-block that classifies the eggs might be useful in other +situations, so to avoid repeating it, we could fold it into a function, +get_egg_label(). Revising the program to use the function +would give us this:

+
+

PYTHON +

+
# revised version
+import random
+for i in range(10):
+
+    # simulating the mass of a chicken egg
+    # the (random) mass will be 70 +/- 20 grams
+    mass = 70 + 20.0 * (2.0 * random.random() - 1.0)
+
+    print(mass, get_egg_label(mass))
+
+
    +
  1. Create a function definition for get_egg_label() that +will work with the revised program above. Note that the +get_egg_label() function’s return value will be important. +Sample output from the above program would be +71.23 large.
  2. +
  3. A dirty egg might have a mass of more than 90 grams, and a spoiled +or broken egg will probably have a mass that’s less than 50 grams. +Modify your get_egg_label() function to account for these +error conditions. Sample output could be +25 too light, probably spoiled.
  4. +
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def get_egg_label(mass):
+    # egg sizing machinery prints a label
+    egg_label = "Unlabelled"
+    if mass >= 90:
+        egg_label = "warning: egg might be dirty"
+    elif mass >= 85:
+        egg_label = "jumbo"
+    elif mass >= 70:
+        egg_label = "large"
+    elif mass < 70 and mass >= 55:
+        egg_label = "medium"
+    elif mass < 50:
+        egg_label = "too light, probably spoiled"
+    else:
+        egg_label = "small"
+    return egg_label
+
+
+
+
+
+
+
+ +
+
+

Encapsulating Data Analysis

+
+

Assume that the following code has been executed:

+
+

PYTHON +

+
import pandas as pd
+
+data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col=0)
+japan = data_asia.loc['Japan']
+
+
    +
  1. Complete the statements below to obtain the average GDP for Japan +across the years reported for the 1980s.
  2. +
+
+

PYTHON +

+
year = 1983
+gdp_decade = 'gdpPercap_' + str(year // ____)
+avg = (japan.loc[gdp_decade + ___] + japan.loc[gdp_decade + ___]) / 2
+
+
    +
  1. Abstract the code above into a single function.
  2. +
+
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_'+___+'.csv',delimiter=',',index_col=0)
+    ____
+    ____
+    ____
+    return avg
+
+
    +
  1. How would you generalize this function if you did not know +beforehand which specific years occurred as columns in the data? For +instance, what if we also had data from years ending in 1 and 9 for each +decade? (Hint: use the columns to filter out the ones that correspond to +the decade, instead of enumerating them in the code.)
  2. +
+
+
+
+
+
+ +
+
+
    +
  1. The average GDP for Japan across the years reported for the 1980s is +computed with:
  2. +
+
+

PYTHON +

+
year = 1983
+gdp_decade = 'gdpPercap_' + str(year // 10)
+avg = (japan.loc[gdp_decade + '2'] + japan.loc[gdp_decade + '7']) / 2
+
+
    +
  1. That code as a function is:
  2. +
+
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0)
+    c = data_countries.loc[country]
+    gdp_decade = 'gdpPercap_' + str(year // 10)
+    avg = (c.loc[gdp_decade + '2'] + c.loc[gdp_decade + '7'])/2
+    return avg
+
+
    +
  1. To obtain the average for the relevant years, we need to loop over +them:
  2. +
+
+

PYTHON +

+
def avg_gdp_in_decade(country, continent, year):
+    data_countries = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0)
+    c = data_countries.loc[country]
+    gdp_decade = 'gdpPercap_' + str(year // 10)
+    total = 0.0
+    num_years = 0
+    for yr_header in c.index: # c's index contains reported years
+        if yr_header.startswith(gdp_decade):
+            total = total + c.loc[yr_header]
+            num_years = num_years + 1
+    return total/num_years
+
+

The function can now be called by:

+
+

PYTHON +

+
avg_gdp_in_decade('Japan','asia',1983)
+
+
+

OUTPUT +

+
20880.023800000003
+
+
+
+
+
+
+
+ +
+
+

Simulating a dynamical system

+
+

In mathematics, a dynamical +system is a system in which a function describes the time dependence +of a point in a geometrical space. A canonical example of a dynamical +system is the logistic map, a +growth model that computes a new population density (between 0 and 1) +based on the current density. In the model, time takes discrete values +0, 1, 2, …

+
    +
  1. Define a function called logistic_map that takes two +inputs: x, representing the current population (at time +t), and a parameter r = 1. This function +should return a value representing the state of the system (population) +at time t + 1, using the mapping function:
  2. +
+

f(t+1) = r * f(t) * [1 - f(t)]

+
    +
  1. Using a for or while loop, iterate the +logistic_map function defined in part 1, starting from an +initial population of 0.5, for a period of time +t_final = 10. Store the intermediate results in a list so +that after the loop terminates you have accumulated a sequence of values +representing the state of the logistic map at times +t = [0,1,...,t_final] (11 values in total). Print this list +to see the evolution of the population.

  2. +
  3. Encapsulate the logic of your loop into a function called +iterate that takes the initial population as its first +input, the parameter t_final as its second input and the +parameter r as its third input. The function should return +the list of values representing the state of the logistic map at times +t = [0,1,...,t_final]. Run this function for periods +t_final = 100 and 1000 and print some of the +values. Is the population trending toward a steady state?

  4. +
+
+
+
+
+
+ +
+
+
    +
  1. +

    PYTHON +

    +
    def logistic_map(x, r):
    +    return r * x * (1 - x)
    +
  2. +
  3. +

    PYTHON +

    +
    initial_population = 0.5
    +t_final = 10
    +r = 1.0
    +population = [initial_population]
    +
    +for t in range(t_final):
    +    population.append( logistic_map(population[t], r) )
    +
  4. +
  5. +
    +

    PYTHON +

    +
    def iterate(initial_population, t_final, r):
    +    population = [initial_population]
    +    for t in range(t_final):
    +        population.append( logistic_map(population[t], r) )
    +    return population
    +
    +for period in (10, 100, 1000):
    +    population = iterate(0.5, period, 1)
    +    print(population[-1])
    +
    +
    +

    OUTPUT +

    +
    0.06945089389714401
    +0.009395779870614648
    +0.0009913908614406382
    +
    +The population seems to be approaching zero.
  6. +
+
+
+
+
+
+
+ +
+
+

Using Functions With Conditionals in Pandas

+
+

Functions will often contain conditionals. Here is a short example +that will indicate which quartile the argument is in based on hand-coded +values for the quartile cut points.

+
+

PYTHON +

+
def calculate_life_quartile(exp):
+    if exp < 58.41:
+        # This observation is in the first quartile
+        return 1
+    elif exp >= 58.41 and exp < 67.05:
+        # This observation is in the second quartile
+       return 2
+    elif exp >= 67.05 and exp < 71.70:
+        # This observation is in the third quartile
+       return 3
+    elif exp >= 71.70:
+        # This observation is in the fourth quartile
+       return 4
+    else:
+        # This observation has bad data
+       return None
+
+calculate_life_quartile(62.5)
+
+
+

OUTPUT +

+
2
+
+

That function would typically be used within a for loop, +but Pandas has a different, more efficient way of doing the same thing, +and that is by applying a function to a dataframe or a portion +of a dataframe. Here is an example, using the definition above.

+
+

PYTHON +

+
data = pd.read_csv('data/gapminder_all.csv')
+data['life_qrtl'] = data['lifeExp_1952'].apply(calculate_life_quartile)
+
+

There is a lot in that second line, so let’s take it piece by piece. +On the right side of the = we start with +data['lifeExp'], which is the column in the dataframe +called data labeled lifExp. We use the +apply() to do what it says, apply the +calculate_life_quartile to the value of this column for +every row in the dataframe.

+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Break programs down into functions to make them easier to +understand.
  • +
  • Define a function using def with a name, parameters, +and a block of code.
  • +
  • Defining a function does not run it.
  • +
  • Arguments in a function call are matched to its defined +parameters.
  • +
  • Functions may return a result to their caller using +return.
  • +
+
+
+
+

Content from Variable Scope

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 20 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How do function calls actually work?
  • +
  • How can I determine where errors occurred?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Identify local and global variables.
  • +
  • Identify parameters as local variables.
  • +
  • Read a traceback and determine the file, function, and line number +on which the error occurred, the type of error, and the error +message.
  • +
+
+
+
+
+
+

The scope of a variable is the part of a program that can ‘see’ that +variable. +

+
+
    +
  • There are only so many sensible names for variables.
  • +
  • People using functions shouldn’t have to worry about what variable +names the author of the function used.
  • +
  • People writing functions shouldn’t have to worry about what variable +names the function’s caller uses.
  • +
  • The part of a program in which a variable is visible is called its +scope.
  • +
+
+

PYTHON +

+
pressure = 103.9
+
+def adjust(t):
+    temperature = t * 1.43 / pressure
+    return temperature
+
+
    +
  • +pressure is a global variable. +
      +
    • Defined outside any particular function.
    • +
    • Visible everywhere.
    • +
    +
  • +
  • +t and temperature are local +variables in adjust. +
      +
    • Defined in the function.
    • +
    • Not visible in the main program.
    • +
    • Remember: a function parameter is a variable that is automatically +assigned a value when the function is called.
    • +
    +
  • +
+
+

PYTHON +

+
print('adjusted:', adjust(0.9))
+print('temperature after call:', temperature)
+
+
+

OUTPUT +

+
adjusted: 0.01238691049085659
+
+
+

ERROR +

+
Traceback (most recent call last):
+  File "/Users/swcarpentry/foo.py", line 8, in <module>
+    print('temperature after call:', temperature)
+NameError: name 'temperature' is not defined
+
+
+
+ +
+
+

Local and Global Variable Use

+
+

Trace the values of all variables in this program as it is executed. +(Use ‘—’ as the value of variables before and after they exist.)

+
+

PYTHON +

+
limit = 100
+
+def clip(value):
+    return min(max(0.0, value), limit)
+
+value = -22.5
+print(clip(value))
+
+
+
+
+
+
+ +
+
+

Reading Error Messages

+
+

Read the traceback below, and identify the following:

+
    +
  1. How many levels does the traceback have?
  2. +
  3. What is the file name where the error occurred?
  4. +
  5. What is the function name where the error occurred?
  6. +
  7. On which line number in this function did the error occur?
  8. +
  9. What is the type of error?
  10. +
  11. What is the error message?
  12. +
+
+

ERROR +

+
---------------------------------------------------------------------------
+KeyError                                  Traceback (most recent call last)
+<ipython-input-2-e4c4cbafeeb5> in <module>()
+      1 import errors_02
+----> 2 errors_02.print_friday_message()
+
+/Users/ghopper/thesis/code/errors_02.py in print_friday_message()
+     13
+     14 def print_friday_message():
+---> 15     print_message("Friday")
+
+/Users/ghopper/thesis/code/errors_02.py in print_message(day)
+      9         "sunday": "Aw, the weekend is almost over."
+     10     }
+---> 11     print(messages[day])
+     12
+     13
+
+KeyError: 'Friday'
+
+
+
+
+
+
+ +
+
+
    +
  1. Three levels.
  2. +
  3. errors_02.py
  4. +
  5. print_message
  6. +
  7. Line 11
  8. +
  9. +KeyError. These errors occur when we are trying to look +up a key that does not exist (usually in a data structure such as a +dictionary). We can find more information about the +KeyError and other built-in exceptions in the Python +docs.
  10. +
  11. KeyError: 'Friday'
  12. +
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • The scope of a variable is the part of a program that can ‘see’ that +variable.
  • +
+
+
+
+

Content from Programming Style

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 30 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I make my programs more readable?
  • +
  • How do most programmers format their code?
  • +
  • How can programs check their own operation?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Provide sound justifications for basic rules of coding style.
  • +
  • Refactor one-page programs to make them more readable and justify +the changes.
  • +
  • Use Python community coding standards (PEP-8).
  • +
+
+
+
+
+
+

Coding style +

+
+

A consistent coding style helps others (including our future selves) +read and understand code more easily. Code is read much more often than +it is written, and as the Zen of Python +states, “Readability counts”. Python proposed a standard style through +one of its first Python Enhancement Proposals (PEP), PEP8.

+

Some points worth highlighting:

+
    +
  • document your code and ensure that assumptions, internal algorithms, +expected inputs, expected outputs, etc., are clear
  • +
  • use clear, semantically meaningful variable names
  • +
  • use white-space, not tabs, to indent lines (tabs can cause +problems across different text editors, operating systems, and version +control systems)
  • +

Follow standard Python style in your code. +

+
+
    +
  • +PEP8: a style +guide for Python that discusses topics such as how to name variables, +how to indent your code, how to structure your import +statements, etc. Adhering to PEP8 makes it easier for other Python +developers to read and understand your code, and to understand what +their contributions should look like.
  • +
  • To check your code for compliance with PEP8, you can use the pycodestyle application +and tools like the black code +formatter can automatically format your code to conform to PEP8 and +pycodestyle (a Jupyter notebook formatter also exists nb_black).
  • +
  • Some groups and organizations follow different style guidelines +besides PEP8. For example, the Google style +guide on Python makes slightly different recommendations. Google +wrote an application that can help you format your code in either their +style or PEP8 called yapf.
  • +
  • With respect to coding style, the key is consistency. +Choose a style for your project be it PEP8, the Google style, or +something else and do your best to ensure that you and anyone else you +are collaborating with sticks to it. Consistency within a project is +often more impactful than the particular style used. A consistent style +will make your software easier to read and understand for others and for +your future self.
  • +

Use assertions to check for internal errors. +

+
+

Assertions are a simple but powerful method for making sure that the +context in which your code is executing is as you expect.

+
+

PYTHON +

+
def calc_bulk_density(mass, volume):
+    '''Return dry bulk density = powder mass / powder volume.'''
+    assert volume > 0
+    return mass / volume
+
+

If the assertion is False, the Python interpreter raises +an AssertionError runtime exception. The source code for +the expression that failed will be displayed as part of the error +message. To ignore assertions in your code run the interpreter with the +‘-O’ (optimize) switch. Assertions should contain only simple checks and +never change the state of the program. For example, an assertion should +never contain an assignment.

+

Use docstrings to provide builtin help. +

+
+

If the first thing in a function is a character string that is not +assigned directly to a variable, Python attaches it to the function, +accessible via the builtin help function. This string that provides +documentation is also known as a docstring.

+
+

PYTHON +

+
def average(values):
+    "Return average of values, or None if no values are supplied."
+
+    if len(values) == 0:
+        return None
+    return sum(values) / len(values)
+
+help(average)
+
+
+

OUTPUT +

+
Help on function average in module __main__:
+
+average(values)
+    Return average of values, or None if no values are supplied.
+
+
+
+ +
+
+

Multiline Strings

+
+

Often use multiline strings for documentation. These start +and end with three quote characters (either single or double) and end +with three matching characters.

+
+

PYTHON +

+
"""This string spans
+multiple lines.
+
+Blank lines are allowed."""
+
+
+
+
+
+
+ +
+
+

What Will Be Shown?

+
+

Highlight the lines in the code below that will be available as +online help. Are there lines that should be made available, but won’t +be? Will any lines produce a syntax error or a runtime error?

+
+

PYTHON +

+
"Find maximum edit distance between multiple sequences."
+# This finds the maximum distance between all sequences.
+
+def overall_max(sequences):
+    '''Determine overall maximum edit distance.'''
+
+    highest = 0
+    for left in sequences:
+        for right in sequences:
+            '''Avoid checking sequence against itself.'''
+            if left != right:
+                this = edit_distance(left, right)
+                highest = max(highest, this)
+
+    # Report.
+    return highest
+
+
+
+
+
+
+ +
+
+

Document This

+
+

Use comments to describe and help others understand potentially +unintuitive sections or individual lines of code. They are especially +useful to whoever may need to understand and edit your code in the +future, including yourself.

+

Use docstrings to document the acceptable inputs and expected outputs +of a method or class, its purpose, assumptions and intended behavior. +Docstrings are displayed when a user invokes the builtin +help method on your method or class.

+

Turn the comment in the following function into a docstring and check +that help displays it properly.

+
+

PYTHON +

+
def middle(a, b, c):
+    # Return the middle value of three.
+    # Assumes the values can actually be compared.
+    values = [a, b, c]
+    values.sort()
+    return values[1]
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def middle(a, b, c):
+    '''Return the middle value of three.
+    Assumes the values can actually be compared.'''
+    values = [a, b, c]
+    values.sort()
+    return values[1]
+
+
+
+
+
+
+
+ +
+
+

Clean Up This Code

+
+
    +
  1. Read this short program and try to predict what it does.
  2. +
  3. Run it: how accurate was your prediction?
  4. +
  5. Refactor the program to make it more readable. Remember to run it +after each change to ensure its behavior hasn’t changed.
  6. +
  7. Compare your rewrite with your neighbor’s. What did you do the same? +What did you do differently, and why?
  8. +
+
+

PYTHON +

+
n = 10
+s = 'et cetera'
+print(s)
+i = 0
+while i < n:
+    # print('at', j)
+    new = ''
+    for j in range(len(s)):
+        left = j-1
+        right = (j+1)%len(s)
+        if s[left]==s[right]: new = new + '-'
+        else: new = new + '*'
+    s=''.join(new)
+    print(s)
+    i += 1
+
+
+
+
+
+
+ +
+
+

Here’s one solution.

+
+

PYTHON +

+
def string_machine(input_string, iterations):
+    """
+    Takes input_string and generates a new string with -'s and *'s
+    corresponding to characters that have identical adjacent characters
+    or not, respectively.  Iterates through this procedure with the resultant
+    strings for the supplied number of iterations.
+    """
+    print(input_string)
+    input_string_length = len(input_string)
+    old = input_string
+    for i in range(iterations):
+        new = ''
+        # iterate through characters in previous string
+        for j in range(input_string_length):
+            left = j-1
+            right = (j+1) % input_string_length  # ensure right index wraps around
+            if old[left] == old[right]:
+                new = new + '-'
+            else:
+                new = new + '*'
+        print(new)
+        # store new string as old
+        old = new     
+
+string_machine('et cetera', 10)
+
+
+

OUTPUT +

+
et cetera
+*****-***
+----*-*--
+---*---*-
+--*-*-*-*
+**-------
+***-----*
+--**---**
+*****-***
+----*-*--
+---*---*-
+
+
+
+
+
+
+
+ +
+
+

Key Points

+
+
    +
  • Follow standard Python style in your code.
  • +
  • Use docstrings to provide builtin help.
  • +
+
+
+
+

Content from Wrap-Up

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 20 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • What have we learned?
  • +
  • What else is out there and where do I find it?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Name and locate scientific Python community sites for software, +workshops, and help.
  • +
+
+
+
+
+
+

Leslie Lamport once said, “Writing is nature’s way of showing you how +sloppy your thinking is.” The same is true of programming: many things +that seem obvious when we’re thinking about them turn out to be anything +but when we have to explain them precisely.

+

Python supports a large and diverse community across academia and +industry. +

+
+ +
+
+ +
+
+

Key Points

+
+
    +
  • Python supports a large and diverse community across academia and +industry.
  • +
+
+
+
+

Content from Feedback

+
+

Last updated on 2024-10-18 | + + Edit this page

+

Estimated time: 15 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How did the class go?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Gather feedback on the class
  • +
+
+
+
+
+
+

Gather feedback from participants.

+
+
+ +
+
+

Key Points

+
+
    +
  • We are constantly seeking to improve this course.
  • +
+
+
+
+
+
+
+ + +
+ + +
+ + + + + diff --git a/instructor/design.html b/instructor/design.html new file mode 100644 index 000000000..b302e473d --- /dev/null +++ b/instructor/design.html @@ -0,0 +1,1019 @@ + +Plotting and Programming in Python: Lesson Design +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Lesson Design

+

Last updated on 2024-10-18 | + + Edit this page

+ + + + + +
+ +
+ + + +
+
+ +
+
+

Help Wanted

+
+

We are filling in the exercises below in order to make the lesson plan +more concrete. Contributions (both in the form of pull requests with +filled-in exercises, and comments on specific exercises, ordering, and +timings) are greatly appreciated.

+
+
+
+

Process Used

+
+

Michael Pollan’s advice if he taught R or Python programming:

+
  1. Write code.
  2. +
  3. Not too much.
  4. +
  5. Mostly plots.
  6. +

Michael +Koontz {: .quotation}

+
+

This lesson was developed using a slimmed-down variant of the +“Understanding by Design” process. The main sections are:

+
  1. Assumptions about audience, time, etc. (The current draft also +includes some conclusions and decisions in this section - that should be +refactored.)

  2. +
  3. Desired results: overall goals, summative assessments at half-day +granularity, what learners will be able to do, what learners will +know.

  4. +
  5. Learning plan: each episode has a heading that summarizes what +will be covered, then estimates time that will be spent on teaching and +on exercises, while the exercises are given as bullet points.

  6. +

Stage 1: Assumptions

+
  • Audience +
    • Graduate students in numerate disciplines from cosmology to +archaeology
    • +
    • Who have manipulated data in spreadsheets and with interactive tools +like SAS
    • +
    • But have not programmed beyond CPD +(copy-paste-despair)
    • +
  • +
  • Constraints +
    • One full day 09:00-16:30 +
      • 06:15 class time
      • +
      • 0:45 lunch
      • +
      • 0:30 total for two coffee breaks
      • +
    • +
    • Learners use native installs on their own machines +
      • May use VMs or cloud resources at instructor’s discretion
      • +
      • But must keep native local install as an option
      • +
    • +
    • No dependence on other Carpentry modules +
      • In particular, does not require knowledge of shell or version +control
      • +
    • +
    • Use the Jupyter Notebook +
      • Authentic tool used by many instructors
      • +
      • There isn’t really an alternative
      • +
      • And means that even people who have seen a bit of Python before will +probably learn something
      • +
    • +
  • +
  • Motivating Example +
    • Creating 2D plots suitable for inclusion in papers
    • +
    • Appeals to almost everyone
    • +
    • Makes lesson usable by both Carpentries +
      • And means that even people who have seen a bit of Python before will +probably learn something
      • +
    • +
  • +
  • Data +
    • Use the gapminder data throughout
    • +
    • But break into multiple files by continent +
      • To make display of output from examples tidier (e.g., use +Australia/New Zealand, which is only two lines)
      • +
      • And allow examples showing use of multiple data sets
      • +
    • +
  • +
  • Focus on Pandas instead of NumPy +
    • Makes lesson usable by both Data Carpentry and Software +Carpentry
    • +
    • Genuine novices are likely to want data analysis
    • +
    • And people with some prior experience: +
      • will accept data analysis as an authentic task,
      • +
      • and are unlikely to have encountered Pandas, so they’ll still get +something useful out of the lesson
      • +
    • +
  • +
  • Challenges will mostly not be “write this code from +scratch” +
    • Want lots of short exercises that can reliably be finished in +allotted time
    • +
    • So use MCQs, fill-in-the-blanks, Parsons Problems, “tweak this +code”, etc.
    • +
  • +

Stage 2: Desired Results

+
+

Questions

+

How do I…

+
  • …read tabular data?
  • +
  • …plot a single vector of values?
  • +
  • …create a time series plot?
  • +
  • …create one plot for each of several data sets?
  • +
  • …get extra data from a single data set for plotting?
  • +
  • …write programs I can read and re-use in future?
  • +
+
+

Skills

+

I can…

+
  • …write short scripts using loops and conditionals.
  • +
  • …write functions with a fixed number of parameters that return a +single result.
  • +
  • …import libraries using aliases and refer to those libraries’ +contents.
  • +
  • …do simple data extraction and formatting using Pandas.
  • +
+
+

Concepts

+

I know…

+
  • …that a program is a piece of lab equipment that implements an +analysis +
    • Needs to be validated/calibrated before/during use
    • +
    • Makes analysis reproducible, reviewable, shareable
    • +
  • +
  • …that programs are written for people, not for computers +
    • Meaningful variable names
    • +
    • Modularity for readability as well as re-use
    • +
    • No duplication
    • +
    • Document purpose and use
    • +
  • +
  • …that there is no magic: the programs they use are no different in +principle from those they build
  • +
  • …how to assign values to variables
  • +
  • …what integers, floats, strings, NumPy arrays, and Pandas dataframes +are
  • +
  • …how to trace the execution of a for loop
  • +
  • …how to trace the execution of if/else +statements
  • +
  • …how to create and index lists
  • +
  • …how to create and index NumPy arrays
  • +
  • …how to create and index Pandas dataframes
  • +
  • …how to create time series plots
  • +
  • …the difference between defining and calling a function
  • +
  • …where to find documentation on standard libraries
  • +
  • …how to find out what else scientific Python offers
  • +
+

Stage 3: Learning Plan

+
+

Summative Assessment

+
  • Midpoint: create time-series plot for each file in a directory.
  • +
  • Final: extract data from Pandas dataframe and create comparative +multi-line time series plot.
  • +
+
+

+Running and Quitting Interactively +(9:00)

+
  • Teaching: 15 min (because setup issues) +
    • Launch the Jupyter Notebook, create new notebooks, and exit the +Notebook.
    • +
    • Create Markdown cells in a notebook.
    • +
    • Create and run Python cells in a notebook.
    • +
  • +
  • Challenges: 0 min (accounted for in teaching time - no separate +exercise) +
    • Creating lists in Markdown
    • +
    • What is displayed when several expressions are put in a single +cell?
    • +
    • Change an existing cell from code to Markdown
    • +
    • Rendering LaTeX-style equations
    • +
  • +
+
+

+Variables and Assignment (9:15)

+
  • Teaching: 10 min +
    • Write programs that assign scalar values to variables and perform +calculations with those values.
    • +
    • Correctly trace value changes in programs that use scalar +assignment.
    • +
  • +
  • Challenges: 10 min +
    • Trace execution of code swapping two values using an intermediate +variable.
    • +
    • Predict final values of variables after several assignments.
    • +
    • What happens if you try to index a number?
    • +
    • Which is a better variable name, m, min, +or minutes?
    • +
    • What do the following slice expressions produce?
    • +
  • +
+
+

+Data Types and Type +Conversion (09:35)

+
  • Teaching: 10 min +
    • Explain key differences between integers and floating point +numbers.
    • +
    • Explain key differences between numbers and character strings.
    • +
    • Use built-in functions to convert between integers, floating point +numbers, and strings.
    • +
  • +
  • Challenges: 10 min +
    • What type of value is 3.4?
    • +
    • What type of value is 3.25 + 4?
    • +
    • What type of value would you use to represent: +
      • Number of days since the start of the year.
      • +
      • Time elapsed since the start of the year.
      • +
      • Etc.
      • +
    • +
    • How can you use // (integer division) and +% (modulo)?
    • +
    • What does int("3.4") do?
    • +
    • Given these float, int, and string values, which expressions will +print a particular result?
    • +
    • What do you expect 1+2j + 3 to produce?
    • +
  • +
+
+

+Built-in Functions and Help +(09:55)

+
  • Teaching: 15 min +
    • Explain the purpose of functions.
    • +
    • Correctly call built-in Python functions.
    • +
    • Correctly nest calls to built-in functions.
    • +
    • Use help to display documentation for built-in functions.
    • +
    • Correctly describe situations in which SyntaxError and NameError +occur.
    • +
  • +
  • Challenges: 10 min +
    • Explain the order of operations in the following complex +expression.
    • +
    • What will each nested combination of min and +max calls produce?
    • +
    • Why don’t max and min return +None when given no arguments?
    • +
    • Given what we have seen so far, what index expression will get the +last character in a string?
    • +
  • +
+
+

+Coffee: 15 min (10:20)

+
+
+

+Libraries (10:35)

+
  • Teaching: 10 min +
    • Explain what software libraries are and why programmers create and +use them.
    • +
    • Write programs that import and use libraries from Python’s standard +library.
    • +
    • Find and read documentation for standard libraries interactively (in +the interpreter) and online.
    • +
  • +
  • Challenges: 10 min +
    • Which function from the standard math library could you use to +calculate a square root?
    • +
    • What library would you use to select a random value from data?
    • +
    • If help(math) produces an error, what have you +forgotten to do?
    • +
    • Fill in the blanks in code below so that the import statement and +program run.
    • +
  • +
+
+

+Reading Tabular Data +(10:55)

+
  • Teaching: 10 min +
    • Import the Pandas library.
    • +
    • Use Pandas to load a simple CSV data set.
    • +
    • Get some basic information about a Pandas DataFrame.
    • +
  • +
  • Challenges: 10 min +
    • Read the data for the Americas and display its summary +statistics.
    • +
    • What do .head and .tail do?
    • +
    • What string(s) should you pass to read_csv to read +files from other directories?
    • +
    • How can you write CSV data?
    • +
  • +
+
+

+DataFrames (11:15)

+
  • Teaching: 15 min +
    • Select individual values from a Pandas dataframe.
    • +
    • Select entire rows or entire columns from a dataframe.
    • +
    • Select a subset of both rows and columns from a dataframe in a +single operation.
    • +
    • Select a subset of a dataframe by a single Boolean criterion.
    • +
  • +
  • Challenges: 15 min +
    • Write an expression to find the Per Capita GDP of Serbia in +2007.
    • +
    • What rule governs what is (or isn’t) included in numerical and named +slices in Pandas?
    • +
    • What does each line in the following short program do?
    • +
    • What do idxmin and idxmax do?
    • +
    • Write expressions to get the GDP per capita for all countries in +1982, for all countries after 1985, etc.
    • +
    • Given the way its borders have changed since 1900, what would you do +if asked to create a table of GDP per capita for Poland for the +Twentieth Century?
    • +
  • +
+
+

+Plotting (11:45)

+
  • Teaching: 15 min +
    • Create a time series plot showing a single data set.
    • +
    • Create a scatter plot showing relationship between two data +sets.
    • +
  • +
  • Exercise: 15 min +
    • Fill in the blanks to plot the minimum GDP per capita over time for +European countries.
    • +
    • Modify the example to create a scatter plot of GDP per capita in +Asian countries.
    • +
    • Explain what each argument to plot does in the +following example.
    • +
  • +
+
+

+Lunch (12:15): 45 min

+
+
+

+Lists (13:00)

+
  • Teaching: 10 min +
    • Explain why programs need collections of values.
    • +
    • Write programs that create flat lists, index them, slice them, and +modify them through assignment and method calls.
    • +
  • +
  • Challenges: 10 min +
    • Fill in the blanks so that the program produces the output +shown.
    • +
    • How large are the following slices?
    • +
    • What do negative index expressions print?
    • +
    • What does a “stride” in a slice do?
    • +
    • How do slices treat out-of-range bounds?
    • +
    • What are the differences between sorting these two ways?
    • +
    • What is the difference between new = old and +new = old[:]?
    • +
  • +
+
+

+Loops (13:20)

+
  • Teaching: 10 min +
    • Explain what for loops are normally used for.
    • +
    • Trace the execution of a simple (unnested) loop and correctly state +the values of variables in each iteration.
    • +
    • Write for loops that use the Accumulator pattern to aggregate +values.
    • +
  • +
  • Challenges: 15 min +
    • Is an indentation error a syntax error or a runtime error?
    • +
    • Trace which lines of this program are executed in what order.
    • +
    • Fill in the blanks in this program so that it reverses a +string.
    • +
    • Fill in the blanks in this series of examples to get practice +accumulating values.
    • +
    • Reorder and indent these lines to calculate the cumulative sum of +the list values.
    • +
  • +
+
+

+Looping Over Data Sets +(13:45)

+
  • Teaching: 5 min +
    • Be able to read and write globbing expressions that match sets of +files.
    • +
    • Use glob to create lists of files.
    • +
    • Write for loops to perform operations on files given their names in +a list.
    • +
  • +
  • Challenges: 10 min +
    • Which filenames are not matched by this glob +expression?
    • +
    • Modify this program so that it prints the number of records in the +shortest file.
    • +
    • Write a program that reads and plots all of the regional data +sets.
    • +
  • +
+
+

+Writing Functions (14:00)

+
  • Teaching: 10 min +
    • Explain and identify the difference between function definition and +function call.
    • +
    • Write a function that takes a small, fixed number of arguments and +produces a single result.
    • +
  • +
  • Challenges: 15 min +
    • This code defines and calls a function - what does it print when +run?
    • +
    • Explain why this short program prints things in the order it +does.
    • +
    • Fill in the blanks to create a function that finds the minimum value +in a data file.
    • +
    • Fill in the blanks to create a function that finds the first +negative value in a list. What does your function do if the list is +empty?
    • +
    • Why is it sometimes useful to pass arguments by naming the +corresponding parameters?
    • +
    • Fill in the blanks and turn this short piece of code into a +function.
    • +
  • +
+
+

+Variable Scope (14:25)

+
  • Teaching: 10 min +
    • Identify local and global variables.
    • +
    • Identify parameters as local variables.
    • +
    • Read a traceback and determine the file, function, and line number +on which the error occurred.
    • +
  • +
  • Challenges: 10 min +
    • Trace the changes to the values in this program, being careful to +distinguish local from global values.
    • +
  • +
+
+

+Coffee (14:45): 15 min

+
+
+

+Conditionals (15:00)

+
  • Teaching: 10 min +
    • Correctly write programs that use if and else statements and simple +Boolean expressions (without logical operators).
    • +
    • Trace the execution of unnested conditionals and conditionals inside +loops.
    • +
  • +
  • Challenges: 15 min +
    • Trace the execution of this conditional statement.
    • +
    • Fill in the blanks so that this function replaces negative values +with zeroes.
    • +
    • Modify this program so that it only processes files with fewer than +50 records.
    • +
    • Modify this program so that it always finds the largest and smallest +values in a list no matter what the list’s values are.
    • +
  • +
+
+

+Programming Style (15:25)

+
  • Teaching: 15 min +
    • How can I make my programs more readable?
    • +
    • How do most programmers format their code?
    • +
    • How can programs check their own operation?
    • +
  • +
  • Challenges: 15 min +
    • Which lines in this code will be available as online help?
    • +
    • Turn the comments in this program into docstrings.
    • +
    • Rewrite this short program to be more readable.
    • +
  • +
+
+

+Wrap-Up (15:55)

+
  • Teaching: 20 min +
    • Name and locate scientific Python community sites for software, +workshops, and help.
    • +
  • +
  • Challenges: 0 min +
    • None.
    • +
  • +
+
+

+Feedback (16:15)

+
  • Teaching: 0 min
  • +
  • Challenges: 15 min +
    • Collect feedback
    • +
  • +
+
+

Finish (16:30)

+
+
+
+ + +
+
+ + + diff --git a/instructor/discuss.html b/instructor/discuss.html new file mode 100644 index 000000000..b560e10f3 --- /dev/null +++ b/instructor/discuss.html @@ -0,0 +1,531 @@ + +Plotting and Programming in Python: Discussion +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Discussion

+

Last updated on 2024-10-18 | + + Edit this page

+ + + + + +
+ +
+ + +

FIXME: general discussion and further reading for learners.

+ + +
+
+ + +
+
+ + + diff --git a/instructor/exercises.html b/instructor/exercises.html new file mode 100644 index 000000000..fbb03387b --- /dev/null +++ b/instructor/exercises.html @@ -0,0 +1,531 @@ + +Plotting and Programming in Python: Further Exercises +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Further Exercises

+

Last updated on 2024-10-18 | + + Edit this page

+ + + + + +
+ +
+ + +

FIXME: exercises that don’t fit into the regular schedule.

+ + +
+
+ + +
+
+ + + diff --git a/instructor/images.html b/instructor/images.html new file mode 100644 index 000000000..b87318cb8 --- /dev/null +++ b/instructor/images.html @@ -0,0 +1,706 @@ + + + + + +Plotting and Programming in Python: All Images + + + + + + + + + + + + +
+ Plotting and Programming in Python +
+ +
+
+ + + + + + +
+
+ + +

Running and Quitting

+
+

Figure 1

+ +

+Anaconda Navigator landing page

+
+

Figure 2

+ +

+JupyterLab landing page

+
+

Figure 3

+ +

+JupyterLab Menu Bar

+
+

Figure 4

+ +

+JupyterLab Left Side Bar

+
+

Figure 5

+ +

+JupyterLab Main Work Area

+
+

Figure 6

+ +

+Example Jupyter Notebook

+
+

Figure 7

+ +

+Multi-panel JupyterLab

+

Variables and Assignment

+
+

Figure 1

+ +
A line of Python code, print(atom_name[0]), demonstrates that using the zero index will output just the initial letter, in this case ‘h’ for helium.
A line of Python code, print(atom_name[0]), +demonstrates that using the zero index will output just the initial +letter, in this case ‘h’ for helium.
+

Data Types and Type Conversion

+

Built-in Functions and Help

+

Morning Coffee

+

Libraries

+

Reading Tabular Data into DataFrames

+

Pandas DataFrames

+

Plotting

+
+

Figure 1

+ +
A line chart showing time (hr) relative to position (km), using the values provided in the code block above. By default, the plotted line is blue against a white background, and the axes have been scaled automatically to fit the range of the input data.

+

Figure 2

+ +
GDP plot for Australia

+

Figure 3

+ +
GDP plot for Australia and New Zealand

+

Figure 4

+ +
GDP barplot for Australia

+

Figure 5

+ +
GDP formatted plot for Australia

+

Figure 6

+ +
GDP formatted plot for Australia and New Zealand

+

Figure 7

+ +
GDP correlation using plt.scatter

+

Figure 8

+ +
GDP correlation using data.T.plot.scatter

+

Figure 9

+ +
Minima Maxima Solution

+

Figure 10

+ +
Correlations Solution 1

+

Figure 11

+ +
Correlations Solution 2

+

Figure 12

+ +
More Correlations Solution

Lunch

+

Lists

+

For Loops

+

Conditionals

+

Looping Over Data Sets

+

Afternoon Coffee

+

Writing Functions

+

Variable Scope

+

Programming Style

+

Wrap-Up

+

Feedback

+
+
+
+
+ + +
+ + +
+ + + + + diff --git a/instructor/index.html b/instructor/index.html new file mode 100644 index 000000000..c3cd355d5 --- /dev/null +++ b/instructor/index.html @@ -0,0 +1,733 @@ + +Plotting and Programming in Python: Summary and Schedule +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+

Summary and Schedule

+ + +

This lesson is an introduction to programming in Python 3 for people +with little or no previous programming experience. It uses plotting as +its motivating example and is designed to be used in both Data Carpentry and Software Carpentry +workshops. This lesson references JupyterLab but +can be taught using alternative Python 3 interpreters as well (e.g., +repl.it, Anaconda).

+
+
+ +
+
+

Prerequisites

+
+
  1. Learners need to understand what files and directories are, what +a working directory is, and how to start a Python interpreter.

  2. +
  3. Learners must install Python 3 before the class starts.

  4. +
  5. Learners must get the gapminder data before class starts: please +download and unzip the file python-novice-gapminder-data.zip.

  6. +

Please see the setup instructions for more +details.

+
+
+
+ + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor. +

+

Getting the Data

+

The data we will be using is taken from the gapminder +dataset. To obtain it, download and unzip the file python-novice-gapminder-data.zip. +In order to follow the presented material, you should launch the +JupyterLab server in the root directory (see Starting +JupyterLab).

+

Installing Python Using Anaconda

+

Please refer to the Python +section of the workshop website for installation instructions.

+
+ + +
+
+ + + diff --git a/instructor/instructor-notes.html b/instructor/instructor-notes.html new file mode 100644 index 000000000..ebcb90e3e --- /dev/null +++ b/instructor/instructor-notes.html @@ -0,0 +1,666 @@ + + + + + +Plotting and Programming in Python: Instructor Notes + + + + + + + + + + + + +
+ Plotting and Programming in Python +
+ +
+
+ + + + + + +
+
+

Instructor Notes

+ +

General Notes +

+
+
+
It’s all right not to get through the whole lesson.
+
+This lesson is designed for people who have never programmed before, but +any given class may include people with a wide range of prior +experience. We have therefore included enough material to fill a full +day if need be, but expect that many offerings will only get as far as +the introduction to Pandas. +
+
Don’t tell people to Google things.
+
+One of the goals of this lesson is to help novices build a workable +mental model of how programming works. Until they have that model, they +will not know what to search for or how to recognize a helpful answer. +Telling them to Google can also give the impression that we think their +problem is trivial. (That said, if learners have done enough programming +before to be past these issues, having them search for solutions online +can help them solidify their understanding.) It’s also worth quoting Trevor +King’s comment about online search: “If you find anything, other +folks were confused enough to bother with a blog or Stack Overflow post, +so it’s probably not trivial.” +
+

Running and Quitting

+

Variables and Assignment

+

Data Types and Type Conversion

+

Built-in Functions and Help

+

Morning Coffee

+

Libraries

+

Reading Tabular Data into DataFrames

+

Pandas DataFrames

+
+

+Instructor Note +

+

Learners often struggle here, many may not work with financial data +and concepts so they find the example concepts difficult to get their +head around. The biggest problem though is the line generating the +wealth_score, this step needs to be talked through throughly: * It uses +implicit conversion between boolean and float values which has not been +covered in the course so far. * The axis=1 argument needs to be +explained clearly.

+
+

Plotting

+

Lunch

+

Lists

+

For Loops

+

Conditionals

+

Looping Over Data Sets

+

Afternoon Coffee

+

Writing Functions

+

Variable Scope

+

Programming Style

+

Wrap-Up

+

Feedback

+
+
+
+
+ + +
+ + +
+ + + + + diff --git a/instructor/key-points.html b/instructor/key-points.html new file mode 100644 index 000000000..b52b067a8 --- /dev/null +++ b/instructor/key-points.html @@ -0,0 +1,796 @@ + + + + + +Plotting and Programming in Python: Key Points + + + + + + + + + + + + +
+ Plotting and Programming in Python +
+ +
+
+ + + + + + +
+
+ + +

Running and Quitting

+
+
    +
  • Python scripts are plain text files.
  • +
  • Use the Jupyter Notebook for editing and running Python.
  • +
  • The Notebook has Command and Edit modes.
  • +
  • Use the keyboard and mouse to select and edit cells.
  • +
  • The Notebook will turn Markdown into pretty-printed +documentation.
  • +
  • Markdown does most of what HTML does.
  • +

Variables and Assignment

+
+
    +
  • Use variables to store values.
  • +
  • Use print to display values.
  • +
  • Variables persist between cells.
  • +
  • Variables must be created before they are used.
  • +
  • Variables can be used in calculations.
  • +
  • Use an index to get a single character from a string.
  • +
  • Use a slice to get a substring.
  • +
  • Use the built-in function len to find the length of a +string.
  • +
  • Python is case-sensitive.
  • +
  • Use meaningful variable names.
  • +

Data Types and Type Conversion

+
+
    +
  • Every value has a type.
  • +
  • Use the built-in function type to find the type of a +value.
  • +
  • Types control what operations can be done on values.
  • +
  • Strings can be added and multiplied.
  • +
  • Strings have a length (but numbers don’t).
  • +
  • Must convert numbers to strings or vice versa when operating on +them.
  • +
  • Can mix integers and floats freely in operations.
  • +
  • Variables only change value when something is assigned to them.
  • +

Built-in Functions and Help

+
+
    +
  • Use comments to add documentation to programs.
  • +
  • A function may take zero or more arguments.
  • +
  • Commonly-used built-in functions include max, +min, and round.
  • +
  • Functions may only work for certain (combinations of) +arguments.
  • +
  • Functions may have default values for some arguments.
  • +
  • Use the built-in function help to get help for a +function.
  • +
  • The Jupyter Notebook has two ways to get help.
  • +
  • Every function returns something.
  • +
  • Python reports a syntax error when it can’t understand the source of +a program.
  • +
  • Python reports a runtime error when something goes wrong while a +program is executing.
  • +
  • Fix syntax errors by reading the source code, and runtime errors by +tracing the program’s execution.
  • +

Morning Coffee

+

Libraries

+
+
    +
  • Most of the power of a programming language is in its +libraries.
  • +
  • A program must import a library module in order to use it.
  • +
  • Use help to learn about the contents of a library +module.
  • +
  • Import specific items from a library to shorten programs.
  • +
  • Create an alias for a library when importing it to shorten +programs.
  • +

Reading Tabular Data into DataFrames

+
+
    +
  • Use the Pandas library to get basic statistics out of tabular +data.
  • +
  • Use index_col to specify that a column’s values should +be used as row headings.
  • +
  • Use DataFrame.info to find out more about a +dataframe.
  • +
  • The DataFrame.columns variable stores information about +the dataframe’s columns.
  • +
  • Use DataFrame.T to transpose a dataframe.
  • +
  • Use DataFrame.describe to get summary statistics about +data.
  • +

Pandas DataFrames

+
+
    +
  • Use DataFrame.iloc[..., ...] to select values by +integer location.
  • +
  • Use : on its own to mean all columns or all rows.
  • +
  • Select multiple columns or rows using DataFrame.loc and +a named slice.
  • +
  • Result of slicing can be used in further operations.
  • +
  • Use comparisons to select data based on value.
  • +
  • Select values or NaN using a Boolean mask.
  • +

Plotting

+
+
    +
  • +matplotlib is the +most widely used scientific plotting library in Python.
  • +
  • Plot data directly from a Pandas dataframe.
  • +
  • Select and transform data, then plot it.
  • +
  • Many styles of plot are available: see the Python Graph +Gallery for more options.
  • +
  • Can plot many sets of data together.
  • +

Lunch

+

Lists

+
+
    +
  • A list stores many values in a single structure.
  • +
  • Use an item’s index to fetch it from a list.
  • +
  • Lists’ values can be replaced by assigning to them.
  • +
  • Appending items to a list lengthens it.
  • +
  • Use del to remove items from a list entirely.
  • +
  • The empty list contains no values.
  • +
  • Lists may contain values of different types.
  • +
  • Character strings can be indexed like lists.
  • +
  • Character strings are immutable.
  • +
  • Indexing beyond the end of the collection is an error.
  • +

For Loops

+
+
    +
  • A for loop executes commands once for each value in a +collection.
  • +
  • A for loop is made up of a collection, a loop variable, +and a body.
  • +
  • The first line of the for loop must end with a colon, +and the body must be indented.
  • +
  • Indentation is always meaningful in Python.
  • +
  • Loop variables can be called anything (but it is strongly advised to +have a meaningful name to the looping variable).
  • +
  • The body of a loop can contain many statements.
  • +
  • Use range to iterate over a sequence of numbers.
  • +
  • The Accumulator pattern turns many values into one.
  • +

Conditionals

+
+
    +
  • Use if statements to control whether or not a block of +code is executed.
  • +
  • Conditionals are often used inside loops.
  • +
  • Use else to execute a block of code when an +if condition is not true.
  • +
  • Use elif to specify additional tests.
  • +
  • Conditions are tested once, in order.
  • +
  • Create a table showing variables’ values to trace a program’s +execution.
  • +

Looping Over Data Sets

+
+
    +
  • Use a for loop to process files given a list of their +names.
  • +
  • Use glob.glob to find sets of files whose names match a +pattern.
  • +
  • Use glob and for to process batches of +files.
  • +

Afternoon Coffee

+

Writing Functions

+
+
    +
  • Break programs down into functions to make them easier to +understand.
  • +
  • Define a function using def with a name, parameters, +and a block of code.
  • +
  • Defining a function does not run it.
  • +
  • Arguments in a function call are matched to its defined +parameters.
  • +
  • Functions may return a result to their caller using +return.
  • +

Variable Scope

+
+
    +
  • The scope of a variable is the part of a program that can ‘see’ that +variable.
  • +

Programming Style

+
+
    +
  • Follow standard Python style in your code.
  • +
  • Use docstrings to provide builtin help.
  • +

Wrap-Up

+
+
    +
  • Python supports a large and diverse community across academia and +industry.
  • +

Feedback

+
+
    +
  • We are constantly seeking to improve this course.
  • +
+
+
+
+ + +
+ + +
+ + + + + diff --git a/instructor/profiles.html b/instructor/profiles.html new file mode 100644 index 000000000..adb3e635a --- /dev/null +++ b/instructor/profiles.html @@ -0,0 +1,488 @@ + +Plotting and Programming in Python: Learner Profiles +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Learner Profiles

+ +

This is a placeholder file. Please add content here.

+ +
+
+ + +
+
+ + + diff --git a/instructor/reference.html b/instructor/reference.html new file mode 100644 index 000000000..85bccb138 --- /dev/null +++ b/instructor/reference.html @@ -0,0 +1,890 @@ + +Plotting and Programming in Python: Reference +
+ Plotting and Programming in Python +
+ +
+
+ + + + + +
+
+

Reference

+

Last updated on 2024-10-18 | + + Edit this page

+ + + + + +
+ +
+ + + +

Reference

+

Running and Quitting

+
  • Python files have the .py extension.
  • +
  • Can be written in a text file or a Jupyter Notebook. +
    • Jupyter notebooks have the extension .ipynb +
    • +
    • Jupyter notebooks can be opened from Anaconda or +through the command line by entering $ jupyter notebook +
      • Markdown and HTML are allowed in markdown cells for documenting +code.
      • +
    • +
  • +

Variables and Assignment

+
  • Variables are stored using =. +
    • Strings are defined in quotations '...'.
    • +
    • Integers and floating point numbers are defined without +quotations.
    • +
  • +
  • Variables can contain letters, digits, and underscores +_. +
    • Cannot start with a digit.
    • +
    • Variables that start with underscores should be avoided.
    • +
  • +
  • Use print(...) to display values as text.
  • +
  • Can use indexing on strings. +
    • Indexing starts at 0.
    • +
    • Position is given in square brackets [position] +following the variable name.
    • +
    • Take a slice using [start:stop]. This makes a copy of +part of the original string. +
      • +start is the index of the first element.
      • +
      • +stop is the index of the element after the last desired +element.
      • +
    • +
  • +
  • Use len(...) to find the length of a variable or +string.
  • +

Data Types and Type +Conversion

+
  • Each value has a type. This controls what can be done with it. +
    • +int represents an integer
    • +
    • +float represents a floating point number.
    • +
    • +str represents a string.
    • +
  • +
  • To determine a variables type, use the built-in function +type(...), including the variable name in the +parenthesis.
  • +
  • Modifying strings: +
    • Use + to concatenate strings.
    • +
    • Use * to repeat a string.
    • +
    • Numbers and strings cannot be added to on another. +
      • Convert string to integer: int(...).
      • +
      • Convert integer to string: str(...).
      • +
    • +
  • +

Built-in Functions and Help

+
  • To add a comment, place # before the thing you do not +with to be executed.
  • +
  • Commonly used built-in functions: +
    • +min() finds the smallest value.
    • +
    • +max() finds the largest value.
    • +
    • +round() rounds off a floating point number.
    • +
    • +help() displays documentation for the function in the +parenthesis. +
      • Other ways to get help include holding down shift and +pressing tab in Jupyter Notebooks.
      • +
    • +
  • +

Libraries

+
  • Importing a library: +
    • Use import ... to load a library.
    • +
    • Refer to this library by using module_name.thing_name. +
      • +. indicates ‘part of’.
      • +
    • +
  • +
  • To import a specific item from a library: +from ... import ... +
  • +
  • To import a library using an alias: +import ... as ... +
  • +
  • Importing the math library: import math +
    • Example of referring to an item with the module’s name: +math.cos(math.pi).
    • +
  • +
  • Importing the plotting library as an alias: +import matplotlib as mpl +
  • +

Reading Tabular Data into +DataFrames

+
  • Use the pandas library to do statistics on tabular data. Load with +import pandas as pd. +
    • To read in a csv: pd.read_csv(), including the path +name in the parenthesis. +
      • To specify a column’s values should be used as row headings: +pd.read_csv('path', index_col='column name'), where path +and column name should be replaced with the relevant values.
      • +
    • +
  • +
  • To get more information about a DataFrame, use +DataFrame.info, replacing DataFrame with the +variable name of your DataFrame.
  • +
  • Use DataFrame.columns to view the column names.
  • +
  • Use DataFrame.T to transpose a DataFrame.
  • +
  • Use DataFrame.describe to get summary statistics about +your data.
  • +

Pandas DataFrames

+
  • Select data using [i,j] +
    • To select by entry position: DataFrame.iloc[..., ...] +
      • This is inclusive of everything except the final index.
      • +
    • +
    • To select by entry label: DataFrame.loc[..., ...] +
      • Can select multiple rows or columns by listing labels.
      • +
      • This is inclusive to both ends.
      • +
    • +
    • Use : to select all rows or columns.
    • +
  • +
  • Can also select data based on values using True and +False. This is a Boolean mask. +
    • mask = subset > 10000
    • +
    • We can then use this to select values.
    • +
  • +
  • To use a select-apply-combine operation we use +data.apply(lambda x: x > x.mean()) where +mean() can be any operation the user would like to be +applied to x.
  • +

Plotting

+
  • The most widely used plotting library is matplotlib. +
    • Usually imported using +import matplotlib.pyplot as plt.
    • +
    • To plot we use the command +plt.plot(time, position).
    • +
    • To create a legend use +plt.legend(['label1', 'label2'], loc='upper left') +
      • Can also define labels within the plot statements by using +plt.plot(time, position, label='label'). To make the legend +show up, use plt.legend() +
      • +
    • +
    • To label x and y axis plt.xlabel('label') and +plt.ylabel('label') are used.
    • +
  • +
  • Pandas DataFrames can be used to plot by using +DataFrame.plot(). Any operations that can be used on a +DataFrame can be applied while plotting. +
    • To plot a bar plot data.plot(kind='bar') +
    • +
  • +
+

PYTHON +

+
import matplotlib.puplot as plot
+plt.plot(time, position, label='label')
+plt.xlabel('x axis label')
+plt.ylabel('y axis label')
+plt.legend()
+
+

Lists

+
  • Defined within [...] and separated by ,. +
    • An empty list can be created by using [].
    • +
  • +
  • Can use len(...) to determine how many values are in a +list.
  • +
  • Can index just as done in previous lessons. +
    • Indexing can be used to reassign values +list_name[0] = newvalue.
    • +
  • +
  • To add an item to a list use list_name.append(), with +the item to append in the parenthesis.
  • +
  • To combine two lists use +list_name_1.extend(list_name_2).
  • +
  • To remove an item from a list use +del list_name[index].
  • +

For Loops

+
  • Start a for loop with for number in [1, 2, 3]:, with +the following lines indented. +
    • +[1, 2, 3] is considered the collection.
    • +
    • +number is the loop variable.
    • +
    • The action following the collection is the body.
    • +
  • +
  • To iterate over a sequence of numbers use +range(start, end) +
  • +
+

PYTHON +

+
for number in range(0,5):
+    print(number)
+
+

Conditionals

+
  • Defined similarly to a loop, using +if variable conditional value:. +
    • For example, if variable > 5:.
    • +
  • +
  • Use elif: for additional tests.
  • +
  • Use else: for when if statement is not true.
  • +
  • Can combine more than one conditional by using and or +or.
  • +
  • Often used in combination with for loops.
  • +
  • Conditions that can be used: +
    • +== equal to.
    • +
    • +>= greater than or equal to.
    • +
    • +<= less than or equal to.
    • +
    • +> greater than.
    • +
    • +< less than.
    • +
  • +
+

PYTHON +

+
for m in [3, 6, 7, 2, 8]:
+    if m > 5:
+        print(m, 'is large')
+    elif m == 5:
+        print(m, 'is 5')
+    else:
+        print(m, 'is small')
+
+

Looping Over Data Sets

+
  • Use a for loop: for filename in [file1, file2]: +
  • +
  • To find a set of files using a pattern use glob.glob +
    • Must import first using import glob.
    • +
    • +* indicates “match zero or more characters”
    • +
    • +? indicates “match exactly one character” +
      • For example: glob.glob(*.txt) will find all files that +end with .txt in the current directory.
      • +
    • +
  • +
  • Combine these by writing a loop using: +for filename in glob.glob(*.txt): +
  • +
+

PYTHON +

+
for filename in glob.glob(*.txt):
+  data = pd.read_csv(filename)
+
+

Writing Functions

+
  • Define a function using def function_name(parameters):. +Replace parameters with the variables to use when the +function is executed.
  • +
  • Run by using function_name(parameters).
  • +
  • To return a result to the caller use return ... in the +function.
  • +
+

PYTHON +

+
def add_numbers(a, b):
+    result = a + b
+    return result
+
+add_numbers(1, 4)
+
+

Variable Scope

+
  • A local variable is defined in a function and can only be seen and +used within that function.
  • +
  • A global variable is defined outside of a function and can be seen +or used anywhere after definition.
  • +

Programming Style

+
  • Document your code.
  • +
  • Use clear and meaningful variable names.
  • +
  • Follow the PEP8 +style guide when setting up your code.
  • +
  • Use assertions to check for internal errors.
  • +
  • Use docstrings to provide help.
  • +

Glossary

+
Arguments
+
+Values passed to functions. +
+
Array
+
+A container holding elements of the same type. +
+
Boolean
+
+An object composed of True and False. +
+
DataFrame
+
+The way Pandas represents a table; a collection of series. +
+
Element
+
+An item in a list or an array. For a string, these are the individual +characters. +
+
Function
+
+A block of code that can be called and re-used elsewhere. +
+
Global variable
+
+A variable defined outside of a function that can be used anywhere. +
+
Index
+
+The position of a given element. +
+
Jupyter Notebook
+
+Interactive coding environment allowing a combination of code and +markdown. +
+
Library
+
+A collection of files containing functions used by other programs. +
+
Local Variable
+
+A variable defined inside of a function that can only be used inside of +that function. +
+
Mask
+
+A boolean object used for selecting data from another object. +
+
Method
+
+An action tied to a particular object. Called by using +object.method. +
+
Modules
+
+The files within a library containing functions used by other programs. +
+
Parameters
+
+Variables used when executing a function. +
+
Series
+
+A Pandas data structure to represent a column. +
+
Substring
+
+A part of a string. +
+
Variables
+
+Names for values. +
+
+
+ + +
+
+ + + diff --git a/key-points.html b/key-points.html new file mode 100644 index 000000000..c8f71ac34 --- /dev/null +++ b/key-points.html @@ -0,0 +1,792 @@ + + + + + +Plotting and Programming in Python: Key Points + + + + + + + + + + + + +
+ Plotting and Programming in Python +
+ +
+
+ + + + + + +
+
+ + +

Running and Quitting

+
+
    +
  • Python scripts are plain text files.
  • +
  • Use the Jupyter Notebook for editing and running Python.
  • +
  • The Notebook has Command and Edit modes.
  • +
  • Use the keyboard and mouse to select and edit cells.
  • +
  • The Notebook will turn Markdown into pretty-printed +documentation.
  • +
  • Markdown does most of what HTML does.
  • +

Variables and Assignment

+
+
    +
  • Use variables to store values.
  • +
  • Use print to display values.
  • +
  • Variables persist between cells.
  • +
  • Variables must be created before they are used.
  • +
  • Variables can be used in calculations.
  • +
  • Use an index to get a single character from a string.
  • +
  • Use a slice to get a substring.
  • +
  • Use the built-in function len to find the length of a +string.
  • +
  • Python is case-sensitive.
  • +
  • Use meaningful variable names.
  • +

Data Types and Type Conversion

+
+
    +
  • Every value has a type.
  • +
  • Use the built-in function type to find the type of a +value.
  • +
  • Types control what operations can be done on values.
  • +
  • Strings can be added and multiplied.
  • +
  • Strings have a length (but numbers don’t).
  • +
  • Must convert numbers to strings or vice versa when operating on +them.
  • +
  • Can mix integers and floats freely in operations.
  • +
  • Variables only change value when something is assigned to them.
  • +

Built-in Functions and Help

+
+
    +
  • Use comments to add documentation to programs.
  • +
  • A function may take zero or more arguments.
  • +
  • Commonly-used built-in functions include max, +min, and round.
  • +
  • Functions may only work for certain (combinations of) +arguments.
  • +
  • Functions may have default values for some arguments.
  • +
  • Use the built-in function help to get help for a +function.
  • +
  • The Jupyter Notebook has two ways to get help.
  • +
  • Every function returns something.
  • +
  • Python reports a syntax error when it can’t understand the source of +a program.
  • +
  • Python reports a runtime error when something goes wrong while a +program is executing.
  • +
  • Fix syntax errors by reading the source code, and runtime errors by +tracing the program’s execution.
  • +

Morning Coffee

+

Libraries

+
+
    +
  • Most of the power of a programming language is in its +libraries.
  • +
  • A program must import a library module in order to use it.
  • +
  • Use help to learn about the contents of a library +module.
  • +
  • Import specific items from a library to shorten programs.
  • +
  • Create an alias for a library when importing it to shorten +programs.
  • +

Reading Tabular Data into DataFrames

+
+
    +
  • Use the Pandas library to get basic statistics out of tabular +data.
  • +
  • Use index_col to specify that a column’s values should +be used as row headings.
  • +
  • Use DataFrame.info to find out more about a +dataframe.
  • +
  • The DataFrame.columns variable stores information about +the dataframe’s columns.
  • +
  • Use DataFrame.T to transpose a dataframe.
  • +
  • Use DataFrame.describe to get summary statistics about +data.
  • +

Pandas DataFrames

+
+
    +
  • Use DataFrame.iloc[..., ...] to select values by +integer location.
  • +
  • Use : on its own to mean all columns or all rows.
  • +
  • Select multiple columns or rows using DataFrame.loc and +a named slice.
  • +
  • Result of slicing can be used in further operations.
  • +
  • Use comparisons to select data based on value.
  • +
  • Select values or NaN using a Boolean mask.
  • +

Plotting

+
+
    +
  • +matplotlib is the +most widely used scientific plotting library in Python.
  • +
  • Plot data directly from a Pandas dataframe.
  • +
  • Select and transform data, then plot it.
  • +
  • Many styles of plot are available: see the Python Graph +Gallery for more options.
  • +
  • Can plot many sets of data together.
  • +

Lunch

+

Lists

+
+
    +
  • A list stores many values in a single structure.
  • +
  • Use an item’s index to fetch it from a list.
  • +
  • Lists’ values can be replaced by assigning to them.
  • +
  • Appending items to a list lengthens it.
  • +
  • Use del to remove items from a list entirely.
  • +
  • The empty list contains no values.
  • +
  • Lists may contain values of different types.
  • +
  • Character strings can be indexed like lists.
  • +
  • Character strings are immutable.
  • +
  • Indexing beyond the end of the collection is an error.
  • +

For Loops

+
+
    +
  • A for loop executes commands once for each value in a +collection.
  • +
  • A for loop is made up of a collection, a loop variable, +and a body.
  • +
  • The first line of the for loop must end with a colon, +and the body must be indented.
  • +
  • Indentation is always meaningful in Python.
  • +
  • Loop variables can be called anything (but it is strongly advised to +have a meaningful name to the looping variable).
  • +
  • The body of a loop can contain many statements.
  • +
  • Use range to iterate over a sequence of numbers.
  • +
  • The Accumulator pattern turns many values into one.
  • +

Conditionals

+
+
    +
  • Use if statements to control whether or not a block of +code is executed.
  • +
  • Conditionals are often used inside loops.
  • +
  • Use else to execute a block of code when an +if condition is not true.
  • +
  • Use elif to specify additional tests.
  • +
  • Conditions are tested once, in order.
  • +
  • Create a table showing variables’ values to trace a program’s +execution.
  • +

Looping Over Data Sets

+
+
    +
  • Use a for loop to process files given a list of their +names.
  • +
  • Use glob.glob to find sets of files whose names match a +pattern.
  • +
  • Use glob and for to process batches of +files.
  • +

Afternoon Coffee

+

Writing Functions

+
+
    +
  • Break programs down into functions to make them easier to +understand.
  • +
  • Define a function using def with a name, parameters, +and a block of code.
  • +
  • Defining a function does not run it.
  • +
  • Arguments in a function call are matched to its defined +parameters.
  • +
  • Functions may return a result to their caller using +return.
  • +

Variable Scope

+
+
    +
  • The scope of a variable is the part of a program that can ‘see’ that +variable.
  • +

Programming Style

+
+
    +
  • Follow standard Python style in your code.
  • +
  • Use docstrings to provide builtin help.
  • +

Wrap-Up

+
+
    +
  • Python supports a large and diverse community across academia and +industry.
  • +

Feedback

+
+
    +
  • We are constantly seeking to improve this course.
  • +
+
+
+
+ + +
+ + +
+ + + + + diff --git a/link.svg b/link.svg new file mode 100644 index 000000000..88ad82769 --- /dev/null +++ b/link.svg @@ -0,0 +1,12 @@ + + + + + + diff --git a/md5sum.txt b/md5sum.txt new file mode 100644 index 000000000..ca97ff893 --- /dev/null +++ b/md5sum.txt @@ -0,0 +1,33 @@ +"file" "checksum" "built" "date" +"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-10-18" +"LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2024-10-18" +"config.yaml" "4c8c3b66083d754c51eae2c277d24ca0" "site/built/config.yaml" "2024-10-18" +"index.md" "f019634aead94a6e24c7b0a414239caa" "site/built/index.md" "2024-10-18" +"links.md" "fd719a41381bb145880b9220d70edec3" "site/built/links.md" "2024-10-18" +"episodes/01-run-quit.md" "ca5737c8a9dc8c8e16f8320550a93787" "site/built/01-run-quit.md" "2024-10-18" +"episodes/02-variables.md" "9dacd8cd9968b5d0185f0c244a55eb84" "site/built/02-variables.md" "2024-10-18" +"episodes/03-types-conversion.md" "9e3a08116a2124cd8e23d1d7c5dba432" "site/built/03-types-conversion.md" "2024-10-18" +"episodes/04-built-in.md" "d3ea4aa2a49667b61cb21a62034c88c6" "site/built/04-built-in.md" "2024-10-18" +"episodes/05-coffee.md" "c7616ec40b9e611c47b2bac1e11c47d2" "site/built/05-coffee.md" "2024-10-18" +"episodes/06-libraries.md" "96899c58843e51f10eb84a8ac20ebb90" "site/built/06-libraries.md" "2024-10-18" +"episodes/07-reading-tabular.md" "b5b65e50037a583dfc5a3a879e4404b0" "site/built/07-reading-tabular.md" "2024-10-18" +"episodes/08-data-frames.md" "af0057242e5f63f0c049f58ad66f1cbb" "site/built/08-data-frames.md" "2024-10-18" +"episodes/09-plotting.md" "9d0e0b5ce187cff8cd166762432c598e" "site/built/09-plotting.md" "2024-10-18" +"episodes/10-lunch.md" "0624bfa89e628df443070e8c44271b33" "site/built/10-lunch.md" "2024-10-18" +"episodes/11-lists.md" "1257daeb542377a3b04c6bec0d0ffee1" "site/built/11-lists.md" "2024-10-18" +"episodes/12-for-loops.md" "1da6e4e57a25f8d4fd64802c2eb682c4" "site/built/12-for-loops.md" "2024-10-18" +"episodes/13-conditionals.md" "2739086f688f386c32ce56400c6b27e2" "site/built/13-conditionals.md" "2024-10-18" +"episodes/14-looping-data-sets.md" "fb2992c34b244b375302ffb15bd25b8d" "site/built/14-looping-data-sets.md" "2024-10-18" +"episodes/15-coffee.md" "062bae79eb17ee57f183b21658a8d813" "site/built/15-coffee.md" "2024-10-18" +"episodes/16-writing-functions.md" "a87b7fd96770bd8c24d695ef529b93ce" "site/built/16-writing-functions.md" "2024-10-18" +"episodes/17-scope.md" "8109afb18f278a482083d867ad80da6e" "site/built/17-scope.md" "2024-10-18" +"episodes/18-style.md" "67f9594a062909ef15132811d02ee6a0" "site/built/18-style.md" "2024-10-18" +"episodes/19-wrap.md" "8863b58685fecbc89a6f5058bde50307" "site/built/19-wrap.md" "2024-10-18" +"episodes/20-feedback.md" "942925c3013831350ae64f2cb75f2171" "site/built/20-feedback.md" "2024-10-18" +"instructors/design.md" "84d5da2a0671a8a719c26f6636695873" "site/built/design.md" "2024-10-18" +"instructors/instructor-notes.md" "2ea8589d855779b73fe1526c1552b330" "site/built/instructor-notes.md" "2024-10-18" +"learners/discuss.md" "012b885b35283c528857acd0fde06604" "site/built/discuss.md" "2024-10-18" +"learners/exercises.md" "8f305efe9f670305e9d23140d43ca651" "site/built/exercises.md" "2024-10-18" +"learners/reference.md" "f83e0f36168cb869210dd190ef81227b" "site/built/reference.md" "2024-10-18" +"learners/setup.md" "40258d2c8777bac1e9ee081f6c12010c" "site/built/setup.md" "2024-10-18" +"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-10-18" diff --git a/mstile-150x150.png b/mstile-150x150.png new file mode 100644 index 000000000..8136f75e7 Binary files /dev/null and b/mstile-150x150.png differ diff --git a/pkgdown.css b/pkgdown.css new file mode 100644 index 000000000..80ea5b838 --- /dev/null +++ b/pkgdown.css @@ -0,0 +1,384 @@ +/* Sticky footer */ + +/** + * Basic idea: https://philipwalton.github.io/solved-by-flexbox/demos/sticky-footer/ + * Details: https://github.com/philipwalton/solved-by-flexbox/blob/master/assets/css/components/site.css + * + * .Site -> body > .container + * .Site-content -> body > .container .row + * .footer -> footer + * + * Key idea seems to be to ensure that .container and __all its parents__ + * have height set to 100% + * + */ + +html, body { + height: 100%; +} + +body { + position: relative; +} + +body > .container { + display: flex; + height: 100%; + flex-direction: column; +} + +body > .container .row { + flex: 1 0 auto; +} + +footer { + margin-top: 45px; + padding: 35px 0 36px; + border-top: 1px solid #e5e5e5; + color: #666; + display: flex; + flex-shrink: 0; +} +footer p { + margin-bottom: 0; +} +footer div { + flex: 1; +} +footer .pkgdown { + text-align: right; +} +footer p { + margin-bottom: 0; +} + +img.icon { + float: right; +} + +/* Ensure in-page images don't run outside their container */ +.contents img { + max-width: 100%; + height: auto; +} + +/* Fix bug in bootstrap (only seen in firefox) */ +summary { + display: list-item; +} + +/* Typographic tweaking ---------------------------------*/ + +.contents .page-header { + margin-top: calc(-60px + 1em); +} + +dd { + margin-left: 3em; +} + +/* Section anchors ---------------------------------*/ + +a.anchor { + display: none; + margin-left: 5px; + width: 20px; + height: 20px; + + background-image: url(./link.svg); + background-repeat: no-repeat; + background-size: 20px 20px; + background-position: center center; +} + +h1:hover .anchor, +h2:hover .anchor, +h3:hover .anchor, +h4:hover .anchor, +h5:hover .anchor, +h6:hover .anchor { + display: inline-block; +} + +/* Fixes for fixed navbar --------------------------*/ + +.contents h1, .contents h2, .contents h3, .contents h4 { + padding-top: 60px; + margin-top: -40px; +} + +/* Navbar submenu --------------------------*/ + +.dropdown-submenu { + position: relative; +} + +.dropdown-submenu>.dropdown-menu { + top: 0; + left: 100%; + margin-top: -6px; + margin-left: -1px; + border-radius: 0 6px 6px 6px; +} + +.dropdown-submenu:hover>.dropdown-menu { + display: block; +} + +.dropdown-submenu>a:after { + display: block; + content: " "; + float: right; + width: 0; + height: 0; + border-color: transparent; + border-style: solid; + border-width: 5px 0 5px 5px; + border-left-color: #cccccc; + margin-top: 5px; + margin-right: -10px; +} + +.dropdown-submenu:hover>a:after { + border-left-color: #ffffff; +} + +.dropdown-submenu.pull-left { + float: none; +} + +.dropdown-submenu.pull-left>.dropdown-menu { + left: -100%; + margin-left: 10px; + border-radius: 6px 0 6px 6px; +} + +/* Sidebar --------------------------*/ + +#pkgdown-sidebar { + margin-top: 30px; + position: -webkit-sticky; + position: sticky; + top: 70px; +} + +#pkgdown-sidebar h2 { + font-size: 1.5em; + margin-top: 1em; +} + +#pkgdown-sidebar h2:first-child { + margin-top: 0; +} + +#pkgdown-sidebar .list-unstyled li { + margin-bottom: 0.5em; +} + +/* bootstrap-toc tweaks ------------------------------------------------------*/ + +/* All levels of nav */ + +nav[data-toggle='toc'] .nav > li > a { + padding: 4px 20px 4px 6px; + font-size: 1.5rem; + font-weight: 400; + color: inherit; +} + +nav[data-toggle='toc'] .nav > li > a:hover, +nav[data-toggle='toc'] .nav > li > a:focus { + padding-left: 5px; + color: inherit; + border-left: 1px solid #878787; +} + +nav[data-toggle='toc'] .nav > .active > a, +nav[data-toggle='toc'] .nav > .active:hover > a, +nav[data-toggle='toc'] .nav > .active:focus > a { + padding-left: 5px; + font-size: 1.5rem; + font-weight: 400; + color: inherit; + border-left: 2px solid #878787; +} + +/* Nav: second level (shown on .active) */ + +nav[data-toggle='toc'] .nav .nav { + display: none; /* Hide by default, but at >768px, show it */ + padding-bottom: 10px; +} + +nav[data-toggle='toc'] .nav .nav > li > a { + padding-left: 16px; + font-size: 1.35rem; +} + +nav[data-toggle='toc'] .nav .nav > li > a:hover, +nav[data-toggle='toc'] .nav .nav > li > a:focus { + padding-left: 15px; +} + +nav[data-toggle='toc'] .nav .nav > .active > a, +nav[data-toggle='toc'] .nav .nav > .active:hover > a, +nav[data-toggle='toc'] .nav .nav > .active:focus > a { + padding-left: 15px; + font-weight: 500; + font-size: 1.35rem; +} + +/* orcid ------------------------------------------------------------------- */ + +.orcid { + font-size: 16px; + color: #A6CE39; + /* margins are required by official ORCID trademark and display guidelines */ + margin-left:4px; + margin-right:4px; + vertical-align: middle; +} + +/* Reference index & topics ----------------------------------------------- */ + +.ref-index th {font-weight: normal;} + +.ref-index td {vertical-align: top; min-width: 100px} +.ref-index .icon {width: 40px;} +.ref-index .alias {width: 40%;} +.ref-index-icons .alias {width: calc(40% - 40px);} +.ref-index .title {width: 60%;} + +.ref-arguments th {text-align: right; padding-right: 10px;} +.ref-arguments th, .ref-arguments td {vertical-align: top; min-width: 100px} +.ref-arguments .name {width: 20%;} +.ref-arguments .desc {width: 80%;} + +/* Nice scrolling for wide elements --------------------------------------- */ + +table { + display: block; + overflow: auto; +} + +/* Syntax highlighting ---------------------------------------------------- */ + +pre, code, pre code { + background-color: #f8f8f8; + color: #333; +} +pre, pre code { + white-space: pre-wrap; + word-break: break-all; + overflow-wrap: break-word; +} + +pre { + border: 1px solid #eee; +} + +pre .img, pre .r-plt { + margin: 5px 0; +} + +pre .img img, pre .r-plt img { + background-color: #fff; +} + +code a, pre a { + color: #375f84; +} + +a.sourceLine:hover { + text-decoration: none; +} + +.fl {color: #1514b5;} +.fu {color: #000000;} /* function */ +.ch,.st {color: #036a07;} /* string */ +.kw {color: #264D66;} /* keyword */ +.co {color: #888888;} /* comment */ + +.error {font-weight: bolder;} +.warning {font-weight: bolder;} + +/* Clipboard --------------------------*/ + +.hasCopyButton { + position: relative; +} + +.btn-copy-ex { + position: absolute; + right: 0; + top: 0; + visibility: hidden; +} + +.hasCopyButton:hover button.btn-copy-ex { + visibility: visible; +} + +/* headroom.js ------------------------ */ + +.headroom { + will-change: transform; + transition: transform 200ms linear; +} +.headroom--pinned { + transform: translateY(0%); +} +.headroom--unpinned { + transform: translateY(-100%); +} + +/* mark.js ----------------------------*/ + +mark { + background-color: rgba(255, 255, 51, 0.5); + border-bottom: 2px solid rgba(255, 153, 51, 0.3); + padding: 1px; +} + +/* vertical spacing after htmlwidgets */ +.html-widget { + margin-bottom: 10px; +} + +/* fontawesome ------------------------ */ + +.fab { + font-family: "Font Awesome 5 Brands" !important; +} + +/* don't display links in code chunks when printing */ +/* source: https://stackoverflow.com/a/10781533 */ +@media print { + code a:link:after, code a:visited:after { + content: ""; + } +} + +/* Section anchors --------------------------------- + Added in pandoc 2.11: https://github.com/jgm/pandoc-templates/commit/9904bf71 +*/ + +div.csl-bib-body { } +div.csl-entry { + clear: both; +} +.hanging-indent div.csl-entry { + margin-left:2em; + text-indent:-2em; +} +div.csl-left-margin { + min-width:2em; + float:left; +} +div.csl-right-inline { + margin-left:2em; + padding-left:1em; +} +div.csl-indent { + margin-left: 2em; +} diff --git a/pkgdown.js b/pkgdown.js new file mode 100644 index 000000000..6f0eee40b --- /dev/null +++ b/pkgdown.js @@ -0,0 +1,108 @@ +/* http://gregfranko.com/blog/jquery-best-practices/ */ +(function($) { + $(function() { + + $('.navbar-fixed-top').headroom(); + + $('body').css('padding-top', $('.navbar').height() + 10); + $(window).resize(function(){ + $('body').css('padding-top', $('.navbar').height() + 10); + }); + + $('[data-toggle="tooltip"]').tooltip(); + + var cur_path = paths(location.pathname); + var links = $("#navbar ul li a"); + var max_length = -1; + var pos = -1; + for (var i = 0; i < links.length; i++) { + if (links[i].getAttribute("href") === "#") + continue; + // Ignore external links + if (links[i].host !== location.host) + continue; + + var nav_path = paths(links[i].pathname); + + var length = prefix_length(nav_path, cur_path); + if (length > max_length) { + max_length = length; + pos = i; + } + } + + // Add class to parent
  • , and enclosing
  • if in dropdown + if (pos >= 0) { + var menu_anchor = $(links[pos]); + menu_anchor.parent().addClass("active"); + menu_anchor.closest("li.dropdown").addClass("active"); + } + }); + + function paths(pathname) { + var pieces = pathname.split("/"); + pieces.shift(); // always starts with / + + var end = pieces[pieces.length - 1]; + if (end === "index.html" || end === "") + pieces.pop(); + return(pieces); + } + + // Returns -1 if not found + function prefix_length(needle, haystack) { + if (needle.length > haystack.length) + return(-1); + + // Special case for length-0 haystack, since for loop won't run + if (haystack.length === 0) { + return(needle.length === 0 ? 0 : -1); + } + + for (var i = 0; i < haystack.length; i++) { + if (needle[i] != haystack[i]) + return(i); + } + + return(haystack.length); + } + + /* Clipboard --------------------------*/ + + function changeTooltipMessage(element, msg) { + var tooltipOriginalTitle=element.getAttribute('data-original-title'); + element.setAttribute('data-original-title', msg); + $(element).tooltip('show'); + element.setAttribute('data-original-title', tooltipOriginalTitle); + } + + if(ClipboardJS.isSupported()) { + $(document).ready(function() { + var copyButton = ""; + + $("div.sourceCode").addClass("hasCopyButton"); + + // Insert copy buttons: + $(copyButton).prependTo(".hasCopyButton"); + + // Initialize tooltips: + $('.btn-copy-ex').tooltip({container: 'body'}); + + // Initialize clipboard: + var clipboardBtnCopies = new ClipboardJS('[data-clipboard-copy]', { + text: function(trigger) { + return trigger.parentNode.textContent.replace(/\n#>[^\n]*/g, ""); + } + }); + + clipboardBtnCopies.on('success', function(e) { + changeTooltipMessage(e.trigger, 'Copied!'); + e.clearSelection(); + }); + + clipboardBtnCopies.on('error', function() { + changeTooltipMessage(e.trigger,'Press Ctrl+C or Command+C to copy'); + }); + }); + } +})(window.jQuery || window.$) diff --git a/pkgdown.yml b/pkgdown.yml new file mode 100644 index 000000000..d1213be08 --- /dev/null +++ b/pkgdown.yml @@ -0,0 +1,5 @@ +pandoc: 3.1.11 +pkgdown: 2.1.1 +pkgdown_sha: ~ +articles: {} +last_built: 2024-10-18T07:45Z diff --git a/profiles.html b/profiles.html new file mode 100644 index 000000000..2c5c3dac9 --- /dev/null +++ b/profiles.html @@ -0,0 +1,488 @@ + +Plotting and Programming in Python: Learner Profiles +
    + Plotting and Programming in Python +
    + +
    +
    + + + + + +
    +
    +

    Learner Profiles

    + +

    This is a placeholder file. Please add content here.

    + +
    +
    + + +
    +
    + + + diff --git a/reference.html b/reference.html new file mode 100644 index 000000000..c6b4b5e92 --- /dev/null +++ b/reference.html @@ -0,0 +1,888 @@ + +Plotting and Programming in Python: Reference +
    + Plotting and Programming in Python +
    + +
    +
    + + + + + +
    +
    +

    Reference

    +

    Last updated on 2024-10-18 | + + Edit this page

    + + + +
    + +
    + + + +

    Reference

    +

    Running and Quitting

    +
    • Python files have the .py extension.
    • +
    • Can be written in a text file or a Jupyter Notebook. +
      • Jupyter notebooks have the extension .ipynb +
      • +
      • Jupyter notebooks can be opened from Anaconda or +through the command line by entering $ jupyter notebook +
        • Markdown and HTML are allowed in markdown cells for documenting +code.
        • +
      • +
    • +

    Variables and Assignment

    +
    • Variables are stored using =. +
      • Strings are defined in quotations '...'.
      • +
      • Integers and floating point numbers are defined without +quotations.
      • +
    • +
    • Variables can contain letters, digits, and underscores +_. +
      • Cannot start with a digit.
      • +
      • Variables that start with underscores should be avoided.
      • +
    • +
    • Use print(...) to display values as text.
    • +
    • Can use indexing on strings. +
      • Indexing starts at 0.
      • +
      • Position is given in square brackets [position] +following the variable name.
      • +
      • Take a slice using [start:stop]. This makes a copy of +part of the original string. +
        • +start is the index of the first element.
        • +
        • +stop is the index of the element after the last desired +element.
        • +
      • +
    • +
    • Use len(...) to find the length of a variable or +string.
    • +

    Data Types and Type +Conversion

    +
    • Each value has a type. This controls what can be done with it. +
      • +int represents an integer
      • +
      • +float represents a floating point number.
      • +
      • +str represents a string.
      • +
    • +
    • To determine a variables type, use the built-in function +type(...), including the variable name in the +parenthesis.
    • +
    • Modifying strings: +
      • Use + to concatenate strings.
      • +
      • Use * to repeat a string.
      • +
      • Numbers and strings cannot be added to on another. +
        • Convert string to integer: int(...).
        • +
        • Convert integer to string: str(...).
        • +
      • +
    • +

    Built-in Functions and Help

    +
    • To add a comment, place # before the thing you do not +with to be executed.
    • +
    • Commonly used built-in functions: +
      • +min() finds the smallest value.
      • +
      • +max() finds the largest value.
      • +
      • +round() rounds off a floating point number.
      • +
      • +help() displays documentation for the function in the +parenthesis. +
        • Other ways to get help include holding down shift and +pressing tab in Jupyter Notebooks.
        • +
      • +
    • +

    Libraries

    +
    • Importing a library: +
      • Use import ... to load a library.
      • +
      • Refer to this library by using module_name.thing_name. +
        • +. indicates ‘part of’.
        • +
      • +
    • +
    • To import a specific item from a library: +from ... import ... +
    • +
    • To import a library using an alias: +import ... as ... +
    • +
    • Importing the math library: import math +
      • Example of referring to an item with the module’s name: +math.cos(math.pi).
      • +
    • +
    • Importing the plotting library as an alias: +import matplotlib as mpl +
    • +

    Reading Tabular Data into +DataFrames

    +
    • Use the pandas library to do statistics on tabular data. Load with +import pandas as pd. +
      • To read in a csv: pd.read_csv(), including the path +name in the parenthesis. +
        • To specify a column’s values should be used as row headings: +pd.read_csv('path', index_col='column name'), where path +and column name should be replaced with the relevant values.
        • +
      • +
    • +
    • To get more information about a DataFrame, use +DataFrame.info, replacing DataFrame with the +variable name of your DataFrame.
    • +
    • Use DataFrame.columns to view the column names.
    • +
    • Use DataFrame.T to transpose a DataFrame.
    • +
    • Use DataFrame.describe to get summary statistics about +your data.
    • +

    Pandas DataFrames

    +
    • Select data using [i,j] +
      • To select by entry position: DataFrame.iloc[..., ...] +
        • This is inclusive of everything except the final index.
        • +
      • +
      • To select by entry label: DataFrame.loc[..., ...] +
        • Can select multiple rows or columns by listing labels.
        • +
        • This is inclusive to both ends.
        • +
      • +
      • Use : to select all rows or columns.
      • +
    • +
    • Can also select data based on values using True and +False. This is a Boolean mask. +
      • mask = subset > 10000
      • +
      • We can then use this to select values.
      • +
    • +
    • To use a select-apply-combine operation we use +data.apply(lambda x: x > x.mean()) where +mean() can be any operation the user would like to be +applied to x.
    • +

    Plotting

    +
    • The most widely used plotting library is matplotlib. +
      • Usually imported using +import matplotlib.pyplot as plt.
      • +
      • To plot we use the command +plt.plot(time, position).
      • +
      • To create a legend use +plt.legend(['label1', 'label2'], loc='upper left') +
        • Can also define labels within the plot statements by using +plt.plot(time, position, label='label'). To make the legend +show up, use plt.legend() +
        • +
      • +
      • To label x and y axis plt.xlabel('label') and +plt.ylabel('label') are used.
      • +
    • +
    • Pandas DataFrames can be used to plot by using +DataFrame.plot(). Any operations that can be used on a +DataFrame can be applied while plotting. +
      • To plot a bar plot data.plot(kind='bar') +
      • +
    • +
    +

    PYTHON +

    +
    import matplotlib.puplot as plot
    +plt.plot(time, position, label='label')
    +plt.xlabel('x axis label')
    +plt.ylabel('y axis label')
    +plt.legend()
    +
    +

    Lists

    +
    • Defined within [...] and separated by ,. +
      • An empty list can be created by using [].
      • +
    • +
    • Can use len(...) to determine how many values are in a +list.
    • +
    • Can index just as done in previous lessons. +
      • Indexing can be used to reassign values +list_name[0] = newvalue.
      • +
    • +
    • To add an item to a list use list_name.append(), with +the item to append in the parenthesis.
    • +
    • To combine two lists use +list_name_1.extend(list_name_2).
    • +
    • To remove an item from a list use +del list_name[index].
    • +

    For Loops

    +
    • Start a for loop with for number in [1, 2, 3]:, with +the following lines indented. +
      • +[1, 2, 3] is considered the collection.
      • +
      • +number is the loop variable.
      • +
      • The action following the collection is the body.
      • +
    • +
    • To iterate over a sequence of numbers use +range(start, end) +
    • +
    +

    PYTHON +

    +
    for number in range(0,5):
    +    print(number)
    +
    +

    Conditionals

    +
    • Defined similarly to a loop, using +if variable conditional value:. +
      • For example, if variable > 5:.
      • +
    • +
    • Use elif: for additional tests.
    • +
    • Use else: for when if statement is not true.
    • +
    • Can combine more than one conditional by using and or +or.
    • +
    • Often used in combination with for loops.
    • +
    • Conditions that can be used: +
      • +== equal to.
      • +
      • +>= greater than or equal to.
      • +
      • +<= less than or equal to.
      • +
      • +> greater than.
      • +
      • +< less than.
      • +
    • +
    +

    PYTHON +

    +
    for m in [3, 6, 7, 2, 8]:
    +    if m > 5:
    +        print(m, 'is large')
    +    elif m == 5:
    +        print(m, 'is 5')
    +    else:
    +        print(m, 'is small')
    +
    +

    Looping Over Data Sets

    +
    • Use a for loop: for filename in [file1, file2]: +
    • +
    • To find a set of files using a pattern use glob.glob +
      • Must import first using import glob.
      • +
      • +* indicates “match zero or more characters”
      • +
      • +? indicates “match exactly one character” +
        • For example: glob.glob(*.txt) will find all files that +end with .txt in the current directory.
        • +
      • +
    • +
    • Combine these by writing a loop using: +for filename in glob.glob(*.txt): +
    • +
    +

    PYTHON +

    +
    for filename in glob.glob(*.txt):
    +  data = pd.read_csv(filename)
    +
    +

    Writing Functions

    +
    • Define a function using def function_name(parameters):. +Replace parameters with the variables to use when the +function is executed.
    • +
    • Run by using function_name(parameters).
    • +
    • To return a result to the caller use return ... in the +function.
    • +
    +

    PYTHON +

    +
    def add_numbers(a, b):
    +    result = a + b
    +    return result
    +
    +add_numbers(1, 4)
    +
    +

    Variable Scope

    +
    • A local variable is defined in a function and can only be seen and +used within that function.
    • +
    • A global variable is defined outside of a function and can be seen +or used anywhere after definition.
    • +

    Programming Style

    +
    • Document your code.
    • +
    • Use clear and meaningful variable names.
    • +
    • Follow the PEP8 +style guide when setting up your code.
    • +
    • Use assertions to check for internal errors.
    • +
    • Use docstrings to provide help.
    • +

    Glossary

    +
    Arguments
    +
    +Values passed to functions. +
    +
    Array
    +
    +A container holding elements of the same type. +
    +
    Boolean
    +
    +An object composed of True and False. +
    +
    DataFrame
    +
    +The way Pandas represents a table; a collection of series. +
    +
    Element
    +
    +An item in a list or an array. For a string, these are the individual +characters. +
    +
    Function
    +
    +A block of code that can be called and re-used elsewhere. +
    +
    Global variable
    +
    +A variable defined outside of a function that can be used anywhere. +
    +
    Index
    +
    +The position of a given element. +
    +
    Jupyter Notebook
    +
    +Interactive coding environment allowing a combination of code and +markdown. +
    +
    Library
    +
    +A collection of files containing functions used by other programs. +
    +
    Local Variable
    +
    +A variable defined inside of a function that can only be used inside of +that function. +
    +
    Mask
    +
    +A boolean object used for selecting data from another object. +
    +
    Method
    +
    +An action tied to a particular object. Called by using +object.method. +
    +
    Modules
    +
    +The files within a library containing functions used by other programs. +
    +
    Parameters
    +
    +Variables used when executing a function. +
    +
    Series
    +
    +A Pandas data structure to represent a column. +
    +
    Substring
    +
    +A part of a string. +
    +
    Variables
    +
    +Names for values. +
    +
    +
    + + +
    +
    + + + diff --git a/safari-pinned-tab.svg b/safari-pinned-tab.svg new file mode 100644 index 000000000..8a74e60c8 --- /dev/null +++ b/safari-pinned-tab.svg @@ -0,0 +1,68 @@ + + + + +Created by potrace 1.14, written by Peter Selinger 2001-2017 + + + + + + + + diff --git a/site.webmanifest b/site.webmanifest new file mode 100644 index 000000000..f2302ffdd --- /dev/null +++ b/site.webmanifest @@ -0,0 +1,19 @@ +{ + "name": "The Carpentries", + "short_name": "The Carpentries", + "icons": [ + { + "src": "/android-chrome-192x192.png", + "sizes": "192x192", + "type": "image/png" + }, + { + "src": "/android-chrome-512x512.png", + "sizes": "512x512", + "type": "image/png" + } + ], + "theme_color": "#ffffff", + "background_color": "#ffffff", + "display": "standalone" +} diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 000000000..e1870c749 --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,177 @@ + + + + https://swcarpentry.github.io/python-novice-gapminder/01-run-quit.html + + + https://swcarpentry.github.io/python-novice-gapminder/02-variables.html + + + https://swcarpentry.github.io/python-novice-gapminder/03-types-conversion.html + + + https://swcarpentry.github.io/python-novice-gapminder/04-built-in.html + + + https://swcarpentry.github.io/python-novice-gapminder/05-coffee.html + + + https://swcarpentry.github.io/python-novice-gapminder/06-libraries.html + + + https://swcarpentry.github.io/python-novice-gapminder/07-reading-tabular.html + + + https://swcarpentry.github.io/python-novice-gapminder/08-data-frames.html + + + https://swcarpentry.github.io/python-novice-gapminder/09-plotting.html + + + https://swcarpentry.github.io/python-novice-gapminder/10-lunch.html + + + https://swcarpentry.github.io/python-novice-gapminder/11-lists.html + + + https://swcarpentry.github.io/python-novice-gapminder/12-for-loops.html + + + https://swcarpentry.github.io/python-novice-gapminder/13-conditionals.html + + + https://swcarpentry.github.io/python-novice-gapminder/14-looping-data-sets.html + + + https://swcarpentry.github.io/python-novice-gapminder/15-coffee.html + + + https://swcarpentry.github.io/python-novice-gapminder/16-writing-functions.html + + + https://swcarpentry.github.io/python-novice-gapminder/17-scope.html + + + https://swcarpentry.github.io/python-novice-gapminder/18-style.html + + + https://swcarpentry.github.io/python-novice-gapminder/19-wrap.html + + + https://swcarpentry.github.io/python-novice-gapminder/20-feedback.html + + + https://swcarpentry.github.io/python-novice-gapminder/404.html + + + https://swcarpentry.github.io/python-novice-gapminder/CODE_OF_CONDUCT.html + + + https://swcarpentry.github.io/python-novice-gapminder/LICENSE.html + + + https://swcarpentry.github.io/python-novice-gapminder/design.html + + + https://swcarpentry.github.io/python-novice-gapminder/discuss.html + + + https://swcarpentry.github.io/python-novice-gapminder/exercises.html + + + https://swcarpentry.github.io/python-novice-gapminder/index.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/01-run-quit.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/02-variables.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/03-types-conversion.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/04-built-in.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/05-coffee.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/06-libraries.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/07-reading-tabular.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/08-data-frames.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/09-plotting.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/10-lunch.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/11-lists.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/12-for-loops.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/13-conditionals.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/14-looping-data-sets.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/15-coffee.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/16-writing-functions.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/17-scope.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/18-style.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/19-wrap.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/20-feedback.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/404.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/CODE_OF_CONDUCT.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/LICENSE.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/design.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/discuss.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/exercises.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/index.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/profiles.html + + + https://swcarpentry.github.io/python-novice-gapminder/instructor/reference.html + + + https://swcarpentry.github.io/python-novice-gapminder/profiles.html + + + https://swcarpentry.github.io/python-novice-gapminder/reference.html + +