chore: port basics + variables + functions from doctoral school

UCL · Aug 13, 2024 · 59fe404 · 59fe404
1 parent 7f5996b
commit 59fe404
Show file tree

Hide file tree

Showing 5 changed files with 527 additions and 483 deletions.
diff --git a/ch01python/00pythons.ipynb.py b/ch01python/00pythons.ipynb.py
@@ -12,224 +12,178 @@
 # ---
 
 # %% [markdown]
-# # Introduction to Python
-
-# %% [markdown]
-# ## Introduction
-
-# %% [markdown]
-# ### Why teach Python?
-
-# %% [markdown]
+# # Introduction
+#
+# ## Why teach Python?
 #
 # * In this first session, we will introduce [Python](http://www.python.org).
 # * This course is about programming for data analysis and visualisation in research.
 # * It's not mainly about Python.
 # * But we have to use some language.
 #
-
-# %% [markdown]
 # ### Why Python?
-
-# %% [markdown]
 #
-# * Python is quick to program in
-# * Python is popular in research, and has lots of libraries for science
-# * Python interfaces well with faster languages
+# * Python has a readable [syntax](https://en.wikipedia.org/wiki/Syntax_(programming_languages)) that makes it relatively quick to pick up.
+# * Python is popular in research, and has lots of libraries for science.
+# * Python interfaces well with faster languages.
 # * Python is free, so you'll never have a problem getting hold of it, wherever you go.
 #
-
-# %% [markdown]
+#
 # ### Why write programs for research?
-
-# %% [markdown]
 #
-# * Not just labour saving
-# * Scripted research can be tested and reproduced
+# * Not just labour saving.
+# * Scripted research can be tested and reproduced.
+#
+# ### Sensible input - reasonable output
 #
-
-# %% [markdown]
-# ### Sensible Input  - Reasonable Output
-
-# %% [markdown]
 # Programs are a rigorous way of describing data analysis for other researchers, as well as for computers.
 #
-# Computational research suffers from people assuming each other's data manipulation is correct. By sharing codes,
-# which are much more easy for a non-author to understand than spreadsheets, we can avoid the "SIRO" problem. The old saw "Garbage in Garbage out" is not the real problem for science:
+# Computational research suffers from people assuming each other's data manipulation is correct. By sharing _readable_, _reproducible_ and _well-tested_ code, which makes all of the data processing steps used in an analysis explicit and checks that each of those steps behaves as expected, we enable other researchers to understand and assesss the validity of those analysis steps for themselves. In a research code context the problem is generally not so much _garbage in, garbage out_, but _sensible input, reasonable output_: 'black-box' analysis pipelines that given sensible looking data inputs produce reasonable appearing but incorrect analyses as outputs.
 #
-# * Sensible input
-# * Reasonable output
+# ## Many kinds of Python
 #
+# ### Python notebooks
 #
-
-# %% [markdown]
-# ## Many kinds of Python
-
-# %% [markdown]
-# ### The Jupyter Notebook
-
-# %% [markdown]
-# The easiest way to get started using Python, and one of the best for research data work, is the Jupyter Notebook.
-
-# %% [markdown]
-# In the notebook, you can easily mix code with discussion and commentary, and mix code with the results of that code;
-# including graphs and other data visualisations.
+# A particularly easy way to get started using Python, and one particularly suited to the sort of exploratory work common in a research context, is using [Jupyter](https://jupyter.org/https://jupyter.org/) notebooks. 
+#
+# In a notebook, you can easily mix code with discussion and commentary, and display the results outputted by code alongside the code itself, including graphs and other data visualisations. 
+#
+# For example if we wish to plot a figure-eight curve ([lemniscate](https://en.wikipedia.org/wiki/Lemniscate_of_Gerono)), we can include the parameteric equations
+# $x = \sin(2\theta) / 2, y = \cos(\theta), \theta \in [0, 2\pi)$ which mathematically define the curve as well as corresponding Python code to plot the curve and the output of that code all within the same notebook:
 
 # %%
-### Make plot
-# %matplotlib inline
-import math
-
+# Plot lemniscate curve
 import numpy as np
 import matplotlib.pyplot as plt
 
-theta = np.arange(0, 4 * math.pi, 0.1)
-eight = plt.figure()
-axes = eight.add_axes([0, 0, 1, 1])
-axes.plot(0.5 * np.sin(theta), np.cos(theta / 2))
+theta = np.linspace(0, 2 * np.pi, 100)
+x = np.sin(2 * theta) / 2
+y = np.cos(theta)
+fig, ax = plt.subplots(figsize=(3, 6))
+lines = ax.plot(x, y)
 
 # %% [markdown]
-# These notes are created using Jupyter notebooks and you may want to use it during the course. However, Jupyter notebooks won't be used for most of the activities and exercises done in class. To get hold of a copy of the notebook, follow the setup instructions shown on the course website, use the installation in Desktop@UCL (available in the teaching cluster rooms or [anywhere](https://www.ucl.ac.uk/isd/services/computers/remote-access/desktopucl-anywhere)), or go clone the [repository](https://github.com/UCL/rsd-engineeringcourse) on GitHub.
-
-# %% [markdown]
-# Jupyter notebooks consist of discussion cells, referred to as "markdown cells", and "code cells", which contain Python. This document has been created using Jupyter notebook, and this very cell is a **Markdown Cell**. 
+# We will be mainly mainly working with Jupyter notebooks in this course and will be using [Jupyter Lab](https://jupyterlab.readthedocs.io/) to view, edit and run the notebooks. To install Jupyter Lab, follow the setup instructions shown [on the course website](../index.html#what-you-need-for-the-course), or use the installation in [Desktop@UCL](https://my.desktop.ucl.ac.uk/).
+#
+# #### Notebook cells
+#
+# Jupyter notebooks consist of sequence of _cells_. Cells can be of two main types:
+#
+#   * _Markdown cells_: Cells containing descriptive text and discussion with rich-text formatting via the [Markdown](https://en.wikipedia.org/wiki/Markdown) text markup language.
+#   * _Code cells_: Cells containing Python code, which is displayed with syntax highlighting. The results returned by the computation performed when running the cell are displayed below the cell as the cell _output_, with Jupyter having a _rich display_ system allowing embedding a range of different outputs including for example static images, videos and interactive widgets.
+#
+# The document you are currently reading is a Jupyter notebook, and this text you are reading is a Markdown cell in the notebook. Below we see an example of a code cell.
 
 # %%
 print("This cell is a code cell")
 
-# %% [markdown]
-# Code cell inputs are numbered, and show the output below.
+# %% [markdown] jp-MarkdownHeadingCollapsed=true
+# Code cell inputs are numbered, with the cell output shown immediately below the input. Here the output is the text that we instruct the cell to print to the standard output stream. Cells will also display a representation of the value outputted by the last line in the cell, if any. For example
+
+# %%
+print("This text will be displayed\n")
+"This is text will also be displayed\n"
 
 # %% [markdown]
-# Markdown cells contain text which uses a simple format to achive pretty layout, 
-# for example, to obtain:
+# There is a small difference in the formatting of the output here, with the `print` function displaying the text without quotation mark delimiters and with any _escaped_ special characters (such as the `"\n"` newline character here) processed.
+#
+# #### Markdown formatting
+#
+# The Markdown language used in Markdown cells provides a simple way to add basic text formatting to the rendered output while aiming to be retain the readability of the original Markdown source. For example to achieve the following rendered output text
 #
-# **bold**, *italic*
+# **bold**, *italic*, ~~striketrough~~, `monospace`
 #
 # * Bullet
 #
 # > Quote
 #
-# We write:
+# [Link to search](https://duckduckgo.com/)
 #
-#     **bold**, *italic*
+# We can use the following Markdown text
 #
-#     * Bullet
+# ```Markdown
+# **bold**, *italic*, ~~striketrough~~, `monospace`
 #
-#     > Quote
+# * Bullet
 #
-# See the Markdown documentation at [This Hyperlink](http://daringfireball.net/projects/markdown/)
-
-# %% [markdown]
-# ### Typing code in the notebook
-
-# %% [markdown]
-# When working with the notebook, you can either be in a cell, typing its contents, or outside cells, moving around the notebook.
-#
-# * When in a cell, press escape to leave it. When moving around outside cells, press return to enter.
-# * Outside a cell:
-#   * Use arrow keys to move around.
-#   * Press `b` to add a new cell below the cursor.
-#   * Press `m` to turn a cell from code mode to markdown mode.
-#   * Press `shift`+`enter` to calculate the code in the block.
-#   * Press `h` to see a list of useful keys in the notebook.
-# * Inside a cell:
-#   * Press `tab` to suggest completions of variables. (Try it!)
-
-# %% [markdown]
-# *Supplementary material*: Learn more about [Jupyter notebooks](https://jupyter.org/).
-
-# %% [markdown]
-# The `%%` at the beginning of a cell is called *magics*. There's a [large list of them available](https://ipython.readthedocs.io/en/stable/interactive/magics.html) and you can [create your own](http://ipython.readthedocs.io/en/stable/config/custommagics.html).
+# > Quote
 #
-
-# %% [markdown]
-# ### Python at the command line
-
-# %% [markdown]
-# Data science experts tend to use a "command line environment" to work. You'll be able to learn this at our ["Software Carpentry" workshops](http://github-pages.arc.ucl.ac.uk/software-carpentry/), which cover other skills for computationally based research.
-
-# %% language="bash"
-# # Above line tells Python to execute this cell as *shell code*
-# # not Python, as if we were in a command line
+# [Link to search](https://duckduckgo.com/)
+# ```
 #
-# python -c "print(2 * 4)"
-
-# %% [markdown]
-# ### Python scripts
-
-# %% [markdown]
-# Once you get good at programming, you'll  want to be able to write your own full programs in Python, which work just
-# like any other program on your computer. Here are some examples:
-
-# %% language="bash"
-# echo "print(2 * 4)" > eight.py
-# python eight.py
-
-# %% [markdown]
-# We can make the script directly executable (on Linux or Mac) by inserting a [hashbang](https://en.wikipedia.org/wiki/Shebang_(Unix%29)) and [setting the permissions](http://v4.software-carpentry.org/shell/perm.html) to execute.
+# For more information [see this tutorial notebook in the official Jupyter documentation](https://nbviewer.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Working%20With%20Markdown%20Cells.ipynb).
 #
-# Note, the `%%writefile` cell magic will write the contents of the cell to the file `fourteen.py`.
-
-# %%
-# %%writefile fourteen.py
-# #! /usr/bin/env python
-print(2 * 7)
-
-# %% language="bash"
-# chmod u+x fourteen.py
-# ./fourteen.py
-
-# %% [markdown]
-# ### Python Modules
+# #### Editing and running cells in the notebook
+#
+# When working with the notebook, you can either be editing the content of a cell (termed _edit mode_), or outside the cells, navigating around the notebook (termed _command mode_).
+#
+# * When in _edit mode_ in a cell, press <kbd>esc</kbd> to leave it and change to _command mode_. 
+# * When navigating between cells in _command mode_, press <kbd>enter</kbd> to change in to _edit mode_ in the selected cell.
+# * When in _command mode_:
+#   * The currently selected cell will be shown by a <div style='display: inline; border-right: solid 5px #1976d2; padding-right: 2px;'>blue highlight</div> to the left of the cell.
+#   * Use the arrow keys <kbd>▲</kbd> and <kbd>▼</kbd> to navigate up and down between cells.
+#   * Press <kbd>a</kbd> to add a new cell above the currently selected cell.
+#   * Press <kbd>b</kbd> to add a new cell below the currently selected cell.
+#   * Press <kbd>d</kbd><kbd>d</kbd> to delete the currently selected cell.
+#   * Press <kbd>m</kbd> to change a code cell to a Markdown cell.
+#   * Press <kbd>y</kbd> to change a Markdown cell to a code cell.
+#   * Press <kbd>shift</kbd>+<kbd>l</kbd> to toggle displaying line numbers on the currently selected cell.
+#   * Press <kbd>shift</kbd>+<kbd>enter</kbd> to run the code in a currently selected code cell and move to the next cell. 
+#   * Press <kbd>ctrl</kbd>+<kbd>enter</kbd> to run the code in a currently selected code cell and keep the current cell selected. 
+#   * Press <kbd>ctrl</kbd>+<kbd>shift</kbd>+<kbd>c</kbd> to access the command palette and search useful actions in the notebook.
+# * When in _edit mode_:
+#   * Press <kbd>tab</kbd> to suggest completions of variable names and object attribute. (Try it!)
+#   * Press <kbd>shift</kbd>+<kbd>tab</kbd> when in the argument list of a function to display a pop-up showing documentation for the function.
+#
+# *Supplementary material*: Learn more about [Jupyter notebooks](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html/).
+#
+# ### Python interpreters
+#
+# An alternative to running Python code via a notebook interface is to run commands in a Python _interpreter_ (also known as an _interactive shell_ or _read-eval-print-loop (REPL)_). This is similar in concept to interacting with your operating system via a command-line interface such as the `bash` or `zsh` shells in Linux and MacOS or `Command Prompt` in Windows. A Python interpreter provides a _prompt_ into which we can type Python code corresponding to commands we wish to execute; we then execute this code by hitting  <kbd>enter</kbd> with any output from the computation being displayed before returning to the prompt again.
+#
+# We will not further explore using Python via an interpreter in this course but if you wish to learn more about such command-line interfaces we recommend you attend one of the [Software Carpentry](https://software-carpentry.org/lessons/https://software-carpentry.org/lessons/) workshops (sessions are regularly organised by [our group](http://rits.github-pages.ucl.ac.uk/software-carpentry/)), which covers this and other skills for computationally based research.
 
 # %% [markdown]
-# A Python module is a file that contains a set of related functions or other code. The filename must have a `.py` extension.
+# ### Python libraries
+#
+# A very common requirement in research (and all other!) programming is needing to reuse code in multiple different files. While it may seem that copying-and-pasting is an adequate solution to this problem, this should generally be avoided wherever possible and code which we wish to reuse _factored out_ in to _libraries_ which we we can _import_ in to other files to access the functionality of this code. 
 #
-# We can write our own Python modules that we can import and use in other scripts or even in this notebook:
+# Compared to copying and pasting code, writing and using libraries has the major advantage of meaning if we spot a bug in the code we only need to fix it once in the underlying library, and we straight away have the fixed code available everywhere the library is used rather than having to separately implement the fix in each file it is used. This similarly applies to for example adding new features to a piece of code. By creating libraries we also make it easier for other researchers to use our code.
 #
+# While it is simple to use libraries within a notebook (and we have already seen examples of this when we imported the Python libraries NumPy and Matplotlib in the figure-eight plot example above), it is non-trivial to use code from one notebook in another without copying-and-pasting. To create Python libraries we therefore generally write the code in to text files with a `.py` extension which in Python terminology are called _modules_ . The code can in these file can then be used in notebooks (or other modules) using the Python `import` statement. For example the cell below creates a file `draw_eight.py` in the same directory as this notebook containing Python code defining a _function_ (we will cover how to define and call functions later in the course) which creates a figure-eight plot and return the figure object.
 
 # %%
-# %%writefile draw_eight.py 
-# Above line tells the notebook to treat the rest of this
-# cell as content for a file on disk.
-import math
+# %%writefile draw_eight.py
+# The above line tells the notebook to write the rest of the cell content to a file draw_eight.py
 
 import numpy as np
 import matplotlib.pyplot as plt
 
 def make_figure():
-    """Plot a figure of eight."""
-
-    theta = np.arange(0, 4 * math.pi, 0.1)
-    eight = plt.figure()
-    axes = eight.add_axes([0, 0, 1, 1])
-    axes.plot(0.5 * np.sin(theta), np.cos(theta / 2))
-
-    return eight
-
+    theta = np.linspace(0, 2 * np.pi, 100)
+    fig, ax = plt.subplots(figsize=(3, 6))
+    ax.plot(np.sin(2 * theta) / 2, np.cos(theta))
+    return fig
 
 
 # %% [markdown]
-# In a real example, we could edit the file on disk
-# using a code editor such as [VS code](https://code.visualstudio.com/).
+# We can use this code in the notebook by _importing_ the `draw_eight` module and then _calling_ the `make_figure` function defined in the module.
 
 # %%
-import draw_eight # Load the library file we just wrote to disk
-
-# %%
-image = draw_eight.make_figure()
+import draw_eight  # Load the library
+fig = draw_eight.make_figure()
 
 # %% [markdown]
-# Note, we can import our `draw_eight` module in this notebook only if the file is in our current working directory (i.e. the folder this notebook is in).
-#
-# To allow us to import our module from anywhere on our computer, or to allow other people to reuse it on their own computer, we can create a [Python package](https://packaging.python.org/en/latest/).
+# We will cover how to import and use functionality from libraries, how to install third-party libraries and how to write your own libraries that can be shared and used by other in this course.
 #
-
-# %% [markdown]
-# ### Python packages
+# ### Python scripts
 #
-# A package is a collection of modules that can be installed on our computer and easily shared with others. We will learn how to create packages later on in this course.
+# While Jupyter notebooks are a great medium for learning how to use Python and for exploratory work, there are some drawbacks:
 #
-# There is a huge variety of available packages to do pretty much anything. For instance, try `import antigravity` or `import this`.
+#   * The require Jupyter Lab (or a similar application) to be installed to run the notebook.
+#   * It can be difficult to run notebooks non-interactively, for example when scheduling a job on a cluster [such as those offered by UCL Research Computing](https://www.rc.ucl.ac.uk/docs/Background/Cluster_Computing).
+#   * The flexibility of being able to run the code in cells in any order can also make it difficult to reason how outputs were produced and can lead to non-reproducible analyses.
+#   
+# In some settings it can therefore be preferrable to write Python _scripts_ - that is files (typically with a `.py` extension) which contain Python code which completely describes a computational task to perform and that can be run by  passing the name of the script file to the `python` program in a command-line environment. Optionally scripts may also allow passing in arguments from the command-line to control the execution of the script. As scripts are generally run from text-based terminals, non-text outputs such as images will generally be saved to files on disk.
 #
+# Python scripts are well suited to for example for describing computationally demanding simulations or analyses to run as long jobs on a remote server or cluster, or tasks where the input and output is mainly at the file level - for instance batch processing a series of data files. We will not cover how to write Python scripts in this course, however you can learn more about this topics in our [MPHY001: _Research software engineering with Python_ course](http://github-pages.ucl.ac.uk/rsd-engineeringcourse/).