Skip to content
alex-kling edited this page Apr 14, 2020 · 132 revisions

Welcome to the Mars Climate Modeling Center (MCMC) Analysis Pipeline. By the end of this tutorial, you will know how to download Mars Climate data from the MCMC's data portal, reduce these large climate simulations to meaningful data, and make plots for winds at the beginning of the Martian Northern Spring.

INSTALLATION:

The analysis pipeline is entirely written in pure Python, which is an intuitive and open source programming language. You may identify yourself in one the following categories:

  • A. You are familiar with the Python infrastructure and would like to install the Analysis pipeline on top of your current Python installation: Skip to 'Installing the pipeline', and note that you may have to manually add aliases to the Mars***.py executables to your search path.
  • B. You have experience with Python but not with managing packages, or are new to Python: To ensure that there is no conflict with other Python versions that may be on your system, we will install a fresh Python distribution locally (this does not require admin permission). Additionally, we will install the analysis pipeline in a self-contained virtual environment which is basically a standalone copy of your entire distribution, minus the 'core' code that is shared with the main python distribution. This will allow you to use your fresh Python installation for other projects (including installing or upgrading packages) without the risk of altering the analysis pipeline. It will also be safe to alter (or even completely delete) that virtual environment without breaking the main distribution.

Requirements

Python 3: It you are already a Python user, you can install the Ames analysis pipeline on top of you current installation. For new users, we recommend to use the latest version of the Anaconda Python distribution available here, as it already ships with pre-compiled math and plotting packages (e.g. numpy, matplotlib), and pre-compiled libraries (e.g. hdf5 headers to read netcdf files).

  • In MacOS and Linux, you can install a fresh Python3 locally from a terminal with:

chmod +x Anaconda3-2020.02-MacOSX-x86_64.sh (this make the .sh file executable)

./Anaconda3-2020.02-MacOSX-x86_64.sh (this runs the executable)

Read (ENTER) and accept (yes) the terms. Take note of the location for the installation directory. You can use the default location or change it if you would like, for example /Users/username/anaconda3 works well.

  • In Windows, we recommend installing the pipeline under a Linux-type environment using Cygwin, so we can use command line tools. Simply download the Windows version on the Anaconda website and follow the instructions from the installation GUI.

When asked about the installation location, make sure you install Python under your emulated-Linux home /home/username and not on the default location /cygdrive/c/Users/username/anaconda3. From the installation GUI, the path you want to select is something like:

C:/Program Files/cygwin64/home/username/anaconda3 We also make sure to check YES for Add Anaconda to my PATH environment variable


The analysis pipeline also requires the following Python packages:

  • numpy (array operations)
  • matplotlib (plotting library)
  • netCDF4 Python (handling of netcdf files)
  • requests (for downloading data from the MCMC Portal)

Optionally, you can install:

  • ghostscript which will allow the analysis pipeline to generate multiple figures as a single pdf file. Type gs -version to see if Ghostscript is already available on your system. If it is not, you can installing from this page: https://www.ghostscript.com/download.html or decide to use png images instead.

Before we proceed with the installation of new modules, it is important to verify that you using the versions of Python and pip that you are intending to (pip is the Python package manager), and not old versions than may be sitting on your system (e.g. an old python2 or pip executables located in /usr/local/bin/pip ...)

To make sure the paths are fully actualized, we recommend to close the current terminal. Then, open a fresh terminal, type python and hit the TAB key. If multiple options are be available (e.g python, python2, python 3.7, python.exe), pick the one you think may be from the Ananconda version you just installed and confirm this with the which or whereis commands, for example:

python3 --version (python3.exe --version in Windows) which python3 (which.exe in Windows)

We are looking for a python executable that looks like it was installed with Anaconda, like /username/anaconda3/bin/python3. Then type pip and hit the TAB key to see what the options are and find the version of pip with a location that matches those of the python executable with whereis pip or which pip3 ... If which python3 or whereis python3 points to your anaconda distribution (e.g /Users/username/anaconda3/bin/python3), you are set, if not, proceed with using the full paths, e.g /Users/username/anaconda3/bin/python3 instead of python3 or /Users/username/anaconda3/bin/pip instead of pip to be safe.

**TODO Cygwin/Windows: **

If you have issue locating python or python.exe, you can check your current PATH with echo $PATH and verify that 'Ananconda' or 'anaconda' shows up somewhere in the search path, (e.g /Users/username/anaconda3/bin)

find / -iname 'conda'

Creation of a virtual environment:

We will create a virtual environment for the Ames analysis pipeline which shares the same Python core but branches out with its own packages. We will call it amesGCM3 to remind ourselves that it shares the same structure that the core python3 it is derived from (e.g. amesGCM3/bin amesGCM3/lib will have the same structure as ~/anaconda3/bin , ~/anaconda3/lib) From a terminal run:

python3 -m venv --system-site-packages amesGCM3

Now activate the virtual environment with:

source amesGCM3/bin/activate (if you are using bash )

source amesGCM3/bin/activate.csh (if you are using csh/tcsh )

For Windows users using Cygwin, the directory structure may be ../anaconda3/Scripts/activate

You may notice that your prompt change from username> to **(amesGCM3)**username> which indicates that you are INSIDE the virtual environment, even when navigation to different directory on your machine.

After entering the virtual environment, we can verify that which python and which pip unambiguously point to amesGCM3/bin/python3 and amesGCM3/bin/pip so there is no need use the full paths.

Installing the pipeline:

From inside the virtual environment, run:

The Ames Analysis Pipeline will automatically install required packages in a self-contained virtual environment but if you prefer to permanently install packages like netCDF4 for your own coding projects, it is easy to do: If which pip points to /Users/username/anaconda3/bin/pip , use: If which pip points to somewhere else (e.g. an old python2 pip executable in /usr/local/bin/pip), use the full path: /Users/username/anaconda3/bin/pip install netCDF4 pip install netCDF4.

pip install git+https://github.com/alex-kling/amesgcm.git

Note that is is also possible to install the packages from a packaged .zip archive: Download an untar the archive anywhere (e.g. in our Downloads directory), run cd amesgcm-master and then pip install .

To make sure the paths to the executables are correctly set in your terminal, exit the virtual environment with

deactivate

This complete the one-time installation of the Ames analysis pipeline:

amesGCM3/
├── bin
│   ├── MarsFiles.py
│   ├── MarsInterp.py
│   ├── MarsPlot.py
│   ├── MarsPull.py
│   ├── MarsVars.py
│   ├── activate
│   ├── activate.csh
│   ├── pip
│   └── python3
├── lib
│   └── python3.7
│       └── site-packages
│           ├── netCDF4
│           └── amesgcm
│               ├── FV3_utils.py
│               ├── Ncdf_wrapper.py
│               └── Script_utils.py
├── mars_data
│   └── Legacy.fixed.nc
└── mars_templates
    └── legacy.in

Use, upgrade, and remove the pipeline

Every time you want to use the analysis pipeline from a new terminal session, simply run:

source amesGCM3/bin/activate (source amesGCM3/bin/activate.csh in csh/tcsh)

You can check that the tools are installed properly by typing Mars and hit the TAB key.

Check the documentation for any of the executable with:

Mars***.py --help (e.g. MarsPlot.py -h for short)

If no executable show up, consider using full paths, e.g. ~/amesGCM3/bin/MarsPlot.py -h and add your own aliases:

alias MarsPlot.py /Users/akling/amesGCM3_bash/bin/MarsPlot.py

After you are done with your work, you can exit the analysis pipeline with:

deactivate

To upgrade the pipeline, activate the virtual environment as shown above and run :

pip install git+https://github.com/alex-kling/amesgcm.git --upgrade

To permanently remove the amesgcm pipeline, activate the virtual environment and run :

pip uninstall amesgcm

It is also safe to delete the entire amesGCM3 virtual environment directory as this will not affect your main Python distribution.

TUTORIAL:

Overview of the process

The following steps will be used to access the data, reduce it, compute additional diagnostics, interpolate the diagnostics to standard pressures levels, and visualize the results.

Download raw Legacy GCM outputs

The data from the Legacy GCM is archived every 1.5 hours (i.e 16 times a day) and packaged in chunks of 10 sols (1 sol = 1 martian days). Files are available for download on the MCMC Data portal at : https://data.nas.nasa.gov/legacygcm/data_legacygcm.php, and referenced by their solar longitude or "Ls", which is 0° at the vernal equinox (beginning of Northern spring), 90° during the summer solstice, 180° at the autumnal equinox, and 270° during winter solstice. To download a 30-sols chunk starting at the beginning at the the martian year (Ls =0 to Ls=15), navigate to a place you would like to store the data and run :

MarsPull.py --help
MarsPull.py --ls 0 15

This will download three LegacyGCM_Ls000***.nc raw outputs, each ~280MB each.

We can use the --inspect command of MarsPlot.py to peak into the content for one of the raw outputs:

MarsPlot.py -i LegacyGCM_Ls000_Ls004.nc

Note the characteristic structure for the Legacy GCM raw outputs with 10 days chunks ('time') , and 16 time of day ('ntod').

File format conversion

For analysis purposes, it is useful to reduce the data from the raw outputs into different formats:

  • fixed: static fields (e.g. surface albedo, topography)
  • average: 5 days averages
  • daily : continuous time series
  • diurn : 5 days average for each time of the day

New files for each of the formats listed above can be created using the MarsFiles utility which handles conversions from the Legacy format to this new (FV3) format. To create fixed and average files for each of the 10 days output from the Legacy GCM, run:

MarsFiles.py -h
MarsFiles.py LegacyGCM_Ls* -fv3 fixed average

And check the new content for one of the files with:

MarsPlot.py -i 00000.atmos_average.nc
MarsPlot.py -i 00000.fixed.nc

Moving forward with the postprocessing pipeline, it is the user's choice to proceed with individual sets of files (00000, 00010, and 00020 files in our example), or merge those files together into one. All the utilities from the analysis pipeline (including the plotting routine) accept a list of files as input, and keeping separate files can be strategic when computer memory is limited (the daily files remain 280MB each and there are 67 of those in one Mars year).

Since we are working with 5 days average, which are relatively small, we can use the --combine option of MarsFiles to merge them together:

MarsFiles.py *atmos_average.nc -c
MarsFiles.py *fixed.nc -c

Variable operations

When provided with no arguments, the variable utility MarsVars.py has the same functionality as MarsPlot.py -i and displays the content for the file:

MarsVars.py 00000.atmos_average.nc

To see what MarVars can do, check the --help option (MarsVars.py -h)

For example, to compute the atmospheric density (rho) from the vertical grid data (pk, bk), surface pressure (ps) and air temperature (temp), run:

MarsVars.py 00000.atmos_average.nc -add rho

Check that a new variable was added to the file by running again MarsVars with no argument:

MarsVars.py 00000.atmos_average.nc

Similarly, we will perform a column integration for the water vapor (vap_mass) with -colint. At the same time, we will remove the dust (dst_num) and water ice (ice_num) particles numbers variables, which we are not planning to use in this analysis (this will free some memory).

MarsVars.py 00000.atmos_average.nc -colint vap_mass -rm ice_num dst_num

Similarly, we observed that a new variable "colint_vap_mass" was added to the file. while "ice_num" and "dst_num" have disappeared.

Pressure interpolation

The Ames GCM uses a pressure coordinate in the vertical, which means that a single atmospheric layer will be located at different geometric heights (and pressure levels) between the atmospheric columns. Before we do any zonal averaging, it is therefore necessary to interpolate the data in all the columns to a same standard pressure. This operation is done with the MarsInterp utility using --type pstd options:

MarsInterp.py -h 
MarsInterp.py  00000.atmos_average.nc -t pstd

We observe with MarsPlot.py -i 00000.atmos_average_pstd.nc that the pressure level axis "pfull" (formerly 24 layers) has disappeared and was replaced by a standard pressure "pstd". Also, the shape for the 3-dimensional variables are different and reflect the new shape of "pstd"

Plotting the results with MarsPlot:

While you may use the software of your choice to visualize the results (e.g. Matlab, IDL), a utility is provided to create 2D figures and 1D line plots that are easily configured from an input template. To generate a template in the current directory use:

MarsPlot.py -h
MarsPlot.py --template

and open the file Custom.in with a text editor (you can rename the file to something.in if you want).


Quick Tip: MarsPlot uses text files with a '.in' extension as input files. Select "Python" as the language (in place of "plain text") then editing the file from text editor (gedit, atom ...) to enable syntax-highlighting of key words. If you are using the vim editor, add the following lines to your ~/.vimrc:_ to recognize "Custom.in' as using Python' syntax.

syntax on
colorscheme default
au BufReadPost *.in  set syntax=python

Close the file and run: source ~/.vimrc


In order to access data in a specific file MarsPlot uses the XXXXX.fileN.var syntax, XXXXX being the sol number (e.g "03335", optional), file being the file type (e.g "atmos_average_pstd"), N being the simulation number (e.g "2" if comparing two different simulations, optional), and varthe requested variable (e.g "ucomp" for the zonal winds).

When dimensions are omitted with None, MarsPlot makes educated guesses for data selection (e.g, if no layer is requested, use the surface layer etc...) and will tell you exactly how the data is being processed both in the default title for the figures, and in the terminal output. Instructions for additional features are listed at the beginning of Custom.in : Note the use of the brackets "[ ]" for variable operations, "{ }" to overwrite the default dimensions, and the possibility of adding another simulation to the <<<<< Simulations >>>>> block for comparison purposes.

After making edits, feed the template back to MarsPlot with:

MarsPlot.py Custom.in (MarsPlot.py Custom.in -o png if you are not using ghostscript)

[----------]  0 % (2D_lon_lat :fixed.zsurf)
[#####-----] 50 % (2D_lat_press :atmos_average.ucomp, Ls= (MY 1) 13.61, lon=18.0)
[##########]100 % (Done)

By default MarsPlot will handle errors by itself (e.g missing data) and reports them after completion both in the terminal, and in the pdf. To by-pass this behavior (when debugging), use the --debug.

A file Diagnostic.pdf will be generated in the current directory with the requested plots which can be open with a pdf viewer (e.g: open Diagnostic.pdf on MacOS, or evince Diagnostic.pdf on Linux). If you have used the --output png formatting option, the images will be located in plots/ in the current directory.

You can try to add a new figure by editing any of <<<| Plot ... = True |>>> block or by making a copy/paste of the entire block. HOLD ON[...]HOLD OFF is used to put multiple figures on a same page. For example, to compute the zonally-averaged (Lon +/-180 = all) and time-average of the first 10 degree of solar longitude (Ls 0-360 = 0.,10) for dust field (dst_mass) from the interpolated file (atmos_average_pstd), we use:

<<<<<<<<<<<<<<| Plot 2D lat X press = True |>>>>>>>>>>>>>
Title          = None
Main Variable  = atmos_average_pstd.dst_mass
Cmin, Cmax     = None
Ls 0-360       = 0.,10
Lon +/-180     = all
2nd Variable   = None
Axis Options  : Lat = [None,None] | level[Pa] = [1e3,0.1] | cmap = Wistia

Note that we also decided to change the color map and adjust the axis with the Axis Options By default MarsPlot.py Custom.in runs the requested analysis on the last set of output files present in the directory (identified by XXXXX.fixed.nc) but to run the analysis over a single specific data file or a range of files, use the --date options:

MarsPlot.py Custom.in -d 0

Moving forward with your own analysis

You can customize your own plots using the programming language of your choice. Here is a script to get you started in Python. Unless you have installed python-netCDF4 and the analysis pipeline on top of your main distribution, the script has to be be run from inside the virtual environment in order to access the netCDF4 and amesgcm packages). Copy-paste the following inside a script named demo.py and run:

python demo.py


#======================= Import python packages ================================
import numpy as np                          # for array operations
import matplotlib.pyplot as plt             # python plotting library
from netCDF4 import Dataset                 # to read .nc files
#===============================================================================

# Open a fixed.nc file, read some variables and close it.
f_fixed=Dataset('/Users/akling/test/00000.fixed.nc','r') 
lon=f_fixed.variables['lon'][:]
lat=f_fixed.variables['lat'][:]
zsurf=f_fixed.variables['zsurf'][:]  
f_fixed.close()

# Open a dataset and read the 'variables' attribute from the NETCDF FILE
f_average_pstd=Dataset('/Users/akling/test/00000.atmos_average_pstd.nc','r')
vars_list     =f_average_pstd.variables.keys() 
print('The variables in the atmos files are: ',vars_list) 

# Read the 'shape' and 'units' attribute from the temperature VARIABLE
Nt,Nz,Ny,Nx = f_average_pstd.variables['temp'].shape 
units_txt   = f_average_pstd.variables['temp'].units
print('The data dimensions are Nt,Nz,Ny,Nx=',Nt,Nz,Ny,Nx)
# Read the pressure, time, and the temperature for an equatorial cross section
pstd       = f_average_pstd.variables['pstd'][:]   
areo       = f_average_pstd.variables['areo'][0] #solar longitude for the 1st timestep
temp       = f_average_pstd.variables['temp'][0,:,18,:] #time, press, lat, lon
f_average_pstd.close()

 #get the latitude of the cross section. 
lat_cross=lat[18]

# Example of accessing  functions from the Ames Pipeline if we wanted to plot 
# the data  in a different coordinate system  (0>360 instead of +/-180 )
#----
from amesgcm.FV3_utils import lon180_to_360,shiftgrid_180_to_360 
lon360=lon180_to_360(lon)
temp360=shiftgrid_180_to_360(lon,temp) 

# Define contour for plotting
conts= np.linspace(150,250,32)

#Create a figure with the data 
plt.close('all')
ax=plt.subplot(111)
plt.contourf(lon,pstd,temp,conts,cmap='jet',extend='both')
plt.colorbar()
# Axis labeling 
ax.invert_yaxis()
ax.set_yscale("log") 
plt.xlabel('Longitudes')
plt.ylabel('Pressure [Pa]')
plt.title('Temperature [%s] at Ls %03i, lat= %.2f '%(units_txt,areo,lat_cross))
plt.show()

This will produce the following:

Clone this wiki locally