Summary
The mmp_toolbox code, which was inspired by a MMP processing toolbox created by John Toole (Woods Hole Oceanographic Institution), consists mostly of novel code (need a footnote to list the parts that aren’t novel) written to rapidly import and process large amounts of MMP data within an architecture enabling parallel processing steps, allowing users to quickly visualize and analyze the results. The toolbox is written in Matlab because it is the language of choice for many oceanographers including those who have also written code to process McLane profile data (as above; also reference). Structure arrays are the natural choice for containing data from instruments mounted on profilers because the indexing of its elements corresponds to the chronological ordering of the profiles in the deployment being processed and each instrument’s structure array can contain the different raw and derived data variables associated with that instrument. This simplifies subroutine calls which require either different data variables from the same instrument or data inputs from different instruments; the appropriate structure array(s) containing the data are used instead. Their use is also advantageous because the processed data stays in the workspace memory in between processing steps rather than being written out as text files and then read back in in the next processing step (reference) or being saved in mat-files then reloaded, thereby significantly shortening execution time by avoiding repeated file IO. The toolbox structure arrays are characterized by the associated instruments and data processing level; for example, structure array ctd_L0 contains CTD (an instrument which measures conductivity, temperature, and pressure (depth)) data after data import.
Processing one profile requires importing data from 3 instrumentation files (ENG, CTD, ACM) and then applying an ordered sequence of processing steps culminating with the binning of its data products on pressure (depth). The toolbox uses a parallel workflow to process all profiles at each processing step so that ENG and CTD data products (temperature, salinity, dissolved oxygen, photosynthetically active radiation [coastal only], chlorophyll and colored dissolved organic matter fluorescence [the latter for coastal MMPs only], and optical backscatter) that are not dependent on ACM velocity data, which must be processed later in the sequence, are available independent of the state of the ACM data. The structure array data products containing binned data are processed to create scalar structures whose fields contain the entire deployment of binned data for each variable for convenient plotting.
Conceptually there are 2 workflow choices: (1) inner ‘loop’ for processing steps, outer ‘loop’ iterates over profile number or (2) inner ’loop’ over profile number, outer ‘loop’ over processing steps. The latter lends itself to vectorized (parallel) processing for certain operations and is the workflow choice adopted for the toolbox.
All the information needed to process a deployment of profiles (for example, latitude, longitude, magnetic variation, smoothing time constants; folder locations of unpacked McLane data and calibration files; which profiles, numbered chronologically, that are to be processed) is entered by the user as one plain text file using any text editor. The file can freely contain documentation in comment lines and in fact toolbox documentation is provided inside the example files on the repository for the files themselves and for each class of metadata. The metadata file is read by a toolbox routine in such a way that additional metadata parameters can be added to the processing by adding the requisite lines specifying such in an intuitive format in the metadata file. The metadata values are also provided in the fields of a scalar structure data product.
One of the useful features of the toolbox is that any subset of the profiles can be chosen to be processed by specifying the profile numbers as a Matlab row vector in one line in the metadata text file; the correspondence between structure array index and chronological profile number is maintained. Marine conditions vary during the 6-12 month MMP deployments, and, the instruments age so that removing hysteresis in neighboring ascending and descending profiles will require a different metadata set at different times (different profiles numbers) in the deployment. This feature is also useful when dealing with data relying on sunlight, for example, data from PAR (photosynthetically active radiation) sensors mounted on the coastal profilers; only daytime profiles can easily be selected for processing.
The final data products were chosen to be the instrument structure arrays at 3 levels of processing: L0, after data import with minor processing, L1, fully processed, and L2, which is L1 data binned on a common pressure record. The details of processing and processing history for any given instrument profile are to be found in the corresponding structure array element. For convenience in (pseudocolor) plotting and visualization of an entire deployment of data, structure array data are converted to a structure of arrays data product (need a reference here – who did MMPall first, that we know of? Toole? Someone from UW or SIO?) without explicitly using for loops. Sample plotting programs are included in the repository that could be used on a toy dataset also provided which can be downloaded from the repository.
[On the repo I should provide processed data from the toy dataset also, along with pcolor plots; this would satisfy one of the JOSS requirements]. The toolbox was written to be extensible by providing the user with a template (the toolbox subroutines themselves) to intercalate their own processing step(s) as appropriately placed subroutine(s) between those used in the Main programs. The user subroutine would use the appropriate structure array(s) as input and output, add additional inputs as needed, and modify the field data within the new subroutine while retaining the structure arrays’ fields’ formats. However, although possible, it would be more involved to extend support to additional instrumentation.
This section should go in as a footnote: {It also uses OOI data production algorithms (to which the authors contributed; https://github.com/oceanobservatories/ion-functions/tree/master/ion_functions/data) and the Gibbs-SeaWater (GSW) Oceanographic Toolbox (McDougall & Barker, 2011) to calculate standard oceanographic data products, as well as one function (Toolebox_shift.m) derived from the https://github.com/modscripps/MPproc repository. Two functions written by the author (RAD) using SeaSoft ® algorithms (Seabird Scientific) are also used.}
The following needs to be rewritten before inclusion into the repo.
The resulting scalar structure of arrays MMP contains the original, unprocessed CTD and engineering data (level 0), processed data that have had, for example, calibration coefficients applied, and been adjusted for thermal-lag, flow, and sensor time constant effects (level 1), and final, level 2 data sets where level 1 data have been binned on pressure. The scalar structure of arrays ACM contains only level 2 acoustic current meter data because of the large amount of data contained in a typical deployment; the ACM level 0 and level 1 scalar structure of arrays variables are included in the saved acmMatFilename Mat-File. Additional data products can be accessed by loading the saved MAT-Files created as named in the respective second output arguments. These include the scalar arrays of structures discussed above and also structure arrays, indexed by profile number, for each instrument and processing level. The structure arrays, besides containing profile data and a record of the functions used to generate them, enable efficient creation of the array of structures data products which are more convenient to use when creating plots to survey an entire deployment of data variables.