Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
ebaca0f
Add placeholder LRAUV_WORKFLOW.md file and rename original to DORADO_…
MBARIMike Oct 9, 2025
b2ced0d
Update link to DORADO_WORKFLOW.md.
MBARIMike Oct 9, 2025
9425692
Add runner for nc42netcdfs.py using a known_hash.
MBARIMike Oct 11, 2025
dc864cb
Update for supporting LRAUV data processing.
MBARIMike Oct 11, 2025
262d543
WIP: Initial commit
MBARIMike Oct 11, 2025
44e2558
Add extract_groups_to_files_netcdf4() as xarray fails on garbled data.
MBARIMike Oct 14, 2025
6f07788
Save variables from / into Universals.nc.
MBARIMike Oct 14, 2025
649a842
Add test for bad data to "1.3 - nc42netcdfs".
MBARIMike Oct 14, 2025
2340c5c
WIP: Working out lrauv data processing workflow.
MBARIMike Oct 14, 2025
bde0115
WIP: Copy of calibrate.py - rework to combine lrauv data.
MBARIMike Oct 14, 2025
d749bf9
Refactor extract_groups_to_files_netcdf4() into more readable methods.
MBARIMike Oct 15, 2025
405d8bc
WIP: Begin changing for use with LRAUV data.
MBARIMike Oct 15, 2025
b285895
Factor nudge_positions() out of calibrate.py so that combine.py can u…
MBARIMike Oct 15, 2025
7efef8d
WIP: Initial attempt at a process_lrauv.py module.
MBARIMike Oct 16, 2025
1de36bb
Simplify calling methods with just log_file, save using <stem>_Group …
MBARIMike Oct 21, 2025
2c7575a
Add test for process_lrauv.py
MBARIMike Oct 21, 2025
6abcb69
Call process_log_files(), signifying that these are LRAUV data.
MBARIMike Oct 21, 2025
9aecc2a
Implement first and "last" steps in process.py for LRAUV data.
MBARIMike Oct 21, 2025
d805f97
Update EXPECTED_SIZE_GITHUB values.
MBARIMike Oct 21, 2025
886728e
Update EXPECTED_MD5_GITHUB value.
MBARIMike Oct 21, 2025
d97165a
Utility script for seafloor mapping in Monterey Bay in lieu of using …
MBARIMike Oct 22, 2025
7f85fd1
Fixes for ruff and add installation instructions.
MBARIMike Oct 22, 2025
80450c7
Add gsw as we need to migrate from seawater.
MBARIMike Oct 22, 2025
dbbff83
Remove obvious methods that dealt with Dorado log and sensor files.
MBARIMike Oct 22, 2025
77da251
Add netcdf4 dependency to the documentation.
MBARIMike Oct 22, 2025
dd0cc6c
Implement --filter_monotonic_time as a base level QC step.
MBARIMike Oct 23, 2025
e135177
Resolution of Issue writen by Claude Sonnet 4
MBARIMike Oct 28, 2025
c2ef308
Add "2.2 - combine.py"
MBARIMike Oct 28, 2025
b5dd22f
WIP on combine.py. Add GROUP literal for globbing of Group .nc files.
MBARIMike Oct 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 34 additions & 1 deletion .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,24 @@
"args": ["-v", "1", "-d", "0", "-i", "data/auv_data/dorado/missionlogs/2009.055.05/lopc.bin", "-n", "data/auv_data/dorado/missionnetcdfs/2009.055.05/lopc.nc", "-f", "--LargeCopepod_AIcrit", "0.3"]
},
{
"name": "1.1 - correct_log_times.py --mission 2017.284.00 --auv_name Dorado389",
"name": "1.2 - correct_log_times.py --mission 2017.284.00 --auv_name Dorado389",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/src/data/correct_log_times.py",
"console": "integratedTerminal",
"args": ["--auv_name", "Dorado389", "--mission", "2017.284.00", "-v", "2"]
},
{
"name": "1.3 - nc42netcdfs",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/src/data/nc42netcdfs.py",
"console": "integratedTerminal",
// A small log_file that has a reasonable amount of data, and known_hash to verify download
//"args": ["-v", "1", "--log_file", "ahi/missionlogs/2025/20250908_20250912/20250911T201546/202509112015_202509112115.nc4", "--known_hash", "d1235ead55023bea05e9841465d54a45dfab007a283320322e28b84438fb8a85"]
// Has bad latitude and longitude values
"args": ["-v", "1", "--log_file", "brizo/missionlogs/2025/20250909_20250915/20250914T080941/202509140809_202509150109.nc4"]
},
{
"name": "2.0 - calibrate.py",
"type": "debugpy",
Expand Down Expand Up @@ -92,6 +103,16 @@
"program": "${workspaceFolder}/src/data/hs2_proc.py",
"console": "integratedTerminal",
},

{
"name": "2.2 - combine.py",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/src/data/combine.py",
"console": "integratedTerminal",
"justMyCode": false,
"args": ["-v", "1", "--log_file", "brizo/missionlogs/2025/20250909_20250915/20250914T080941/202509140809_202509150109.nc4"]
},
{
"name": "3.0 - align.py",
"type": "debugpy",
Expand Down Expand Up @@ -282,5 +303,17 @@
"console": "integratedTerminal",
"args": ["-v", "1", "--noinput", "--no_cleanup", "--download", "--mission", "2011.256.02"]
},
{
"name": "process_lrauv",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/src/data/process_lrauv.py",
"console": "integratedTerminal",
//"args": ["-v", "1", "--log_file", "brizo/missionlogs/2025/20250909_20250915/20250914T080941/202509140809_202509150109.nc4"]
//"args": ["-v", "2", "--log_file", "brizo/missionlogs/2025/20250909_20250915/20250914T080941/202509140809_202509150109.nc4", "--clobber"]
"args": ["-v", "2", "--log_file", "brizo/missionlogs/2025/20250909_20250915/20250914T080941/202509140809_202509150109.nc4", "--clobber", "--no_cleanup"]
//"args": ["-v", "1", "--auv_name", "tethys", "--start", "20120901", "--end", "20121101", "--noinput"]
},

]
}
6 changes: 3 additions & 3 deletions WORKFLOW.md → DORADO_WORKFLOW.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Data Workflow
## Dorado Data Workflow

The sequence of steps to process data is as follows:
The sequence of steps to process Dorado data is as follows:

logs2netcdfs.py → calibrate.py → align.py → resample.py → archive.py → plot.py

Expand Down Expand Up @@ -70,6 +70,6 @@ on the local file system's work directory is as follows:

archive.py
Copy the netCDF files to the archive directory. The archive directory
is initally in the AUVCTD share on atlas which is shared with the
is initially in the AUVCTD share on atlas which is shared with the
data from the Dorado Gulper vehicle, but can also be on the M3 share
on thalassa near the original log data.
68 changes: 68 additions & 0 deletions LRAUV_WORKFLOW.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
## LRAUV Data Workflow

The sequence of steps to process LRAUV data is as follows:

nc42netcdfs.py → combine.py → align.py → resample.py → archive.py → plot.py

Details of each step are described in the respective scripts and in the
description of output netCDF files below. The output file directory structure
on the local file system's work directory is as follows:

├── data
│ ├── lrauv_data
│ │ ├── <auv_name> <- e.g.: ahi, brizo, pontus, tethys, ...
│ │ │ ├── missionlogs/year/dlist_dir
│ │ │ │ ├── <log_dir> <- e.g.: ahi/missionlogs/2025/20250908_20250912/20250911T201546/202509112015_202509112115.nc4
│ │ │ │ │ ├── <nc4> <- .nc4 file containing original data
│ │ │ │ │ ├── <nc> <- .nc files, one for each group from the .nc4 file
| | | | | | data identical to original in NETCDF4 format
│ │ │ │ │ ├── <_cal> <- A single NETCDF3 .nc file containing all the
| | | | | | varibles from the .nc files along with nudged
| | | | | | latitudes and longitudes - created by combine.py
│ │ │ │ │ ├── <_align> <- .nc file with all measurement variables
| | | | | | having associated coordinate variables
| | | | | | at original instrument sampling rate -
| | | | | | created by align.py
│ │ │ │ │ ├── <_nS> <- .nc file with all measurement variables
resampled to a common time grid at n
Second intervals - created by resample.py

nc42netcdfs.py
Extract the groups and the variables we want from the groups into
individual .nc files. These data are saved using NETCDF4 format as
there are many unlimited dimensions that are not allowed in NETCDF3.
The data in the .nc files are identical to what is in the .nc4 groups.

combine.py
Apply calibration coefficients to the original data. The calibrated data
are written to a new netCDF file in the missionnetcdfs/<mission>
directory ending with _cal.nc. This step also includes nudging the
underwater portions of the navigation positions to the GPS fixes
done at the surface and applying pitch corrections to the sensor
depth for those sensors (instruments) for which offset values are
specified in SensorInfo. Some minimal QC is done in this step, namely
removal on non-monotonic times. The record variables in the netCDF
file have only their original coordinates, namely time associated with
them.

align.py
Interpolate corrected lat/lon variables to the original sampling
intervals for each instrument's record variables. This format is
analogous to the .nc4 files produced by the LRAUV unserialize
process. These are the best files to use for the highest temporal
resolution of the data. Unlike the .nc4 files align.py's output files
use a naming convention rather than netCDF4 groups for each instrument.

resample.py
Produce a netCDF file with all of the instrument's record variables
resampled to the same temporal interval. The coordinate variables are
also resampled to the same temporal interval and named with standard
depth, latitude, and longitude names. These are the best files to
use for loading data into STOQS and for analyses requiring all the
data to be on the same spatial temporal grid.

archive.py
Copy the netCDF files to the archive directory. The archive directory
is initially in the AUVCTD share on atlas which is shared with the
data from the Dorado Gulper vehicle, but can also be on the M3 share
on thalassa near the original log data.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ print out the usage information for each of the processing scripts:
uv run src/data/process_i2map.py --help
uv run src/data/process_dorado.py --help

See [WORKFLOW.md](WORKFLOW.md) for more details on the data processing workflow.
See [DORADO_WORKFLOW.md](DORADO_WORKFLOW.md) for more details on the data processing workflow.

### Jupyter Notebooks ###
To run the Jupyter Notebooks, start Jupyter Lab at the command line with:
Expand Down
2 changes: 1 addition & 1 deletion TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ and make sure that it's the only entry in "process_dorado" that is uncommented.

2. From VS Code's Run and Debug panel select "process_dorado" and click the green Start Debugging play button. For data to be copied from the archive the smb://atlas.shore.mbari.org/AUVCTD share must be mounted on your computer. Primary development is done in MacOS where the local mount point is /Volumes. Archive volumes are hard-coded as literals in [src/data/process_dorado.py](https://github.com/mbari-org/auv-python/blob/fc3b58613761b295ab47907993c4d0eb0bceb197/src/data/process_dorado.py) and [src/data/process_i2map.py](https://github.com/mbari-org/auv-python/blob/fc3b58613761b295ab47907993c4d0eb0bceb197/src/data/process_i2map.py). These should be changed if you mount these volumes at a different location.

3. Mission log data will copied to your `auv-python/data/auv_data/` directory into subdirectories organized by vehicle name, mission, and processing step. Data will be processed as described in [WORKFLOW.md](WORKFLOW.md). A typical mission takes about 10 minutes to process.
3. Mission log data will copied to your `auv-python/data/auv_data/` directory into subdirectories organized by vehicle name, mission, and processing step. Data will be processed as described in [DORADO_WORKFLOW.md](DORADO_WORKFLOW.md). A typical mission takes about 10 minutes to process.

4. After all of the intermediate files are created any step of the workflow may be executed and debugged in VS Code. The `.vscode\launch.json` file has several example entries that can be modified for specific debugging purposes via the menu in the Run and Debug panel.

Expand Down
3 changes: 2 additions & 1 deletion notebooks/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
The Notebooks in this directory are intended to be used to examine the data
generated by each of the steps described in the [workflow]("../WORKFLOW.md"):
generated by each of the steps described in the [Dorado]]("../DORADO_WORKFLOW.md")
or [LRAUV]("../LRAUV_WORKFLOW.md") WORKFLOW documents:

logs2netcdfs.py → calibrate.py → align.py → resample.py → archive.py → <ML operations & analysis>
1.x 2.x 3.x 4.x 5.x 6.x
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ dependencies = [
"datashader>=0.18.1",
"defusedxml>=0.7.1",
"gitpython>=3.1.44",
"gsw>=3.6.20",
"hvplot>=0.11.3",
"ipympl>=0.9.7",
"jupyter>=1.1.1",
Expand Down
Loading