Skip to content

Commit

Permalink
Expand Export logging abilities (#1164)
Browse files Browse the repository at this point in the history
* WIP: export fixes, mro issues

* WIP: Add one-time log of fetch/restr. Join issues

* ✅ : Export logging on join, update doc

* pre-commit mdformat

* Add flags to permit disable export log in fetch call><

* Use cache to avoid dupe entries. Mod Notebook

* Incorporate feedback from @samuelbray32

* Balance parens

* Remove redundant parens

* Init table in join

* Update src/spyglass/utils/mixins/export.py

Co-authored-by: Samuel Bray <sam.bray@ucsf.edu>

* Include externals

* #1173

* Add hex-blob arg to mysqldump

* Revert pytest defaults

---------

Co-authored-by: Samuel Bray <sam.bray@ucsf.edu>
  • Loading branch information
CBroz1 and samuelbray32 authored Nov 6, 2024
1 parent 6b4ff10 commit 92d9c35
Show file tree
Hide file tree
Showing 20 changed files with 979 additions and 411 deletions.
8 changes: 6 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,10 @@ dj.FreeTable(dj.conn(), "common_session.session_group").drop()
- Remove stored hashes from pytests #1152
- Remove mambaforge from tests #1153
- Remove debug statement #1164
- Allow python < 3.13 #1169
- Add testing for python versions 3.9, 3.10, 3.11, 3.12 #1169
- Allow python \< 3.13 #1169
- Remove numpy version restriction #1169
- Add testing for python versions 3.9, 3.10, 3.11, 3.12 #1169
- Merge table delete removes orphaned master entries #1164

### Pipelines

Expand All @@ -45,6 +46,9 @@ dj.FreeTable(dj.conn(), "common_session.session_group").drop()
- Drop `SessionGroup` table #1106
- Improve electrodes import efficiency #1125
- Fix logger method call in `common_task` #1132
- Export fixes #1164
- Allow `get_abs_path` to add selection entry.
- Log restrictions and joins.

- Decoding

Expand Down
66 changes: 51 additions & 15 deletions docs/src/Features/Export.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,40 +11,76 @@ from only one project be shared during publication.

To export data with the current implementation, you must do the following:

- All custom tables must inherit from `SpyglassMixin` (e.g.,
`class MyTable(SpyglassMixin, dj.ManualOrOther):`)
- Only one export can be active at a time.
- All custom tables must inherit from either `SpyglassMixin` or `ExportMixin`
(e.g., `class MyTable(SpyglassMixin, dj.ManualOrOther):`)
- Only one export can be active at a time for a given Python instance.
- Start the export process with `ExportSelection.start_export()`, run all
functions associated with a given analysis, and end the export process with
`ExportSelection.end_export()`.

## How

The current implementation relies on two classes in the Spyglass package
(`SpyglassMixin` and `RestrGraph`) and the `Export` tables.
(`ExportMixin` and `RestrGraph`) and the `Export` tables.

- `SpyglassMixin`: See `spyglass/utils/dj_mixin.py`
- `ExportMixin`: See `spyglass/utils/mixins/export.py`
- `RestrGraph`: See `spyglass/utils/dj_graph.py`
- `Export`: See `spyglass/common/common_usage.py`

### Mixin

The `SpyglassMixin` class adds functionality to DataJoint tables. A subset of
The `ExportMixin` class adds functionality to DataJoint tables. A subset of
methods are used to set an environment variable, `SPYGLASS_EXPORT_ID`, and,
while active, intercept all `fetch`/`fetch_nwb` calls to tables. When `fetch` is
called, the mixin grabs the table name and the restriction applied to the table
and stores them in the `ExportSelection` part tables.
while active, intercept all `fetch`, `fetch_nwb`, `restrict` and `join` calls to
tables. When these functions are called, the mixin grabs the table name and the
restriction applied to the table and stores them in the `ExportSelection` part
tables.

<!-- TODO: Mention intercepting of restrict and join. -->

- `fetch_nwb` is specific to Spyglass and logs all analysis nwb files that are
fetched.
- `fetch` is a DataJoint method that retrieves data from a table.
- `restrict` is a DataJoint method that restricts a table to a subset of data,
typically using the `&` operator.
- `join` is a DataJoint method that joins two tables together, typically using
the `*` operator.

This is designed to capture any way that Spyglass is accessed, including
restricting one table via a join with another table. If this process seems to be
missing a way that Spyglass is accessed in your pipeline, please let us know.

Note that logging all restrictions may log more than is necessary. For example,
`MyTable & restr1 & restr2` will log `MyTable & restr1` and `MyTable & restr2`,
despite returning the combined restriction. Logging will treat compound
restrictions as 'OR' instead of 'AND' statements. This can be avoided by
combining restrictions before using the `&` operator.

```python
MyTable & "a = b" & "c > 5" # Will capture 'a = b' OR 'c > 5'
MyTable & "a = b AND c > 5" # Will capture 'a = b AND c > 5'
MyTable & dj.AndList(["a = b", "c > 5"]) # Will capture 'a = b AND c > 5'
```

If this process captures too much, you can either run a process with logging
disabled, or delete these entries from `ExportSelection` after the export is
logged.

Disabling logging with the `log_export` flag:

```python
MyTable().fetch(log_export=False)
MyTable().fetch_nwb(log_export=False)
MyTable().restrict(restr, log_export=False) # Instead of MyTable & restr
MyTable().join(Other, log_export=False) # Instead of MyTable * Other
```

### Graph

The `RestrGraph` class uses DataJoint's networkx graph to store each of the
tables and restrictions intercepted by the `SpyglassMixin`'s `fetch` as
'leaves'. The class then cascades these restrictions up from each leaf to all
ancestors. Use is modeled in the methods of `ExportSelection`.
tables and restrictions intercepted by the `ExportMixin`'s `fetch` as 'leaves'.
The class then cascades these restrictions up from each leaf to all ancestors.
Use is modeled in the methods of `ExportSelection`.

```python
from spyglass.utils.dj_graph import RestrGraph
Expand Down Expand Up @@ -117,7 +153,7 @@ paper. Each shell script one `mysqldump` command per table.

To implement an export for a non-Spyglass database, you will need to ...

- Create a modified version of `SpyglassMixin`, including ...
- Create a modified version of `ExportMixin`, including ...
- `_export_table` method to lazy load an export table like `ExportSelection`
- `export_id` attribute, plus setter and deleter methods, to manage the status
of the export.
Expand All @@ -126,6 +162,6 @@ To implement an export for a non-Spyglass database, you will need to ...
`spyglass_version` to match the new database.

Or, optionally, you can use the `RestrGraph` class to cascade hand-picked tables
and restrictions without the background logging of `SpyglassMixin`. The
assembled list of restricted free tables, `RestrGraph.all_ft`, can be passed to
and restrictions without the background logging of `ExportMixin`. The assembled
list of restricted free tables, `RestrGraph.all_ft`, can be passed to
`Export.write_export` to generate a shell script for exporting the data.
Loading

0 comments on commit 92d9c35

Please sign in to comment.