Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump dj dependency to 0.14.2 #1081

Merged
merged 8 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

- Disable populate transaction protection for long-populating tables #1066
- Add docstrings to all public methods #1076
- Update DataJoint to 0.14.2 #1081

### Pipelines

Expand Down
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -166,5 +166,5 @@ keywords:
- spike sorting
- kachery
license: MIT
version: 0.5.2
version: 0.5.3
date-released: '2024-04-22'
110 changes: 58 additions & 52 deletions docs/src/Features/Mixin.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@ functionalities that have been added to DataJoint tables. This includes...

- Fetching NWB files
- Long-distance restrictions.
- Delete functionality, including permission checks and part/master pairs
- Permission checks on delete
- Export logging. See [export doc](./Export.md) for more information.
- Miscellaneous helper functions

To add this functionality to your own tables, simply inherit from the mixin:

Expand Down Expand Up @@ -102,12 +103,7 @@ my_table << upstream_restriction >> downstream_restriction
When providing a restriction of the parent, use 'up' direction. When providing a
restriction of the child, use 'down' direction.

## Delete Functionality

The mixin overrides the default `delete` function to provide two additional
features.

### Permission Checks
## Delete Permission Checks

By default, DataJoint is unable to set delete permissions on a per-table basis.
If a user is able to delete entries in a given table, she can delete entries in
Expand All @@ -127,66 +123,76 @@ curcumvent the default permission checks by adding themselves to the relevant
team or removing the mixin from the class declaration. However, it provides a
reasonable level of security for the average user.

### Master/Part Pairs
Because parts of this process rely on caching, this process will be faster if
you assign the instanced table to a variable.

By default, DataJoint has protections in place to prevent deletion of a part
entry without deleting the corresponding master. This is useful for enforcing
the custom of adding/removing all parts of a master at once and avoids orphaned
masters, or null entry masters without matching data.
```python
# Slower
YourTable().delete()
YourTable().delete()

For [Merge tables](./Merge.md), this is a significant problem. If a user wants
to delete all entries associated with a given session, she must find all part
table entries, including Merge tables, and delete them in the correct order. The
mixin provides a function, `delete_downstream_parts`, to handle this, which is
run by default when calling `delete`.
# Faster
nwbfile = YourTable()
nwbfile.delete()
nwbfile.delete()
```

`delete_downstream_parts`, also aliased as `ddp`, identifies all part tables
with foreign key references downstream of where it is called. If `dry_run=True`,
it will return a list of entries that would be deleted, otherwise it will delete
them.
<details><summary>Deprecated delete feature</summary>

Importantly, `delete_downstream_parts` cannot properly interact with tables that
have not been imported into the current namespace. If you are having trouble
with part deletion errors, import the offending table and rerun the function
with `reload_cache=True`.
Previous versions of Spyglass also deleted masters of parts with foreign key
references. This functionality has been migrated to DataJoint in version 0.14.2
via the `force_masters` delete argument. This argument is `True` by default in
Spyglass tables.

```python
import datajoint as dj
from spyglass.common import Nwbfile
</details>

restricted_nwbfile = Nwbfile() & "nwb_file_name LIKE 'Name%'"
## Populate Calls

vanilla_dj_table = dj.FreeTable(dj.conn(), Nwbfile.full_table_name)
vanilla_dj_table.delete()
# DataJointError("Attempt to delete part table MyMerge.Part before ... ")
The mixin also overrides the default `populate` function to provide additional
functionality for non-daemon process pools and disabling transaction protection.

restricted_nwbfile.delete()
# [WARNING] Spyglass: No part deletes found w/ Nwbfile ...
# OR
# ValueError("Please import MyMerge and try again.")
### Non-Daemon Process Pools

from spyglass.example import MyMerge
To allow the `make` function to spawn a new process pool, the mixin overrides
the default `populate` function for tables with `_parallel_make` set to `True`.
See [issue #1000](https://github.com/LorenFrankLab/spyglass/issues/1000) and
[PR #1001](https://github.com/LorenFrankLab/spyglass/pull/1001) for more
information.

restricted_nwbfile.delete_downstream_parts(reload_cache=True, dry_run=False)
```
### Disable Transaction Protection

By default, DataJoint wraps the `populate` function in a transaction to ensure
data integrity (see
[Transactions](https://docs.datajoint.io/python/definition/05-Transactions.html)).

Because each table keeps a cache of downstream merge tables, it is important to
reload the cache if the table has been imported after the cache was created.
Speed gains can also be achieved by avoiding re-instancing the table each time.
This can cause issues when populating large tables if another user attempts to
declare/modify a table while the transaction is open (see
[issue #1030](https://github.com/LorenFrankLab/spyglass/issues/1030) and
[DataJoint issue #1170](https://github.com/datajoint/datajoint-python/issues/1170)).

```python
# Slow
from spyglass.common import Nwbfile
Tables with `_use_transaction` set to `False` will not be wrapped in a
transaction when calling `populate`. Transaction protection is replaced by a
hash of upstream data to ensure no changes are made to the table during the
unprotected populate. The additional time required to hash the data is a
trade-off for already time-consuming populates, but avoids blocking other users.

(Nwbfile() & "nwb_file_name LIKE 'Name%'").ddp(dry_run=False)
(Nwbfile() & "nwb_file_name LIKE 'Other%'").ddp(dry_run=False)
## Miscellaneous Helper functions

# Faster
from spyglass.common import Nwbfile
`file_like` allows you to restrict a table using a substring of a file name.
This is equivalent to the following:

nwbfile = Nwbfile()
(nwbfile & "nwb_file_name LIKE 'Name%'").ddp(dry_run=False)
(nwbfile & "nwb_file_name LIKE 'Other%'").ddp(dry_run=False)
```python
MyTable().file_like("eg")
MyTable() & ('nwb_file_name LIKE "%eg%" OR analysis_file_name LIKE "%eg%"')
```

`find_insert_fail` is a helper function to find the cause of an `IntegrityError`
when inserting into a table. This checks parent tables for required keys.

```python
my_key = {"key": "value"}
MyTable().insert1(my_key) # Raises IntegrityError
MyTable().find_insert_fail(my_key) # Shows the parent(s) missing the key
```

## Populate Calls
Expand Down
5 changes: 1 addition & 4 deletions notebooks/01_Concepts.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,7 @@
"```python\n",
"my_key = dict(value=key) # whatever you're inserting\n",
"MyTable.insert1(my_key) # error here\n",
"parents = MyTable.parents(as_objects=True) # get the parents as FreeTables\n",
"for parent in parents: # iterate through the parents, with only relevant fields\n",
" parent_key = {k: v for k, v in my_key.items() if k in parent.heading.names}\n",
" print(parent & parent_key) # restricted parent\n",
"parents = MyTable().find_insert_fail(my_key)\n",
"```\n",
"\n",
"If any of the printed tables are empty, you know you need to insert into that\n",
Expand Down
5 changes: 1 addition & 4 deletions notebooks/py_scripts/01_Concepts.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,7 @@
# ```python
# my_key = dict(value=key) # whatever you're inserting
# MyTable.insert1(my_key) # error here
# parents = MyTable.parents(as_objects=True) # get the parents as FreeTables
# for parent in parents: # iterate through the parents, with only relevant fields
# parent_key = {k: v for k, v in my_key.items() if k in parent.heading.names}
# print(parent & parent_key) # restricted parent
# parents = MyTable().find_insert_fail(my_key)
# ```
#
# If any of the printed tables are empty, you know you need to insert into that
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ dependencies = [
"black[jupyter]",
"bottleneck",
"dask",
"datajoint>=0.13.6",
"datajoint>=0.14.2",
# "ghostipy", # removed from list bc M1 users need to install pyfftw first
"hdmf>=3.4.6",
"ipympl",
Expand Down
2 changes: 1 addition & 1 deletion src/spyglass/utils/dj_merge_tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def is_merge_table(table):
def trim_def(definition):
return re_sub(
r"\n\s*\n", "\n", re_sub(r"#.*\n", "\n", definition.strip())
)
).replace(" ", "")

if isinstance(table, str):
table = dj.FreeTable(dj.conn(), table)
Expand Down
Loading
Loading