Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redundant time-consuming insertion of electrodes during import #1121

Closed
magland opened this issue Sep 19, 2024 · 8 comments · Fixed by #1125
Closed

redundant time-consuming insertion of electrodes during import #1121

magland opened this issue Sep 19, 2024 · 8 comments · Fixed by #1125

Comments

@magland
Copy link
Contributor

magland commented Sep 19, 2024

I am finding that the most time-consuming part of importing a new session is the importing of all the electrodes. In my NWB there are 32 electrode groups, each with 4 electrodes, for a total of 128 electrodes. However what seems to be happening is that the 128 electrodes get inserted mayny times (maybe 32 times, once for each group?). I see skip_duplicates=True, so in the end only 128 electrodes are inserted. But this ends up taking > 15 minutes on my system, and it seems to be ~32 times the amount of work as it should be.

Here's the code that seems to be getting executed once for every electrode group:

for elect_id, elect_data in electrodes.iterrows():
key.update(
{
"electrode_id": elect_id,
"name": str(elect_id),
"electrode_group_name": elect_data.group_name,
"region_id": BrainRegion.fetch_add(
region_name=elect_data.group.location
),
"x": elect_data.get("x"),
"y": elect_data.get("y"),
"z": elect_data.get("z"),
"filtering": elect_data.get("filtering", "unfiltered"),
"impedance": elect_data.get("imp"),
**electrode_constants,
}
)
# rough check of whether the electrodes table was created by
# rec_to_nwb and has the appropriate custom columns used by
# rec_to_nwb
# TODO this could be better resolved by making an extension for the
# electrodes table
extra_cols = [
"probe_shank",
"probe_electrode",
"bad_channel",
"ref_elect_id",
]
if isinstance(
elect_data.group.device, ndx_franklab_novela.Probe
) and all(col in elect_data for col in extra_cols):
key.update(
{
"probe_id": elect_data.group.device.probe_type,
"probe_shank": elect_data.probe_shank,
"probe_electrode": elect_data.probe_electrode,
"bad_channel": (
"True" if elect_data.bad_channel else "False"
),
"original_reference_electrode": elect_data.ref_elect_id,
}
)
else:
logger.warning(
"Electrode did not match extected novela format.\nPlease "
+ f"ensure the following in YAML config: {extra_cols}."
)
# override with information from the config YAML based on primary
# key (electrode id)
if elect_id in electrode_config_dicts:
# check whether the Probe.Electrode being referenced exists
query = Probe.Electrode & electrode_config_dicts[elect_id]
if len(query) == 0:
warnings.warn(
"No Probe.Electrode exists that matches the data: "
+ f"{electrode_config_dicts[elect_id]}. "
"The config YAML for Electrode with electrode_id "
+ f"{elect_id} will be ignored."
)
else:
key.update(electrode_config_dicts[elect_id])
electrode_inserts.append(key.copy())
self.insert(
electrode_inserts,
skip_duplicates=True,
allow_direct_insert=True, # for no_transaction, pop_all_common
)

@samuelbray32
Copy link
Collaborator

I see. Each make function is running on all entries in the nwb file

electrodes = nwbf.electrodes.to_dataframe()

However, the key source of the table is specific to the electrode group, so the make function gets called once per each group in the single_transaction_make function here.

key_source = getattr(table, "key_source", None)

I'm a little surprised that each call is taking so long. Do you know what part of the Electrode make function is using the time?

@magland
Copy link
Contributor Author

magland commented Sep 19, 2024

(disregard my deleted comment... investigating some more to answer your question)

@magland
Copy link
Contributor Author

magland commented Sep 19, 2024

I think it might be this

if elect_id in electrode_config_dicts:
                # check whether the Probe.Electrode being referenced exists
                query = Probe.Electrode & electrode_config_dicts[elect_id]
                if len(query) == 0:
                    warnings.warn(
                        "No Probe.Electrode exists that matches the data: "
                        + f"{electrode_config_dicts[elect_id]}. "
                        "The config YAML for Electrode with electrode_id "
                        + f"{elect_id} will be ignored."
                    )
                else:
                    key.update(electrode_config_dicts[elect_id])

because it's making a query to the database at each iteration. Would that make sense? Maybe this check is not needed?

@magland
Copy link
Contributor Author

magland commented Sep 19, 2024

actually it doesn't seem to be that. Investigating further....

@magland
Copy link
Contributor Author

magland commented Sep 19, 2024

Okay I'm pretty sure that the slow part is the BrainRegion.fetch_add(...) in the following snippet.

key.update(
                {
                    "electrode_id": elect_id,
                    "name": str(elect_id),
                    "electrode_group_name": elect_data.group_name,
                    "region_id": BrainRegion.fetch_add(
                        region_name=elect_data.group.location
                    ),
                    "x": elect_data.get("x"),
                    "y": elect_data.get("y"),
                    "z": elect_data.get("z"),
                    "filtering": elect_data.get("filtering", "unfiltered"),
                    "impedance": elect_data.get("imp"),
                    **electrode_constants,
                }
            )

Can anything be done to avoid that?

@samuelbray32
Copy link
Collaborator

It looks like fetch_add is just making a query check of the database once it's inserted which shouldn't be slow.

We could try to reduce number of calls here, but in genereal spyglass is making many queries to the database. If the latency is high enough where this is slowing you down, you'll likely be aware of it a bunch of other spots too. It might be more productive to figure the source of the slowdown for your database queries

@magland
Copy link
Contributor Author

magland commented Sep 19, 2024

Makes, sense. Okay I'll investigate what our expected latency is.

I think we must be experiencing something like 200 milliseconds round-trip latency because that would be

0.2 sec * 128 * 32  = 14 minutes

But even at extremely good 30 msec round-trip latency this would be 2 minutes, whereas if it didn't do this 32x redundant then it would be 4 seconds.

So, given that it makes a network request at every iteration, do you think this could be brought down to 128 requests instead of 128 x 32?

@magland
Copy link
Contributor Author

magland commented Sep 19, 2024

Working on this solution #1125

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants