redundant time-consuming insertion of electrodes during import #1121

magland · 2024-09-19T18:46:17Z

I am finding that the most time-consuming part of importing a new session is the importing of all the electrodes. In my NWB there are 32 electrode groups, each with 4 electrodes, for a total of 128 electrodes. However what seems to be happening is that the 128 electrodes get inserted mayny times (maybe 32 times, once for each group?). I see skip_duplicates=True, so in the end only 128 electrodes are inserted. But this ends up taking > 15 minutes on my system, and it seems to be ~32 times the amount of work as it should be.

Here's the code that seems to be getting executed once for every electrode group:

spyglass/src/spyglass/common/common_ephys.py

Lines 130 to 202 in 094e002

    
           for elect_id, elect_data in electrodes.iterrows(): 
        
               key.update( 
        
                   { 
        
                       "electrode_id": elect_id, 
        
                       "name": str(elect_id), 
        
                       "electrode_group_name": elect_data.group_name, 
        
                       "region_id": BrainRegion.fetch_add( 
        
                           region_name=elect_data.group.location 
        
                       ), 
        
                       "x": elect_data.get("x"), 
        
                       "y": elect_data.get("y"), 
        
                       "z": elect_data.get("z"), 
        
                       "filtering": elect_data.get("filtering", "unfiltered"), 
        
                       "impedance": elect_data.get("imp"), 
        
                       **electrode_constants, 
        
                   } 
        
               ) 
        
               # rough check of whether the electrodes table was created by 
        
               # rec_to_nwb and has the appropriate custom columns used by 
        
               # rec_to_nwb 
        
               # TODO this could be better resolved by making an extension for the 
        
               # electrodes table 
        
               extra_cols = [ 
        
                   "probe_shank", 
        
                   "probe_electrode", 
        
                   "bad_channel", 
        
                   "ref_elect_id", 
        
               ] 
        
               if isinstance( 
        
                   elect_data.group.device, ndx_franklab_novela.Probe 
        
               ) and all(col in elect_data for col in extra_cols): 
        
                   key.update( 
        
                       { 
        
                           "probe_id": elect_data.group.device.probe_type, 
        
                           "probe_shank": elect_data.probe_shank, 
        
                           "probe_electrode": elect_data.probe_electrode, 
        
                           "bad_channel": ( 
        
                               "True" if elect_data.bad_channel else "False" 
        
                           ), 
        
                           "original_reference_electrode": elect_data.ref_elect_id, 
        
                       } 
        
                   ) 
        
               else: 
        
                   logger.warning( 
        
                       "Electrode did not match extected novela format.\nPlease " 
        
                       + f"ensure the following in YAML config: {extra_cols}." 
        
                   ) 
        
               # override with information from the config YAML based on primary 
        
               # key (electrode id) 
        
               if elect_id in electrode_config_dicts: 
        
                   # check whether the Probe.Electrode being referenced exists 
        
                   query = Probe.Electrode & electrode_config_dicts[elect_id] 
        
                   if len(query) == 0: 
        
                       warnings.warn( 
        
                           "No Probe.Electrode exists that matches the data: " 
        
                           + f"{electrode_config_dicts[elect_id]}. " 
        
                           "The config YAML for Electrode with electrode_id " 
        
                           + f"{elect_id} will be ignored." 
        
                       ) 
        
                   else: 
        
                       key.update(electrode_config_dicts[elect_id]) 
        
               electrode_inserts.append(key.copy()) 
        
           self.insert( 
        
               electrode_inserts, 
        
               skip_duplicates=True, 
        
               allow_direct_insert=True,  # for no_transaction, pop_all_common 
        
           )

The text was updated successfully, but these errors were encountered:

samuelbray32 · 2024-09-19T19:53:36Z

I see. Each make function is running on all entries in the nwb file

spyglass/src/spyglass/common/common_ephys.py

Line 129 in 094e002

electrodes = nwbf.electrodes.to_dataframe()

However, the key source of the table is specific to the electrode group, so the make function gets called once per each group in the single_transaction_make function here.

spyglass/src/spyglass/common/populate_all_common.py

Line 75 in 094e002

key_source = getattr(table, "key_source", None)

I'm a little surprised that each call is taking so long. Do you know what part of the Electrode make function is using the time?

magland · 2024-09-19T19:59:30Z

(disregard my deleted comment... investigating some more to answer your question)

magland · 2024-09-19T20:02:26Z

I think it might be this

if elect_id in electrode_config_dicts:
                # check whether the Probe.Electrode being referenced exists
                query = Probe.Electrode & electrode_config_dicts[elect_id]
                if len(query) == 0:
                    warnings.warn(
                        "No Probe.Electrode exists that matches the data: "
                        + f"{electrode_config_dicts[elect_id]}. "
                        "The config YAML for Electrode with electrode_id "
                        + f"{elect_id} will be ignored."
                    )
                else:
                    key.update(electrode_config_dicts[elect_id])

because it's making a query to the database at each iteration. Would that make sense? Maybe this check is not needed?

magland · 2024-09-19T20:22:41Z

actually it doesn't seem to be that. Investigating further....

magland · 2024-09-19T21:02:19Z

Okay I'm pretty sure that the slow part is the BrainRegion.fetch_add(...) in the following snippet.

key.update(
                {
                    "electrode_id": elect_id,
                    "name": str(elect_id),
                    "electrode_group_name": elect_data.group_name,
                    "region_id": BrainRegion.fetch_add(
                        region_name=elect_data.group.location
                    ),
                    "x": elect_data.get("x"),
                    "y": elect_data.get("y"),
                    "z": elect_data.get("z"),
                    "filtering": elect_data.get("filtering", "unfiltered"),
                    "impedance": elect_data.get("imp"),
                    **electrode_constants,
                }
            )

Can anything be done to avoid that?

samuelbray32 · 2024-09-19T21:22:55Z

It looks like fetch_add is just making a query check of the database once it's inserted which shouldn't be slow.

We could try to reduce number of calls here, but in genereal spyglass is making many queries to the database. If the latency is high enough where this is slowing you down, you'll likely be aware of it a bunch of other spots too. It might be more productive to figure the source of the slowdown for your database queries

magland · 2024-09-19T21:34:40Z

Makes, sense. Okay I'll investigate what our expected latency is.

I think we must be experiencing something like 200 milliseconds round-trip latency because that would be

0.2 sec * 128 * 32  = 14 minutes

But even at extremely good 30 msec round-trip latency this would be 2 minutes, whereas if it didn't do this 32x redundant then it would be 4 seconds.

So, given that it makes a network request at every iteration, do you think this could be brought down to 128 requests instead of 128 x 32?

magland · 2024-09-19T22:25:59Z

Working on this solution #1125

magland mentioned this issue Sep 19, 2024

speed up electrodes import #1125

Merged

7 tasks

CBroz1 closed this as completed in #1125 Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

redundant time-consuming insertion of electrodes during import #1121

redundant time-consuming insertion of electrodes during import #1121

magland commented Sep 19, 2024

samuelbray32 commented Sep 19, 2024

magland commented Sep 19, 2024

magland commented Sep 19, 2024

magland commented Sep 19, 2024

magland commented Sep 19, 2024

samuelbray32 commented Sep 19, 2024

magland commented Sep 19, 2024

magland commented Sep 19, 2024

redundant time-consuming insertion of electrodes during import #1121

redundant time-consuming insertion of electrodes during import #1121

Comments

magland commented Sep 19, 2024

samuelbray32 commented Sep 19, 2024

magland commented Sep 19, 2024

magland commented Sep 19, 2024

magland commented Sep 19, 2024

magland commented Sep 19, 2024

samuelbray32 commented Sep 19, 2024

magland commented Sep 19, 2024

magland commented Sep 19, 2024