[BBPBGLIB-556] Full estimate of memory consumption (#32)

Introduce full memory estimation capabilities for dry run as detailed in https://bbpteam.epfl.ch/project/issues/browse/BBPBGLIB-556. When running in dry run mode, neurodamus will now provide a full estimate of the memory usage for both cells and synapses. The synapse workflow is mostly untouched from the previous version with just a small bug fix. For the cells estimate workflow we now perform full calculation of the estimate with the following workflow: 1. Finding all unique combinations of METypes and taking only the first 50 elements of each combination 2. Instantiating these combinations. Each set of maximum 50 elements is allocated in a different rank. 3. Getting the total memory consumption and then average it per cells. 4. Use the above average to calculate total memory usage for any combination with more than 50 elements. 5. Combine the cells memory total with the synapse one to get a full estimate. The MR also introduces exporting the cells memory estimate to `json` file. Since this phase is the most time consuming one, by default the dry run workflow will automatically export the results for cell memory usage to a `memory_usage.json` file. It will also try to load this file on any subsequent run and just perform instantiation for METype combinations that are not already present in the file. Furthermore, in dry run we will attempt to get an estimate for the overhead memory used by every rank, normally used to load libraries and data structure. This amount will be added at the grand total at the end of the execution. --------- Co-authored-by: Fernando Pereira <fernando.pereira@epfl.ch>
BlueBrain · Oct 5, 2023 · 55b958d · 55b958d
1 parent 0d0bc7c
commit 55b958d
Show file tree

Hide file tree

Showing 12 changed files with 261 additions and 62 deletions.
diff --git a/README.rst b/README.rst
@@ -9,7 +9,7 @@ Neurodamus
 
 Neurodamus is a BBP Simulation Control application for Neuron.
 
-The Python implementation offers a comprehensive Python API for fine tunning of the simulation, initially defined by a BlueConfig file.
+The Python implementation offers a comprehensive Python API for fine tuning of the simulation, initially defined by a BlueConfig file.
 
 
 Description
@@ -81,7 +81,7 @@ An example of a full installation with a simulation run can be found in the work
 
 Docker container
 ================
-Alternaltively, you can start directly a neurodamus docker container where all the packages are built.
+Alternatively, you can start directly a neurodamus docker container where all the packages are built.
 With the container, you can build your mod files and run simulations.
 See instructions in `docker/README.md <https://github.com/BlueBrain/neurodamus/blob/main/docker/README.md>`_.
 

diff --git a/docs/architecture.rst b/docs/architecture.rst
@@ -317,6 +317,57 @@ Indeed public API represents exactly these 3 cases:
         cell_manager.finalize()
         conn_manager.create_connections()
 
+Dry Run
+-------
+
+A dry run mode was introduced to help users in understanding how many nodes and tasks are
+necessary to run a specific circuit. In the future this mode will also be used to improve
+load balancing.
+
+By running a dry run, using the `--dry-run` flag, the user will NOT run an actual simulation but
+will get a summary of the estimated memory used for cells and synapses, including also the overhead
+memory necessary to load libraries and neurodamus data structures.
+A grand total is provided to the user as well as a per-cell type and per-synapse type breakdown.
+
+In this paragraph we will go a bit more into details on how the estimation is done.
+
+Below you can see the workflow of the dry run mode:
+
+.. image:: ./img/neurodamus_dry_run.png
+
+First of all, since memory usage of cells is strongly connected to their metypes, we create a dictionary
+of all the gids corresponding to a certain metype combination. This dictionary is then crosschecked
+with the one imported from the external `memory_usage.json` file, which contains the memory usage
+of metype combinations coming from a previous execution of dry run on this or any other circuits.
+As long as the `memory_usage.json` file is present in the working directory, it will be loaded.
+
+If the metype combination is not present in the external file, we compute the memory usage of the
+metype combination by instantiating a group of (maximum) 50 cells per metype combination and then
+measuring memory usage before and after the instantiation. The memory usage is then averaged over
+the number of cells instantiated and the result are saved internally and added to the external
+`memory_usage.json` file. Any combination already present in the external file is simply imported
+and is not instantiated again in order to speed up the execution. One can simply delete the `memory_usage.json`
+file (or any relevant lines) in order to force the re-evaluation of all (or some) metype
+combinations.
+
+The memory usage of synapses is instead estimated using a pre-computed look up table, which is
+hardcoded in the `SynapseMemoryUsage` class. The values used for this look up table were computed by using an external script
+to instantiate 1M synapses of each type, each with 1K connections, and then measuring the memory
+usage before and after the instantiation. The memory usage is then averaged over the number of
+synapses instantiated. The script used to perform this operation `synstat.py` is available for the user
+and is archived in this repo in the `_benchmarks` folder.
+
+Having these pre-computed values allows us to simply count the amount of synapses of each type
+and multiply it by the corresponding memory usage value.
+
+Apart from both cells and synapses, we also need to take into account the memory usage of neurodamus
+itself, e.g. data structures, loaded libraries and so on. This is done by measuring the RSS of the neurodamus
+process before any of the actual instantiation is done. This value, since it's averaged over all ranks that take
+part in the execution, is then multiplied by the number of ranks used in the execution.
+
+The final result is then printed to the user in a human readable format.
+
+
 Development
 ------------
 

diff --git a/docs/examples.rst b/docs/examples.rst
@@ -87,21 +87,28 @@ In order to obtain a more accurate estimation of the resources needed for a simu
 users can also run Neurodamus in dry run mode. This functionality is only available
 for libsonata circuits. MVD3 circuits are not supported.
 
-This mode will instantiate all the cells but won't run the actual simulation.
-The user can then check the memory usage of the simulation as it's printed on
-the terminal and decide how to proceed.
-
-The mode also provides detailed information on the memory usage of each cell type
-and the total memory usage of the simulation.
+This mode will partially instantiate cells and synapses to get a statistical overview
+of the memory used but won't run the actual simulation.
+The user can then check the estimated memory usage of the simulation as it's printed on
+the terminal at the end of the execution. In a future update we will also integrate
+indications and suggestions on the number of tasks and nodes to use for that circuit
+based on the amount of memory used during the dry run.
+
+The mode also provides detailed information on the memory usage of each cell metype,
+synapse type and the total estimated memory usage of the simulation, including the
+memory overhead dictated by loading of libraries and data structures.
+
+The information on the cell memory usage is also automatically saved in a file called
+``memory_usage.json`` in the working directory. This json file contains a
+dictionary with the memory usage of each cell metype in the circuit and is automatically
+loaded in any further execution of Neurodamus in dry run mode, in order to speed up the execution.
+In future we plan to also use this file to improve the load balance of actual simulations.
 
 To run Neurodamus in dry run mode, the user can use the ``--dry-run`` flag when launching
 Neurodamus. For example:
 
 ``neurodamus --configFile=BlueConfig --dry-run``
 
-At the moment dry run mode only supports memory estimation for cell instantiation. Evaluation
-of other resources (e.g. connections) will be added in the future. 
-
 
 Neurodamus for Developers
 -------------------------

diff --git a/docs/img/neurodamus_dry_run.png b/docs/img/neurodamus_dry_run.png
diff --git a/neurodamus/cell_distributor.py b/neurodamus/cell_distributor.py
@@ -142,6 +142,7 @@ def __init__(self, circuit_conf, target_manager, _run_conf=None, **_kw):
         self._binfo = None
         self._pc = Nd.pc
         self._conn_managers_per_src_pop = weakref.WeakValueDictionary()
+        self._metype_counts = None
 
         if type(circuit_conf.CircuitPath) is str:
             self._init_config(circuit_conf, self._target_spec.population or '')
@@ -163,6 +164,7 @@ def __init__(self, circuit_conf, target_manager, _run_conf=None, **_kw):
     is_default = property(lambda self: self._circuit_name is None)
     is_virtual = property(lambda self: False)
     connection_managers = property(lambda self: self._conn_managers_per_src_pop)
+    metype_counts = property(lambda self: self._metype_counts)
 
     def is_initialized(self):
         return self._local_nodes is not None
@@ -219,6 +221,8 @@ def load_nodes(self, load_balancer=None, *, _loader=None, loader_opts=None):
         else:
             gidvec, me_infos, *cell_counts = self._load_nodes_balance(loader_f, load_balancer)
         self._local_nodes.add_gids(gidvec, me_infos)
+        if SimConfig.dry_run:
+            self._metype_counts = me_infos.counts
         self._total_cells = cell_counts[0]
         logging.info(" => Loaded info about %d target cells (out of %d)", *cell_counts)
 
@@ -253,7 +257,7 @@ def _load_nodes_balance(self, loader_f, load_balancer):
         return gidvec, me_infos, total_cells, full_size
 
     # -
-    def finalize(self, *_):
+    def finalize(self, imported_memory_dict=None, *_):
         """Instantiates cells and initializes the network in the simulator.
 
         Note: it should be called after all cell distributors have done load_nodes()
@@ -262,11 +266,13 @@ def finalize(self, *_):
         if self._local_nodes is None:
             return
         logging.info("Finalizing cells... Gid offset: %d", self._local_nodes.offset)
-        self._instantiate_cells()
+        memory_dict = self._instantiate_cells(imported_memory_dict)
         self._update_targets_local_gids()
         self._init_cell_network()
         self._local_nodes.clear_cell_info()
 
+        return memory_dict
+
     @mpi_no_errors
     def _instantiate_cells(self, _CellType=None):
         CellType = _CellType or self.CellType
@@ -286,52 +292,69 @@ def _instantiate_cells(self, _CellType=None):
             self._store_cell(gid + cell_offset, cell)
 
     @mpi_no_errors
-    def _instantiate_cells_dry(self, _CellType=None):
+    def _instantiate_cells_dry(self, _CellType=None, imported_memory_dict=None):
         CellType = _CellType or self.CellType
         assert CellType is not None, "Undefined CellType in Manager"
         Nd.execute("xopen_broadcast_ = 0")
 
         logging.info(" > Dry run on cells... (%d in Rank 0)", len(self._local_nodes))
-        logging.info("Memory usage for metype combinations:")
+        logging.info("Memory usage for newly instantiated metype combinations:")
         cell_offset = self._local_nodes.offset
 
         gid_info_items = self._local_nodes.items()
 
-        prev_emodel = None
+        prev_etype = None
         prev_mtype = None
         start_memory = get_mem_usage()
         n_cells = 0
         memory_dict = {}
 
-        for gid, cell_info in gid_info_items:
+        filtered_gid_info_items = self._filter_memory_dict(imported_memory_dict, gid_info_items)
+
+        for gid, cell_info in filtered_gid_info_items:
             diff_mtype = prev_mtype != cell_info.mtype
-            diff_emodel = prev_emodel != cell_info.emodel
-            first = prev_emodel is None and prev_mtype is None
-            if (diff_mtype or diff_emodel) and not first:
+            diff_etype = prev_etype != cell_info.etype
+            first = prev_etype is None and prev_mtype is None
+            if (diff_mtype or diff_etype) and not first:
                 end_memory = get_mem_usage()
                 memory_allocated = end_memory - start_memory
-                log_all(logging.INFO, " * %s %s: %f MB averaged over %d cells",
-                        prev_emodel, prev_mtype, memory_allocated/n_cells, n_cells)
-                memory_dict[(prev_emodel, prev_mtype)] = memory_allocated/n_cells
+                log_all(logging.INFO, " * %s %s: %.2f MB averaged over %d cells",
+                        prev_etype, prev_mtype, memory_allocated/n_cells, n_cells)
+                memory_dict[(prev_etype, prev_mtype)] = memory_allocated/n_cells
                 start_memory = end_memory
                 n_cells = 0
 
             cell = CellType(gid, cell_info, self._circuit_conf)
             self._store_cell(gid + cell_offset, cell)
 
-            prev_emodel = cell_info.emodel
+            prev_etype = cell_info.etype
             prev_mtype = cell_info.mtype
             n_cells += 1
 
-        if prev_emodel is not None and prev_mtype is not None:
+        if prev_etype is not None and prev_mtype is not None:
             end_memory = get_mem_usage()
             memory_allocated = end_memory - start_memory
             log_all(logging.INFO, " * %s %s: %f MB averaged over %d cells",
-                    prev_emodel, prev_mtype, memory_allocated/n_cells, n_cells)
-            memory_dict[(prev_emodel, prev_mtype)] = memory_allocated/n_cells
+                    prev_etype, prev_mtype, memory_allocated/n_cells, n_cells)
+            memory_dict[(prev_etype, prev_mtype)] = memory_allocated/n_cells
+
+        if imported_memory_dict is not None:
+            memory_dict.update(imported_memory_dict)
 
         return memory_dict
 
+    def _filter_memory_dict(self, imported_memory_dict, gid_info_items):
+        if imported_memory_dict is not None:
+            filtered_gid_info_items = (
+                (gid, cell_info)
+                for gid, cell_info in gid_info_items
+                if (cell_info.etype, cell_info.mtype) not in imported_memory_dict
+            )
+        else:
+            filtered_gid_info_items = gid_info_items
+
+        return filtered_gid_info_items
+
     def _update_targets_local_gids(self):
         logging.info(" > Updating targets")
         cell_offset = self._local_nodes.offset
@@ -559,7 +582,7 @@ def load_nodes(self, load_balancer=None, **kw):
         log_verbose("Nodes Format: %s, Loader: %s", self._node_format, loader.__name__)
         return super().load_nodes(load_balancer, _loader=loader, loader_opts=loader_opts)
 
-    def _instantiate_cells(self, *_):
+    def _instantiate_cells(self, imported_memory_dict, *_):
         if self.CellType is not NotImplemented:
             return super()._instantiate_cells(self.CellType)
         conf = self._circuit_conf
@@ -570,7 +593,7 @@ def _instantiate_cells(self, *_):
         log_verbose("Loading '%s' morphologies from: %s",
                     CellType.morpho_extension, conf.MorphologyPath)
         if SimConfig.dry_run:
-            super()._instantiate_cells_dry(CellType)
+            return super()._instantiate_cells_dry(CellType, imported_memory_dict)
         else:
             super()._instantiate_cells(CellType)
 

diff --git a/neurodamus/core/nodeset.py b/neurodamus/core/nodeset.py
@@ -313,7 +313,7 @@ def intersection(self, other: _NodeSetBase, raw_gids=False, _quick_check=False):
                 # Like that we could still keep ranges internally and have PROPER API to get raw ids
                 return numpy.add(intersect, 1, dtype=intersect.dtype)
             return numpy.add(intersect, self.offset + 1, dtype=intersect.dtype)
-        return []
+        return numpy.array([], dtype="uint32")
 
     def intersects(self, other):
         return self.intersection(other, _quick_check=True)

diff --git a/neurodamus/io/cell_readers.py b/neurodamus/io/cell_readers.py
@@ -248,6 +248,8 @@ def fetch_MEinfo(node_reader, gidvec, combo_file, meinfo):
     mtypes = node_reader.mtypes(indexes)
     emodels = node_reader.emodels(indexes) \
         if combo_file else None  # Rare but we may not need emodels (ngv)
+    etypes = node_reader.etypes(indexes) \
+        if combo_file else None
     exc_mini_freqs = node_reader.exc_mini_frequencies(indexes) \
         if node_reader.hasMiniFrequencies() else None
     inh_mini_freqs = node_reader.inh_mini_frequencies(indexes) \
@@ -259,8 +261,8 @@ def fetch_MEinfo(node_reader, gidvec, combo_file, meinfo):
     positions = node_reader.positions(indexes)
     rotations = node_reader.rotations(indexes) if node_reader.rotated else None
 
-    meinfo.load_infoNP(gidvec, morpho_names, emodels, mtypes, threshold_currents, holding_currents,
-                       exc_mini_freqs, inh_mini_freqs, positions, rotations)
+    meinfo.load_infoNP(gidvec, morpho_names, emodels, mtypes, etypes, threshold_currents,
+                       holding_currents, exc_mini_freqs, inh_mini_freqs, positions, rotations)
 
 
 def load_sonata(circuit_conf, all_gids, stride=1, stride_offset=0, *,
@@ -279,7 +281,7 @@ def load_nodes_base_info():
         total_cells = node_pop.size
         if SimConfig.dry_run:
             logging.info("Sonata dry run mode: looking for unique metype instances")
-            gid_metype_bundle = _retrieve_unique_metypes(node_pop, all_gids)
+            gid_metype_bundle, count_per_metype = _retrieve_unique_metypes(node_pop, all_gids)
             gidvec = dry_run_distribution(gid_metype_bundle, stride, stride_offset, total_cells)
         else:
             gidvec = split_round_robin(all_gids, stride, stride_offset, total_cells)
@@ -289,8 +291,13 @@ def load_nodes_base_info():
         node_sel = libsonata.Selection(gidvec - 1)  # 0-based node indices
         morpho_names = node_pop.get_attribute("morphology", node_sel)
         mtypes = node_pop.get_attribute("mtype", node_sel)
-        emodels = [emodel.removeprefix("hoc:")
-                   for emodel in node_pop.get_attribute("model_template", node_sel)]
+        try:
+            etypes = node_pop.get_attribute("etype", node_sel)
+        except libsonata.SonataError:
+            logging.warning("etype not found in node population, setting to None")
+            etypes = None
+        _model_templates = node_pop.get_attribute("model_template", node_sel)
+        emodel_templates = [emodel.removeprefix("hoc:") for emodel in _model_templates]
         if set(["exc_mini_frequency", "inh_mini_frequency"]).issubset(attr_names):
             exc_mini_freqs = node_pop.get_attribute("exc_mini_frequency", node_sel)
             inh_mini_freqs = node_pop.get_attribute("inh_mini_frequency", node_sel)
@@ -309,13 +316,17 @@ def load_nodes_base_info():
         rotations = _get_rotations(node_pop, node_sel)
 
         # For Sonata and new emodel hoc template, we need additional attributes for building metype
+        # TODO: validate it's really the emodel_templates var we should pass here, or etype
         add_params_list = None if not has_extra_data \
-            else _getNeededAttributes(node_pop, circuit_conf.METypePath, emodels, gidvec-1)
+            else _getNeededAttributes(node_pop, circuit_conf.METypePath, emodel_templates, gidvec-1)
 
         meinfos = METypeManager()
-        meinfos.load_infoNP(gidvec, morpho_names, emodels, mtypes, threshold_currents,
-                            holding_currents, exc_mini_freqs, inh_mini_freqs, positions,
-                            rotations, add_params_list)
+        meinfos.load_infoNP(gidvec, morpho_names, emodel_templates, mtypes, etypes,
+                            threshold_currents, holding_currents,
+                            exc_mini_freqs, inh_mini_freqs, positions, rotations,
+                            add_params_list)
+        if SimConfig.dry_run:
+            meinfos.counts = count_per_metype
         return gidvec, meinfos, total_cells
 
     # If dynamic properties are not specified simply return early
@@ -480,8 +491,10 @@ def _retrieve_unique_metypes(node_reader, all_gids) -> dict:
         raise Exception(f"Reader type {type(node_reader)} incompatible with dry run.")
 
     unique_metypes = defaultdict(list)
+    count_per_metype = defaultdict(int)
     for gid, emodel, mtype in zip(gidvec, emodels, mtypes):
         unique_metypes[(emodel, mtype)].append(gid)
+        count_per_metype[(emodel, mtype)] += 1
 
     logging.info("Out of %d cells, found %d unique mtype+emodel combination",
                  len(gidvec), len(unique_metypes))
@@ -498,4 +511,4 @@ def _retrieve_unique_metypes(node_reader, all_gids) -> dict:
         else:
             gid_metype_bundle.append(unique_metypes[key])
 
-    return gid_metype_bundle
+    return gid_metype_bundle, count_per_metype