Skip to content

Exporting simulation data

valery edited this page Mar 21, 2023 · 2 revisions

Workflow for exporting simulation data

Once you have run a simulation, you will have a folder containing all the simulation data (e.g. Experiment1). To export this data to a tabular format, you will need to follow the steps outlined below:

  1. Reorganize the file structure: Use the script /ABM/abm/data/metaprotocol/experiments/scripts/organize_distributed_experiment.py to reorganize the file structure. Replace the path in the script with the path to the outer Experiment1 folder that contains all the individual hashed folders. This will result in a folder structure with folders named batch_XX inside.
  2. Load the simulation data: Use ExperimentLoader to load the simulation data.
  3. Select the required data: Select the necessary data from the agent_summary dictionary. The data shape will be (batch, simulation parameter 1 ... simulation parameter n, agent number, timestep).

Here's an example script you can use to export the data to a tabular format:

import os
import uuid
from pathlib import Path

from abm.loader.data_loader import ExperimentLoader
import pandas as pd
import numpy as np

# path to the folder with all the simulation data
data_folder = ".../Experiment1"
simulation_params = ['PARAMETER_1', 'PARAMETER_2']

experiment = ExperimentLoader(data_folder)

# data shape is (batch, simulation parameter 1 ... simulation parameter n, agent number, timestep)
pos_x = experiment.agent_summary['posx']
pos_y = experiment.agent_summary['posy']
num_batches = experiment.num_batches

# convert zarr to the tabular format with columns: time, x, y
for i, s in enumerate(experiment.varying_params[simulation_params[0]]):
    for j, v in enumerate(experiment.varying_params[simulation_params[1]]):
        folder_name = f"param1_{s}_param2_{v}"
        # create a folder with the condition name (parameter values) using pathlib
        condition_folder = Path(data_folder) / folder_name
        condition_folder.mkdir(parents=True, exist_ok=True)
        time = np.arange(0, experiment.chunksize)
        for batch in range(num_batches):
            data = []
            ids = []
            for a in range(pos_x.shape[-2]):
                # create a unique ID for each agent
                agent_id = uuid.uuid4().hex
                data.append(pd.DataFrame({
                    'time': time,
                    'id': [a + 1] * len(time),
                    'x': pos_x[batch, i, j, a, :],
                    'y': pos_y[batch, i, j, a, :],
                }))
                ids.append(agent_id)
            data = pd.concat(data)
            # sort by time and agent ID
            data = data.sort_values(by=['time', 'id'], ignore_index=True)
            data.to_csv(os.path.join(condition_folder, f"{'_'.join(ids)}.csv"), index=False)

Clone this wiki locally