Skip to content

Health and Gait dataset: multimodal features from videos

License

Notifications You must be signed in to change notification settings

AVAuco/healthgait

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Health&Gait: a video dataset for gait-based analysis


The different data modalities contained in the dataset. From left to right shows pose, semantic segmentation, silhouette and optical flow.

Description

This repository contains the Health&Gait dataset, the first that enables gait analysis using visual information without specific sensors, relying solely on cameras. The dataset includes multimodal features extracted from videos, and gait parameters and anthropometric measurements from each participant. This dataset is intended for use in health, sports and gait analysis research.


Two examples of the different data types from the dataset for two participants (a) and (b).

Dataset Contents

Health&Gait consists of 1,564 videos of 398 participants walking in a controlled closed environment, where each video has associated the following information:

  • 2D pose estimation of their joints by AlphaPose (JSON format files).
  • Semantic segmentation by DensePose (PNG images).
  • Optical flow by TVL1 and GMFlow (PNG images).
  • Silhouette by YOLOV8 (JPEG images).

Moreover, for each subject, the following data has been recorded:

  • Anthropometric measurements.
  • Gait parameters obtained from OptoGait and MuscleLAB.
  • Gait parameters estimated from pose information.


Directory and file scheme of the Health&Gait database.

Attributes in the file participants_measures.csv
Attributes Description Unit
Sex Participant sex 0:female, 1:male
Age Participant Age Years
PA_level Level of physical activity >=3 days: Active, < 3 days: Non active
Height Participant height cm
Weight Participant weight kg
BMI Body Mass Index kg/m2
WaistC Waist circumference cm
HipC Hip circumference cm
NeckC Neck circumference cm
Percentage fat mass The total mass of fat divided by total body mass %
Lean mass The difference between total body weight and body fat weight kg
Atributes in the file gait_parameters_estimation.csv.
Attributes Description Unit
Step_UGS / Step_FGS The distance between the two toes or heels of the feet in sequence for usual/fast gait speed. cm
Stride_UGS / Stride_FGS The distance between the two toes or heels of sequential strides of the same foot for usual/fast gait speed. cm
Cadence_UGS / Cadence_FGS The number of steps taken per unit of time for usual/fast gait speed. Steps / min
MonoSP_UGS / MonoSP_FGS Time in the swing phase where only one limb is in contact with the ground for usual/fast gait speed. sec
BiSP_UGS / BiSP_FGS Time that both feet are on the ground for usual/fast gait speed. sec
Speed_UGS / Speed_FGS Participant velocity for usual/fast gait speed. m / s

Getting Started

Download Dataset

The dataset is hosted in the following Zenodo repository.

Install dependencies

The first is to create a Python environment from the requirement.txt file. The use of conda is recommended.

conda create --name <env> --file requirement.txt
conda activate <env>

Check if Tensorflow is installed correctly:

python3 -c "import tensorflow as tf; print(len(tf.config.list_physical_devices('GPU')) > 0)"

Usage

The following indicates how to use the various scripts provided in the repository and a recommendation on how to load and use the dataset. The first thing is to access the directory within the repository where the scripts are located:

cd scripts/

Create train, validation and test partitions

The create_partitions.sh script allows the partitioning of patients into train, validation and test sets in a stratified manner for the sex classification and weight and age regression tasks.

bash create_partitions.sh -p <patient_measures_file> -o <output_path>

where:

  • <patient_measures_file> is the path to the participants_measures.csv file in the dataset.
  • <output_path> is the path where to save the partitions.

Technical Validation

The following script can be used to obtain the results presented in the technical validation sections in the paper:

python technical_validation.py --config '../configfiles/train_configfile.json' --targets {sex, weight, age} --methods {MoviNet, XGBoost, MLP}

where train_configfile.json is a configuration file where you need to set the value of the following fields:

  • "data_path": path where you have downloaded the dataset.
  • "partitions_path": path used in the previous scripts.
  • "save_dir": path where to store the training results.
  • "patients_measures": file with the anthropometric data of the participants.
  • "gait_parameters": file with the information of the gait parameters.
  • "gait_parameters_estimation": file with the information of the estimated gait parameters.
  • "hyperparameters_search_space": JSON file where the hyperparameters search space of the XGBoost method is defined.

You can tell the script from which target to get the results and which methods to use to do so.

Gait parameters estimation

The following script allows to obtain the gait parameters from the pose:

bash gait_parameters_estimation.sh -p <patient_measures_file> -s <sensor_bboxes> -e <semantic_segmentation_path> -o <output_csv_path> -k [scale] -f [fps]

where:

  • -p is used to indicate the path to the participants_measures.csv file.
  • -s is the path to the directory with the bounding boxes of OptoGait sensors (for more information contact me).
  • -e is the path to the directory with the semantic segmentation.
  • -o is the path to the CSV file where we want to store the estimates.
  • -k is used to indicate the scale of the scene to obtain real measurements.
  • -f is used to indicate the fps rate needed to obtain the gait parameters related to time.

Recommendations to load and use the dataset

Example of DataGenerator in Tensorflow

To load the multimodal features extracted from the videos (semantic segmentation, silhouette, and optical flow), the use of DataGenerators is recommended.

class FrameGenerator:

  def __init__(self, videos_names, labels, n_frames, output_size = (224, 224), training = False):
    
    """ Returns a set of frames with their associated label. 

      Args:
        path: Video file paths.
        n_frames: Number of frames. 
        training: Boolean to determine if a training dataset is being created.
    """
    self.videos_names = videos_names
    self.n_frames = n_frames
    self.training = training
    self.labels = labels
    self.output_size = output_size

  def __call__(self):

    pairs = list(zip(self.videos_names, self.labels))

    if self.training:
      random.shuffle(pairs)

    for video_name, label in pairs:

      video_frames = frames_from_video_file(video_name, self.n_frames, self.output_size) 

      yield video_frames, label


def frames_from_video_file(video_file, num_frames, output_size = (224,224), skip_percent=0.15):
    
    frames = sorted(os.listdir(video_file))
    total_frames = len(frames)
    skip_frames = int(total_frames * skip_percent)

    effective_frame_count = total_frames - (2 * skip_frames)
    frame_interval = effective_frame_count // num_frames
    
    selected_frames = []
    for i in range(num_frames):

        frame_num = skip_frames + int(i * frame_interval)
        frame = frames[frame_num]
        selected_frames.append(format_frames(cv2.imread(os.path.join(video_file, frame)), output_size))

    frames = np.array(selected_frames)[..., [2, 1, 0]]
    return frames

def format_frames(frame, output_size):
  """
    Pad and resize an image from a video.

    Args:
      frame: Image that needs to be resized and padded. 
      output_size: Pixel size of the output frame image.

    Return:
      Formatted frame with padding of specified output size.
  """
  frame = tf.image.convert_image_dtype(frame, tf.float32)
  frame = tf.image.resize_with_pad(frame, *output_size)

  return frame
Get paths and targets for the DataGenerator

To define the DataGenerator it is first necessary to obtain the path and target of the data type to be used:

def get_data(PARTITIONS_FILE, DATA_PATH, DATA_TYPE, DATA_CLASS, OPTICAL_FLOW_METHOD, PATIENTS_INFO, TARGET):

    with open(PARTITIONS_FILE, 'r') as f:

        partitions_data = json.load(f)

    train_patients = partitions_data["train"]
    validation_patients = partitions_data["validation"]
    test_patients = partitions_data["test"]

    df = pd.read_csv(PATIENTS_INFO, sep = ',')

    data = {
        "train": {"path": [], "class": []},
        "validation": {"path": [], "class": []},
        "test": {"path": [], "class": []}
    }

    def add_data(patient, file_path):
        category = ""
        if patient in train_patients:
            category = "train"
        elif patient in validation_patients:
            category = "validation"
        elif patient in test_patients:
            category = "test"

        if category:
            data[category]["path"].append(file_path)
            data[category]["class"].append(df.loc[df['ID'] == patient][TARGET].values[0])

    for patient in os.listdir(DATA_PATH):
        for gait_type in os.listdir(os.path.join(DATA_PATH, patient)):
            directory = os.path.join(DATA_PATH, patient, gait_type)
            directory = directory if DATA_TYPE in ['silhouette', 'semantic_segmentation'] else os.path.join(directory, OPTICAL_FLOW_METHOD)

            for class_folder in os.listdir(directory):
                file_path = os.path.join(directory, class_folder)

                if DATA_CLASS == 'both':
                    add_data(patient, file_path)
                else:
                    first_split = class_folder.split("_")
                    second_split = first_split[0].split("-")

                    if second_split[1] == DATA_CLASS:
                        add_data(patient, file_path)

    return data
Define and use DataGenerators

The following code shows how to define the data generators for the three datasets, making use of the methods from_generator:

output_signature = (tf.TensorSpec(shape = (None, None, None, 3), dtype = tf.float32),
tf.TensorSpec(shape = (), dtype = tf.int16))

train_ds = tf.data.Dataset.from_generator(FrameGenerator(data["train"]["path"], data["train"]["class"], NUM_FRAMES, (IMG_SIZE, IMG_SIZE), training = True),
            output_signature=output_signature)

train_ds = train_ds.batch(BATCH_SIZE)

val_ds = tf.data.Dataset.from_generator(FrameGenerator(data["validation"]["path"], data["validation"]["class"], NUM_FRAMES, (IMG_SIZE, IMG_SIZE)),
            output_signature=output_signature)

val_ds = val_ds.batch(BATCH_SIZE)

test_ds = tf.data.Dataset.from_generator(FrameGenerator(data["test"]["path"], data["test"]["class"], NUM_FRAMES, (IMG_SIZE, IMG_SIZE)),
            output_signature=output_signature)

test_ds = test_ds.batch(BATCH_SIZE)

These generators can be used in Tensorflow's fit or predict methods:

results = model.fit(train_ds,
validation_data = val_ds,
epochs = EPOCHS,
validation_freq = 1,
callbacks = callbacks_list,
verbose = 1)
y_preds = model.predict(test_ds)

Results and Discussion

The document detailing the computational experiments conducted to perform quality control on the data is below.

Health&Gait Additional Material

License

Health&Gait is freely available for free non-commercial use, but may not be redistributed without the authors' consent. Please, see the license for further details.

Citation

@article{zafra2024,
author = {Zafra-Palma, Jorge and Marín-Jiménez, Nuria and  Castro-Piñero, José and Cuenca-García, Magdalena  and Muñoz-Salinas, Rafael and Marín-Jiménez, Manuel  J},
title = {Health \& Gait: a dataset for gait-based analysis},
journal = {Scientific Data},
volume = {12},
year = {2025},
number = {1},
issn = {2052-4463},
doi = {10.1038/s41597-024-04327-4}
}

Contact

If you have any question or suggestion, contact us by jzafra@uco.es.

About

Health and Gait dataset: multimodal features from videos

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published