This repository contains datasets collected and annotated within the scopes of the OpenDR project. The datasets are read in opendr
format, but they can also be downloaded in their raw format and used in any tool.
The OpenDR toolkit is an optional requirement of this package, only needed if you plan to use these datasets with an OpenDR tool. For full integration, the OpenDR toolkit must be installed first.
pip install git+https://github.com/opendr-eu/datasets.git
If you don't plan on using other OpenDR tools, the datasets can be downloaded from their corresponding README pages, and there is no need to install this package.
The KITTI panoptic segmentation dataset for urban scene understanding provides panoptic annotations for a subset of images from the KITTI Vision Benchmark Suite. The annotations for the images that we provide do not intersect with the official KITTI semantic/instance segmentation test set, therefore in addition to panoptic segmentation, they can also be used as supplementary training data for benchmarking semantic or instance segmentation tasks individually. The dataset consists of a total of 1055 images, out of which 855 are used for the training set and 200 are used for the validation set. The images are a resolution of 1280×384 pixels. We provide annotations for 11 ‘stuff’ classes and 8 ‘thing’ classes adhering to the Cityscapes ‘stuff’ and ‘thing’ class distribution.
The NuScenes LiDAR panoptic segmentation dataset for urban 3D scene understanding provides panoptic annotations for LiDAR point clouds from the NuScenes dataset. Our dataset consists of a total of 850 scans, out of which 700 are used for the training set and 150 are used for the validation set. We provide annotations for 6 'stuff' classes and 10 'thing' classes.
This dataset was generated using an aerial robot and a ground robot in the Webots simulator with the OpenDR agricultural dataset generator tool. It consists of 13980 RGB images and their semantic segmentation counterparts taken at different lighting conditions and robot positions in an agricultural field. It also includes the annotation data comprised of the class of the object, x, and y of the top left pixel of the object bounding box, and the width and height of the object bounding box. Furthermore, it includes gps and inertial unit sensor data for UAV and gps, inertial and lidar sensor data for UGV.
The dataset was generated by capturing 195 real images and annotating them with the correct labels (bounding boxes, keypoints and polygons). Further augmentation of the images completes the dataset to around 280,000 images. All tools and scripts to generate and replicate the dataset are provided, as well as the trained model and visualization scripts.
A set of synthetically generated multi-view facial images has been created within OpenDR H2020 research project by Aristotle University of Thessaloniki based on the LFW image dataset which is a facial image dataset that consists of 13,233 facial images in the wild for 5,749 person identities collected from the Web. The resulting set, named AUTH-OpenDR Augmented LFW (AUTH-OpenDR ALFW), consists of 5,749 person identities. From each image of these subjects (13,233 in total), 13 synthetic images generated by yaw axis camera rotation in the interval [0◦: +60◦ ] with step +5◦ are obtained. Moreover, 10 synthetic images generated by pitch axis camera rotation in the interval [0◦ : +45◦ ] with step +5◦ are also created for each facial image of the aforementioned dataset.
A dataset of facial images from several viewing angles was created by Aristotle University of Thessaloniki based on the CelebA image dataset, using the software that was created in OpenDR H2020 research project based on this paper and the respective code provided by the authors. CelebA is a large scale facial dataset and consists of 202,599 facial images of 10,177 celebrities captured in the wild. The new dataset namely AUTH-OpenDR Augmented CelebA (AUTH-OpenDR ACelebA) was generated from 140,000 facial images corresponding to 9161 persons, i.e. a subset of CelebA was used. For each CelebA image used, 13 synthetic images generated by yaw axis camera rotation in the interval [0◦ : +60◦ ] with step +5◦ were obtained. Moreover, 10 synthetic images generated by pitch axis camera rotation in the interval [0◦: +45◦] with step +5◦ are also created for each facial image of the aforementioned dataset. Since CelebA license does not allow distribution of derivative work we do not make AcelebA directly available but instead provide instructions and scripts on how to recreate it.
Realistic 3D Human models Generated from Real-World Images
This dataset contains 133 3D human models generated using the Pixel-aligned Implicit Function (PIFu) and full-body images of people from the Clothing Co-Parsing (CCP) dataset. The 3D human models are provided in .OBJ format.
Download Instructions:
wget ftp://opendrdata.csd.auth.gr/simulation/human_data_generation_framework/human_models.tar.gz
This dataset contains 133 3D human models generated using the Pixel-aligned Implicit Function (PIFu) and full-body images of people from the Clothing Co-Parsing (CCP) dataset. The 3D human models are provided in .OBJ format.
Download Instructions:
wget ftp://opendrdata.csd.auth.gr/simulation/human_data_generation_framework/human_models.tar.gz
The dataset contains 2914 human bodies in various shapes and textures folowing the SMPL-D template. At its core, our dataset consists of 183 unique SMPL+D bodies, which were generated through non-rigid shape registration of manually generated MakeHuman models. The rest were generated by applying shape and texture alterations to those models. In addition, we provide code for converting those human models in the FBX format. However, pose-dependent deformations are not applied to the human models. Finally, instructions for setting a demo project in the Webots simulator are provided. In the project, one the SMPL+D models in FBX format can perform an animation from AMASS.
The dataset is available through the official GitHub repository of the OpenDR toolkit here.
This dataset introduces a realistic synthetic facial image generation pipeline, using a modified version of Unity's Perception package installed on a URP project, that has been designed to support active face recognition. The developed pipeline enables generating images under a wide range of different view angles and distances, as well as under different illumination conditions and backgrounds.
The dataset is available here
ActiveHuman was generated using Unity’s Perception package. It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals). Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.
ActiveHuman is available here.
The dataset contains (i) sequences of facial 3D models depicting various expressions, (ii) Webots environments and (iii) videos along with synchronized audio (speech), captured from a grid of virtual cameras, placed in a set of 25 angles (−60°…+ 60° in pan with 30° increments and −30°..+ 30° in tilt with 15° increments) and 3 distances (0.5,0.75 and 1 meters). In addition, aiming to simulate various illumination conditions, the video-footage was collected at “dark” and “bright” lighting levels. The dataset is suitable for training and evaluation of active/static single modality/multimodal facial expression recognition methods. Each part of the dataset can be utilized in a standalone manner, i.e., it can be downloaded and utilized separately from the rest. The facial 3D models were generated using DECA model, applied on image sequences from the emotional speech part of RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset. This part contains videos (including audio) of 24 professional actors, vocalizing two statements in 8 different emotions/expressions in two intensity levels (normal, strong) and two repetitions per intensity. The neutral expression is also included in normal intensity only.
The dataset is available here.
- AUTH - AR-based mixed data generator for human-centric active perception tasks
The tool uses Augmented Reality (AR) technology and allows the easy creation of mixed (real and synthetic) data depicting realistic 3D human models in various user-captured real environments, viewed from different camera positions. It runs on Android mobile devices and it was developed using Unity's AR Foundation framework. Once the tool is executed on a mobile device within a certain environment, e.g. a room, it uses the device's rear camera and plane detection to detect the floor. Then the first of the h publicly available 3D human models that are bundled with the tool are automatically (or manually) rendered/placed on the floor, with correct proportions and correct lighting coming from natural light estimation, at a certain distance from the camera and rotated 360° in k θ° increments, while its images, one for each orientation, are automatically stored. Subsequently, the 3D model is moved away from the camera (i.e., in the z axis direction) by one-meter increments and rotated again 360° in each such position. The same procedure, which is fully customizable in terms of its parameters, is repeated for all available models. Captured images are accompanied by automatically generated 3D and 2D (on the image plane) bounding boxes for the body and the head and are annotated with their camera angle and position as well as the ID of the depicted subject. In order to facilitate the capturing procedure, the tool allows the user to capture a video from a certain environment and do the 3D models rendering/placement and image data/metadata generation procedure afterwards, in an offline manner.
The tool is availale here.
The following simulation-based competition scenarios allows researchers to easily generate sensory datasets for traning deep-learning robotics systems:
- Highway driving competition: Program a Lincoln MKZ autonomous car to drive as fast as possible on a crowded highway.
- Pick and place competition: Program a youBot mobile manipulator robot to pick and place a cube as quickly as possible.
- Pit escape competition: Program a BB-8 robot lost in a sand desert to climb out of a pit as quickly as possible.
- UAV depth planning environment: Program a quadrotor to avoid randomized obstacles using depth images.
This is a synthetic image generation pipeline specifically designed to support active vision tasks. The pipeline is developed using a realistic simulation framework based on Unity and allows for the generation of images depicting humans, captured at varying view angles, distances, illumination conditions, and backgrounds. The generated data are accompanied by ground truth 2D/3D bounding boxes, joints key points and semantic segmentation maps.