Reproducibility

This document explains how to reproduce the HeiPorSPECTRAL dataset including data generation, preprocessing (intermediates) and figure computation.

Prerequisite

Please make sure that you have the htc package installed (it contains all required dependencies) and that it works on your machine (see README in the repository root). It is advisable to run all commands in a screen environment (e.g. screen) as they may take a while to complete.

Step 1a: generate the dataset (DKFZ internal only)

The HeiPorSPECTRAL dataset is generated based on our (internal) full version of the tissue atlas. Please make sure that you set the path to the masks and studies dataset correctly and then run:

htc dataset_open_atlas --output-path /mnt/nvme_4tb/HeiPorSPECTRAL

This also generates all the intermediate files and uploads the zip archive.

Step 1b: make the new dataset accessible

The following steps need access to the new (or downloaded) dataset. Therefore, please adjust your environment variables (according to the README) so that no environment variables for the network drive are set and that no other dataset is registered (to ensure that the scripts really only use the HeiPorSPECTRAL dataset), e.g. via the following .env:

export PATH_Tivita_HeiPorSPECTRAL=/mnt/nvme_4tb/HeiPorSPECTRAL
export PATH_HTC_RESULTS=~/htc/results

# DKFZ internal only
export PATH_E130_Projekte=""

With these settings, the generated files will be stored in ~/htc/results/open_data.

Step 2: label profiles

To generate the label profile images (similar to the profile images in the intermediates directory but with aggregated data) per image, simply run

htc label_profiles

This will generate a PDF per label.

Step 3: dimensionality reduction figures

The PCA and UMAP figures of the paper can be generated by running the DataVisualizations.ipynb notebook:

jupyter nbconvert --to html --execute --stdout ~/htc/src/paper/NatureData2023/DataVisualizations.ipynb > /dev/null

This will create PDF and HTML files for all PCA and UMAP visualizations.

Step 4: technical validation figures

The colorchecker comparison between the Tivita camera and the spectrometer can be generated by running the TechnicalValidation.ipynb notebook:

jupyter nbconvert --to html --execute --stdout ~/htc/src/paper/NatureData2023/TechnicalValidation.ipynb > /dev/null

This will generate a PDF and a HTML file with the colorchecker figure.

Step 5: README assets

To generate the example gif file which is shown in the README of the dataset, you can run

htc readme_gif

This will generate a GIF and a PNG file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproducibility.md

reproducibility.md

Reproducibility

Prerequisite

Step 1a: generate the dataset (DKFZ internal only)

Step 1b: make the new dataset accessible

Step 2: label profiles

Step 3: dimensionality reduction figures

Step 4: technical validation figures

Step 5: README assets

Files

reproducibility.md

Latest commit

History

reproducibility.md

File metadata and controls

Reproducibility

Prerequisite

Step 1a: generate the dataset (DKFZ internal only)

Step 1b: make the new dataset accessible

Step 2: label profiles

Step 3: dimensionality reduction figures

Step 4: technical validation figures

Step 5: README assets