Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
228c945
add bids code
aparnabg Sep 24, 2025
739d352
add log _file.py
aparnabg Sep 24, 2025
1564f16
add .sh
aparnabg Sep 25, 2025
c124e0c
add bids code
aparnabg Oct 1, 2025
725305a
add bids code
aparnabg Oct 1, 2025
cb634ce
Add participants.tsv population file
manaalm Oct 14, 2025
ea61c29
Create README.md
manaalm Oct 14, 2025
74bb9f4
Added configuration file for BIDS conversion
Oct 30, 2025
bb918b9
Cleaned src folder
Oct 30, 2025
59b66e3
Final script for BIDS conversion added
Oct 30, 2025
27cb826
Added poetry dependencies
Oct 30, 2025
47227b1
Modified test for new BIDS convertor script
Oct 30, 2025
acf2e66
updated README with BIDS-conversion pipeline
Oct 31, 2025
b1d262c
final cleaning and merge after script execution
Oct 31, 2025
348973b
changed number of jobs in submission file
Oct 31, 2025
e57bcae
Added logs to .gitignore
Oct 31, 2025
84f2e35
fixed last shell scripts
Nov 3, 2025
d6ac5b8
Update jobs/run_bids_convertor.sh
lucie271 Nov 3, 2025
770e600
Update jobs/merge_cleanup.sh
lucie271 Nov 3, 2025
316f227
Update src/tests/test_BIDS_convertor.py
lucie271 Nov 3, 2025
491a431
Update jobs/merge_cleanup.sh
lucie271 Nov 3, 2025
f8ed634
untrack poetry.lock and add logs folder
Nov 3, 2025
b50ba76
BIDS_convertor.py in sailsprep
Nov 3, 2025
e16f84b
Fixed little warnings from PR
Nov 3, 2025
04bad65
Fixed warnings in BIDS_convertor.py from PR
Nov 4, 2025
50a80b9
Cleaned /logs handling
Nov 4, 2025
0fce931
Changed source video to raw folder
Nov 4, 2025
c8699e4
Update src/tests/test_BIDS_convertor.py
Nov 4, 2025
1e04269
fixed issues of execution
Nov 4, 2025
9e5c76d
added documentation
Nov 4, 2025
d80bb46
Merge branch 'main' into BIDS-conversion
lucie271 Nov 4, 2025
21003fd
fixed scripts for unit tests
Nov 4, 2025
5bad1f5
Merge branch 'main' into BIDS-conversion
Nov 4, 2025
85f3384
Merge branch 'BIDS-conversion' of https://github.com/sensein/sailspre…
Nov 4, 2025
333f5f4
updated unit test
Nov 4, 2025
7768f7e
Added unit tests
Nov 5, 2025
5117e74
Change number of array
Nov 5, 2025
78e611e
Fixed error mypy
Nov 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -164,3 +164,9 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

#logs
logs/

#ignore poetry.lock
poetry.lock
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ repos:
hooks:
- id: mypy
args: [--ignore-missing-imports]
additional_dependencies:
- types-PyYAML
- repo: https://github.com/macisamuele/language-formatters-pre-commit-hooks
rev: v2.12.0
hooks:
Expand Down
32 changes: 27 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,47 @@ Welcome to the ```sailsprep``` repo! This is a Python repo for doing incredible

**Caution:**: this package is still under development and may change rapidly over the next few weeks.

This will convert the raw video into BIDS format in a clean fashion.
## Features
- A few
- Cool
- Things
- These may include a wonderful CLI interface.

## Installation
To manage dependencies, this project uses Poetry. Make sure you've got poetry installed.
On Engaging, you need to first run at the root of the repo :
```
module load miniforge
pip install poetry
poetry install
```

The BIDS-conversion tool of sailsprep requires FFmpeg ≥ 6.0 compiled with the vidstab library.
Because FFmpeg compiled with vidstab is not a Python package, it must be installed separately.
You'll need to run (outside any environment):

```
cd ~
wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
tar -xJf ffmpeg-release-amd64-static.tar.xz
mv ffmpeg-*-static ffmpeg_static
export PATH="$HOME/ffmpeg_static:$PATH"

```

Get the newest development version via:

```sh
pip install git+https://github.com/sensein/sailsprep.git
```

## Quick start
```Python
from sailsprep.app import hello_world

hello_world()
```
Tools developped in sailsprep
|Tool|Documentation|
|----|--------------|
|BIDS-conversion| [link to documentation](docs/BIDS_convertor.md)


## Contributing
We welcome contributions from the community! Before getting started, please review our [**CONTRIBUTING.md**](https://github.com/sensein/sailsprep/blob/main/CONTRIBUTING.md).
Expand Down
17 changes: 17 additions & 0 deletions configs/config_bids_convertor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Video Processing Configuration

# Input data
annotation_file: /orcd/data/satra/002/datasets/SAILS/data4analysis/Video Rating Data/SAILS_RATINGS_ALL_DEDUPLICATED_NotForFinalAnalyses_2025.10.csv
video_root: /orcd/data/satra/002/datasets/SAILS/Phase_III_Videos/Videos_from_external
asd_status: /orcd/data/satra/002/datasets/SAILS/data4analysis/ASD_Status.xlsx

# Output data
output_dir: /orcd/scratch/bcs/001/sensein/sails/BIDS_data

# Video processing parameters
target_resolution: 1280x720
target_framerate: 30

# Derived directory names (optional — can be built dynamically)
final_bids_root: final_bids-dataset
derivatives_subdir: derivatives/preprocessed
52 changes: 52 additions & 0 deletions docs/BIDS_convertor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
## BIDS Format

For reproducibility, organization, and practicality, sailsprep converts its raw data into the BIDS (Brain Imaging Data Structure) format.
BIDS is a community-driven standard for organizing, naming, and describing neuroimaging and related data (e.g., EEG, fMRI, MEG, behavioral, physiological data, etc.).

During the BIDS conversion pipeline, the raw domestic videos are preprocessed to be standardized, denoised, and reformatted.
Relevant metadata and annotations necessary for downstream analysis are also extracted at this stage.

## Structure

The final BIDS dataset follows the structure below:
```graphql
├── sub-ID1 # Contains raw videos in BIDS format
│ ├── ses-01 # Videos between 12 and 16 months
│ │ └── beh # Behavioral data
│ │ ├── sub-ID1_ses-01_task-A_run-01_beh.mp4 # Standardized raw video
│ │ ├── sub-ID1_ses-01_task-A_run-01_beh.tsv # Manual annotations
│ │ └── sub-ID1_ses-01_task-A_run-01_beh.json # Info on standardization
│ └── ses-02 # Videos between 34 and 38 months
│ └── beh
├── derivatives
│ └── preprocessed # Contains stabilized, denoised, standardized videos
│ ├── sub-ID1
│ │ ├── ses-01
│ │ │ └── beh
│ │ │ ├── sub-ID1_ses-01_task-A_run-01_audio.json # Audio extraction info
│ │ │ ├── sub-ID1_ses-01_task-A_run-01_audio.wav # Extracted audio
│ │ │ ├── sub-ID1_ses-01_task-A_run-01_desc-processed.json # Video preprocessing info
│ │ │ └── sub-ID1_ses-01_task-A_run-01_desc-processed_beh.mp4 # Preprocessed video
│ │ └── ses-02
│ └── sub-ID2
├── README.md # Explains dataset structure and content
├── participants.tsv # Participant information (e.g., ASD status)
├── participants.json # Metadata for participants.tsv
└── dataset_description.json # BIDS dataset description (name, version, etc.)
```
## Execution

To verify that FFmpeg is correctly installed (cf [README.md](../README.md)) and at least version 6.0, run:

```
ffmpeg -version
```

You’ll need to submit the conversion job on Engaging using sbatch.
Make sure you are in the root directory of the repository.

We provide SLURM submission scripts for convenience — simply run the following commands (with the miniforge module deactivated to ensure the correct FFmpeg version is used):
```
jid=$(sbatch --parsable jobs/run_bids_convertor.sh)
sbatch --dependency=afterok:$jid jobs/merge_cleanup.sh
```
69 changes: 69 additions & 0 deletions jobs/merge_cleanup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/bin/bash
#SBATCH --job-name=merge_cleanup
#SBATCH --output=logs/merge_cleanup_%j.out
#SBATCH --error=logs/merge_cleanup_%j.err
#SBATCH --time=01:00:00
#SBATCH --mem=2G

# Clean up old logs before running
echo "Cleaning up old logs..."
if [ -d logs ]; then
find logs -mindepth 1 ! -name ".gitkeep" \
! -name "merge_cleanup_${SLURM_JOB_ID}.out" \
! -name "merge_cleanup_${SLURM_JOB_ID}.err" -delete
fi

OUTPUT_DIR=$(poetry run python -c "import yaml, sys; print(yaml.safe_load(open('configs/config_bids_convertor.yaml'))['output_dir'])")
MERGED_DIR="$OUTPUT_DIR"

mkdir -p "$MERGED_DIR"

echo "Merging logs from numbered folders under $OUTPUT_DIR"
echo "Started at $(date)"

merged_processed="$MERGED_DIR/all_processed.json"
merged_failed="$MERGED_DIR/all_failed.json"

# Create empty lists if not exist
echo "[]" > "$merged_processed"
echo "[]" > "$merged_failed"

# Load jq (if not already available)
module load jq 2>/dev/null || true

for folder in "$OUTPUT_DIR"/*/; do
foldername=$(basename "$folder")

if [[ "$foldername" =~ ^[0-9]+$ ]]; then
echo "Merging from folder: $foldername"
if [[ -f "$folder/processing_log.json" ]]; then
tmpfile=$(mktemp)
jq -s 'add' "$merged_processed" "$folder/processing_log.json" > "$tmpfile" && mv "$tmpfile" "$merged_processed"
fi
if [[ -f "$folder/not_processed.json" ]]; then
tmpfile=$(mktemp)
jq -s 'add' "$merged_failed" "$folder/not_processed.json" > "$tmpfile" && mv "$tmpfile" "$merged_failed"
fi
fi
done

echo "Merged logs saved in: $MERGED_DIR"
echo "Now cleaning up numbered folders..."

# Delete only folders with numeric names (avoid final_bids-dataset)
for folder in "$OUTPUT_DIR"/*/; do
foldername=$(basename "$folder")
if [[ "$foldername" =~ ^[0-9]+$ ]]; then
echo "Deleting temporary folder: $foldername"
rm -rf "$folder"
else
echo "Skipping non-numeric folder: $foldername"
fi
done

echo "Cleanup complete at $(date)"

# --- Run final Python merge ---
echo "Running final Python merge and participant file creation..."
poetry run python -c "from sailsprep.BIDS_convertor import merge_subjects, create_participants_file; merge_subjects(); create_participants_file()"
echo "Final BIDS merge and participant file creation complete ✅"
41 changes: 41 additions & 0 deletions jobs/run_bids_convertor.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/bin/bash
#SBATCH --job-name=bids_processing
#SBATCH --partition=mit_normal
#SBATCH --array=0-18
#SBATCH --output=logs/bids_%A_%a.out
#SBATCH --error=logs/bids_%A_%a.err
#SBATCH --mem=5G
#SBATCH --time=10:00:00
#SBATCH --cpus-per-task=5

mkdir -p logs

# --- Determine project root robustly ---
if [ -n "$SLURM_SUBMIT_DIR" ]; then
cd "$SLURM_SUBMIT_DIR" || { echo "❌ Cannot cd to SLURM_SUBMIT_DIR=$SLURM_SUBMIT_DIR"; exit 1; }
else
SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
cd "$SCRIPT_DIR/.." || { echo "❌ Cannot cd to project root"; exit 1; }
fi

echo "Running from project root: $(pwd)"
export PYTHONUNBUFFERED=1

ffmpeg -version || echo "⚠️ FFmpeg not available"

# --- Poetry setup ---
if ! poetry env info --path &> /dev/null; then
echo "Creating Poetry environment..."
poetry install || { echo "❌ Poetry install failed"; exit 1; }
fi

ENV_PATH=$(poetry env info --path)
source "$ENV_PATH/bin/activate" || { echo "❌ Failed to activate Poetry environment"; exit 1; }

echo "Using Python from: $(which python)"
echo "Task ID: ${SLURM_ARRAY_TASK_ID}"
echo "Starting BIDS conversion at $(date)"

python -m sailsprep.BIDS_convertor "$SLURM_ARRAY_TASK_ID" "$SLURM_ARRAY_TASK_MAX"

echo "Finished at $(date)"
Empty file added logs/.gitkeep
Empty file.
6 changes: 5 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,11 @@ requires-poetry = ">=2.0"
version = "0.0.0"

[tool.poetry.dependencies]
click = "~=8.3"
click = "~=8.2"
pandas = "^2.3.3"
opencv-python = "^4.12.0.88"
openpyxl = "^3.1.5"
types-pyyaml = "^6.0.12.20250915"

[tool.poetry.group.dev]
optional = true
Expand Down
Loading