Official Repo for "DualTrack: Sensorless 3D Ultrasound Needs Local and Global Context" (MICCAI ASMUS Workshop 2025, arxiv paper) and winner of the TUS-REC 2025 challenge.
Motivation. 3D Ultrasound is cost-effective and has many clinical applications. AI models can analyze 2D ultrasound scans to infer the scan trajectory and build a 3D image, eliminating the need for expensive and/or cumbersome hardware used in conventional 3D ultrasound.
Method. Two types of information can be used to infer scan trajectory from a 2D ultrasound sequence:
- Local features: frame-to-frame motion cues and speckle patterns.
- Global features: scan-level context such as anatomical landmarks and the shape/continuity of anatomical structures.
To best exploit these dual, complementary sources of information, we designed a network called DualTrack. DualTrack features a dual-encoder architecture, with separate modules specializing in local and global features, respectively. These features are combined using a powerful fusion module to predict scan trajectory.
Results. On the TUS-REC 2024 benchmark—a large dataset of over 1000 forearm scans with complex trajectory shapes—DualTrack achieved an average error of < 5 mm (a statistically significant 18.3% improvement over prior state-of-the-art).
We’ve since adapted DualTrack to numerous other datasets with excellent results:
Dataset | Avg. Error (mm) |
---|---|
Carotid artery scans | 3.4 |
Thyroid scans | 4.9 |
TUS-REC 2025 Challenge Dataset | 9.2 |
Efficiency. DualTrack is efficient and runs on a consumer GPU in < 0.5 s for a 30-second ultrasound scan.
- 🧭 Dual-encoder design for local and global context
- 🔗 Robust feature fusion for trajectory prediction
- 📏 Accurate: < 5 mm error on TUS-REC 2024; strong cross-dataset results
- ⚡ Fast: sub-second inference on consumer GPUs
Model | Dataset | Avg. GPE Error (mm) | Download link | Config |
---|---|---|---|---|
DualTrack | TUS-REC 2024 | 4.9 (validation set) | dualtrack_final.pt | configs/model/dualtrack.yaml |
DualTrack Finetuned (TUS-REC 2025 Challenge winner) | TUS-REC 2025 | 9.2 | dualtrack_ft_tus_rec_2025_v3_best.pt | configs/model/dualtrack_ft_tus_rec_2025.yaml |
Instantiate the model using the following code snippet:
from omegaconf import OmegaConf
from src.models import get_model
cfg_path = 'path/to/config.yaml'
cfg = OmegaConf.load(cfg_path)
cfg.checkpoint = 'path/to/checkpoint.pt'
model = get_model(**cfg)
We thank the TUS-REC challenge organizing team for putting together the datasets used for training and benchmarking our models! If you find this work interesting please also check out the TUS-REC 2024 paper and dataset.
Create a Python environment with python>=3.10
and install the requirements lists in requirements.txt
.
To store a tracked ultrasound sweep, this codebase uses an h5 file with the following keys/data structures:
-
images
:$N \times H \times W$ uint8
array containing the pixel values of each ultrasound image in the sweep. Here,$N$ is the number of timesteps in the sweep, and$H$ and$W$ are the height and witdth (axial and lateral dimensions) of the ultrasound image. -
tracking
:$N \times 4 \times 4$ float
array containing the sequence$T_0, T_1, ..., T_N$ of tracking transforms. Each$T_i$ is a stored as a$4 \times 4$ homogeneous transform matrix, mapping from the image coordinate system to the world coordinate system. The image system is in$mm$ relative to the center of the image, with the following orientation for a vector$(x, y, z, 1)$ : -
dimensions
: a single array storing the image dimensions as$(W, H, 1)$ -
spacing
: a single array storing the image spacing (millimeters per pixel) as (W_spacing
,H_spacing
,$1$ ) -
pixel_to_image
: a single $ 4\times 4$float
array containing the transform that maps from the pixel coordinate system to the image coordinate system. The pixel coordinate system has the same orientation as the image coordinate system, but its origin is at the top-left of the image, and its units are in pixels rather than millimeters. This is used for dense displacement field metrics which are based on the physical positions of image points.
If you have a collection of .h5
files in this format, it is easy to create and register a "dataset" with the code base. To prepare a dataset for training and evaluation, first create a .csv
file containing at least 3 columns:
sweep_id
, a unique id for each sweepprocessed_sweep_path
, the.h5
filepath corresponding to the sweepsplit
, one of[train, val]
indicating whether the sweep should be used for
Finally, you should register your dataset by creating a file (or adding to a file) located at data/datasets.yaml with the following format:
tus-rec:
data_csv_path: /path/to/metadata.csv
my-dataset-2:
data_csv_path: "..."
Now, the dataset will be registered with the codebase. You can test this by running:
from src.datasets.sweeps_dataset_v2 import SweepsDataset
ds = SweepsDataset(name='tus-rec')
print(ds[0]['images'].shape) # print the loaded sweep shape (N_timesteps x H x W) array
DualTrack uses the train.py
script for training and evaluate.py
script for evaluation, for example:
python train.py -c path/to/config --log_dir="experiment/v0"
python evaluate.py -c path/to/config --log_dir="experiment/eval/v0"
Note: Training scripts will generate a log directory where checkpoints (best/last) will be saved. Certain experiments will use the checkpoints of a previous experiment to initialize components of the model.
Training configurations are found in the folder configs/dualtrack_train_tus_rec/, and evaluation configurations are found in the folder configs/dualtrack_evaluation. A typical config looks like the following:
model:
name: dualtrack_loc_enc_stg1
data: # dataset options
version: local_encoder
dataset: tus-rec # <- use the name you registered your dataset with
sequence_length_train: 16
augmentations: true
train: # training options
lr: 0.0001
epochs: 5000
warmup_epochs: 0
weight_decay: 0.001
batch_size: 16
val_every: 100
seed: 0
device: cuda
use_amp: true
logger: wandb # could be tensorboard, or console if not using wandb
logger_kw:
wandb_project: dualtrack # logger specific options
debug: false
Training DualTrack involves three main steps:
- Pretrain the local encoder
- Pretrain the global encoder
- Train the final model
Training the fusion model happens in 3 stages:
Pretraining step 1 - we pretrain the 3d CNN backbone on small subsequences of images for 5000 epochs (should take 4-5 days on NVIDIA A40 GPU). Use this config.
Pretrain step 2 - we add a vit stage for frame-wise spatial self-attention on top of the frozen CNN backbone of stage 1 using this config. You will need to edit the model.backbone_weights
field to point to the best checkpoint from the step 1 experiment.
Pretrain step 3 - here we add temporal attention stage and pretrain it on top of the frozen CNN + vit model of stage 2 using this config. Similarly, edit model.backbone_weights
.
The second step of DualTrack is to pretrain the global encoder using sparsely sampled subsequences of the ultrasound frames. The global encoder consists of an image backbone and then a transformer temporal self-attention stage. Here we have several options for the image backbone: CNN, iBOT, MedSAM, and USFM. The code can easily be adapted to using other backbones. Note that some backbones require pretrained weights or add dependencies. Choose one of the configs in configs/dualtrack_train_tus_rec/global_encoder (we recommend cnn.yaml
as a good starting point with no extra dependencies).
The final step is to combine the global and local encoders using a fusion module. The relevant config is configs/dualtrack_train_tus_rec/dualtrack.yaml. You need to edit the config to point to the local encoder and global encoder checkpoints to load their weights.
Scripts will log aggregate metrics information from the training and validation sets throughout training. Once we have our final model, to run a full test routine, we can use evaluate.py
script. For example, to reproduce the numbers from the bottom row of Table 1 in the paper, we can run:
DUALTRACK_FINAL_CHECKPOINT_PATH=/path/to/dualtrack_final/ python evaluate.py -c configs/dualtrack_evaluation/dualtrack_final.yaml # -h for help with extra eval options.
DualTrack adapts readily to new datasets through finetuning. For example, see configs/tus_rec_challenge_2025 which configures a sequence of finetuning experiments to adapt the original DualTrack model to the TUS-REC 2025 challenge winning model! In configs/tus_rec_challenge_2025/stg0.yaml, we've highlighted some relevant config flags and their purpose.
We reproduced the following baselines for tracking estimation:
We have the 2-Frame CNN method based on Prevost et al. 2018. To train it, run:
python scripts/baselines/run_baseline_2_frame_cnn_reprod.py --log_dir experiments/baselines/2-frame-cnn --scheduler=none --model efficientnet_b1 --epochs 6700 --batch_size=16 --optimizer=adam --epoch_mode tus_rec --validate_every 100 --dataset tus-rec --val_datasets tus-rec --flip_h_prob=.5 --reverse_sweep_prob=0.5 --skip_frame_prob=0.2
To run a full test loop, run:
python scripts/baselines/run_baseline_2_frame_cnn_reprod.py --dataset tus-rec-val --model efficientnet_b1 --epochs 6700 --train_dir experiments/baselines/2-frame-cnn --test_dataset tus-rec-val
To run the DCLNet method of Guo et al. 2020, run:
# train
python scripts/baselines/train_dcnet.py -c scripts/baselines/dcnet.yaml --log_dir=experiments/baselines/dcnet
# test
python scripts/baselines/train_dcnet.py test --train_dir=experiments/baselines/dcnet --test_dataset tus-rec-val
# train
python scripts/baselines/run_monet_baseline.py --epochs=3000 --log_dir=experiments/baselines/monet
# test
python scripts/baselines/run_monet_baseline.py --batch_size=1 --use_full_scan_for_val --log_dir=experiments/baselines/monet --dataset=tus-rec-val test
Hybrid transformer is implemented based on the paper "Spatial Position Estimation Method for 3D Ultrasound Reconstruction Based on Hybrid Transfomers" Ning et al. 2022
The standard method as described in the paper can be run as follows:
python scripts/baselines/run_ning_et_al_reprod.py --hidden_size=128 --log_dir=experiments/baselines/hybrid_transformer
We found it to improve performance and reduce computational complexity of training to implement a 2-stage training setup for this model, where we pretrain the CNN component, export its features, then train the transformer component on top of these features (unlike the original paper which used end to end training). To reproduce these steps, we run the following
- pretrain the cnn
python scripts/dualtrack_legacy/train_local_encoder.py --log_dir=experiments/baselines/hybrid_transformer/stage1 --epochs=5000 --lr=1e-4 --weight_decay=1e-3 --run_validation_every_n_epochs=100 --batch_size=16 --sequence_length_train=16 --augmentations --model=vidrn18_small_window_trck_reg_causal
- export its features
python scripts/dualtrack_legacy/train_local_encoder.py --log_dir=experiments/baselines/hybrid_transformer/stage1 --batch_size=1 --model=vidrn18_small_window_trck_reg_causal --model_kwargs checkpoint=experiments/baselines/hybrid_transformer/stage1/checkpoint/best.pt --cached_features_file experiments/baselines/hybrid_transformer/stage1/features.h5 export_features
python scripts/dualtrack_legacy/train_local_encoder.py --log_dir=experiments/baselines/hybrid_transformer/stage1 --batch_size=1 --model=vidrn18_small_window_trck_reg_causal --model_kwargs checkpoint=experiments/baselines/hybrid_transformer/stage1/checkpoint/best.pt --cached_features_file experiments/baselines/hybrid_transformer/stage1/features.h5 --dataset=tus-rec-val export_features
- train on top of these features
python scripts/baselines/run_ning_et_al_reprod.py --features_path=data/pre-computed-features/lively-blaze_causal/feats.h5 --hidden_size=128 --log_dir=experiments/baselines/hybrid_transformer/stage2