Skip to content

albcab/life-sequence-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Life Sequence Transformer: Generative Modelling for Counterfactual Simulation

Code for model architecture using pytorch, experiments run with pytorch-lightning and hydra for configuring hyperparameters. Database management using dask, hyperparameter optimization using optuna and ray, performer implementation based on perfomer-pytorch using fast-transformers CUDA builds.

This codebase is based on life2vec from the paper Using Sequences of Life-events to Predict Human Lives.

Overall Structure

The /conf folder contains configs for the experiments:

  1. /experiment contains configuration for training.
  2. /tasks contain configuration for data augmentation.
  3. /trainer and /datamodule contain configuration for lightning's Trainer.
  4. /data_new contains configuration for data loading and processing.
  5. callbacks.yaml contains configuration for the lightning's Callbacks.
  6. prepare_data.yaml can be used to run data preprocessing.

The /src folder contains the source code:

  1. The /src/dataloaders contains scripts to preprocess, augment and load data.
  2. The /src/models contains the model's source code.
  3. train.py, finetune.py, test.py, tune.py are used to run a particular stage of the training.
  4. prepare_data.py was used to run the data processing.
  5. sample_idx.py and multiple_idx.py are used to generate sequences for individuals in the database, conditioned on some know years.

If using NVIDIA GPUs, we recommend building a container using Dockerfile.

Run Training and Experiments

# build datasets
HYDRA_FULL_ERROR=1 python -m src.prepare_data experiment=decode_only

# run training
HYDRA_FULL_ERROR=1 python -m src.train experiment=decode_only

# run finetuning
HYDRA_FULL_ERROR=1 python -m src.finetune generate=decode_only

# run sequence generation (requires specifying parameters)
HYDRA_FULL_ERROR=1 python -m src.multiple_idx generate=decode_only datamodule.batch_size=8 generate.dataloader.file_name=...

About

Code to reproduce experiments in Life Sequence Transformer: Generative Modelling of Socio-Economic Trajectories from Administrative Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors