Code for model architecture using pytorch, experiments run with pytorch-lightning and hydra for configuring hyperparameters. Database management using dask, hyperparameter optimization using optuna and ray, performer implementation based on perfomer-pytorch using fast-transformers CUDA builds.
This codebase is based on life2vec from the paper Using Sequences of Life-events to Predict Human Lives.
The /conf folder contains configs for the experiments:
/experimentcontains configuration for training./taskscontain configuration for data augmentation./trainerand/datamodulecontain configuration for lightning'sTrainer./data_newcontains configuration for data loading and processing.callbacks.yamlcontains configuration for the lightning'sCallbacks.prepare_data.yamlcan be used to run data preprocessing.
The /src folder contains the source code:
- The
/src/dataloaderscontains scripts to preprocess, augment and load data. - The
/src/modelscontains the model's source code. train.py,finetune.py,test.py,tune.pyare used to run a particular stage of the training.prepare_data.pywas used to run the data processing.sample_idx.pyandmultiple_idx.pyare used to generate sequences for individuals in the database, conditioned on some know years.
If using NVIDIA GPUs, we recommend building a container using Dockerfile.
# build datasets
HYDRA_FULL_ERROR=1 python -m src.prepare_data experiment=decode_only
# run training
HYDRA_FULL_ERROR=1 python -m src.train experiment=decode_only
# run finetuning
HYDRA_FULL_ERROR=1 python -m src.finetune generate=decode_only
# run sequence generation (requires specifying parameters)
HYDRA_FULL_ERROR=1 python -m src.multiple_idx generate=decode_only datamodule.batch_size=8 generate.dataloader.file_name=...