Combining Variational Autoencoders and Transformer Language Models for Improved Password Generation

This code is not maintained and may contain outdated and vulnerable dependencies.

Combining Variational Autoencoders and Transformer Language Models for Improved Password Generation

Code to train a variational autoencoder with GPT2-based encoder and decoder on password data, as in

D. Biesner, K. Cvejoski, and R. Sifa, ‘Combining Variational Autoencoders and Transformer Language Models for Improved Password Generation’, in ARES 2022: The 17th International Conference on Availability, Reliability and Security.

Researchgate

acm.org pdf

Bibtex:

@inproceedings{DBLP:conf/IEEEares/BiesnerCS22,
  author    = {David Biesner and
               Kostadin Cvejoski and
               Rafet Sifa},
  title     = {Combining Variational Autoencoders and Transformer Language Models
               for Improved Password Generation},
  booktitle = {{ARES} 2022: The 17th International Conference on Availability, Reliability
               and Security, Vienna,Austria, August 23 - 26, 2022},
  pages     = {37:1--37:6},
  publisher = {{ACM}},
  year      = {2022},
  url       = {https://doi.org/10.1145/3538969.3539000},
  doi       = {10.1145/3538969.3539000},
  timestamp = {Fri, 19 Aug 2022 10:16:29 +0200},
  biburl    = {https://dblp.org/rec/conf/IEEEares/BiesnerCS22.bib},
}

Contact: David Biesner david.biesner@iais.fraunhofer.de

Installation

Install the requirements and the package by

pip install -r requirements.txt
pip install -e .

Or create a new conda environment

conda env create -f environment.yml

Download password data

Use download script to download some password datasets or use your own. Experiments in the paper require the rockyou dataset (~140MB):

python scripts/download_raw_data.py --datasets rockyou --output data/

Train a model

Use a config.yaml file and the training script to train a new model:

python scripts/train.py --config configs/vae_gpt2.yaml

You will need to adjust the config.yaml file for your system.

Update input and output paths:

data_path: &DATA_PATH ~/password_generation/data/rockyou.txt
logging_dir: &LOGGING_DIR ~/password_generation/logging/
checkpoints_dir: &CHECKPOINTS_DIR ~/password_generation/checkpoints/

Disable wandb-logging to only use tensorboard:

logging:
  use_wandb: True
  logging_dir: *LOGGING_DIR

Use in_memory to load the entire dataset into memory before training and use_cache to cache the tokenization process:

in_memory: &IN_MEMORY True
use_cache: &USE_CACHE True

Cached password tokens are stored in the same directory as the password.txt file.

Generate data

To generate data from a model checkpoint, use the generation script:

python scripts/generate.py -m checkpoints/vae_gpt2/run_timestamp/model.pth --config configs/vae_gpt2.yaml --num-passwords 100000 --batch-size 1000

Model checkpoint must match the model definition in the config.yaml file!

Pretrained model

Due to git lfs quotas we can not upload the pretrained model to this repo. Email us to get access to the model checkpoint files.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
data		data
logging		logging
scripts		scripts
src/password_generation		src/password_generation
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTHORS.rst		AUTHORS.rst
LICENSE.txt		LICENSE.txt
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Combining Variational Autoencoders and Transformer Language Models for Improved Password Generation

Installation

Download password data

Train a model

Generate data

Pretrained model

About

Releases

Packages

Languages

License

fraunhofer-iais/password_generation

Folders and files

Latest commit

History

Repository files navigation

Combining Variational Autoencoders and Transformer Language Models for Improved Password Generation

Installation

Download password data

Train a model

Generate data

Pretrained model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages