Skip to content

Latest commit

 

History

History
26 lines (21 loc) · 1.1 KB

to-llama2-format.md

File metadata and controls

26 lines (21 loc) · 1.1 KB

to-llama2-format

  • domain(s): pretrain
  • accepts: ldc.api.pretrain.PretrainData
  • generates: ldc.api.pretrain.PretrainData

Turns pretrain records into llama2 format. Based on: https://github.com/facebookresearch/llama/blob/ef351e9cd9496c579bf9f2bb036ef11bdc5ca3d2/llama/generation.py#L320

usage: to-llama2-format [-h] [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                        [-N LOGGER_NAME] [--skip_tokens]

Turns pretrain records into llama2 format. Based on: https://github.com/facebo
okresearch/llama/blob/ef351e9cd9496c579bf9f2bb036ef11bdc5ca3d2/llama/generatio
n.py#L320

optional arguments:
  -h, --help            show this help message and exit
  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        The logging level to use. (default: WARN)
  -N LOGGER_NAME, --logger_name LOGGER_NAME
                        The custom name to use for the logger, uses the plugin
                        name by default (default: None)
  --skip_tokens         Whether to leave out the [INST] [/INST] tokens
                        (default: False)