Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marian transfomer configuration files #1

Open
ZJaume opened this issue Feb 14, 2023 · 0 comments
Open

Marian transfomer configuration files #1

ZJaume opened this issue Feb 14, 2023 · 0 comments

Comments

@ZJaume
Copy link

ZJaume commented Feb 14, 2023

This are mostly borrowed from the predefined aliases in Marian in the --task option.

transformer-base

mini-batch-fit: True
shuffle-in-ram: true

after: 600000u
keep-best: True
save-freq: 5000
overwrite: True
disp-freq: 1000
disp-first: 10
quiet-translation: true
early-stopping: 10
early-stopping-on: first
valid-freq: 5000
valid-mini-batch: 64
valid-metrics:
    - chrf
    - ce-mean-words
    - bleu-detok

beam-size: 6
normalize: 1
exponential-smoothing: 0.0001
max-length: 200

cost-type: ce-mean-words
type: transformer
enc-depth: 6
dec-depth: 6
dim-emb: 512
transformer-heads: 8
transformer-dim-ffn: 2048
transformer-ffn-depth: 2
transformer-ffn-activation: swish
transformer-decoder-autoreg: self-attention

transformer-dropout: 0.1
label-smoothing: 0.1
layer-normalization: True

learn-rate: 0.0003
lr-warmup: 16000
lr-decay-inv-sqrt: 16000
lr-report: True
optimizer-params:
    - 0.9
    - 0.98
    - 1e-09
clip-norm: 0 #disable clipnorm because it's buggy
sync-sgd: true

transformer-big

mini-batch-fit: True
shuffle-in-ram: true

after: 600000u
keep-best: True
save-freq: 5000
overwrite: True
disp-freq: 1000
disp-first: 10
quiet-translation: true
early-stopping: 10
early-stopping-on: first
valid-freq: 5000
valid-mini-batch: 32
valid-metrics:
    - chrf
    - ce-mean-words
    - bleu-detok

beam-size: 6
normalize: 1.0
exponential-smoothing: 1e-4
max-length: 200

cost-type: ce-mean-words
type: transformer
enc-depth: 6
dec-depth: 6
dim-emb: 1024
transformer-heads: 16
transformer-dim-ffn: 4096
transformer-ffn-depth: 2
transformer-ffn-activation: swish
transformer-decoder-autoreg: self-attention

transformer-dropout: 0.1
label-smoothing: 0.1
layer-normalization: True

learn-rate: 0.0002
lr-warmup: 8000
lr-decay-inv-sqrt: 8000
lr-report: True
optimizer-params:
    - 0.9
    - 0.998
    - 1e-09
clip-norm: 0
sync-sgd: true

To have enough batch size depending on GPUs used, my use cases have been:

  • transformer-base in 12GB 4xGPUs: 8000 workspace and optmizer-delay 4.
  • transformer-base in 12GB 1xGPU: 8000 workspace and optimizer-delay 8.
  • transformer-big in 40GB 4xGPU: 30000 workspace and optimizer-delay 2.

The vocabulary part would be:

dim-vocabs:
  - 32000
  - 32000
vocabs:
  - vocab.spm # share vocab
  - vocab.spm
tied-embeddings-all: true  # tie source embeddings with target and output embeddings

to share sentencepiece vocab and embeddings. This gives a very easy to use marian, as it only needs raw text as input and it handles all the tokenization.

For languages that don't share script:

dim-vocabs:
  - 32000
  - 32000
vocabs:
  - vocab.src.spm  # separated vocab
  - vocab.trg.spm
tied-embeddings: true # tie only target and output embeddings

We might want to enable byte fallback for all the sentencepiece vocabs to mitigate broken outputs when some strange character comes in:

sentence-piece-options: '--byte_fallback'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant