-
Couldn't load subscription status.
- Fork 6
Reference
Chenghao MOU edited this page Oct 22, 2019
·
1 revision
| Models | Size | Category |
|---|---|---|
| Bert base | 110 M | base |
| Bert large | 340 M | large |
| openai gpt | 110 M | base |
| GPT2 | 117 M | weird large |
| XLM | >= 295 M | super large |
| XLnet | 110 M | base |
| XLNet large | 340 M | large |
| roberta | 125 M | base |
| roberta large | 355 M | large |
| distilbert | 60 M | small |
It is impossible to fit super large models in P100s on HPC. Weird large models are base models eating memory like a large one.
| Models | aNLI | hellaswag | piqa | siqa | Config Commit |
|---|---|---|---|---|---|
| Bert (bert-base-cased) | 63.32 | 37.83 | 65.29 | 60.33 | commit |
| Bert (bert-large-cased) | 66.28 | 43.84 | 68.67 | 65 | commit |
| RoBERTa (roberta-base) | 71.54 | 58.51 | 48.03 | 69.09 | commit |
| RoBERTa (roberta-large) | 84.39 | 82.42 | 76.96 | 77.12 | commit |
| XLNet (xlnet-base-cased) | 68.15 | 52.99 | 52.94 | 65.79 | commit |
| XLNet (xlnet-large-cased) | 80.16 | 80.38 | 69.27 | 75.23 | commit |
| GPT (openai-gpt) | 64.23 | 38.15 | 67.11 | 61.73 | commit |
| GPT2 (gpt2) | 53.46 | 26.52 | 48.05 | 35.16 | commit |
| DistilBERT (distilbert-base-uncased) | 60.17 | 35.57 | 64.96 | 52.92 | commit |
With two P100s on HPC, it takes the following time to fine tune a model.
| Tasks | Base Model(3 epochs) | Large Model(3 epochs) |
|---|---|---|
| aNLI | 1 ~ 2 hrs | ~ 7 hrs |
| hellaswag | 6 ~ 8 hrs | 24 hrs |
| physicaliqa | 1 hr | 3 ~ 4 hrs |
| socialiqa | 1 hr | 4 ~ 5 hrs |