- internlm-7b
seq_len | node | zero stage | mirco bs | global bs | time (s/iter) | token /s | mem (GiB) | checkpoint layers |
---|---|---|---|---|---|---|---|---|
2048 | 1 | 1 | 1 | 512 | 31.3 | 4187 | 68G | 0 |
4096 | 1 | 1 | 1 | 512 | 60.9 | 4304 | 71G | 0 |
8192 | 1 | 2 | 1 | 512 | 146.2 | 3586 | 79G | 5 |
- qwen14b
seq_len | node | zero stage | mirco bs | global bs | time (s/iter) | token /s | mem (GiB) | checkpoint layers |
---|---|---|---|---|---|---|---|---|
2048 | 1 | 2 | 1 | 512 | 62.8 | 2087 | 80G | 0 |
4096 | 1 | 2 | 1 | 512 | 141.9 | 1847 | 80G | 30 |
8192 | 1 | 2 | 1 | 512 | 314.5 | 1667 | 80G | all |
- 70b llama-2
seq_len | gpu_num | pp method | have_checkpoint_layers | mirco bs | global bs | time (s/iter) | token /s | mem (GiB) |
---|---|---|---|---|---|---|---|---|
2048 | TP 4 PP 8 DP 1 | parameters | none | 1 | 512 | 95.5 | 343 | 66G |
4096 | TP 4 PP 8 DP 1 | parameters | none | 1 | 512 | 172.3 | 380 | 80G |
8192 | TP 4 PP 8 DP 8 | parameters | 3 * pp | 1 | 1024 | 96 | 341 | 80G |
32k | TP8 PP8 DP2 | parameters | all | 1 | 512 | 607 | 216 | 78G |
64k | TP8 PP8 DP2 | parameters | all | 1 | 512 | 2377 | 110 | 80G |