This repo provides the whole pizza for fine-tuning GPTBigCode models (e.g. StarCoder) on code generation tasks. It includes:
- Constant Length Dataset Loader
- Scaling laws for computing the correct number of steps, given number of gpus, effective batch size, and number of epochs
- LoRA, with 8, 4 bits and QLoRA (double quant) support
- DeepSpeed support for fine-tuning large models
- Edu-score filtering to remove non-educational data
- Multi-language loss evaluation (using MultiPL-E evaluation datasets)
- Custom tokenizer injection
- Automatic mixed precision quantization