Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main'
Browse files Browse the repository at this point in the history
  • Loading branch information
dchourasia committed Sep 27, 2024
2 parents 7f6d6be + 0c6a062 commit 4e878bf
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 4 deletions.
15 changes: 13 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -647,10 +647,10 @@ The list of configurations for various `fms_acceleration` plugins:
- [quantized_lora_config](./tuning/config/acceleration_configs/quantized_lora_config.py): For quantized 4bit LoRA training
- `--auto_gptq`: 4bit GPTQ-LoRA with AutoGPTQ
- `--bnb_qlora`: 4bit QLoRA with bitsandbytes
- [fused_ops_and_kernels](./tuning/config/acceleration_configs/fused_ops_and_kernels.py) (experimental):
- [fused_ops_and_kernels](./tuning/config/acceleration_configs/fused_ops_and_kernels.py):
- `--fused_lora`: fused lora for more efficient LoRA training.
- `--fast_kernels`: fast cross-entropy, rope, rms loss kernels.
- [attention_and_distributed_packing](./tuning/config/acceleration_configs/attention_and_distributed_packing.py) (experimental):
- [attention_and_distributed_packing](./tuning/config/acceleration_configs/attention_and_distributed_packing.py):
- `--padding_free`: technique to process multiple examples in single batch without adding padding tokens that waste compute.
- `--multipack`: technique for *multi-gpu training* to balance out number of tokens processed in each device, to minimize waiting time.

Expand All @@ -663,6 +663,7 @@ Notes:
- pass `--fast_kernels True True True` for full finetuning/LoRA
- pass `--fast_kernels True True True --auto_gptq triton_v2 --fused_lora auto_gptq True` for GPTQ-LoRA
- pass `--fast_kernels True True True --bitsandbytes nf4 --fused_lora bitsandbytes True` for QLoRA
- Note the list of supported models [here](https://github.com/foundation-model-stack/fms-acceleration/blob/main/plugins/fused-ops-and-kernels/README.md#supported-models).
* Notes on Padding Free
- works for both *single* and *multi-gpu*.
- works on both *pretokenized* and *untokenized* datasets
Expand All @@ -671,6 +672,16 @@ Notes:
- works only for *multi-gpu*.
- currently only includes the version of *multipack* optimized for linear attention implementations like *flash-attn*.

Note: To pass the above flags via a JSON config, each of the flags expects the value to be a mixed type list, so the values must be a list. For example:
```json
{
"fast_kernels": [true, true, true],
"padding_free": ["huggingface"],
"multipack": [16],
"auto_gptq": ["triton_v2"]
}
```

Activate `TRANSFORMERS_VERBOSITY=info` to see the huggingface trainer printouts and verify that `AccelerationFramework` is activated!

```
Expand Down
5 changes: 5 additions & 0 deletions build/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -137,9 +137,14 @@ RUN --mount=type=cache,target=/home/${USER}/.cache/pip,uid=${USER_UID} \
python -m pip install --user "$(head bdist_name)" && \
python -m pip install --user "$(head bdist_name)[flash-attn]"

# fms_acceleration_peft = PEFT-training, e.g., 4bit QLoRA
# fms_acceleration_foak = Fused LoRA and triton kernels
# fms_acceleration_aadp = Padding-Free Flash Attention Computation
RUN if [[ "${ENABLE_FMS_ACCELERATION}" == "true" ]]; then \
python -m pip install --user "$(head bdist_name)[fms-accel]"; \
python -m fms_acceleration.cli install fms_acceleration_peft; \
python -m fms_acceleration.cli install fms_acceleration_foak; \
python -m fms_acceleration.cli install fms_acceleration_aadp; \
fi

RUN if [[ "${ENABLE_AIM}" == "true" ]]; then \
Expand Down
2 changes: 2 additions & 0 deletions build/accelerate_launch.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ def main():
#
##########
output_dir = job_config.get("output_dir")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
try:
# checkpoints outputted to tempdir, only final checkpoint copied to output dir
launch_command(args)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ class AccelerationFrameworkConfig:
PaddingFree,
ConfigAnnotation(
path="training.attention",
experimental=True,
experimental=False,
required_packages=["aadp"],
),
] = None
Expand All @@ -112,7 +112,7 @@ class AccelerationFrameworkConfig:
MultiPack,
ConfigAnnotation(
path="training.dataloader",
experimental=True,
experimental=False,
required_packages=["aadp"],
),
] = None
Expand Down

0 comments on commit 4e878bf

Please sign in to comment.