This repo hosts the code for the paper "Asymmetry in Low-Rank Adapters of Foundation Models". We discover and analysis the asymmetry of the LoRA adapter matrices B
and A
,
Step 1: Please follow the installation steps. First, make sure you have Pytorch installed.
pip3 install torch==1.13.0 torchvision
Step 2: Then install the rest of the required packages:
cd AsymmetryLoRA
pip install -r requirement.txt
Our LoRASYM module follows the structure of the peft module. Specifically, we provide a flexible interface to account for the initialization settings of matrices A and B:
- V and U: Right and left singular matrices of the original weight matrix.
- random: Initializes with a random orthonormal matrix.
- he: Uses
torch.nn.init.kaiming_uniform_
for random uniform distribution, optimizing neural network layer activations.
You can customize matrices A and B with these options.
Matrix | Options | Example | Explanation |
---|---|---|---|
A | V , rand , he , zero |
A_rand |
A is intialized as random orthonormal matrix and is freezed during training. |
B | U , rand , he , zero |
hB_zero |
B is initialized as zero and will be updated. |
Explaination: A_rand_hB_zero
means A is initialized as random orthonormal and unchanged, while B starts at zero and is being updated.
We provide a wrapper that compiles with other models from Huggingface's transformer models. The following is an example of usage:
from transformers import AutoModelForSequenceClassification
from LoRASYM_peft.local_peft_model_all import PeftModelForCausalLM_local,
from LoRASYM_peft.local_lorasym_all import LoRASYMConfig
model = AutoModelForSequenceClassification.from_pretrained(
model_args.model_name_or_path,
)
update_rule_dict = para_dict = {"update_A": False, "update_B": True,
"A_init": "rand", "B_init": "zero"}
lorasym_config = LoRASYMConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
modules_to_save=["classifier"],
update_rule=update_rule_dict,
task_type="SEQ_CLS",
)
lora_model = PeftModelForCausalLM_local(model, lorasym_config)
Use the following command to fine-tune RoBERTa-large model for tasks in the GLUE benchmark.
cd GPT_experiments
python -m run_glue_origin_ft --model_name_or_path roberta-large \
--task_name rte \
--ft_method LoRASYM \
--bf16 True \
--tf32 True \
--do_train \
--do_eval \
--learning_rate 4e-4 \
--num_train_epochs 20 \
--input_seed 7 \
--lora_svd_method A_rand_hB_zero \
--lora_rank 8 \
--lora_alpha 16 \
--overwrite_output_dir
If you have any questions related to the code or the paper, feel free to email Jiacheng Zhu (zjc@mit.edu). Please feel free to open an issue if you encounter any problems when using the code.
Please cite our paper if you find the repo helpful in your work:
@article{zhu2024asymmetry,
title={Asymmetry in Low-Rank Adapters of Foundation Models},
author={Jiacheng Zhu and Kristjan Greenewald and Kimia Nadjahi and Haitz Sáez de Ocáriz Borde and Rickard Brüel Gabrielsson and Leshem Choshen and Marzyeh Ghassemi and Mikhail Yurochkin and Justin Solomon},
year={2024},
}