After joining Lora, the first few layers show a gradient of 0 #167

hluckye · 2024-03-23T16:32:03Z

I am a beginner in deep learning and I would like to know if the reason for the gradient to be 0 is due to the vanishing gradient or if my data is too small (batch_size=32)。

I tried to add Lora to a three-layer neural network, but the result was that only the gradients of the Lora_a and Lora_b matrices in the last layer were below 1e-2, while the gradients of the other layers were all 0.
My definition of lora. linear is as follows:

self.prednet_full1_lora = lora.Linear(self.prednet_input_len,self.prednet_len1,r=4)
self.prednet_full2_lora = lora.Linear(self.prednet_len1, self.prednet_len2, r=4)
self.prednet_full3_lora = lora.Linear(self.prednet_len2, 1,r=4)

The forward part of the model is shown below (assuming input_x is the input):

input_x = torch.sigmoid(self.prednet_full1_lora.forward(input_x))
input_x = torch.sigmoid(self.prednet_full2_lora.forward(input_x))
output = torch.sigmoid(self.prednet_full3_lora.forward(input_x))

and I don't forget to write ：

loss.backward()
optimizer.step()
net.apply_clipper()

I would greatly appreciate it if you could provide some ideas or solutions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After joining Lora, the first few layers show a gradient of 0 #167

After joining Lora, the first few layers show a gradient of 0 #167

hluckye commented Mar 23, 2024

After joining Lora, the first few layers show a gradient of 0 #167

After joining Lora, the first few layers show a gradient of 0 #167

Comments

hluckye commented Mar 23, 2024