Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using img_feats, a gradient explosion issue occurs. #80

Open
Garibelhj opened this issue Oct 14, 2024 · 0 comments
Open

When using img_feats, a gradient explosion issue occurs. #80

Garibelhj opened this issue Oct 14, 2024 · 0 comments

Comments

@Garibelhj
Copy link

When we use the ScienceQA dataset and use CLIP image features, a gradient explosion occurred. Below is my run log.
====Input Arguments====
{
"data_root": "data",
"output_dir": "experiments",
"model": "allenai/unifiedqa-t5-base",
"options": [
"A",
"B",
"C",
"D",
"E"
],
"epoch": 50,
"lr": 5e-05,
"bs": 4,
"input_len": 512,
"output_len": 512,
"eval_bs": 4,
"eval_acc": null,
"train_split": "train",
"val_split": "val",
"test_split": "test",
"use_generate": true,
"final_eval": false,
"user_msg": "rationale",
"img_type": "clip",
"eval_le": null,
"test_le": null,
"evaluate_dir": null,
"caption_file": "data/instruct_captions.json",
"use_caption": true,
"prompt_format": "QCM-E",
"seed": 42
}
img_features size: (11208, 49, 2048)
number of train problems: 12726

number of val problems: 4241

number of test problems: 4241

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
[14:58:49] [Model]: Loading allenai/unifiedqa-t5-base... main.py:66

       [Data]: Reading data...                                                                                                                                       main.py:67

experiments/rationale_allenai-unifiedqa-t5-base_clip_QCM-E_lr5e-05_bs4_op512_ep50
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at allenai/unifiedqa-t5-base and are newly initialized: ['encoder.gate_dense.bias', 'encoder.gate_dense.weight', 'encoder.image_dense.bias', 'encoder.image_dense.weight', 'encoder.mha_layer.in_proj_bias', 'encoder.mha_layer.in_proj_weight', 'encoder.mha_layer.out_proj.bias', 'encoder.mha_layer.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
model parameters: 228019968
{'loss': 29.2632, 'grad_norm': inf, 'learning_rate': 4.984286612193589e-05, 'epoch': 0.16}
{'loss': 29.2109, 'grad_norm': inf, 'learning_rate': 4.968573224387178e-05, 'epoch': 0.31}
1%|▉ | 1106/159100 [26:08<60:19:10, 1.37s/it]{'loss': 29.2953, 'grad_norm': inf, 'learning_rate': 4.952859836580767e-05, 'epoch': 0.47}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant