Loss, MAE, RMSE of energy from training is higher than those of validation #396
-
Hi, I'm training a model of multi oxide system using NVT AIMD data of room temperature and high temperature, total 14000 frames. These are VASP OUTCAR data I converted to extxyz. Loss functions are stress, energy, and force. I mostly used the example config for the training including stress, including these in my config file:
Now, from my metrics_epoch.csv data of current training, I'm seeing Loss, RMSE, and MAE of energy from training is higher than those of validation. I think I'm not using any dropout option (#66). Force and Stress are fine, curves are almost on top of each other and converging. Energy loss and error are also converging, showing proper curves, but just error and loss of training is higher than those of validation. Generalization gap of loss and error of energy looks decreasing, though. I tested without stress, and different hyper parameters, but it is mostly the same, I still see higher loss, rmse, and mae of energy from training compared to those of validation. I really can't understand why this happens and how to prevent this. Is this something wrong with my AIMD data? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @turbosonics , This is likely a result of our default use of EMA averaging for the model in validation and deployment (and not, of course, in training). See, for example, #329. If you want to do an apples-to-apples generalization gap, you can use |
Beta Was this translation helpful? Give feedback.
Hi @turbosonics ,
This is likely a result of our default use of EMA averaging for the model in validation and deployment (and not, of course, in training). See, for example, #329. If you want to do an apples-to-apples generalization gap, you can use
nequip-evaluate
to "validate" the validation EMA averaged model on the training set.