Loss, MAE, RMSE of energy from training is higher than those of validation #396

turbosonics · 2023-12-19T15:11:08Z

turbosonics
Dec 19, 2023

Hi,

I'm training a model of multi oxide system using NVT AIMD data of room temperature and high temperature, total 14000 frames. These are VASP OUTCAR data I converted to extxyz. Loss functions are stress, energy, and force. I mostly used the example config for the training including stress, including these in my config file:
model_builders:

SimpleIrrepsConfig
EnergyModel
PerSpeciesRescale
StressForceOutput
RescaleEnergyEtc

Now, from my metrics_epoch.csv data of current training, I'm seeing Loss, RMSE, and MAE of energy from training is higher than those of validation. I think I'm not using any dropout option (#66).

Force and Stress are fine, curves are almost on top of each other and converging. Energy loss and error are also converging, showing proper curves, but just error and loss of training is higher than those of validation. Generalization gap of loss and error of energy looks decreasing, though.

I tested without stress, and different hyper parameters, but it is mostly the same, I still see higher loss, rmse, and mae of energy from training compared to those of validation.

I really can't understand why this happens and how to prevent this. Is this something wrong with my AIMD data?

Answered by Linux-cpp-lisp

Dec 21, 2023

Hi @turbosonics ,

This is likely a result of our default use of EMA averaging for the model in validation and deployment (and not, of course, in training). See, for example, #329. If you want to do an apples-to-apples generalization gap, you can use nequip-evaluate to "validate" the validation EMA averaged model on the training set.

View full answer

Linux-cpp-lisp · 2023-12-21T21:56:24Z

Linux-cpp-lisp
Dec 21, 2023
Maintainer

Hi @turbosonics ,

This is likely a result of our default use of EMA averaging for the model in validation and deployment (and not, of course, in training). See, for example, #329. If you want to do an apples-to-apples generalization gap, you can use nequip-evaluate to "validate" the validation EMA averaged model on the training set.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss, MAE, RMSE of energy from training is higher than those of validation #396

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Loss, MAE, RMSE of energy from training is higher than those of validation #396

turbosonics Dec 19, 2023

Replies: 1 comment

Linux-cpp-lisp Dec 21, 2023 Maintainer

turbosonics
Dec 19, 2023

Linux-cpp-lisp
Dec 21, 2023
Maintainer