Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation script vs evaluation during training #286

Open
mhoibo opened this issue Nov 20, 2024 · 0 comments
Open

evaluation script vs evaluation during training #286

mhoibo opened this issue Nov 20, 2024 · 0 comments

Comments

@mhoibo
Copy link

mhoibo commented Nov 20, 2024

Hi, Thank you for your great work. I have tried to use clam in a classification task, but have seen that when I train, the validation loss decreases and the accuracies are fairly good (the final values) but when I evaluate with eval.py (on the same splits on the evaluation set ), then get completely different auc and accuracies. Do you know why that could be?

The summary file in the results, tensorboard and the .pkl files created during training, are all the same, but when I reload the model checkpoints with the eval.py script I get very different results on the same data.
For train and eval.py I used values like seen below: (example)

CUDA_VISIBLE_DEVICES=1 python main.py --drop_out 0.25 --max_epochs 100 --early_stopping --lr 2e-4 --k 10 --exp_code <model name> --results_dir <path to save> --weighted_sample --bag_loss ce --inst_loss svm --task <task> --model_type clam_sb --log_data --data_root_dir <path to features> --embed_dim 1024

python eval.py --k 10 --models_exp_code <model name> --save_exp_code <path to to save eval results> --task task --model_type
clam_sb --results_dir <path to training results> --data_root_dir <path to features> --embed_dim 1024 --drop_out 0.25 --model_size small --split val

I appreciate all types of pointers to how to solve or understand this.

I checked that the splits are the same. I also checked that during training the checkpoints are not updated for all epochs, so the best model should be saved in the checkpoints, which are then used for both evaluation during training and in eval.py, am I correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant