Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replicating results of leaderboard #8

Open
husseinmozannar opened this issue Mar 17, 2020 · 5 comments
Open

replicating results of leaderboard #8

husseinmozannar opened this issue Mar 17, 2020 · 5 comments

Comments

@husseinmozannar
Copy link

I've been trying to replicate the results of your leaderboard, but I found a number of things confusing (based on the "medium" data in the linked colab):

  1. leaderboard is based on "realworld" level, but colab is based on "medium" level, do you have ready medium results?
  2. using a vgg-16 model (the one found in mc_dropout/model) and training, I found the below results:

for deterministic:
image (accuracy with pink the deterministic)

and for mc_dropout:
image
with numbers (first is mc_dropout and second is deterministic)
image

In your paper mc_dropout outperformed the deterministic approach by a quite a bit, I didn't expect the deterministic approach to perform so badly, these results seem a bit more sensible but not to this other extent, can you find the reason for this discrepancy?

  1. AUC results behave weirdly:
    for mc_dropout

image

here is a colab to replicate the above
also recommend updating your linked colab with the proper required packages as in it's current form it does not run

@maryam2013
Copy link

Dear Hussein,
The code you have already put here https://colab.research.google.com/drive/1eyRquycs6PFNoCTJVLcPym8g8ptddci8
works well.
I was worried about whether if you changed the codes of "Bayesian Deep Learning Benchmarks" in https://github.com/OATML/bdl-benchmarks or not?
If you have modified any parts of the codes of "bdl-benchmarks", I will be grateful if you explain them.
For instance, you added a part namely, Vgg-16,
https://colab.research.google.com/drive/1eyRquycs6PFNoCTJVLcPym8g8ptddci8#scrollTo=5JZupfWHkCIO
I would like to know how you created this architecture, Vgg-16? In other words, according to which Bayesian method you made up Vgg-16?
I am really looking forward to your answer.
Thank you in advance.
Best
Maryam

@husseinmozannar
Copy link
Author

Here is a small update, I wrote my own evaluation code again to compare deterministic and MC-Droupout using the medium dataset, here are the results:
acc
where coverage is the percent of data points predicted on, a maximum difference of 1 accuracy point, this is again with the vgg-16 model in the repo, AUC results show the same pattern.

@maryam2013
Copy link

Dear Hussein,
I am grateful for your response.
But, may I look at your "own evaluation code"?
I want to know the difference between your code and this code, https://github.com/OATML/bdl-benchmarks/issues/url .

By the way, I wonder if I can change the dataset in bdlb or not? I mean I want to use bdlb for my own dataset instead of using the "Diabetic Retinopathy Diagnosis benchmark"?

I am really looking forward to your answer.
Thank you in advance for answering my questions.

Maryam

@jarrodhaas
Copy link

I spent a few dozen hours with this codebase. I was not able to replicate, or even come close to replicating, the results for either the real world or medium size datasets. FYI, there are a number of bugs in the code that have to be fixed to get things to work on the real world data set. But even after these are fixed, It's still difficult, if not impossible to replicate the results of the leaderboard.

I believe the authors are aware of this, and there was some talk of fixing the code to make things easier to replicate.

Cheers!

@maryam2013
Copy link

Hi there,
I got confused with how to use
Deterministic
Monte Carlo Dropout
Mean-Field Variational Inference
Deep Ensembles
Ensemble MC Dropout
on my own datasets.
I have read the tutorial :
https://www.depends-on-the-definition.com/model-uncertainty-in-deep-learning-with-monte-carlo-dropout
about implementing ensemble mc dropout and I have implemented it on my datasets but I, still, have no idea about how I can apply the others, such as
Monte Carlo Dropout
Mean-Field Variational Inference
Deep Ensembles
Deterministic
on my datasets.
As I use Keras, I have not found any tutorial about using other uncertainty's techniques.

Thank you in advance for helping me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants