replicating results of leaderboard #8

husseinmozannar · 2020-03-17T17:32:08Z

I've been trying to replicate the results of your leaderboard, but I found a number of things confusing (based on the "medium" data in the linked colab):

leaderboard is based on "realworld" level, but colab is based on "medium" level, do you have ready medium results?
using a vgg-16 model (the one found in mc_dropout/model) and training, I found the below results:

for deterministic:
(accuracy with pink the deterministic)

and for mc_dropout:

with numbers (first is mc_dropout and second is deterministic)

In your paper mc_dropout outperformed the deterministic approach by a quite a bit, I didn't expect the deterministic approach to perform so badly, these results seem a bit more sensible but not to this other extent, can you find the reason for this discrepancy?

AUC results behave weirdly:
for mc_dropout

here is a colab to replicate the above
also recommend updating your linked colab with the proper required packages as in it's current form it does not run

maryam2013 · 2020-03-29T00:02:53Z

Dear Hussein,
The code you have already put here https://colab.research.google.com/drive/1eyRquycs6PFNoCTJVLcPym8g8ptddci8
works well.
I was worried about whether if you changed the codes of "Bayesian Deep Learning Benchmarks" in https://github.com/OATML/bdl-benchmarks or not?
If you have modified any parts of the codes of "bdl-benchmarks", I will be grateful if you explain them.
For instance, you added a part namely, Vgg-16,
https://colab.research.google.com/drive/1eyRquycs6PFNoCTJVLcPym8g8ptddci8#scrollTo=5JZupfWHkCIO
I would like to know how you created this architecture, Vgg-16? In other words, according to which Bayesian method you made up Vgg-16?
I am really looking forward to your answer.
Thank you in advance.
Best
Maryam

husseinmozannar · 2020-04-05T00:14:33Z

Here is a small update, I wrote my own evaluation code again to compare deterministic and MC-Droupout using the medium dataset, here are the results:

where coverage is the percent of data points predicted on, a maximum difference of 1 accuracy point, this is again with the vgg-16 model in the repo, AUC results show the same pattern.

maryam2013 · 2020-04-05T19:41:44Z

Dear Hussein,
I am grateful for your response.
But, may I look at your "own evaluation code"?
I want to know the difference between your code and this code, https://github.com/OATML/bdl-benchmarks/issues/url .

By the way, I wonder if I can change the dataset in bdlb or not? I mean I want to use bdlb for my own dataset instead of using the "Diabetic Retinopathy Diagnosis benchmark"?

I am really looking forward to your answer.
Thank you in advance for answering my questions.

Maryam

jarrodhaas · 2020-05-22T16:15:16Z

I spent a few dozen hours with this codebase. I was not able to replicate, or even come close to replicating, the results for either the real world or medium size datasets. FYI, there are a number of bugs in the code that have to be fixed to get things to work on the real world data set. But even after these are fixed, It's still difficult, if not impossible to replicate the results of the leaderboard.

I believe the authors are aware of this, and there was some talk of fixing the code to make things easier to replicate.

Cheers!

maryam2013 · 2020-06-04T17:45:29Z

Hi there,
I got confused with how to use
Deterministic
Monte Carlo Dropout
Mean-Field Variational Inference
Deep Ensembles
Ensemble MC Dropout
on my own datasets.
I have read the tutorial :
https://www.depends-on-the-definition.com/model-uncertainty-in-deep-learning-with-monte-carlo-dropout
about implementing ensemble mc dropout and I have implemented it on my datasets but I, still, have no idea about how I can apply the others, such as
Monte Carlo Dropout
Mean-Field Variational Inference
Deep Ensembles
Deterministic
on my datasets.
As I use Keras, I have not found any tutorial about using other uncertainty's techniques.

Thank you in advance for helping me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replicating results of leaderboard #8

replicating results of leaderboard #8

husseinmozannar commented Mar 17, 2020

maryam2013 commented Mar 29, 2020

husseinmozannar commented Apr 5, 2020

maryam2013 commented Apr 5, 2020

jarrodhaas commented May 22, 2020

maryam2013 commented Jun 4, 2020

replicating results of leaderboard #8

replicating results of leaderboard #8

Comments

husseinmozannar commented Mar 17, 2020

maryam2013 commented Mar 29, 2020

husseinmozannar commented Apr 5, 2020

maryam2013 commented Apr 5, 2020

jarrodhaas commented May 22, 2020

maryam2013 commented Jun 4, 2020