Loss sometimes goes to nan even with the gradient clipping #2

carpedm20 · 2015-12-31T22:13:35Z

Didn't figure out why yet and any advice for this is welcome!

jli05 · 2015-12-31T22:14:39Z

not sure if it’s related but do softmax(xxxx + eps)

On 31 Dec 2015, at 23:13, Taehoon Kim notifications@github.com wrote:

Didn't figure out why yet and any advice for this is welcome!

—
Reply to this email directly or view it on GitHub #2.

carpedm20 · 2015-12-31T22:22:43Z

@jli05 Thanks! I'll try it. I could only learn NTM with max_length=10 since now without nan loss. If it becomes more than 10, I think we need more than 100000 epochs which is different from referenced code.

EderSantana · 2016-04-29T01:41:04Z

@carpedm20 in my NTM implementation (and in a couple of others I saw out there) nans were usually caused by one of the following:

Initializing the memory to zero. The memory appears in the denominator of the cosine distance and that makes it nan. Check if that is not your case and possibly add a small constant in the denominator and avoid initializing the memory to all zeros (make it a small constant).
negative sharpening value. that creates a complex number and also makes the cost function go nan

I think there was a third case but I don't remember right now. Good luck debugging! :D

lixiangnlp · 2017-01-05T01:50:54Z

@EderSantana could you explain what is the meaning of negative sharpening value? Thanks

EderSantana · 2017-03-14T21:01:32Z

the sharpening value is is uses as pow(input, sharpening). So it can't be negative. Use a nonlinearity like softplus to avoid getting negative values: sharpening = tf.nn.softplus(sharpening).

therealjtgill · 2017-03-21T23:48:16Z

Having a negative sharpening value wouldn't make a real become imaginary. But in the paper Graves explicitly states that the sharpening value is >= 1, so softplus(gamma) + 1 would work fine.

a^(-b) = 1/(a^b)

carpedm20 added the bug label Dec 31, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss sometimes goes to nan even with the gradient clipping #2

Loss sometimes goes to nan even with the gradient clipping #2

carpedm20 commented Dec 31, 2015

jli05 commented Dec 31, 2015

carpedm20 commented Dec 31, 2015

EderSantana commented Apr 29, 2016

lixiangnlp commented Jan 5, 2017

EderSantana commented Mar 14, 2017

therealjtgill commented Mar 21, 2017

Loss sometimes goes to nan even with the gradient clipping #2

Loss sometimes goes to nan even with the gradient clipping #2

Comments

carpedm20 commented Dec 31, 2015

jli05 commented Dec 31, 2015

carpedm20 commented Dec 31, 2015

EderSantana commented Apr 29, 2016

lixiangnlp commented Jan 5, 2017

EderSantana commented Mar 14, 2017

therealjtgill commented Mar 21, 2017