-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss sometimes goes to nan even with the gradient clipping #2
Comments
not sure if it’s related but do softmax(xxxx + eps)
|
@jli05 Thanks! I'll try it. I could only learn NTM with |
@carpedm20 in my NTM implementation (and in a couple of others I saw out there)
I think there was a third case but I don't remember right now. Good luck debugging! :D |
@EderSantana could you explain what is the meaning of negative sharpening value? Thanks |
the |
Having a negative sharpening value wouldn't make a real become imaginary. But in the paper Graves explicitly states that the sharpening value is >= 1, so softplus(gamma) + 1 would work fine. a^(-b) = 1/(a^b) |
Didn't figure out why yet and any advice for this is welcome!
The text was updated successfully, but these errors were encountered: