-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RNN Models #20
Comments
BPTT is much more popularly used than RTRL, so might be better to prioritize BPTT over RTRL. |
BPTT doesn't solve the stateful RNN problem unfortunately. The only way to have an online RNN is to use RTRL as each step utilizes the full jacobian product from the previous time step as this allows for smooth information flow. |
Pretty much all other RNN implementations are slow, and not suitable for production. My own reason for betting on arrayfire is that this might yield production-ready implementations for deep learning algorithms. |
I will be doing an internship till Sept and won't have time to update till then most likely. If someone wants to implement these first that would be great now that we have AD setup. |
Hi, I would be interested by either LSTM or GRU, forward pass would be a good first step before implementing backward/training. |
@jramapuram I am going to try and implement this. May be you and @WilliamTambellini review this once I send a PR. |
Hi @pavanky |
I suggest a simple char-run type problem |
@jramapuram @WilliamTambellini If you have specific examples in mind please let me know. Preferably implemented as an example in another ML toolkit already :) |
@jramapuram @WilliamTambellini I think I am going to target this example as a first step: https://github.com/pytorch/examples/tree/master/word_language_model |
Hi @pavanky It sounds very good: the Penn db is quite small (about 5M) and training time should nt be long. Perfect for an example. Have you opted between Elman, GRU, or LSTM ? |
@WilliamTambellini Will start with plain (Elman) RNNs first. |
Once we have an implementation of the Layer Class #17 , the Optimizer class and the DataSet class we can go about creating RNN flavors. There are 3 models that should be implemented:
These will require the implementation of their derivatives and their forward prop values.
Certain details to consider:
To enable the above two methods of learning we should consider inheriting from Layer and implementing a Recurrent Layer.
The text was updated successfully, but these errors were encountered: