-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse net leaf/predict #12
Comments
Thanks for your question @yvdriess. If I understand correctly, you still want a dense forward-prop for your loss, but your gradient that needs to be backpropagated is sparse, because it only involves a subset of the output neurons (presumably on the coordinates matching what's in your |
Thanks for the response, I bit the bullet, dove in and indeed found the SparseF[] grad overload for the loss functions. That takes care of a), but I am still not sure how to do b) without implementing a SparseLinearLayer or similar myself. The basic idea is that:
Optimizing for the above case by making a sparse linear layer that takes an extra coordinates argument and produces SparseF[] output would decrease the amount of memory touched. (The compute probably is not a factor here.) I guess the real question is: Is float[] network output baked in everywhere? Or should it be relatively straightforward for me to make a SparseLinearLayer that produces a SparseF[] network output? |
If I may resurrect this issue: I just noticed while adding a SparseLinear layer to Vectorflow that only the first layer can be sparse. vectorflow/src/vectorflow/layers.d Line 119 in aed9977
Is this work in progress, or should I provide an implementation? |
I have a multi-target workload with very sparse labels. In other words, a single observable/sample is not a (bool label, SparseF[] features) tuple, but rather a (SparseF[] labels, SparseF[] features).
During the learn/optimize steps, the predict run currently appears to just produce a dense vector of predictions (e.g. a 65k vector), but only a few elements (e.g. 200) of that feature vector will have a label and contribute to the gradient.
a) Is there a way in vectorflow to deal with the multi-target learning scenario. I suppose I should make a loss function that scatters the non-zero gradients into a zero-vector of the predict-vector size?
b) Where I can see me doing a), it is kind of wasteful as it means producing a large vector of predictions where only a handful are used to produce the gradient.
I implemented b) In Tensorflow by gathering columns of the last-layer weight matrix, where the gathered columns match the label's non-zero coordinates (viz. target index). For Vectorflow, it probably makes more sense to have a SparseData() leaf/output layer. It appears that float[] dense output is currently hardcoded in the learning interface.. The loss function interface could probably be kept the same, as it should not care about the actual target index, but there needs to be some logic that produces a float[] from the SparseF[] output before passing it to the loss function, and then transforms the dense gradient vector back into a sparse gradient for back-prop. SparseLinear layer? (dense prev-vector + SparseData input, SparseData output)
The text was updated successfully, but these errors were encountered: