-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding optimizations #6
Comments
Hi Rachel! It is great to see people using bulkDGD. We are happy to help! It would be useful if you describe a bit your experimental set-up (How many samples? which tissue?) and what you are trying to achieve. We can take it from there. If you do not want to describe your set-up publicly, you can find my email here: https://di.ku.dk/english/staff/?pure=en/persons/525785 best, Iñigo |
Hi Iñigo, thanks for responding :- ) I am using breast epithelium RNA-seq data from this study: GSE141828. I am only focusing on the "susceptible" breast tissue samples (for now). After convering HGNC to ENSEMBL, dropping non-uniquely mappted genes, and, preprocessing samples using ioutil.preprocess_samples, I have a data frame of 7 samples × 16883 genes. |
I see. We are working on providing loss curves as a dataframe, but that might take some time. In the meantime, I would plot the loss curves (x axis = epoch; y axis= loss) to see how they behave. From what you send loss does not look crazy high. best, Iñigo |
another important question: should I be providing DGD with raw or normalized counts? |
Hi! DGD should be provided with raw counts. It takes care of the normalization internally (mean scaling). Loss curves look fine and the high numbers you observe is probably the GMM penalty, but I will increase a bit the number of epoch to ensure they converge. In think if you sept op1 epochs to 20 and opt2 to 100 the will look stable. best, Iñigo |
Hi, Definetely. The decoder has a Negative-binomial layer with a log_prob method implemented. You can use that to calculate the probability of your samples. You simply pass your data through the log_prob method and I should get sample probabilities. good luck! Iñigo |
Thank you so much! :) I do have another question– is there a way to figure out which components correspond to which tissue? |
sorry this took so long. Component 28 is the breast component, good news. Component 23 is a brain component. We should publish this data but we need to think how to. In the meantime, if you email me, I can share the data. You can find my email here |
hi there!
I'm running DGD on my own data (not from tutorials), and I'm getting really high loss at the end of two optimizations:
INFO:bulkDGD.core.model:Epoch 50: loss 14.552, epoch CPU time 0.780 s, backward step CPU time 0.676 s, epoch wall clock time 0.563 s, backward step wall clock time 0.469 s.
I am really new to machine learning so please bear with me. Should I just be adding more optimizations in the format of the provided yaml files?
The text was updated successfully, but these errors were encountered: