Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Countmatrix and sample metadata table not uploading properly #16

Open
Justin1609 opened this issue Nov 20, 2021 · 7 comments
Open

Countmatrix and sample metadata table not uploading properly #16

Justin1609 opened this issue Nov 20, 2021 · 7 comments

Comments

@Justin1609
Copy link

Hi there

I am trying to upload my own countmatrix and sample metadata table using the interactive version of the tool, but it doesn't seem to be reading my input tables correctly. I made my tables in excel and modeled it according to the "airway" demo data. I saved the excel file as csv file but it just doesn't seem to want to work. I urgently need to plot this data, any help would be greatly appreciated. I can send you my countmatrix and sample metadata table on request.

Many thanks

J

@federicomarini
Copy link
Owner

Hi @Justin1609,
did you try to see if the files are in the csv format - despite of the extension, sometimes Excel might not really be comma-delimited. You can do so by opening these files in any text editor.
If that works: try to read them in offline (before calling the app), and call the app by specifying the count matrix and the metadata table in the respective parameters.
HTH,
Federico

@Justin1609
Copy link
Author

Justin1609 commented Nov 24, 2021 via email

@Justin1609
Copy link
Author

Justin1609 commented Nov 24, 2021 via email

@federicomarini
Copy link
Owner

Hi there Federico Thanks so much, I managed to get it sorted out. I didn't realize that you don't have to transpose the count matrix before inputting it. Why is it that you don't transpose the data in your tool? By transpose I mean having samples as objects and genes as variables.

Well, the reason is more like "historical" - in Bioinformatics, it is more common to see genes as features on the rows and samples on the columns. So I sticked to the "classical" version.
Yes, an even more classical biostatistics-tailored view would be indeed the transposed one.
But hey... 🤷

I am doing PCA on counts data from RNA seq analysis for Saccharomyces cerevisiae. Would the output look different if the count matrix was transposed as I described?

Sure - stick to the expected format, and it will be fine.

I am also having issues with the gene annotation file for S. cerevisiae as there is no entry for this on the normal databases that are used in the examples for using pcaExplorer. Do you maybe know of another database I can use for the annotations of S. cerevisiae? I have a CSV file where I have the Gene IDs of S.cerevisiae in column 1 and then the Standard gene names for each Gene ID in the second column. If you could help out with these issues I would really appreciate it. Kind regards Justin

Not so much experience on yeast TBH - some annotation packages are available in Bioconductor, have a look at org.Sc.sgd.db

Federico

@federicomarini
Copy link
Owner

Hi Federico Could you also please tell me how I can edit the title and the legend name of the plot? And I would like to remove the sample labels from the plot and change the color scheme of the different sample groups? I tried to do this in R but pcaplot doesn't generate the correct object that ggplot requires to be able to edit these details. Kind regards Justin

I guess for these types of request, probably you are best served by building the ggplot object from the scratch.
Actually, the object returned by pcaplot are ggplot objects, so the customization should actually work.
If you are a little familiar with code, feel free to use the source and adapt it to your needs.
HTH,
Federico

@Justin1609
Copy link
Author

Justin1609 commented Nov 24, 2021 via email

@federicomarini
Copy link
Owner

No problem, I am aware we in bioinformatics are doing things by default in a transposed way 😉

If you transpose it: well, in the end you do change the point of view on it: so, no more samples as linear combinations of the genes but the other way around!

For editing the ggplot object: I would say some generic resource like a tutorial on ggplot would do it, I have at the moment none I can recommend, do check out https://datavizm20.classes.andrewheiss.com/, I used to recommend it for many other reasons!

If you want to do an OPLS analysis, this is out of pcaExplorer's business, "per se", but very much in the whole dimensionality reduction business. Do have a look at Holmes & Huber MSMB book, available online. IIRC it had a couple of these alternatives to PCA introduced.

Federico

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants