scSubtype related code，Is it possible to call directly？ #16

zhangjl-work · 2022-06-06T08:17:00Z

hi， I have a batch of breast cancer single cell data, can I directly call the Highest_calls.R script？  

I call this code directly, and use the NatGen_Supplementary_table_S4.csv file, and the results don't seem right, it seems to divide the cells in each sample (after extracting the cancer cells) into the four subtypes, and the dozen or so samples are So, do you know why，Is this result reasonable?  At present, it is not consistent with the clinical results. But I don't know where is the problem。

Looking forward to your reply！

The text was updated successfully, but these errors were encountered:

dlroden · 2022-06-27T10:54:31Z

Hi, thanks for your interest in our work. Could you please provide more detail to allow us to understand your question more clearly? At a minimum, a reproducible example of the input data, the specific code that you are running and the output would be needed to help further. Thanks

zhangjl-work · 2022-06-27T11:18:29Z

hi,thank you for your reply！ I use your trained file NatGen_Supplementary_table_S4.csv，as input file。 and use my own rds file。the following is my code. library(Seurat) # read in scsubtype gene signatures sigdat <- read.csv("NatGen_Supplementary_table_S4.csv") temp_allgenes <- c(as.vector(sigdat[,"Basal_SC"]), as.vector(sigdat[,"Her2E_SC"]), as.vector(sigdat[,"LumA_SC"]), as.vector(sigdat[,"LumB_SC"])) temp_allgenes <- unique(temp_allgenes[!temp_allgenes == ""]) #读取数据 allfile <- list.files("/PROJ2/development/zhangjunling/project/breast_cancer/scSubtype/data/", pattern = 'rds') rds_list <- list() for (i in allfile) { print (i) file_path <- paste0("./data/",i) Mydata <- readRDS(file_path) #Read in the single cell RDS object as 'Mydata' #Mydata <- readRDS("./B_epi_filter_T.rds") Mydata <- ScaleData(Mydata, features=temp_allgenes) ***@***.******@***.***) #calculate mean scsubtype scores outdat <- matrix(0, nrow=ncol(sigdat), ncol=ncol(tocalc), dimnames=list(colnames(sigdat), colnames(tocalc))) for(i in 1:ncol(sigdat)){ row <- as.character(sigdat[,i]) row<-unique(row[row != ""]) genes<-which(rownames(tocalc) %in% row) temp<-apply(tocalc[genes,],2,function(x){mean(as.numeric(x),na.rm=TRUE)}) outdat[i,]<-as.numeric(temp) } final<-outdat[which(rowSums(outdat,na.rm=TRUE)!=0),] final<-as.data.frame(final) is.num <- sapply(final, is.numeric);final[is.num] <- lapply(final[is.num], round, 4) finalm<-as.matrix(final) ##Scaling scores function before calling the highest Call center_sweep <- function(x, row.w = rep(1, nrow(x))/nrow(x)) { get_average <- function(v) sum(v * row.w)/sum(row.w) average <- apply(x, 2, get_average) sweep(x, 2, average) } ##Obtaining the highest call finalmt<-as.data.frame(t(finalm)) #print (head(finalmt)) finalm.sweep.t<-center_sweep(finalmt) Finalnames<-colnames(finalm.sweep.t)[max.col(finalm.sweep.t,ties.method="first")] finalm.sweep.t$SCSubtypeCall <- Finalnames result <- as.data.frame(table(Finalnames)) print (result) } Below is my output： [1] "G_epi_filter_LN1.rds" Centering and scaling data matrix Finalnames Freq 1 Basal_SC 1184 2 Her2E_SC 1116 3 LumA_SC 1464 4 LumB_SC 1559 [1] "G_epi_filter_LN2.rds" Centering and scaling data matrix Finalnames Freq 1 Basal_SC 906 2 Her2E_SC 848 3 LumA_SC 1073 4 LumB_SC 1111 [1] "G_epi_filter_T.rds" Centering and scaling data matrix Finalnames Freq 1 Basal_SC 754 2 Her2E_SC 766 3 LumA_SC 923 4 LumB_SC 936 Seems to split my data equally across the four subtypes，why is that？

…

2022年6月27日下午6:54，Daniel Roden ***@***.***> 写道： Hi, thanks for your interest in our work. Could you please provide more detail to allow us to understand your question more clearly? At a minimum, a reproducible example of the input data, the specific code that you are running and the output would be needed to help further. Thanks — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AP747ND4PRHSX4YJIGDINZDVRGB7HANCNFSM5X6TTC4Q>. You are receiving this because you authored the thread.

RegnerM2015 · 2022-06-27T16:37:20Z

Hi @zhangjl-work,

You actually have to integrate your data with their training data first. Then you apply the scSubtyping script to the rescaled integrated expression data (your data + their training data).

Applying the scSubtyping script directly to your data alone will result in a random or even distribution of subtype calls.

zhangjl-work · 2022-06-28T03:09:10Z

Thank you very much for your reply, but there is one more thing I don't understand. Question 1: Do I have to have my own training set? If I don't have my own training set data, can I use the training set data from your article? Question 2: What is the file format of the training set? The format of the TNBCmerged_Training_SwarbrickInferCNV.txt file, can you take a few lines to see it? Question 3: What does the SinglecellMolecularSubtypesignaturesonlytumor_SEPTEMBER2019.tx file in Calculatingscoresandplotting.R do? Can you take a few lines and take a look? Looking forward to your reply, thanks!

…

2022年6月28日上午1:08，Matt Regner ***@***.***> 写道： Hi @zhangjl-work <https://github.com/zhangjl-work>, You actually have to integrate your data with their training data first. Then you apply the scSubtyping script to the rescaled integrated expression data (your data + their training data). Applying the scSubtyping script directly to your data alone will result in a random or even distribution of subtype calls. — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AP747NCKZO4XTUB4CTWGPATVRHNXXANCNFSM5X6TTC4Q>. You are receiving this because you were mentioned.

RegnerM2015 · 2022-06-28T12:36:34Z

Hi @zhangjl-work,

Question 1: Yes, use the training set data from the article.

Question 2: The format of the training set should be four Seurat objects (saved as RDS files). There should be one rds file for each subtype (LumA, LumB, Her2, and Basal). I am not familiar with the file TNBCmerged_Training_SwarbrickInferCNV.txt, perhaps @dlroden can clarify.

Question 3: I think SinglecellMolecularSubtypesignaturesonlytumor_SEPTEMBER2019.txt holds the gene signatures for each molecular subtype and Calculatingscoresandplotting.R computes the signature enrichment scores for these gene signatures.

I hope @dlroden and @sunnyzwu can clarify these.

zhangjl-work · 2022-06-29T01:41:18Z

Thank you very much for your reply, I will communicate with the person you recommended！ Thanks again！

shufanzhang · 2022-08-10T11:27:23Z

Hi @zhangjl-work,
Is your problem solved? I also want to know the format of the input data for scSubtype and where to get the training set file.

zhangjl-work changed the title ~~scSubtype ，Is it possible to call directly？~~ scSubtype related code，Is it possible to call directly？ Jun 6, 2022

miguecass mentioned this issue Aug 9, 2023

Reproducing SC subtype calls #27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scSubtype related code，Is it possible to call directly？ #16

scSubtype related code，Is it possible to call directly？ #16

zhangjl-work commented Jun 6, 2022 •

edited

Loading

dlroden commented Jun 27, 2022

zhangjl-work commented Jun 27, 2022 via email

RegnerM2015 commented Jun 27, 2022

zhangjl-work commented Jun 28, 2022 via email

RegnerM2015 commented Jun 28, 2022 •

edited

Loading

zhangjl-work commented Jun 29, 2022

shufanzhang commented Aug 10, 2022

scSubtype related code，Is it possible to call directly？ #16

scSubtype related code，Is it possible to call directly？ #16

Comments

zhangjl-work commented Jun 6, 2022 • edited Loading

dlroden commented Jun 27, 2022

zhangjl-work commented Jun 27, 2022 via email

RegnerM2015 commented Jun 27, 2022

zhangjl-work commented Jun 28, 2022 via email

RegnerM2015 commented Jun 28, 2022 • edited Loading

zhangjl-work commented Jun 29, 2022

shufanzhang commented Aug 10, 2022

zhangjl-work commented Jun 6, 2022 •

edited

Loading

RegnerM2015 commented Jun 28, 2022 •

edited

Loading