-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scSubtype related code,Is it possible to call directly? #16
Comments
Hi, thanks for your interest in our work. Could you please provide more detail to allow us to understand your question more clearly? At a minimum, a reproducible example of the input data, the specific code that you are running and the output would be needed to help further. Thanks |
hi,thank you for your reply!
I use your trained file NatGen_Supplementary_table_S4.csv,as input file。 and use my own rds file。the following is my code.
library(Seurat)
# read in scsubtype gene signatures
sigdat <- read.csv("NatGen_Supplementary_table_S4.csv")
temp_allgenes <- c(as.vector(sigdat[,"Basal_SC"]),
as.vector(sigdat[,"Her2E_SC"]),
as.vector(sigdat[,"LumA_SC"]),
as.vector(sigdat[,"LumB_SC"]))
temp_allgenes <- unique(temp_allgenes[!temp_allgenes == ""])
#读取数据
allfile <- list.files("/PROJ2/development/zhangjunling/project/breast_cancer/scSubtype/data/", pattern = 'rds')
rds_list <- list()
for (i in allfile) {
print (i)
file_path <- paste0("./data/",i)
Mydata <- readRDS(file_path)
#Read in the single cell RDS object as 'Mydata'
#Mydata <- readRDS("./B_epi_filter_T.rds")
Mydata <- ScaleData(Mydata, features=temp_allgenes)
***@***.******@***.***)
#calculate mean scsubtype scores
outdat <- matrix(0,
nrow=ncol(sigdat),
ncol=ncol(tocalc),
dimnames=list(colnames(sigdat),
colnames(tocalc)))
for(i in 1:ncol(sigdat)){
row <- as.character(sigdat[,i])
row<-unique(row[row != ""])
genes<-which(rownames(tocalc) %in% row)
temp<-apply(tocalc[genes,],2,function(x){mean(as.numeric(x),na.rm=TRUE)})
outdat[i,]<-as.numeric(temp)
}
final<-outdat[which(rowSums(outdat,na.rm=TRUE)!=0),]
final<-as.data.frame(final)
is.num <- sapply(final, is.numeric);final[is.num] <- lapply(final[is.num], round, 4)
finalm<-as.matrix(final)
##Scaling scores function before calling the highest Call
center_sweep <- function(x, row.w = rep(1, nrow(x))/nrow(x)) {
get_average <- function(v) sum(v * row.w)/sum(row.w)
average <- apply(x, 2, get_average)
sweep(x, 2, average)
}
##Obtaining the highest call
finalmt<-as.data.frame(t(finalm))
#print (head(finalmt))
finalm.sweep.t<-center_sweep(finalmt)
Finalnames<-colnames(finalm.sweep.t)[max.col(finalm.sweep.t,ties.method="first")]
finalm.sweep.t$SCSubtypeCall <- Finalnames
result <- as.data.frame(table(Finalnames))
print (result)
}
Below is my output:
[1] "G_epi_filter_LN1.rds"
Centering and scaling data matrix
Finalnames Freq
1 Basal_SC 1184
2 Her2E_SC 1116
3 LumA_SC 1464
4 LumB_SC 1559
[1] "G_epi_filter_LN2.rds"
Centering and scaling data matrix
Finalnames Freq
1 Basal_SC 906
2 Her2E_SC 848
3 LumA_SC 1073
4 LumB_SC 1111
[1] "G_epi_filter_T.rds"
Centering and scaling data matrix
Finalnames Freq
1 Basal_SC 754
2 Her2E_SC 766
3 LumA_SC 923
4 LumB_SC 936
Seems to split my data equally across the four subtypes,why is that?
… 2022年6月27日 下午6:54,Daniel Roden ***@***.***> 写道:
Hi, thanks for your interest in our work. Could you please provide more detail to allow us to understand your question more clearly? At a minimum, a reproducible example of the input data, the specific code that you are running and the output would be needed to help further. Thanks
—
Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AP747ND4PRHSX4YJIGDINZDVRGB7HANCNFSM5X6TTC4Q>.
You are receiving this because you authored the thread.
|
Hi @zhangjl-work, You actually have to integrate your data with their training data first. Then you apply the scSubtyping script to the rescaled integrated expression data (your data + their training data). Applying the scSubtyping script directly to your data alone will result in a random or even distribution of subtype calls. |
Thank you very much for your reply, but there is one more thing I don't understand.
Question 1: Do I have to have my own training set? If I don't have my own training set data, can I use the training set data from your article?
Question 2: What is the file format of the training set? The format of the TNBCmerged_Training_SwarbrickInferCNV.txt file, can you take a few lines to see it?
Question 3: What does the SinglecellMolecularSubtypesignaturesonlytumor_SEPTEMBER2019.tx file in Calculatingscoresandplotting.R do? Can you take a few lines and take a look?
Looking forward to your reply, thanks!
… 2022年6月28日 上午1:08,Matt Regner ***@***.***> 写道:
Hi @zhangjl-work <https://github.com/zhangjl-work>,
You actually have to integrate your data with their training data first. Then you apply the scSubtyping script to the rescaled integrated expression data (your data + their training data).
Applying the scSubtyping script directly to your data alone will result in a random or even distribution of subtype calls.
—
Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AP747NCKZO4XTUB4CTWGPATVRHNXXANCNFSM5X6TTC4Q>.
You are receiving this because you were mentioned.
|
Hi @zhangjl-work, Question 1: Yes, use the training set data from the article. Question 2: The format of the training set should be four Seurat objects (saved as RDS files). There should be one rds file for each subtype (LumA, LumB, Her2, and Basal). I am not familiar with the file TNBCmerged_Training_SwarbrickInferCNV.txt, perhaps @dlroden can clarify. Question 3: I think SinglecellMolecularSubtypesignaturesonlytumor_SEPTEMBER2019.txt holds the gene signatures for each molecular subtype and Calculatingscoresandplotting.R computes the signature enrichment scores for these gene signatures. |
Thank you very much for your reply, I will communicate with the person you recommended! Thanks again! |
|
Looking forward to your reply!
The text was updated successfully, but these errors were encountered: