Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NULL Row Names in NMF Matrix When Using sig_tally() with Wang Method #465

Open
selkamand opened this issue Sep 5, 2024 · 1 comment
Open

Comments

@selkamand
Copy link
Contributor

Hi @ShixiangWang,

Hope you're well.

Just noticed that when using the sig_tally() function with the Wang method, the row names in the resulting NMF matrix (tally_wang$nmf_matrix) are unexpectedly NULL. This behavior is inconsistent with the expected outcome, where the row names would ideally be the sample IDs, as observed when using the S (Steele) method. Reprex below:

# Load the sigminer library
library(sigminer)
#> Registered S3 method overwritten by 'sigminer':
#>   method      from
#>   print.bytes Rcpp
#> sigminer version 2.3.2
#> - Star me at https://github.com/ShixiangWang/sigminer
#> - Run hello() to see usage and citation.
# Load a toy segmentation table included with the sigminer package
load(system.file("extdata", "toy_segTab.RData",
                 package = "sigminer", mustWork = TRUE
))

# Set a seed for reproducibility
set.seed(1234)

# Add a new column 'minor_cn' with random values of either 0 or 1
segTabs$minor_cn <- sample(c(0, 1), size = nrow(segTabs), replace = TRUE)

# Subset the segmentation table for a single sample
singleSampleSegTabs <- subset(segTabs, sample == "TCGA-A8-A07S-01A-11D-A036-01")

# Read the copy number data for the single sample
cn <- read_copynumber(singleSampleSegTabs,
                      seg_cols = c("chromosome", "start", "end", "segVal"),
                      genome_measure = "wg", complement = TRUE, add_loh = TRUE
)
#> ℹ [2024-09-05 14:58:13.990813]: Started.
#> ℹ [2024-09-05 14:58:13.995483]: Genome build  : hg19.
#> ℹ [2024-09-05 14:58:13.996023]: Genome measure: wg.
#> ℹ [2024-09-05 14:58:13.996513]: When add_loh is TRUE, use_all is forced to TRUE.
#> Please drop columns you don't want to keep before reading.
#> ✔ [2024-09-05 14:58:14.002538]: Chromosome size database for build obtained.
#> ℹ [2024-09-05 14:58:14.003157]: Reading input.
#> ✔ [2024-09-05 14:58:14.003672]: A data frame as input detected.
#> ✔ [2024-09-05 14:58:14.004327]: Column names checked.
#> ✔ [2024-09-05 14:58:14.005043]: Column order set.
#> ✔ [2024-09-05 14:58:14.006148]: Chromosomes unified.
#> ✔ [2024-09-05 14:58:14.008141]: Value 2 (normal copy) filled to uncalled chromosomes.
#> ✔ [2024-09-05 14:58:14.009723]: Data imported.
#> ℹ [2024-09-05 14:58:14.010245]: Segments info:
#> ℹ [2024-09-05 14:58:14.010774]:     Keep - 45
#> ℹ [2024-09-05 14:58:14.011272]:     Drop - 0
#> ✔ [2024-09-05 14:58:14.011938]: Segments sorted.
#> ℹ [2024-09-05 14:58:14.012445]: Adding LOH labels...
#> ℹ [2024-09-05 14:58:14.013251]: Joining adjacent segments with same copy number value. Be patient...
#> ✔ [2024-09-05 14:58:14.018461]: 39 segments left after joining.
#> ✔ [2024-09-05 14:58:14.019082]: Segmental table cleaned.
#> ℹ [2024-09-05 14:58:14.019575]: Annotating.
#> ✔ [2024-09-05 14:58:14.024584]: Annotation done.
#> ℹ [2024-09-05 14:58:14.02513]: Summarizing per sample.
#> ✔ [2024-09-05 14:58:14.033565]: Summarized.
#> ℹ [2024-09-05 14:58:14.034175]: Generating CopyNumber object.
#> ✔ [2024-09-05 14:58:14.034964]: Generated.
#> ℹ [2024-09-05 14:58:14.03548]: Validating object.
#> ✔ [2024-09-05 14:58:14.036014]: Done.
#> ℹ [2024-09-05 14:58:14.036588]: 0.046 secs elapsed.
# Tally signatures using the Wang method
tally_wang <- sigminer::sig_tally(cn, method = "Wang", keep_only_matrix = FALSE)
#> ℹ [2024-09-05 14:58:14.03747]: Started.
#> ℹ [2024-09-05 14:58:14.038619]: Step: getting copy number features.
#> ℹ [2024-09-05 14:58:14.076979]: Getting breakpoint count per 10 Mb...
#> ℹ [2024-09-05 14:58:14.104202]: Getting breakpoint count per chromosome arm...
#> ℹ [2024-09-05 14:58:14.111208]: Getting copy number...
#> ℹ [2024-09-05 14:58:14.112797]: Getting change-point copy number change...
#> ℹ [2024-09-05 14:58:14.118523]: Getting length of chains of oscillating copy number...
#> ℹ [2024-09-05 14:58:14.124469]: Getting (log10 based) segment size...
#> ℹ [2024-09-05 14:58:14.125993]: Getting the minimal number of chromosome with 50% CNV...
#> ℹ [2024-09-05 14:58:14.133015]: Getting burden of chromosome...
#> ✔ [2024-09-05 14:58:14.140732]: Gotten.
#> ℹ [2024-09-05 14:58:14.14139]: Step: generating copy number components.
#> ✔ [2024-09-05 14:58:14.141898]: `feature_setting` checked.
#> ℹ [2024-09-05 14:58:14.143294]: Step: counting components.
#> ✔ [2024-09-05 14:58:14.250663]: Counted.
#> ℹ [2024-09-05 14:58:14.251405]: Step: generating components by sample matrix.
#> ✔ [2024-09-05 14:58:14.252393]: Matrix generated.
#> ℹ [2024-09-05 14:58:14.252925]: 0.215 secs elapsed.
# Tally signatures using the Steele method
tally_steele <- sigminer::sig_tally(cn, method = "S", keep_only_matrix = FALSE)
#> ℹ [2024-09-05 14:58:14.253776]: Started.
#> ℹ [2024-09-05 14:58:14.25461]: When you use method 'S', please make sure you have set 'join_adj_seg' to FALSE and 'add_loh' to TRUE in 'read_copynumber() in the previous step!
#> ✔ [2024-09-05 14:58:14.260849]: Matrix generated.
#> ℹ [2024-09-05 14:58:14.261393]: 0.008 secs elapsed.
# ISSUE: The row names of the NMF matrix for the Wang tally are NULL,
# but they are expected to be the sample IDs.
rownames(tally_wang$nmf_matrix)
#> NULL
# EXPECTED BEHAVIOR: The row names for the Steele tally include the sample names as expected.
rownames(tally_steele$nmf_matrix)
#> [1] "TCGA-A8-A07S-01A-11D-A036-01"
# SessionInfo
sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sonoma 14.4.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
#> 
#> time zone: Australia/Sydney
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] purrr_1.0.2         sigminer_2.3.2      Biobase_2.64.0     
#> [4] BiocGenerics_0.50.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] styler_1.10.3      tidyr_1.3.1        future_1.34.0      utf8_1.2.4        
#>  [5] generics_0.1.3     stringi_1.8.4      listenv_0.9.1      digest_0.6.36     
#>  [9] magrittr_2.0.3     evaluate_0.24.0    grid_4.4.0         RColorBrewer_1.1-3
#> [13] iterators_1.0.14   fastmap_1.2.0      R.oo_1.26.0        foreach_1.5.2     
#> [17] doParallel_1.0.17  R.cache_0.16.0     plyr_1.8.9         R.utils_2.12.3    
#> [21] fansi_1.0.6        scales_1.3.0       codetools_0.2-20   cli_3.6.3         
#> [25] registry_0.5-1     rlang_1.1.4        R.methodsS3_1.8.2  parallelly_1.38.0 
#> [29] munsell_0.5.1      reprex_2.1.0       withr_3.0.1        yaml_2.3.8        
#> [33] tools_4.4.0        parallel_4.4.0     reshape2_1.4.4     dplyr_1.1.4       
#> [37] colorspace_2.1-1   ggplot2_3.5.1      rngtools_1.5.2     NMF_0.27          
#> [41] globals_0.16.3     vctrs_0.6.5        R6_2.5.1           lifecycle_1.0.4   
#> [45] stringr_1.5.1      fs_1.6.4           furrr_0.3.1        cluster_2.1.6     
#> [49] pkgconfig_2.0.3    pillar_1.9.0       gtable_0.3.5       data.table_1.15.4 
#> [53] glue_1.7.0         Rcpp_1.0.13        xfun_0.45          tibble_3.2.1      
#> [57] tidyselect_1.2.1   rstudioapi_0.16.0  knitr_1.47         htmltools_0.5.8.1 
#> [61] rmarkdown_2.27     compiler_4.4.0     gridBase_0.4-7

Created on 2024-09-05 with reprex v2.1.0

@ShixiangWang
Copy link
Owner

@selkamand Thanks for your potential bug report. I will take a look and fix it in a few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants