Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR - reading gene covariate file: duplicate variable name 'Fetal_Neuron' #156

Closed
2 tasks done
bschilder opened this issue Oct 4, 2024 · 6 comments
Closed
2 tasks done
Assignees
Labels
bug Something isn't working

Comments

@bschilder
Copy link
Collaborator

bschilder commented Oct 4, 2024

Checklist

  • I am able to reproduce the bug with the latest version
  • I checked, but didn't find any duplicates (open OR closed) of this issue in the repo

Affected version

2.0.13
I'm guessing @Al-Murphy is using the latest version.

Steps to reproduce the bug

HCL <- MSTExplorer::load_example_ctd(c("ctd_HumanCellLandscape.rds"),multi_dataset=FALSE) 

path_formatted <- MAGMA.Celltyping::get_example_gwas(  trait = "prospective_memory")

genesOutPath <- MAGMA.Celltyping::map_snps_to_genes(  path_formatted = path_formatted,  force_new = TRUE,  genome_build = "GRCh37")

MAGMA_results <- MAGMA.Celltyping::celltype_associations_pipeline(  magma_dirs = dirname(genesOutPath),  
                                                              ctd = HCL,  
                                                              ctd_species = "human",   
                                                              ctd_name = "Test",   
                                                              run_linear = TRUE,  
                                                              run_top10 = TRUE,  
                                                              force_new = TRUE)

Actual behavior


> HCL <- MSTExplorer::load_example_ctd(c("ctd_HumanCellLandscape.rds"),
+                                      multi_dataset=FALSE) 
Loading ctd_HumanCellLandscape.rds
> path_formatted <- MAGMA.Celltyping::get_example_gwas(
+   trait = "prospective_memory")
Importing munged GWAS summary statistics: prospective_memory
ℹ All local files already up-to-date!
Saving decompressed copy of path_formatted ==>  /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/prospective_memory.ukb.tsv
> genesOutPath <- MAGMA.Celltyping::map_snps_to_genes(
+   path_formatted = path_formatted,
+   force_new = TRUE,
+   genome_build = "GRCh37")
Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Using existing genome_ref found in storage_dir.
ℹ All local files already up-to-date!

==== MAGMA Step 1: Generate genes.annot file ====

Welcome to MAGMA v1.10 (custom)
Using flags:
	--annotate window=35,10
	--snp-loc /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/prospective_memory.ukb.tsv
	--gene-loc /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/NCBI37.3.gene.loc
	--out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN

Start time is 16:32:28, Thursday 03 Oct 2024

Starting annotation...
Reading gene locations from file /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/NCBI37.3.gene.loc... 
	adding window: 35000bp (before), 10000bp (after)
	19161 gene locations read from file
	chromosome  1: 2016 genes
	chromosome  2: 1226 genes
	chromosome  3: 1050 genes
	chromosome  4: 745 genes
	chromosome  5: 856 genes
	chromosome  6: 750 genes
	chromosome  7: 906 genes
	chromosome  8: 669 genes
	chromosome  9: 775 genes
	chromosome 10: 723 genes
	chromosome 11: 1275 genes
	chromosome 12: 1009 genes
	chromosome 13: 320 genes
	chromosome 14: 595 genes
	chromosome 15: 586 genes
	chromosome 16: 817 genes
	chromosome 17: 1147 genes
	chromosome 18: 271 genes
	chromosome 19: 1389 genes
	chromosome 20: 527 genes
	chromosome 21: 215 genes
	chromosome 22: 442 genes
	chromosome  X: 805 genes
	chromosome  Y: 47 genes
Reading SNP locations from file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/prospective_memory.ukb.tsv... 
	WARNING: on line 1, chromosome code 'CHR' not recognised; skipping SNP (ID = SNP)
	398092 SNP locations read from file                                                             
	of those, 215415 (54.11%) mapped to at least one gene
Writing annotation to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.annot
	for chromosome  1, 744 genes are empty (out of 2016)
	for chromosome  2, 425 genes are empty (out of 1226)
	for chromosome  3, 319 genes are empty (out of 1050)
	for chromosome  4, 275 genes are empty (out of 745)
	for chromosome  5, 252 genes are empty (out of 856)
	for chromosome  6, 234 genes are empty (out of 750)
	for chromosome  7, 275 genes are empty (out of 906)
	for chromosome  8, 272 genes are empty (out of 669)
	for chromosome  9, 238 genes are empty (out of 775)
	for chromosome 10, 239 genes are empty (out of 723)
	for chromosome 11, 394 genes are empty (out of 1275)
	for chromosome 12, 313 genes are empty (out of 1009)
	for chromosome 13, 121 genes are empty (out of 320)
	for chromosome 14, 211 genes are empty (out of 595)
	for chromosome 15, 220 genes are empty (out of 586)
	for chromosome 16, 307 genes are empty (out of 817)
	for chromosome 17, 305 genes are empty (out of 1147)
	for chromosome 18, 84 genes are empty (out of 271)
	for chromosome 19, 393 genes are empty (out of 1389)
	for chromosome 20, 151 genes are empty (out of 527)
	for chromosome 21, 65 genes are empty (out of 215)
	for chromosome 22, 150 genes are empty (out of 442)
	for chromosome  X, 805 genes are empty (out of 805)
	for chromosome  Y, 47 genes are empty (out of 47)
	at least one SNP mapped to each of a total of 12322 genes (out of 19161)


End time is 16:32:29, Thursday 03 Oct 2024 (elapsed: 00:00:01)

==== MAGMA Step 2: Generate genes.out ====

Welcome to MAGMA v1.10 (custom)
Using flags:
	--bfile /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/g1000_eur/g1000_eur synonym-dup=skip
	--pval /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/prospective_memory.ukb.tsv ncol=N duplicate=drop
	--gene-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.annot
	--out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN

Start time is 16:32:29, Thursday 03 Oct 2024

Loading PLINK-format data...
Reading file /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/g1000_eur/g1000_eur.fam... 503 individuals read
Reading file /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/g1000_eur/g1000_eur.bim... 22665064 SNPs read
Preparing file /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/g1000_eur/g1000_eur.bed... 

Reading SNP synonyms from file /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/g1000_eur/g1000_eur.synonyms (auto-detected)
	read 6016767 mapped synonyms from file, mapping to 3921040 SNPs in the data
	WARNING: detected 133 synonymous SNP pairs in the data
	         skipped all synonym entries involved, synonymous SNPs are kept in analysis
	         writing list of detected synonyms in data to supplementary log file
Reading SNP p-values from file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/prospective_memory.ukb.tsv... 
	detected 14 variables in file
	using variable: SNP (SNP id)
	using variable: P (p-value)
	using variable: N (sample size; discarding SNPs with N < 50)
	read 398093 lines from file, containing valid SNP p-values for 387654 SNPs in data (97.38% of lines, 1.71% of SNPs in data)
	WARNING: file contained 149 SNPs (same IDs or synonyms) with duplications
	         dropped all occurrences of each from analysis
	         writing list of duplicated IDs to supplementary log file
Loading gene annotation from file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.annot... 
	12322 gene definitions read from file
	found 12190 genes containing valid SNPs in genotype data


Starting gene analysis... 
	using model: SNPwise-mean
	writing gene analysis results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.out
	writing intermediate output to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw


End time is 16:34:33, Thursday 03 Oct 2024 (elapsed: 00:02:04)
> MAGMA_results <- MAGMA.Celltyping::celltype_associations_pipeline(
+   magma_dirs = dirname(genesOutPath),
+   ctd = HCL,
+   ctd_species = "human", 
+   ctd_name = "Test", 
+   run_linear = TRUE, 
+   run_top10 = TRUE,
+   force_new = TRUE)
Preparing CellTypeDataset.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Checking CTD: level 1
Checking CTD: level 2
Checking CTD: level 3
Checking CTD: level 4
Checking CTD: level 5
Checking CTD: level 1
Checking CTD: level 2
Checking CTD: level 3
Checking CTD: level 4
Checking CTD: level 5
prospective_memory.ukb.tsv.35UP.10DOWN
======= Calculating celltype associations: linear mode =======
Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Running MAGMA: Linear mode
Mapping gene symbols in specificity_quantiles matrix to entrez IDs.
Welcome to MAGMA v1.10 (custom)
Using flags:
	--gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
	--gene-covar /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f5d22076d
	--model direction=pos
	--out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level1.Test_linear.Linear

Start time is 16:30:43, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
	12190 genes read from file
Loading gene-level covariates...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f5d22076d... 
	detected 59 variables in file (using all)
	found 59 valid gene covariates, for 10651 genes defined in genotype data
Processing missing values...
	found 1539 genes not present in all input files: removing these from analysis
	10651 genes remaining in analysis
Preparing variables for analysis...
	truncating Z-scores 3 points below zero or 6 standard deviations above the mean
	truncating covariate values more than 5 standard deviations from the mean
	total variables available for analysis: 59 gene covariates

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
	testing direction: one-sided, positive (sets), one-sided, positive (covar)
	conditioning on internal variables:
		gene size, log(gene size)
		gene density, log(gene density)
		inverse mac, log(inverse mac)
	analysing individual variables

	analysing single-variable models (number of models: 59)
	writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level1.Test_linear.Linear.gsa.out

End time is 16:30:45, Thursday 03 Oct 2024 (elapsed: 00:00:02)
Reading enrichment results file into R.
Running MAGMA: Linear mode
Mapping gene symbols in specificity_quantiles matrix to entrez IDs.
Welcome to MAGMA v1.10 (custom)
Using flags:
	--gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
	--gene-covar /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f6515b4df
	--model direction=pos
	--out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level2.Test_linear.Linear

Start time is 16:30:46, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
	12190 genes read from file
Loading gene-level covariates...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f6515b4df... 

ERROR - reading gene covariate file: duplicate variable name 'Fetal_Neuron'

Terminating program.
Reading enrichment results file into R.
Error in file(file, "rt"): cannot open the connection

======= Calculating celltype associations: top10% mode =======
Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Running MAGMA: Top 10% mode
Mapping gene symbols in specificity_deciles matrix to entrez IDs.
Constructing top10% gene marker sets for 60 cell-types.
Welcome to MAGMA v1.10 (custom)
Using flags:
	--gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
	--set-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f7415d051
	--out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level1.Test_top10.Top10pct

Start time is 16:30:46, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
	12190 genes read from file
Loading gene-set annotation...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f7415d051... 
	59 gene-set definitions read from file
	found 59 gene sets containing genes defined in genotype data (containing a total of 8583 unique genes)
Preparing variables for analysis...
	truncating Z-scores 3 points below zero or 6 standard deviations above the mean
	truncating covariate values more than 5 standard deviations from the mean
	total variables available for analysis: 59 gene sets

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
	testing direction: one-sided, positive (sets), two-sided (covar)
	conditioning on internal variables:
		gene size, log(gene size)
		gene density, log(gene density)
		inverse mac, log(inverse mac)
	analysing individual variables

	analysing single-variable models (number of models: 59)
	writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level1.Test_top10.Top10pct.gsa.out

End time is 16:30:49, Thursday 03 Oct 2024 (elapsed: 00:00:03)
Reading enrichment results file into R.
Running MAGMA: Top 10% mode
Mapping gene symbols in specificity_deciles matrix to entrez IDs.
Constructing top10% gene marker sets for 64 cell-types.
Welcome to MAGMA v1.10 (custom)
Using flags:
	--gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
	--set-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f441fb161
	--out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level2.Test_top10.Top10pct

Start time is 16:30:50, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
	12190 genes read from file
Loading gene-set annotation...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f441fb161... 
	63 gene-set definitions read from file
	found 63 gene sets containing genes defined in genotype data (containing a total of 10359 unique genes)
Preparing variables for analysis...
	truncating Z-scores 3 points below zero or 6 standard deviations above the mean
	truncating covariate values more than 5 standard deviations from the mean
	total variables available for analysis: 63 gene sets

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
	testing direction: one-sided, positive (sets), two-sided (covar)
	conditioning on internal variables:
		gene size, log(gene size)
		gene density, log(gene density)
		inverse mac, log(inverse mac)
	analysing individual variables

	analysing single-variable models (number of models: 63)
	writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level2.Test_top10.Top10pct.gsa.out

End time is 16:30:53, Thursday 03 Oct 2024 (elapsed: 00:00:03)
Reading enrichment results file into R.
Running MAGMA: Top 10% mode
Mapping gene symbols in specificity_deciles matrix to entrez IDs.
Constructing top10% gene marker sets for 125 cell-types.
Welcome to MAGMA v1.10 (custom)
Using flags:
	--gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
	--set-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f7cb67a37
	--out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level3.Test_top10.Top10pct

Start time is 16:30:53, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
	12190 genes read from file
Loading gene-set annotation...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f7cb67a37... 
	124 gene-set definitions read from file
	found 124 gene sets containing genes defined in genotype data (containing a total of 10158 unique genes)
Preparing variables for analysis...
	truncating Z-scores 3 points below zero or 6 standard deviations above the mean
	truncating covariate values more than 5 standard deviations from the mean
	total variables available for analysis: 124 gene sets

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
	testing direction: one-sided, positive (sets), two-sided (covar)
	conditioning on internal variables:
		gene size, log(gene size)
		gene density, log(gene density)
		inverse mac, log(inverse mac)
	analysing individual variables

	analysing single-variable models (number of models: 124)
	writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level3.Test_top10.Top10pct.gsa.out

End time is 16:30:56, Thursday 03 Oct 2024 (elapsed: 00:00:03)
Reading enrichment results file into R.
Running MAGMA: Top 10% mode
Mapping gene symbols in specificity_deciles matrix to entrez IDs.
Constructing top10% gene marker sets for 1362 cell-types.
Welcome to MAGMA v1.10 (custom)
Using flags:
	--gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
	--set-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f3011f0d8
	--out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level4.Test_top10.Top10pct

Start time is 16:30:59, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
	12190 genes read from file
Loading gene-set annotation...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f3011f0d8... 
	1361 gene-set definitions read from file
	found 1361 gene sets containing genes defined in genotype data (containing a total of 10590 unique genes)
Preparing variables for analysis...
	truncating Z-scores 3 points below zero or 6 standard deviations above the mean
	truncating covariate values more than 5 standard deviations from the mean
	total variables available for analysis: 1361 gene sets

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
	testing direction: one-sided, positive (sets), two-sided (covar)
	conditioning on internal variables:
		gene size, log(gene size)
		gene density, log(gene density)
		inverse mac, log(inverse mac)
	analysing individual variables

	analysing single-variable models (number of models: 1361)
	writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level4.Test_top10.Top10pct.gsa.out
	writing gene information to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level4.Test_top10.Top10pct.gsa.genes.out
	writing gene analysis results per significant result (after multiple testing correction, at alpha = 0.05) to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level4.Test_top10.Top10pct.gsa.sets.genes.out

End time is 16:31:02, Thursday 03 Oct 2024 (elapsed: 00:00:03)
Reading enrichment results file into R.
Running MAGMA: Top 10% mode
Mapping gene symbols in specificity_deciles matrix to entrez IDs.
Constructing top10% gene marker sets for 1865 cell-types.
Welcome to MAGMA v1.10 (custom)
Using flags:
	--gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
	--set-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f69db0d8f
	--out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level5.Test_top10.Top10pct

Start time is 16:31:04, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
	12190 genes read from file
Loading gene-set annotation...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f69db0d8f... 
	1864 gene-set definitions read from file
	found 1864 gene sets containing genes defined in genotype data (containing a total of 10612 unique genes)
Preparing variables for analysis...
	truncating Z-scores 3 points below zero or 6 standard deviations above the mean
	truncating covariate values more than 5 standard deviations from the mean
	total variables available for analysis: 1864 gene sets

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
	testing direction: one-sided, positive (sets), two-sided (covar)
	conditioning on internal variables:
		gene size, log(gene size)
		gene density, log(gene density)
		inverse mac, log(inverse mac)
	analysing individual variables

	analysing single-variable models (number of models: 1864)
	writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level5.Test_top10.Top10pct.gsa.out
	writing gene information to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level5.Test_top10.Top10pct.gsa.genes.out
	writing gene analysis results per significant result (after multiple testing correction, at alpha = 0.05) to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level5.Test_top10.Top10pct.gsa.sets.genes.out

End time is 16:31:08, Thursday 03 Oct 2024 (elapsed: 00:00:04)
Reading enrichment results file into R.
Saving results ==> /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/Test/MAGMA_celltyping.Test.rds
Warning message:
In file(file, "rt") :
  cannot open file '/var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level2.Test_linear.Linear.gsa.out': No such file or directory

Expected behavior

Returns enrichment results.

Session info


R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.0

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] MAGMA.Celltyping_2.0.13

loaded via a namespace (and not attached):
  [1] splines_4.4.1                 later_1.3.2                   BiocIO_1.14.0                
  [4] bitops_1.0-8                  ggplotify_0.1.2               filelock_1.0.3               
  [7] tibble_3.2.1                  R.oo_1.26.0                   rex_1.2.1                    
 [10] XML_3.99-0.17                 lifecycle_1.0.4               rstatix_0.7.2                
 [13] rprojroot_2.0.4               lattice_0.22-6                MASS_7.3-61                  
 [16] crosstalk_1.2.1               backports_1.5.0               magrittr_2.0.3               
 [19] sass_0.4.9                    limma_3.60.5                  plotly_4.10.4                
 [22] rmarkdown_2.28                jquerylib_0.1.4               remotes_2.5.0.9000           
 [25] dlstats_0.1.7                 yaml_2.3.10                   httpuv_1.6.15                
 [28] sessioninfo_1.2.2             pkgbuild_1.4.4                HGNChelper_0.8.14            
 [31] RColorBrewer_1.1-3            DBI_1.2.3                     minqa_1.2.8                  
 [34] abind_1.4-8                   pkgload_1.4.0                 zlibbioc_1.50.0              
 [37] rvcheck_0.2.1                 GenomicRanges_1.56.1          purrr_1.0.2                  
 [40] R.utils_2.12.3                BiocGenerics_0.50.0           RCurl_1.98-1.16              
 [43] yulab.utils_0.1.7             VariantAnnotation_1.50.0      rappdirs_0.3.3               
 [46] rworkflows_1.0.3              GenomeInfoDbData_1.2.12       IRanges_2.38.1               
 [49] S4Vectors_0.42.1              tidytree_0.4.6                testthat_3.2.1.1             
 [52] codetools_0.2-20              DelayedArray_0.30.1           DT_0.33                      
 [55] tidyselect_1.2.1              aplot_0.2.3                   UCSC.utils_1.0.0             
 [58] farver_2.1.2                  lme4_1.1-35.5                 matrixStats_1.4.1            
 [61] stats4_4.4.1                  BiocFileCache_2.12.0          GenomicAlignments_1.40.0     
 [64] jsonlite_1.8.9                ellipsis_0.3.2                Formula_1.2-5                
 [67] tools_4.4.1                   treeio_1.28.0                 Rcpp_1.0.13                  
 [70] glue_1.8.0                    SparseArray_1.4.8             here_1.0.1                   
 [73] xfun_0.47                     usethis_3.0.0                 MatrixGenerics_1.16.0        
 [76] GenomeInfoDb_1.40.1           RNOmni_1.0.1.2                dplyr_1.1.4                  
 [79] withr_3.0.1                   BiocManager_1.30.25           fastmap_1.2.0                
 [82] boot_1.3-31                   fansi_1.0.6                   digest_0.6.37                
 [85] R6_2.5.1                      mime_0.12                     gridGraphics_0.5-1           
 [88] colorspace_2.1-1              RSQLite_2.3.7                 R.methodsS3_1.8.2            
 [91] utf8_1.2.4                    tidyr_1.3.1                   generics_0.1.3               
 [94] renv_1.0.9                    data.table_1.16.0             rtracklayer_1.64.0           
 [97] httr_1.4.7                    htmlwidgets_1.6.4             S4Arrays_1.4.1               
[100] pkgconfig_2.0.3               gtable_0.3.5                  blob_1.2.4                   
[103] covr_3.6.4                    SingleCellExperiment_1.26.0   XVector_0.44.0               
[106] brio_1.1.5                    htmltools_0.5.8.1             carData_3.0-5                
[109] profvis_0.4.0                 scales_1.3.0                  Biobase_2.64.0               
[112] png_0.1-8                     ggfun_0.1.6                   ggdendro_0.2.0               
[115] knitr_1.48                    rstudioapi_0.16.0             reshape2_1.4.4               
[118] rjson_0.2.23                  badger_0.2.4                  nlme_3.1-166                 
[121] curl_5.2.3                    nloptr_2.1.1                  cachem_1.1.0                 
[124] stringr_1.5.1                 BiocVersion_3.19.1            miniUI_0.1.1.1               
[127] parallel_4.4.1                AnnotationDbi_1.66.0          desc_1.4.3                   
[130] restfulr_0.0.15               pillar_1.9.0                  grid_4.4.1                   
[133] vctrs_0.6.5                   urlchecker_1.0.1              promises_1.3.0               
[136] ggpubr_0.6.0                  car_3.1-3                     dbplyr_2.5.0                 
[139] xtable_1.8-4                  evaluate_1.0.0                orthogene_1.10.0             
[142] GenomicFeatures_1.56.0        cli_3.6.3                     compiler_4.4.1               
[145] Rsamtools_2.20.0              rlang_1.1.4                   crayon_1.5.3                 
[148] grr_0.9.5                     ggsignif_0.6.4                gprofiler2_0.2.3             
[151] EWCE_1.12.0                   plyr_1.8.9                    fs_1.6.4                     
[154] stringi_1.8.4                 viridisLite_0.4.2             ewceData_1.12.0              
[157] BiocParallel_1.38.0           assertthat_0.2.1              babelgene_22.9               
[160] munsell_0.5.1                 Biostrings_2.72.1             lazyeval_0.2.2               
[163] gh_1.4.1                      devtools_2.4.5                homologene_1.4.68.19.3.27    
[166] Matrix_1.7-0                  ExperimentHub_2.12.0          MungeSumstats_1.13.4         
[169] BSgenome_1.72.0               patchwork_1.3.0               bit64_4.5.2                  
[172] ggplot2_3.5.1                 KEGGREST_1.44.1               statmod_1.5.0                
[175] shiny_1.9.1                   SummarizedExperiment_1.34.0   interactiveDisplayBase_1.42.0
[178] AnnotationHub_3.12.0          googleAuthR_2.0.2             gargle_1.5.2                 
[181] broom_1.0.7                   memoise_2.0.1                 bslib_0.8.0                  
[184] ggtree_3.12.0                 bit_4.5.0                     splitstackshape_1.4.8        
[187] ape_5.8   
@bschilder bschilder added the bug Something isn't working label Oct 4, 2024
@bschilder bschilder self-assigned this Oct 4, 2024
@bschilder
Copy link
Collaborator Author

bschilder commented Oct 4, 2024

Originally reported by @Al-Murphy. Potentially related to:

Just as an update, I also tried the different versions of the human cell landscape CTD using github tags 'v0.1.10' and 'v0.0.1' but this didn't help either!

@bschilder
Copy link
Collaborator Author

bschilder commented Oct 4, 2024

One thing I'm noticing is that the error only occurs with specific combinations of CTD level and test type.

Specifically, CTD level 2 with the linear tests is the only one that's failing.

@bschilder
Copy link
Collaborator Author

We can see the celltype names aren't duplicated in the original CTD:

colnames(HCL$level_2$specificity_quantiles)[duplicated(colnames(HCL$level_2$specificity_quantiles))]
> character(0)

This remains true even after restandardising the CTD:

HCL2=EWCE::standardise_ctd(HCL, force_standardise = T)
colnames(HCL2$level_2$specificity_quantiles)[duplicated(colnames(HCL2$level_2$specificity_quantiles))]
> character(0)

So something is happening further downstream of this step.

@bschilder
Copy link
Collaborator Author

Ok, I think i pinpointed the reason.

At level 2 the CTD contains the cell types "Fetal_Neuron" and "Fetal_neuron".
I think this is simply an inconsistency with how the original HCL authors annotated their cell types (I've noticed this a lot in that dataset).
You can see this by reading in the gene covariate file referenced in the error message.

gcf <- data.table::fread("/var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f6515b4df")
cols <- grep("fetal_neuron",names(gcf), ignore.case = TRUE, value = TRUE)
cols 
>  "Fetal_neuron" "Fetal_Neuron"
gcf[,cols,with=FALSE]
       Fetal_neuron Fetal_Neuron
              <int>        <int>
    1:           19           13
    2:           19           26
    3:            6           11
    4:            0            0
    5:            4            0
   ---                          
17956:            0            0
17957:           27            0
17958:            0            0
17959:            0            0
17960:           35            9

R doesn't recognize these as duplicates, but internally MAGMA must be ignoring case so it does recognize them as duplicates and thus throws the error. Specifically at this step:

cca_out <- calculate_celltype_associations_linear(

@bschilder
Copy link
Collaborator Author

I could add a step to drop dup columns when ignoring case, but the real solution is to regenerate the CTD after correcting the cell type annotations, because this will alter the expression and specificity scores.

@bschilder bschilder changed the title ERROR - reading gene covariate file: duplicate variable name 'Adult_Fetal_Neuron' ERROR - reading gene covariate file: duplicate variable name 'Fetal_Neuron' Oct 4, 2024
@bschilder
Copy link
Collaborator Author

I've made some updates in MAGMA.Celltyping 2.0.14 (now pushed to GH), so that it automatically drops duplicate celltypes, but gives users more informative messages about why they're being dropped and which ones. It also recommends to them to reprocess the CTD accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant