Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception on differential enrichment #312

Open
Mitmischer opened this issue Mar 17, 2024 · 1 comment
Open

Exception on differential enrichment #312

Mitmischer opened this issue Mar 17, 2024 · 1 comment

Comments

@Mitmischer
Copy link

Describe the bug
In order to perform differential enrichment, I wanted to try on a single gene, but gimme crashes.

To Reproduce
Steps to reproduce the behavior:

gimme maelstrom -N120 IFIH1.fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa /tmp/maelstrom_out

IFIH1.fasta is attached: IFIH1.fasta.txt

Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa can be downloaded from
here.

Expected behavior
I expected the program to run in full or to provide a concise error message.

Error logs

2024-03-17 11:07:42,884 - INFO - Starting maelstrom                                                                                                                                                                        2024-03-17 11:07:42,890 - INFO - motif scanning (counts)                                                                                          
2024-03-17 11:07:42,890 - INFO - reading table                                                                                                                                 
2024-03-17 11:07:45,717 - INFO - using 14000 sequences                                                                                            
2024-03-17 11:08:34,427 - INFO - setting threshold                                                                                                
Determining FPR-based threshold: 100%|██████████████████████████████████████████████████████████████| 10633/10633 [12:40<00:00, 13.98 sequences/s]
2024-03-17 11:21:23,647 - INFO - creating count table                                                                                             
Scanning: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:09<00:00,  9.37s/ sequences]
2024-03-17 11:21:33,022 - INFO - done                                                                                                             
2024-03-17 11:21:33,022 - INFO - creating dataframe                                                                                               
2024-03-17 11:21:33,435 - INFO - motif scanning (scores)                                                                                          
2024-03-17 11:21:33,435 - INFO - reading table                                                                                                    
2024-03-17 11:21:39,620 - INFO - using 14000 sequences                                                                                            
2024-03-17 11:22:13,126 - INFO - creating score table (z-score, GC%)                                                                              
Determining mean and stddev for motifs: 100%|██████████████████████████████████████████████████████████| 19756/19756 [11:18<00:00, 29.13 motifs/s]             
Scanning: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.55s/ sequences]
2024-03-17 11:33:43,722 - INFO - done                                                                                                                                          
2024-03-17 11:33:43,722 - INFO - creating dataframe                                                                                               
2024-03-17 11:33:44,235 - INFO - Selecting non-redundant motifs                                                                                   
Traceback (most recent call last):                                                                                                                
  File "/home/mabe/.conda/envs/mabe/bin/gimme", line 12, in <module>                                                                                                           
    cli(sys.argv[1:])                                                                                                                                          
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/gimmemotifs/cli.py", line 755, in cli                                                                         
    args.func(args)                                                                                                                                                            
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/gimmemotifs/commands/maelstrom.py", line 42, in maelstrom                                                                                                 
    run_maelstrom(                                                                                                                                                                                 
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/gimmemotifs/maelstrom/__init__.py", line 239, in run_maelstrom                                                
    fa.fit(scores)                                                                                                                                                             
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/sklearn/base.py", line 1474, in wrapper                                                                                                                   
    return fit_method(estimator, *args, **kwargs)                                                                                                                              
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/sklearn/cluster/_agglomerative.py", line 1329, in fit                                                                             
    super()._fit(X.T)                                                                                                                                                          
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/sklearn/cluster/_agglomerative.py", line 1066, in _fit                                                                            
    out = memory.cache(tree_builder)(                                                            
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/joblib/memory.py", line 353, in __call__                                                                                          
    return self.func(*args, **kwargs)                                                                                                                                                                                      
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/sklearn/cluster/_agglomerative.py", line 706, in _complete_linkage                                                                                        
    return linkage_tree(*args, **kwargs)                                                                                                                                                                                   
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/sklearn/cluster/_agglomerative.py", line 585, in linkage_tree                                                                                             
    out = hierarchy.linkage(X, method=linkage, metric=affinity)                                                                                                                                                            
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/scipy/cluster/hierarchy.py", line 1030, in linkage                                                                                                            raise ValueError("The condensed distance matrix must contain only "                                                                                                                                                    
ValueError: The condensed distance matrix must contain only finite values.

Installation information (please complete the following information):

  • OS: [Ubuntu 22.04.4 LTS]
  • Installation [conda]
  • Version [0.18.0]

Additional context
As I am new to the software, this might as well be an error on my side (or maybe the statistics just doesn't work out on a single gene) . Still, I think that the error handling/error message should be better!

@maxfieldk
Copy link

I am having the same issue. You haven't managed to find a way around this @Mitmischer have you?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants