Generation of large matrices and data frames takes forever #34

cjfields · 2022-02-14T04:27:54Z

Seeing some issues when working with very large data sets (~1000 samples or more, over 1M ASVs), where the simple text outputs take a long time. This primarily is when prepping for QIIME2:

TADA/templates/GenerateSeqTables.R

Line 29 in d0ee9aa

# Generate OTU table for QIIME2 import (rows = ASVs, cols = samples)

or generating a new seq table with the modified IDs:

TADA/templates/GenerateSeqTables.R

Line 22 in d0ee9aa

# Generate OTU table output (rows = samples, cols = ASV)

One workaround is to simply generate default outputs (seq tables and tax tables for phyloseq) but time out for other data, but this will require splitting out those steps, currently found in GenerateSeqTables.R and GenerateTaxTables.R.

The text was updated successfully, but these errors were encountered:

cjfields · 2022-02-14T04:43:47Z

The main culprit is really the seq table and the number of samples. With a current run we have a matrix of 960 sample IDs x 1.3M ASVs (with counts). The tax table with 1.3M ASVs and seven ranks (KPCOPGS) is relatively fast.

cjfields · 2022-02-14T04:57:27Z

There is a bit of redundancy in GenerateSeqTables.R that should also be addressed, namely that seqtab_final.txt and seqtab_final.simple.txt are the same file; this likely occurs from some code rework that we when renaming ASVs. We can wait to address this when the split_denoise branch lands.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generation of large matrices and data frames takes forever #34

Generation of large matrices and data frames takes forever #34

cjfields commented Feb 14, 2022 •

edited

Loading

cjfields commented Feb 14, 2022

cjfields commented Feb 14, 2022

Generation of large matrices and data frames takes forever #34

Generation of large matrices and data frames takes forever #34

Comments

cjfields commented Feb 14, 2022 • edited Loading

cjfields commented Feb 14, 2022

cjfields commented Feb 14, 2022

cjfields commented Feb 14, 2022 •

edited

Loading