This repository contains code used to reanalyse Human Microbiome Project data to benchmark MAPseq.
Raw data from the HMP is pre-processed and matched to samples and sequencing subregions in the script This generates one folder per sample for subsequent sample-wise (parallel) processing. It also generates global mapping tables.
The script then calls the script for each sample in parallel to remove chimeras.
The script calls the script for each sample in parallel to run INFERNAL. Afterwards, the script stitches together individual per-sample alignments and de-replicates and de-noises them a bit. This script also defines the "consensus" lists of sequences to be used downstream by clustering methods – keeping only those sequences which are non-chimeric and align satisfactorily to the target subregion.
The scripts "make_otus.*.sh" then call the different mapping/clustering tools to generate OTU sets and translate them into R-readable formats (OTU tables etc.).
Finally, the script hmp.benchmark.R contains all the code for R-based analyses, as detailed in the manuscript.
(Raw) results are available in the folder results.