The taxonomy of the model filamentous fungus Podospora anserina

Code for the paper Ament-Velásquez et al. (2020) Mycokeys 75: 51-69.

In this study we compiled sequence data for a number of molecular markers (ITS, LSU, rpb2, and beta-tubulin) to resolve the phylogenetic relationships of lineages within the Podosporaceae family, sensu Wang et al. (2019) Studies in Mycology 93:155-252. The primary objective was to determine the relative position of the model species Podospora anserina and the type species of the genus, Podospora fimiseda. Unfortunately, there is a lot of missing data.

The data required for this pipeline is:

The master concatenated alignment in nexus format with annotated limits of the markers (Podosporaceae_20200714.nxs). You can open it with SeaView, for example. We submitted it to TreeBase (accession number XXXX) but I had to fit it to the Mesquite nexus format for that. To make the Mesquite file compatible with this pipeline, you can open it in SeaView and save it as a new nexus file. It should work then. In any case, I put the original nexus here in this repo just in case.
A partition file (in the RAxML style) with the genes to be analyzed by the scripts of Shen et al. (2017) to obtain gene-wise log-likelihood scores (dGLS) values: allmarkers_combining_orders.txt.

The scripts needed:

The Snakemake pipeline Podosporaceae.smk
The configuration file of the pipeline Podosporaceae_config.yaml
1_sitewise_analyzer.pl and 2_genewise_analyzer.pl from Shen et al. (2017)
An R script for plotting: Shen2017_podofam.R

The configuration file

The configuration file contains the paths to the necessary files to run the pipeline.

$ cat Podosporaceae_config.yaml

# Configuration file of the Podosporaceae.smk pipeline

## Master nexus file
masternex: "data/Podosporaceae_20200714.nxs"

## Name of the full concatenation of all markers in the master nexus file
allmarkersname: "allmarkers"

## RAxML-like partition file to calculate the SLS and GLS metrics (It must be named "{allmarkersname}_combining_orders.txt")
orders: "data/allmarkers_combining_orders.txt"

## Filtering for alignments of individual markers
minfrac: 0.45 # min fraction of overlap with the whole alignment length for a sequence to be considered
minlen: 250 

## Scripts
sitewise_analyzer: "scripts/1_sitewise_analyzer.pl"
genewise_analyzer: "scripts/2_genewise_analyzer.pl" 
plotShen: "scripts/Shen2017_podofam.R"

# Outgroup clade
outgroup: ["Lasiosphaeria_ovina_SMH1538", "Zopfiella_tabulata_CBS230.78", "Sordaria_fimicola_SMH4106", "Diplogelasinospora_princeps_FMR13414", "Chaetomium_globosum_CBS148.51", 'Chaetomium_globosum_CBS160.62', 'Cercophora_mirabilis_CBS120402']
# A representative of clades A, B and C in that order
testmonophyly: ["Podospora_anserina_S", "Cercophora_grandiuscula_CBS120013", "Podospora_fimiseda_CBS990.96"]

Building the environment

To run the Snakemake pipeline, I constructed a conda environment. I assume in the following that you have conda installed already.

I named the environment LorePhylogenetics, but you can call it whatever you like :).

$ conda create -n LorePhylogenetics -c bioconda

$ conda activate LorePhylogenetics
$ conda install -c bioconda snakemake-minimal=5.4.4 biopython=1.72 mafft=7.407 iqtree=1.6.8 raxml=8.2.12
$ conda install r-tidyr=1.1.0 # included dplyr 1.0.0 
$ conda install r-cowplot=1.0.0 # it comes with ggplot2 3.1.1
$ conda install -c etetoolkit ete3=3.1.1

Run pipeline locally

First, to get an idea of how the pipeline looks like we can make a rulegraph:

$ conda install -c pkgs/main graphviz=2.40.1 # already there
$ snakemake --snakefile Podosporaceae.smk --configfile Podosporaceae_config.yaml --rulegraph | dot -Tpng > rulegraph.png

However, in this case I'm using Checkpoints right away so almost all of the graph won't be calculated until that checkpoint is ran.

For testing without running the pipeline:

$ conda activate LorePhylogenetics
$ snakemake --snakefile Podosporaceae.smk --configfile Podosporaceae_config.yaml -pn

To run the pipeline I like to make a screen in case the computer is turned off or something.

$ screen -R phylo
$ conda activate LorePhylogenetics
$ snakemake --snakefile Podosporaceae.smk --configfile Podosporaceae_config.yaml -p -j 40 --keep-going --use-conda &> Podosporaceae.log &
[1] 40092

Notice the use of the -j argument, where I specify the number of threads available for the pipeline.

If we produce the graph again:

$ snakemake --snakefile Podosporaceae.smk --configfile Podosporaceae_config.yaml --rulegraph | dot -Tpng > rulegraph.png

Results

After the pipeline runs successfully, there will be a new folder called results that contains the individual ML trees of each partition in the nexus file, as well as a plot with the dGLS per marker and the number of sites supporting T1 and T2.

ShenMetrics.pdf - Figure 4 in the paper; dGLS values per gene (A) and the number of sites supporting each topology for each gene (B) (see below)
Tree files of all partitions (post filtering)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Podosporaceae.smk		Podosporaceae.smk
Podosporaceae_config.yaml		Podosporaceae_config.yaml
README.md		README.md
ShenMetrics.png		ShenMetrics.png
rulegraph_after.png		rulegraph_after.png
rulegraph_before.png		rulegraph_before.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The taxonomy of the model filamentous fungus Podospora anserina

The configuration file

Building the environment

Run pipeline locally

Results

About

Releases

Packages

Languages

License

SLAment/Podosporaceae

Folders and files

Latest commit

History

Repository files navigation

The taxonomy of the model filamentous fungus Podospora anserina

The configuration file

Building the environment

Run pipeline locally

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages