Skip to content

This generator can be used to generate a specified number of phylogenetic trees (or clusters of trees in the cluster version) in Newick format with a variable number of leaves and with some level of overlap between trees in clusters.

License

Notifications You must be signed in to change notification settings

tahiri-lab/GPTree

Repository files navigation

GPTree

Generator of Phylogenetic trees (for supertrees)


💡 A new vesion of the Generator of clusters of phylogenetic trees with overlapping and HGT is available here.

This solution (Generator of Phylogenetic trees) generates phylogenetic trees in Newick format with a specified number of leaves and a controlled level of overlap between the trees. The generator simulates gene trees with horizontal gene transfer (HGT) and is useful for scientific experiments such as testing clustering algorithms or inferring supertrees.

📑 If you use GPTree generator in your research or experiments, please consider citing the following paper:

Koshkarov, A., & Tahiri, N. (2023). GPTree: Generator of Phylogenetic Trees with Overlapping and Biological Events for Supertree Inference. In BIOINFORMATICS (pp. 212-219). DOI: Link to Paper

🏆 Thank you for your contribution to the community!

The generator is based on the use of the AsymmeTree library.

Features

  • Generates phylogenetic trees with horizontal gene transfer (HGT).
  • Allows users to specify the number of leaves and overlap level between trees.
  • Outputs gene trees and species trees in Newick format.
  • Designed to handle large datasets with configurable parameters.

Requirements

The script depends on the following Python libraries:

  • ete3
  • PyQt5
  • asymmetree
  • pandas

Input Parameters

The user needs to provide several initial parameters:

  1. Lmin: Minimum number of leaves per tree (integer, 5 ≤ Lmin < 500).
  2. Lmax: Maximum number of leaves per tree (integer, Lmin < Lmax ≤ 500).
  3. Ngen: Number of trees to generate (integer, 3 ≤ Ngen ≤ 500).
  4. Plevel: Average level of overlap (common leaves) between trees, as a decimal (0.2 ≤ plevel ≤ 0.7).

The overlap level between trees is calculated based on the number of common leaves between them, with additional controls to ensure the desired level of overlap.

Currently, the generator works slow for the levels of overlap <0.2 and >0.7.

The basic workflow:

The basic workflow

Usage

To run the script, use the following command:

python gptree.py Lmin Lmax Ngen plevel

Example:

python gptree.py 15 25 30 0.5

This will generate 30 trees with leaves ranging from 15 to 25 and an average overlap of 0.5. The trees will be saved in the following files:

  • Gene trees: genetrees_50.txt
  • Species trees: speciestrees_50.txt

Output

The generated trees are saved in Newick format:

  • genetrees_XX.txt: Contains the gene trees with the specified overlap level (XX = plevel * 100).
  • speciestrees_XX.txt: Contains the species trees used for generating the gene trees.

See examples of generated datasets here.

The Jupiter notebook also contains steps to validate the generated dataset (tree visualization, number of trees and leaves, and level of overlap).

About

This generator can be used to generate a specified number of phylogenetic trees (or clusters of trees in the cluster version) in Newick format with a variable number of leaves and with some level of overlap between trees in clusters.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published