KMPTC

Multi-platform application for clustering phylogenetic trees using K-means and inferring multiple supertrees

Table of Contents

About the project
Installation
Examples
Contact

About the project

A new fast method for clustering phylogenetic trees using K-means and inferring multiple supertrees.

About

Program : KMeansPhylogeneticTreesClustering - 2022
Authors : Benjamin Albertelli and Nadia Tahiri (University of Sherbrooke)
Version : 1.0.0

This program clusters phylogenetic trees using the k-means partitioning algorithm.
These trees may have the same (the multiple consensus tree problem) or different, but mutually overlapping, sets of leaves (the multiple supertree problem).

Phylogenetic trees must be given in the Newick format (program input). A partitioning of the input trees in K clusters of trees is returned as output. The optimal number of clusters can be determined either:

by the Calinski-Harabasz (CH) or
by the Ball-Hall (BH) cluster validity index adapted for tree clustering.

A supertree can then be inferred for each cluster of trees.The Robinson and Foulds topological distance is used in the objective function of K-means. The list of the program parameters is specified below.

Requirements

This package works with :

boost 1.78.0_1 for boost regex library (boost has to be added to your PATH, check doc/Boost_Installation.pdf)
git 2.35.1
macOS Monterey version 12.5
macOS Terminal version 2.12.7

Warning : that's not the minimal requirement, the software should work with previous versions of software.

Installation

First copy and paste this link in your shell after you go to the folder where you want to download the software :

$ git clone https://github.com/tahiri-lab/KMeansPhylogeneticTreesClustering.git

Then please execute these two lines.

$ cd src
$ make install

Help

If you need help for execution, please execute this line in the src folder:

$ make help

Examples

Please execute the following command line:

=> For trees: ./KMPTC -tree input_file cluster_validity_index α Kmin Kmax

=> input_file: the input file for the program
=> cluster_validity_index: the cluster validity index used in K-means (1 for Calinski-Harabasz and 2 for Ball-Hall)
=> α: is the penalty parameter for species overlap in phylogenetic trees (must be between 0 and 1)
=> Kmin: is the minimum number of clusters in K-means.

For CH, Kmin>=2,
For BH, Kmin>=1.

=> Kmax: the maximum number of clusters in K-means.

Kmax must be less or equal to N-1 (where N is the number of input trees).

Command line execution examples:

input_file = data/Covid-19_trees.txt, cluster_validity_index = CH, α = 0.1, Kmin = 3, Kmax = 8):

$ ./KMPTC -tree ../data/Covid-19_trees.txt 1 0.1 3 8

Or by using the Makefile instruction as follows (as in the previous example):

$ make execute

input_file = data/all_trees_woese.txt, cluster_validity_index = CH, α = 1, Kmin = 2, Kmax = 10):

$ ./KMPTC -tree ../data/all_trees_woese.txt 1 1 2 10

Input

The input data sets are located in the folder "data".
You can also use your own data, please ensure that the file respect the needed format.

Output

Output folder will be created on your machine when executing "make install" command from src directory.

See the folder "output"
The output is in the following files:

stat.csv - for the clustering statistics.
output.txt - for the cluster content.

Clean Project

To clean the project, please execute:

$ make clean

Contact

Please email us at : Nadia.Tahiri@USherbrooke.ca for any question or feedback.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.github/workflows		.github/workflows
data		data
doc		doc
img		img
paper		paper
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KMPTC

Multi-platform application for clustering phylogenetic trees using K-means and inferring multiple supertrees

About the project

About

Requirements

Installation

Help

Examples

Input

Output

Clean Project

Contact

About

Releases 1

Packages

Contributors 3

Languages

License

tahiri-lab/KMeansPhyloTreesClustering

Folders and files

Latest commit

History

Repository files navigation

KMPTC

Multi-platform application for clustering phylogenetic trees using K-means and inferring multiple supertrees

About the project

About

Requirements

Installation

Help

Examples

Input

Output

Clean Project

Contact

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages