BigNmf (Big Data NMF) is a python 3 package for conducting analysis using NMF algorithms.
NMF (Non-negative matrix factorization) factorizes a non-negative input matrix into non-negative factors. The algorithm has an inherent clustering property and has been gaining attention in various fields especially in biological data analysis.
Brunet et al in their paper demonstrated NMF's superior capability in clustering the leukemia dataset compared to standard clustering algorithms like Hierarchial clustering and Self-organizeing maps.
The following are the algorithms currently available. If you would like to know more about the algorithm, the links below lead to their papers of origin.
- Single NMF
- Joint NMF
This package is available on the PyPi repository. Therefore you can install, by running the following.
pip3 install bignmf
The following examples illustrate typical usage of the algorithm.
from bignmf.datasets.datasets import Datasets
from bignmf.models.snmf.standard import StandardNmf
Datasets.list_all()
data=Datasets.read("SimulatedX1")
k = 3
iter =100
trials = 50
model = StandardNmf(data,k)
# Runs the model
model.run(trials, iter, verbose=0)
print(model.error)
# Clusters the data
model.cluster_data()
print(model.h_cluster)
#Calculates the consensus matrices
model.calc_consensus_matrices()
print(model.consensus_matrix_w)
from bignmf.models.jnmf.integrative import IntegrativeJnmf
from bignmf.datasets.datasets import Datasets
Datasets.list_all()
data_dict = {}
data_dict["sim1"] = Datasets.read("SimulatedX1")
data_dict["sim2"] = Datasets.read("SimulatedX2")
k = 3
iter =100
trials = 50
lamb = 0.1
model = IntegrativeJnmf(data_dict, k, lamb)
# Runs the model
model.run(trials, iter, verbose=0)
print(model.error)
# Clusters the data
model.cluster_data()
print(model.h_cluster)
#Calculates the consensus matrices
model.calc_consensus_matrices()
print(model.consensus_matrix_w)
Here is the extensive documentation for more details.