This repository has been archived by the owner on Mar 2, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
29 changed files
with
1,858 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
############################################################################# | ||
# Development make file. | ||
############################################################################# | ||
|
||
SOURCE_DIR := diffusion_maps | ||
|
||
$(SOURCE_DIR)/version.py: | ||
@scripts/make-version > $@ | ||
|
||
.PHONY: $(SOURCE_DIR)/version.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
The MIT License (MIT) | ||
Copyright (c) 2017 Juan M. Bello Rivas <jmbr@superadditive.com> | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining | ||
a copy of this software and associated documentation files (the | ||
"Software"), to deal in the Software without restriction, including | ||
without limitation the rights to use, copy, modify, merge, publish, | ||
distribute, sublicense, and/or sell copies of the Software, and to | ||
permit persons to whom the Software is furnished to do so, subject to | ||
the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be | ||
included in all copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, | ||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF | ||
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | ||
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE | ||
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION | ||
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION | ||
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
#+TITLE: Diffusion Maps and Geometric Harmonics for Python | ||
#+AUTHOR: Juan M. Bello Rivas | ||
#+EMAIL: jmbr@superadditive.com | ||
#+DATE: <2017-05-20 Sat> | ||
|
||
* Overview | ||
|
||
The =diffusion-maps= library for Python provides a fast and accurate implementation of diffusion maps[fn:1] and geometric harmonics[fn:2]. Its speed stems from the use of sparse linear algebra and (optionally) graphics processing units to accelerate computations. | ||
The included code routinely solves eigenvalue problems 3 x faster than SciPy using GPUs on matrices with over 200 million non-zero entries. | ||
|
||
The package includes a command-line utility for the quick calculation of diffusion maps on data sets. | ||
|
||
Some of the features of the =diffusion-maps= module include: | ||
|
||
- Fast evaluation of distance matrices using nearest neighbors. | ||
|
||
- Fast and accurate computation of eigenvalue/eigenvector pairs using sparse linear algebra. | ||
|
||
- Optional GPU-accelerated sparse linear algebra routines. | ||
|
||
- Optional interface to the [[https://github.com/opencollab/arpack-ng][ARPACK-NG]] library. | ||
|
||
- Simple and easily modifiable code. | ||
|
||
[fn:1] Coifman, R. R., & Lafon, S. (2006). Diffusion maps. Applied and Computational Harmonic Analysis, 21(1), 5–30. http://doi.org/10.1016/j.acha.2006.04.006 | ||
|
||
[fn:2] Coifman, R. R., & Lafon, S. (2006). Geometric harmonics: A novel tool for multiscale out-of-sample extension of empirical functions. Applied and Computational Harmonic Analysis, 21(1), 31–52. http://doi.org/10.1016/j.acha.2005.07.005 | ||
|
||
#+CAPTION: Geometric harmonics for $z = sin(x^2 + y^2)$. | ||
#+NAME: fig:geometric-harmonics | ||
[[./geometric-harmonics.png]] | ||
|
||
* Prerequisites | ||
|
||
The library is implemented in Python 3.5+ and uses [[http://www.numpy.org/][NumPy]] and [[https://www.scipy.org/][SciPy]]. It is recommended to install [[https://mathema.tician.de/software/pycuda/][PyCUDA]] to enable the GPU-accelerated eigenvalue solver. | ||
|
||
The =diffusion-maps= command can display the resulting diffusion maps using [[https://matplotlib.org/][Matplotlib]] if it is available. | ||
|
||
* Installation | ||
|
||
Use ~python setup.py install~ to install on your system or ~python setup.py install --user~ for a user-specific installation. | ||
|
||
* Command-line utility | ||
|
||
The ~diffusion-maps~ command reads data sets stored in NPY format. The simplest way to use it is to invoke it as follows: | ||
|
||
#+BEGIN_SRC bash | ||
diffusion-maps DATA-SET.NPY EPSILON-VALUE | ||
#+END_SRC | ||
|
||
There exist parameters to save and visualize different types of results, to specify how many eigenvalue/eigenvector pairs to compute, etc. See the help page displayed by: | ||
|
||
#+BEGIN_SRC bash | ||
diffusion-maps --help | ||
#+END_SRC | ||
|
||
* Additional documentation | ||
|
||
[[http://www.sphinx-doc.org/en/stable/][Sphinx]]-based API documentation is available in the =doc/= folder. Run | ||
|
||
#+BEGIN_SRC bash | ||
make -C doc html | ||
#+END_SRC | ||
|
||
to build the documentation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"""Diffusion maps module. | ||
""" | ||
|
||
from .diffusion_maps import * | ||
from .geometric_harmonics import * | ||
from .version import * |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
"""Clock for keeping track of the wall time. | ||
""" | ||
|
||
|
||
__all__ = ['ClockError', 'Clock'] | ||
|
||
import datetime | ||
import time | ||
|
||
from typing import Optional # noqa: F401. Used for mypy. | ||
|
||
|
||
class ClockError(Exception): | ||
"""Invalid clock operation.""" | ||
pass | ||
|
||
|
||
class Clock: | ||
"""Clock for keeping track of time. | ||
""" | ||
def __init__(self) -> None: | ||
self.start = None # type: Optional[float] | ||
self.stop = None # type: Optional[float] | ||
|
||
def tic(self) -> None: | ||
"""Start the clock.""" | ||
self.start = time.monotonic() | ||
self.stop = None | ||
|
||
def toc(self) -> None: | ||
"""Stop the clock.""" | ||
assert self.start is not None | ||
self.stop = time.monotonic() | ||
|
||
def __str__(self) -> str: | ||
"""Human-readable representation of elapsed time.""" | ||
if self.start is None: | ||
raise ClockError('The clock has not been started') | ||
else: | ||
start = datetime.datetime.fromtimestamp(self.start) | ||
|
||
if self.stop is None: | ||
stop = datetime.datetime.fromtimestamp(time.monotonic()) | ||
else: | ||
stop = datetime.datetime.fromtimestamp(self.stop) | ||
|
||
delta = stop - start | ||
|
||
return str(delta) | ||
|
||
def __enter__(self): | ||
if self.start is None and self.stop is None: | ||
self.tic() | ||
|
||
return self | ||
|
||
def __exit__(self, exc_type, exc_value, traceback): | ||
if self.start is not None: | ||
self.toc() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
#!/usr/bin/env python | ||
# PYTHON_ARGCOMPLETE_OK -*- mode: python -*- | ||
|
||
"""Command line interface for the computation of diffusion maps. | ||
""" | ||
|
||
import argparse | ||
try: | ||
import argcomplete | ||
except ImportError: | ||
argcomplete = None | ||
import logging | ||
import os | ||
import sys | ||
|
||
import numpy as np | ||
|
||
import diffusion_maps.default as default | ||
import diffusion_maps.version as version | ||
from diffusion_maps import downsample, DiffusionMaps | ||
from diffusion_maps.profiler import Profiler | ||
from diffusion_maps.plot import plot_eigenvectors | ||
|
||
|
||
def output_eigenvalues(ew: np.array) -> None: | ||
"""Output the table of eigenvalues. | ||
""" | ||
logging.info('List of eigenvalues:') | ||
fields = ['Real part', 'Imaginary part'] | ||
fmt = '{:>12} | {:<21}' | ||
logging.info('-' * 30) | ||
logging.info(fmt.format(*fields)) | ||
logging.info('-' * 30) | ||
fmt = '{:+2.9f} | {:+2.9f}' | ||
for eigenvalue in ew: | ||
logging.info(fmt.format(eigenvalue.real, eigenvalue.imag)) | ||
|
||
|
||
def use_cuda(args: argparse.Namespace) -> bool: | ||
"""Determine whether to use GPU-accelerated code or not. | ||
""" | ||
try: | ||
import pycuda # noqa | ||
use_cuda = True and not args.no_gpu | ||
except ImportError: | ||
use_cuda = False | ||
|
||
return use_cuda | ||
|
||
|
||
def main(): | ||
parser = argparse.ArgumentParser(description='Diffusion maps') | ||
parser.add_argument('data_file', metavar='FILE', type=str, | ||
help='process %(metavar)s (should be in NPY format)') | ||
parser.add_argument('epsilon', metavar='VALUE', type=float, | ||
help='kernel bandwidth') | ||
parser.add_argument('-n', '--num-samples', type=float, metavar='NUM', | ||
required=False, help='number of data points to use') | ||
parser.add_argument('-e', '--num-eigenpairs', type=int, metavar='NUM', | ||
required=False, default=default.num_eigenpairs, | ||
help='number of eigenvalue/eigenvector pairs to ' | ||
'compute') | ||
parser.add_argument('-c', '--cut-off', type=float, required=False, | ||
metavar='DISTANCE', help='cut-off to use to enforce ' | ||
'sparsity in the diffusion maps computation.') | ||
parser.add_argument('-o', '--output-data', type=str, required=False, | ||
default='actual-data.npy', metavar='FILE', help='save ' | ||
'actual data used in computation to %(metavar)s') | ||
parser.add_argument('-w', '--eigenvalues', type=str, | ||
default='eigenvalues.dat', required=False, | ||
metavar='FILE', help='save eigenvalues to ' | ||
'%(metavar)s') | ||
parser.add_argument('-v', '--eigenvectors', type=str, | ||
default='eigenvectors.npy', required=False, | ||
metavar='FILE', help='save eigenvectors to ' | ||
'%(metavar)s') | ||
parser.add_argument('-m', '--matrix', type=str, required=False, | ||
metavar='FILE', help='save transition matrix to ' | ||
'%(metavar)s') | ||
parser.add_argument('-p', '--plot', action='store_true', default=False, | ||
help='plot first two eigenvectors') | ||
parser.add_argument('--no-gpu', action='store_true', required=False, | ||
help='disable GPU eigensolver') | ||
parser.add_argument('--debug', action='store_true', required=False, | ||
help='print debugging information') | ||
parser.add_argument('--profile', required=False, metavar='FILE', | ||
type=argparse.FileType('w', encoding='utf-8'), | ||
help='run under profiler and save report to ' | ||
'%(metavar)s') | ||
|
||
args = parser.parse_args(sys.argv[1:]) | ||
|
||
if args.debug is True: | ||
logging.basicConfig(level=logging.DEBUG, format='%(message)s') | ||
else: | ||
logging.basicConfig(level=logging.INFO, format='%(message)s') | ||
|
||
prog_name = os.path.basename(sys.argv[0]) | ||
logging.info('{} {}'.format(prog_name, version.v_long)) | ||
logging.info('') | ||
logging.info('Reading data from {!r}...'.format(args.data_file)) | ||
|
||
orig_data = np.load(args.data_file) | ||
if args.num_samples: | ||
data = downsample(orig_data, int(args.num_samples)) | ||
else: | ||
data = orig_data | ||
|
||
logging.info('Computing {} diffusion maps with epsilon = {:g} ' | ||
'on {} data points...' | ||
.format(args.num_eigenpairs-1, args.epsilon, data.shape[0])) | ||
|
||
with Profiler(args.profile): | ||
dm = DiffusionMaps(data, args.epsilon, | ||
num_eigenpairs=args.num_eigenpairs, | ||
use_cuda=use_cuda(args)) | ||
|
||
if args.profile: | ||
args.profile.close() | ||
|
||
output_eigenvalues(dm.eigenvalues) | ||
|
||
if args.matrix: | ||
logging.info('Saving transition matrix to {!r}' | ||
.format(args.matrix)) | ||
import scipy.io | ||
scipy.io.mmwrite(args.matrix, dm.kernel_matrix) | ||
|
||
if args.eigenvalues: | ||
logging.info('Saving eigenvalues to {!r}' | ||
.format(args.eigenvalues)) | ||
np.savetxt(args.eigenvalues, dm.eigenvalues) | ||
|
||
if args.eigenvectors: | ||
logging.info('Saving eigenvectors to {!r}' | ||
.format(args.eigenvectors)) | ||
np.save(args.eigenvectors, dm.eigenvectors) | ||
|
||
if args.output_data and args.num_samples: | ||
logging.info('Saving downsampled data to {!r}' | ||
.format(args.output_data)) | ||
np.save(args.output_data, data) | ||
|
||
if args.plot is True: | ||
plot_eigenvectors(data, dm.eigenvectors) | ||
|
||
|
||
if __name__ == '__main__': | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
"""Compute dominant eigenvectors of a sparse matrix. | ||
""" | ||
|
||
__all__ = ['eigensolver'] | ||
|
||
from typing import Optional, Tuple | ||
|
||
import numpy as np | ||
import scipy.sparse | ||
import scipy.sparse.linalg | ||
|
||
from . import default | ||
|
||
|
||
def eigensolver(matrix: scipy.sparse.csr_matrix, | ||
num_eigenpairs: int = default.num_eigenpairs, | ||
sigma: Optional[float] = None, | ||
initial_vector: Optional[np.array] = None) \ | ||
-> Tuple[np.array, np.array]: | ||
"""Solve eigenvalue problem for sparse matrix. | ||
Parameters | ||
---------- | ||
matrix : scipy.sparse.csr_matrix | ||
A matrix in compressed sparse row format. | ||
num_eigenpairs : int, optional | ||
Number of eigenvalue/eigenvector pairs to obtain. | ||
sigma : float, optional | ||
Find eigenvalues close to the value of sigma. The default value is | ||
near 1.0. | ||
initial_vector : np.array, optional | ||
Initial vector to use in the Arnoldi iteration. If not set, a vector | ||
with all entries equal to one will be used. | ||
Returns | ||
------- | ||
ew : np.array | ||
Eigenvalues in descending order of magnitude. | ||
ev : np.array | ||
Eigenvectors corresponding to the eigenvalues in `ew`. | ||
""" | ||
if initial_vector is None: | ||
initial_vector = np.ones(matrix.shape[0]) | ||
ew, ev = scipy.sparse.linalg.eigs(matrix, k=num_eigenpairs, which='LM', | ||
sigma=sigma, v0=initial_vector) | ||
ii = np.argsort(np.abs(ew))[::-1] | ||
return ew[ii], ev[:, ii].T |
Oops, something went wrong.