Tools for training and evaluating CIDER functionals for use in Density Functional Theory calculations. Interfaces to the GPAW and PySCF codes are included.
Machine Learning (ML) has recently gained attention as a means to fit more accurate Exchange-Correlation (XC) functionals for use in Density Functional Theory (DFT). We have developed CIDER, a set of features, models, and training techniques that for efficiently learning the exchange energy, with an eye toward learning full XC functionals. CIDER stands for Compressed scale-Invariant DEnsity Representation, which refers to the fact that the descriptors are invariant under squishing or expanding of the density while maintaining its shape. This property makes it efficient for learning the XC functional, especially the exchange energy.
We want to make clear that both the code and the functionals themselves are experimental. The code base will likely change significantly in the next few years. Therefore, please read the installation guidance, usage instructions, examples, and known issues thoroughly before using CiderPress.
The SL-GGA, NL-GGA, SL-MGGA, NL-MGGA, NL-MGGA-PBE, and NL-MGGA-DTR functionals presented in the paper at https://arxiv.org/abs/2303.00682 are available for download. To get them, simply enter the CiderPressLite directory and run the download_functionals.py
script:
python scripts/download_functionals.py
This will place all six of these functionals in a subdirectory called functionals
in the CiderPressLite directory. If you use these functionals or the CiderPress code, please cite the paper in the "Citing" section below.
We do not yet have a PyPI package for CiderPress, as the setup procedure and dependencies are a bit complex and might undergo significant changes when we release the full version. However, we have done our best to make installation fast and straightforward. We recommend creating a conda environment from scratch for setting up CiderPressLite as described below, as this makes it much easier to quickly install compatible version of the dependencies of CiderPress, PySCF, and GPAW. In case you want to install using a different setup, here is a list of dependencies:
- Python 3.9-3.11 (Python 3.12 not supported)
- An installation of PySCF
- An installation of GPAW compiled with LibXC and FFTW (if you want to run periodic DFT calculations)
- Intel Math Kernel Library
- The Python package requirements in
requirements.txt
- Fortran, C, and C++ compilers with OpenMP support.
One of the requirements in
requirements.txt
is the Intel Math Kernel Library. This module is particularly important because the CiderPress C extensions need to link to it and make use of the MKL DFTI library. Please make sure that either your MKL headers and shared libraries are in theinclude
andlib
directories of your Python environment, respectively, or that they are in yourC_INCLUDE_PATH
andLIBRARY_PATH
/LD_LIBRARY_PATH
, respectively.
This section covers how to install CiderPress and its dependencies from a fresh conda environment. Micromamba is also supported; you will just need to replace the 'conda' commands with 'micromamba' below.
- Create a new conda environment.
conda create -n <my_env> python=3.11
conda activate <my_env>
Python 3.9-3.11 (not 3.12) are supported. In later versions, we will tryo to move away from deprecated distutils
tools so that we can support Python 3.12.
- Install dependencies (three options).
Option 1: (Preferred option for multi-node jobs) If you provide your own MPI, C compiler, and Fortran compiler, you can install all the other dependencies using
nocomp_env.yml
:
conda env update --file <CiderPressLite>/nocomp_env.yaml
Option 2: The simplest way to do install everything is to use full_env.yml
, which also installs the mpicc, gcc, g++, and gfortran compilers. Note that this approach will likely not work if you need to run MPI jobs on a cluster, as the conda openmpi will not be linked to the scheduler properly.
conda env update --file <CiderPressLite>/full_env.yaml
Option 3: As one more option, you can install components more manually. Start with the compilers and libxc. You can also use your own compilers, but this step helps ensure that everything is compatible. Note that all these compilers (plus libxc) must be installed via conda in one step, or else there will be compiler and library compatibility issues.
conda install openmpi-mpicc gfortran libxc conda-forge::fftw gxx gcc
You may exclude openmpi-mpicc
, fftw
, and libxc
if you will not be using the GPAW interface. Next, Install MKL
conda install mkl mkl-devel mkl-include mkl_fft
Finally, install the other dependencies
cd <CiderPressLite>
pip install -r requirements.txt
- Build C and Fortran extensions and install CiderPress
python setup.py build install
- (If using GPAW) Install GPAW from source. We recommend using our siteconfig.py to link gpaw to MPI and MKL for simplicity and speed. (gitlab.com/gpaw/gpaw)
cd <place you want to save the GPAW source>
git clone https://gitlab.com/gpaw/gpaw.git
cd gpaw
cp <CiderPressLite>/gpaw_siteconfig.py .
python setup.py build install
Note: Currently CiderPress does not support the new GPAW version (gpaw.new), but we plan to support it in the future.
CIDER calculations can be run in PySCF (for non-periodic, all-electron calculations) and GPAW (for periodic, plane-wave PAW calculations) using the functional initializers ciderpress.dft.ri_cider.setup_cider_calc
and ciderpress.gpaw.cider_paw.get_cider_functional
, respectively. Periodic PySCF calculations are not yet supported. See examples/pyscf/simple_calc.py
and examples/gpaw/simple_calc.py
for a demonstration of setting up a typical calculation, and refer to the docstrings of the initializers for a more detailed explanation of all the input options. As explained in the docstrings, the defaults are sufficient for most of the input options. The recommended functional for most applications in which the goal is to reproduce hybrid DFT is the NL-MGGA-DTR exchange functional (provided in a separate repository), which is a meta-GGA with nonlocal features of the density.
We are currently keeping the various scripts and workflow tools to train CIDER functionals in a separate, private code base. We plan to release this eventually, but it requires significantly more development before public release. CiderPress currently only contains the ML model classes themselves, but not the various training tools. If you are interested in training your own CIDER model, we suggest reaching out to us to discuss (email kylebystrom@gmail.com).
CiderPress has a few known issues that we are currently investigating. Please be aware of these when attempting calculations with CIDER functionals. We will make a note and publish a new release when we fix these issues. If you run into any other problems, please post an issue on the Github repository.
- For some periodic systems in GPAW within the PAW formalism, significant numerical instability issues arise for the nonlocal functionals. In our experience thus far, these issues are uncommon and seem to be caused by the nonlocal PAW corrections to the CIDER features as opposed to the functionals themselves.
- For the GPAW interface, the memory overhead for the nonlocal features can be fairly high, occasionally causing memory issues. Please be aware that you might need to allocate more meory for a nonlocal CIDER calculation than for, say, a PBE calculation.
- For the PySCF interface, there are (mostly minor) convergence issues for some systems. These issues are much less common and less severe for our most robust functionals (like NL-MGGA-DTR). Even for NL-MGGA-DTR, occasionally a system will not quite converge. Usually the energy convergence is fine, but the orbital gradients are somewhat unstable; it might be necessary to set
conv_tol_grad
to a higher value than the default. These issues are likely a mix between inherent functional stability and the stability of the fast feature evaluation algorithm. - The code spits out a lot of divide-by-zero and invalid value warnings from numpy, which occur because (as with many functionals) some terms in CIDER functionals become numerically unstable at very small densities. These issues are corrected by setting the XC energy and potential at very low density to zero, and we will clean up various warnings and unnecessary debug statements as soon as possible.
- The construction of the CIDER PAW corrections within GPAW have a very small numerical stability issue that results in different energies on different runs (with energy differences of roughly
$10^{-11}$ eV). The difference is so small that it is insignificant for most applications, but it might affect finite difference calculations with very small perturbations.
Find a bug? Areas of code unclearly documented? Other questions? Feel free to contact Kyle Bystrom at kylebystrom@gmail.com AND/OR create an issue on the Github page at https://github.com/mir-group/CiderPressLite.
If you find CiderPress useful, please encourage its development by citing the following paper in your research:
@ARTICLE{2023arXiv230300682B,
author = {{Bystrom}, Kyle and {Kozinsky}, Boris},
title = "{Nonlocal Machine-Learned Exchange Functional for Molecules and Solids}",
journal = {arXiv e-prints},
keywords = {Physics - Chemical Physics, Condensed Matter - Materials Science, Physics - Computational Physics},
year = 2023,
month = mar,
eid = {arXiv:2303.00682},
pages = {arXiv:2303.00682},
doi = {10.48550/arXiv.2303.00682},
archivePrefix = {arXiv},
eprint = {2303.00682},
primaryClass = {physics.chem-ph},
adsurl = {https://ui.adsabs.harvard.edu/abs/2023arXiv230300682B},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}