Merge pull request #113 from mir-group/develop

v0.5.0 & v0.4.0
mir-group · Nov 24, 2021 · fe73530 · fe73530
2 parents 83b8887 + 59f5011
commit fe73530
Show file tree

Hide file tree

Showing 127 changed files with 6,255 additions and 2,319 deletions.
diff --git a/.github/ISSUE_TEMPLATE/question.md b/.github/ISSUE_TEMPLATE/question.md
@@ -7,4 +7,4 @@ assignees: ''
 
 ---
 
-
+If this isn't an issue with the code or a request, please use our [GitHub Discussions](https://github.com/mir-group/nequip/discussions) instead.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -24,8 +24,9 @@ Resolves: #???
 <!-- Put an `x` in all the boxes that apply. If you're unsure about any of
      these, don't hesitate to ask. We're here to help! -->
 - [ ] My code follows the code style of this project and has been formatted using `black`.
-- [ ] I have updated the documentation (if relevant).
-- [ ] I have added tests that cover my changes (if relevant).
 - [ ] All new and existing tests passed.
-- [ ] `example.yaml` (and other relevant `configs/`) have been updated with new or changed options.
-- [ ] I have updated `CHANGELOG.md`.
+- [ ] I have added tests that cover my changes (if relevant).
+- [ ] The option documentation (`docs/options`) has been updated with new or changed options.
+- [ ] I have updated `CHANGELOG.md`.
+- [ ] I have updated the documentation (if relevant).
+
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -0,0 +1,48 @@
+name: Check Syntax and Run Tests
+
+on:
+  push:
+    branches:
+    - main
+
+  pull_request:
+    branches:
+    - main
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: [3.6, 3.9]
+        torch-version: [1.8.0, 1.9.0]
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v2
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install flake8
+      run: |
+        pip install flake8
+    - name: Lint with flake8
+      run: |
+        flake8 . --count --show-source --statistics
+    - name: Install dependencies
+      env:
+        TORCH: "${{ matrix.torch-version }}"
+        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+      run: |
+        python -m pip install --upgrade pip
+        pip install torch==${TORCH} -f https://download.pytorch.org/whl/cpu/torch_stable.html
+        pip install .
+    - name: Install pytest
+      run: |
+        pip install pytest
+        pip install pytest-xdist[psutil]
+    - name: Test with pytest
+      run: |
+        # See https://github.com/pytest-dev/pytest/issues/1075
+        PYTHONHASHSEED=0 pytest -n auto --ignore=docs/ .
diff --git a/.github/workflows/tests_develop.yml b/.github/workflows/tests_develop.yml
@@ -0,0 +1,48 @@
+name: Check Syntax and Run Tests
+
+on:
+  push:
+    branches:
+    - develop
+
+  pull_request:
+    branches:
+    - develop
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: [3.9]
+        torch-version: [1.9.0]
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v2
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install flake8
+      run: |
+        pip install flake8
+    - name: Lint with flake8
+      run: |
+        flake8 . --count --show-source --statistics
+    - name: Install dependencies
+      env:
+        TORCH: "${{ matrix.torch-version }}"
+        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+      run: |
+        python -m pip install --upgrade pip
+        pip install torch==${TORCH} -f https://download.pytorch.org/whl/cpu/torch_stable.html
+        pip install .
+    - name: Install pytest
+      run: |
+        pip install pytest
+        pip install pytest-xdist[psutil]
+    - name: Test with pytest
+      run: |
+        # See https://github.com/pytest-dev/pytest/issues/1075
+        PYTHONHASHSEED=0 pytest -n auto --ignore=docs/ .
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,63 @@ Most recent change on the bottom.
 
 ## [Unreleased]
 
+## [0.5.0] - 2021-11-24
+### Changed
+- Allow e3nn 0.4.*, which changes the default normalization of `TensorProduct`s; this change _should_ not affect typical NequIP networks
+- Deployed are now frozen on load, rather than compile
+
+### Fixed
+- `load_deployed_model` respects global JIT settings
+
+## [0.4.0] - not released
+### Added
+- Support for `e3nn`'s `soft_one_hot_linspace` as radial bases
+- Support for parallel dataloader workers with `dataloader_num_workers`
+- Optionally independently configure validation and training datasets
+- Save dataset parameters along with processed data
+- Gradient clipping
+- Arbitrary atom type support
+- Unified, modular model building and initialization architecture
+- Added `nequip-benchmark` script for benchmarking and profiling models
+- Add before option to SequentialGraphNetwork.insert
+- Normalize total energy loss by the number of atoms via PerAtomLoss
+- Model builder to initialize training from previous checkpoint
+- Better error when instantiation fails
+- Rename `npz_keys` to `include_keys`
+- Allow user to register `graph_fields`, `node_fields`, and `edge_fields` via yaml
+- Deployed models save the e3nn and torch versions they were created with
+
+### Changed
+- Update example.yaml to use wandb by default, to only use 100 epochs of training, to set a very large batch logging frequency and to change Validation_loss to validation_loss
+- Name processed datasets based on a hash of their parameters to ensure only valid cached data is used
+- Do not use TensorFloat32 by default on Ampere GPUs until we understand it better
+- No atomic numbers in networks
+- `dataset_energy_std`/`dataset_energy_mean` to `dataset_total_energy_*`
+- `nequip.dynamics` -> `nequip.ase`
+- update example.yaml and full.yaml with better defaults, new loss function, and switched to toluene-ccsd(t) as example 
+data
+- `use_sc` defaults to `True`
+- `register_fields` is now in `nequip.data`
+- Default total energy scaling is changed from global mode to per species mode.
+- Renamed `trainable_global_rescale_scale` to `global_rescale_scale_trainble`
+- Renamed `trainable_global_rescale_shift` to `global_rescale_shift_trainble`
+- Renamed `PerSpeciesScaleShift_` to `per_species_rescale`
+- Change default and allowed values of `metrics_key` from `loss` to `validation_loss`. The old default `loss` will no longer be accepted.
+- Renamed `per_species_rescale_trainable` to `per_species_rescale_scales_trainable` and `per_species_rescale_shifts_trainable`
+
+### Fixed
+- The first 20 epochs/calls of inference are no longer painfully slow for recompilation
+- Set global options like TF32, dtype in `nequip-evaluate`
+- Avoid possilbe race condition in caching of processed datasets across multiple training runs
+
+### Removed
+- Removed `allowed_species`
+- Removed `--update-config`; start a new training and load old state instead
+- Removed dependency on `pytorch_geometric`
+- `nequip-train` no longer prints the full config, which can be found in the training dir as `config.yaml`.
+- `nequip.datasets.AspirinDataset` & `nequip.datasets.WaterDataset`
+- Dependency on `pytorch_scatter`
+
 ## [0.3.3] - 2021-08-11
 ### Added
 - `to_ase` method in `AtomicData.py` to convert `AtomicData` object to (list of) `ase.Atoms` object(s)

diff --git a/README.md b/README.md
@@ -13,17 +13,10 @@ NequIP is an open-source code for building E(3)-equivariant interatomic potentia
 NequIP requires:
 
 * Python >= 3.6
-* PyTorch >= 1.8, <1.10 (PyTorch 1.10 support is in the works on `develop`.)
+* PyTorch >= 1.8, <=1.10.*. PyTorch can be installed following the [instructions from their documentation](https://pytorch.org/get-started/locally/). Note that neither `torchvision` nor `torchaudio`, included in the default install command, are needed for NequIP. NequIP is also not currently compatible with PyTorch 1.10; PyTorch 1.9 can be specified with `pytorch==1.9` in the install command.
 
 To install:
 
-* Install [PyTorch Geometric](https://github.com/rusty1s/pytorch_geometric), following [their installation instructions](https://pytorch-geometric.readthedocs.io/en/1.7.2/notes/installation.html) and making sure to install with the correct version of CUDA. Please note that `torch_geometric==1.7.2` is required.
-
-* Install our fork of [`pytorch_ema`](https://github.com/Linux-cpp-lisp/pytorch_ema) for using an Exponential Moving Average on the weights: 
-```bash
-$ pip install "git+https://github.com/Linux-cpp-lisp/pytorch_ema@context_manager#egg=torch_ema"
-```
-
 * We use [Weights&Biases](https://wandb.ai) to keep track of experiments. This is not a strict requirement — you can use our package without it — but it may make your life easier. If you want to use it, create an account [here](https://wandb.ai) and install the Python package:
 
 ```
@@ -40,14 +33,24 @@ pip install .
 
 ### Installation Issues
 
-We recommend running the tests using ```pytest```: 
+The easiest way to check if your installation is working is to train a toy model:
+```bash
+$ nequip-train configs/minimal.yaml
+```
+
+If you suspect something is wrong, encounter errors, or just want to confirm that everything is in working order, you can also run the unit tests:
 
 ```
 pip install pytest
-pytest ./tests/
+pytest tests/unit/
 ```
 
-While the tests are somewhat compute intensive, we've known them to hang on certain systems that have GPUs. If this happens to you, please report it along with information on your software environment in the [Issues](https://github.com/mir-group/nequip/issues)!
+To run the full tests, including a set of longer/more intensive integration tests, run:
+```
+pytest tests/
+```
+
+Note: the integration tests have hung in the past on certain systems that have GPUs. If this happens to you, please report it along with information on your software environment in the [Issues](https://github.com/mir-group/nequip/issues)!
 
 ## Usage
 
@@ -64,7 +67,7 @@ $ nequip-train configs/example.yaml
 A number of example configuration files are provided:
  - [`configs/minimal.yaml`](configs/minimal.yaml): A minimal example of training a toy model on force data.
  - [`configs/minimal_eng.yaml`](configs/minimal_eng.yaml): The same, but for a toy model that predicts and trains on only energy labels.
- - [`configs/example.yaml`](configs/example.yaml): Training a more realistic model on forces and energies.
+ - [`configs/example.yaml`](configs/example.yaml): Training a more realistic model on forces and energies. Start here for real models.
  - [`configs/full.yaml`](configs/full.yaml): A complete configuration file containing all available options along with documenting comments.
 
 Training runs can be restarted using `nequip-restart`; training that starts fresh or restarts depending on the existance of the working directory can be launched using `nequip-requeue`. All `nequip-*` commands accept the `--help` option to show their call signatures and options.
@@ -87,14 +90,12 @@ The `nequip-deploy` command is used to deploy the result of a training session i
 It compiles a NequIP model trained in Python to [TorchScript](https://pytorch.org/docs/stable/jit.html).
 The result is an optimized model file that has no dependency on the `nequip` Python library, or even on Python itself:
 ```bash
-nequip-deploy build path/to/training/session/ path/to/deployed.pth
+nequip-deploy build path/to/training/session/ where/to/put/deployed_model.pth
 ```
 For more details on this command, please run `nequip-deploy --help`.
 
 ### Using models in Python
 
-Both deployed and undeployed models can be used in Python code; for details, see the end of the [Developer's tutorial](https://deepnote.com/project/2412ca93-7ad1-4458-972c-5d5add5a667e) mentioned again below.
-
 An ASE calculator is also provided in `nequip.dynamics`.
 
 ### LAMMPS Integration 
@@ -113,18 +114,12 @@ The result is an optimized model file that has no Python dependency and can be u
 
 ```
 pair_style	nequip
-pair_coeff	* * deployed.pth
+pair_coeff	* * deployed.pth <NequIP type for LAMMPS type 1> <NequIP type for LAMMPS type 2> ...
 ```
 
 For installation instructions, please see the [`pair_nequip` repository](https://github.com/mir-group/pair_nequip).
 
 
-## Developer's tutorial 
-
-A more in-depth introduction to the internals of NequIP can be found in the [tutorial notebook](https://deepnote.com/project/2412ca93-7ad1-4458-972c-5d5add5a667e). This notebook discusses theoretical background as well as the Python interfaces that can be used to train and call models.
-
-Please note that for most common usecases, including customized models, the `nequip-*` commands should be prefered for training models.
-
 ## References & citing
 
 The theory behind NequIP is described in our preprint (1). NequIP's backend builds on e3nn, a general framework for building E(3)-equivariant neural networks (2). If you use this repository in your work, please consider citing NequIP (1) and e3nn (3):