Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAGL2 training optimizations part 1 #409

Merged
merged 6 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,3 @@
*.zst filter=lfs diff=lfs merge=lfs -text
*.bz filter=lfs diff=lfs merge=lfs -text
*bz2 filter=lfs diff=lfs merge=lfs -text
/mnt/storage/nobackup/nca121/qca-dataset-submission/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0/esp_50k_I_singlepoint_dataset.json.bz2 filter=lfs diff=lfs merge=lfs -text
/mnt/storage/nobackup/nca121/qca-dataset-submission/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0/dataset.pdf filter=lfs diff=lfs merge=lfs -text
/mnt/storage/nobackup/nca121/qca-dataset-submission/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0/iodine_filtered.json.bz2 filter=lfs diff=lfs merge=lfs -text
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,7 @@ These are currently used to find a minimum energy conformation of a molecule.
| `OpenFF Sulfur Optimization Training Coverage Supplement v1.0` | [2024-09-11-OpenFF-Sulfur-Optimization-Training-Coverage-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-11-OpenFF-Sulfur-Optimization-Training-Coverage-Supplement-v1.0) | Additional optimization training data for Sage sulfur and phosphorus parameters | C, S, F, O, H, Cl, Br, P, N | |
| `OpenFF Sulfur Optimization Benchmarking Coverage Supplement v1.0` | [2024-09-18-OpenFF-Sulfur-Optimization-Benchmarking-Coverage-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-18-OpenFF-Sulfur-Optimization-Benchmarking-Coverage-Supplement-v1.0) | Additional optimization benchmarking data for Sage sulfur and phosphorus parameters | S, P, Cl, C, N, O, H, Br, F | |
| `OpenFF Lipid Optimization Training Supplement v1.0` | [2024-10-08-OpenFF-Lipid-Optimization-Training-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-08-OpenFF-Lipid-Optimization-Training-Supplement-v1.0) | Additional optimization training data for Sage from representative LIPID MAPS fragments | I, Br, O, H, P, C, N, Cl, F, S | |
| `OpenFF NAGL2 Training Optimization Dataset Part 1 v4.0` | [2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0) | Optimization dataset for NAGL2 training, part 1 | Cl, O, C, P, I, Br, B, S, N, F, H, Si | |

# TorsionDrive Datasets
These are currently used perform a complete rotation of one or more selected bonds, where optimizations are performed over a discrete set of angles.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# OpenFF NAGL2 Training Optimization Dataset Part 1 v4.0

## Description
A dataset containing molecules from the [`MLPepper RECAP Optimized Fragments v1.0`](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-07-26-MLPepper-RECAP-Optimized-Fragments-v1.0)
and [`MLPepper RECAP Optimized Fragments v1.0 Add Iodines`](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0) datasets,
with new conformers and optimized at the OpenFF default level of theory (B3LYP-D3BJ/DZVP).
The dataset is intended to be used for calculating single point energies and properties,
which will then be used to train our second-generation graph neural network charge model (NAGL2).
This is part 1, for molecules with molecular weight less than 300 Da.


For each molecule, a set of up to 5 conformers were generated by:

* generating a set of up to 1000 conformers with a RMS cutoff of 0.1 Å
using the OpenEye backend of the OpenFF toolkit

* applying ELF conformer selection (max 5 conformers) using OpenEye


## General information
* Date: 2024-11-19
* Class: OpenFF Optimization Dataset
* Purpose: Conformer optimization
* Name: OpenFF NAGL2 Training Optimization Dataset Part 1 v4.0
* Number of unique molecules: 55134
* Number of conformers: 131198
* Number of conformers (min, mean, max): 1.00, 2.38, 5.00
* Molecular weight (min, mean, max): 32.12, 158.53, 299.97
* Charges: -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
* Dataset submitter: Alexandra McIsaac
* Dataset generator: Alexandra McIsaac

## QCSubmit generation pipeline
* `generate-dataset-part1.ipynb` was used to generate conformers from CMILES and create the dataset.

## QCSubmit Manifest
* `dataset_part1.json.bz2`: compressed dataset ready for submission
* `dataset_part1.pdf`: Visualization of dataset molecules
* `dataset_part1.smi`: Smiles strings for dataset molecules
* `generate-dataset-part1.ipynb`: Notebook describing dataset generation and submission
* `input-environment.yaml`: Environment file used to create Python environment for the notebook
* `input-environment-full.yaml`: Fully-resolved environment used to execute the notebook.
* `mlpepper.json.bz2` zipped version of the MLPepper dataset needed to generate conformers.

## Metadata
* Elements: {Cl, O, C, P, I, Br, B, S, N, F, H, Si}
* Spec: default
* basis: DZVP
* implicit_solvent: None
* keywords: {}
* maxiter: 200
* method: B3LYP-D3BJ
* program: psi4
* SCF properties:
* dipole
* quadrupole
* wiberg_lowdin_indices
* mayer_indices
Git LFS file not shown
Binary file not shown.
Loading
Loading