Scripts to run adaptive sampling on Folding@home
To run an adaptive sampling project on Folding@home:
- Set up your project in the usual way (note the systems MUST be the same for adaptive sampling, note tautomers have to be the same in particular!)
- Additionally, in the project folder: create a
SEEDS
folder and copy in thenext_seed.py
script, changing the path in it. Addpython $home/next_state.py $results
to the<next-gen-command>
section of theproject.xml
as the first line (beforemv -f $results/checkpointState.xml.bz2 $jobdir/state.xml.bz2
). - The above have created a project where if any new seeds are present in the
SEEDS
folder, each new gen will randomly draw one of them instead of continuing the trajectory. IfSEEDS
is empty, vanilla (i.e. continue to elongate the trajectories) mode of F@h operation automatically resumes (i.e. this framework is 'very' asynchronous, production can always continue now matter how fast and if seeds are being produced). - After some initial data has been collectd, continuously run one of the adaptive sampling scripts, using multiple threads (change number in the script, also change two paths in the main()). The folder in which the script is run must contain:
- system.xml (copied from one of the RUNs, and bz2-decompressed, remember all systems must be the same!)
- integrator.xml (as above)
SEEDS
folder where new seed state.xml's will be written and copied to theSEEDS
folder in the F@h project folderseeds_pdb
folder where pdbs of the new seeds will be written for visual inspection if desiredtemp
folder which is used as temporary copying location for the features.npy due to needed sudo for file writing in the F@h data folder
The script will output (in addition to the seeds as described above) on every iteration:
current_dtrajs.npy
- discrete trajectories from k-means clustering of the tICA projectioncurrent_trajs_glob.npy
- list of trajectory paths used in the last iteration, in the same order as the discrete trajectoriescurrent_goal.txt
- if using exploit-including methods, the furthest along the sought direction of the reaction coordinate the simulation has progressed
Warnings and TODO:
- This setup has only been verified to speed up a slow event (DFG flip in a kinase) using very exploit-heavy settings. The default FAST settings as set now have NOT been verified that way, this should be an urgent research item to complete, perhaps on the DFG flip or protein folding.
- The testing was also done using a small (50) number of GPUs, same number as number of seeds. Scaling up to the full F@h will require more testing, there is a potential problem that RPW is thinking about now with velocity randomization --- as written currently if there are more GPUs then seeds being produced, multiple trajectories with same seed and same starting velocities will be wastefully produced. Until this is resolved, use only in internal and BETA modes where only a few tens of GPUs will be available, and adjust the number of seeds accordingly.