Skip to content

STNS01 20 Release BEETaskManager

Tim Randles edited this page Mar 5, 2020 · 2 revisions

Activity Description

Develop BEETaskManager.

The BEETaskManager daemon runs on the HPC cluster login node. It accepts tasks from the BEEWorkflowManager, turns those tasks into HPC resource manager jobs (e.g. a slurm job script), and submits the job to the cluster resource manager. The BEETaskManager then tracks the status of the job (pending, running, complete) and updates the BEEWorkflowManager. The BEETaskManager will also cancel a queued or running job when commanded to do so by the BEEWorkflowManager. The first release of the BEETaskManager will support the Slurm resource manager and the Charliecloud linux container runtime.

Activity Completion Criteria

This milestone will be complete when the BEETaskManager can successfully perform the following functionality on a production Slurm HPC Cluster at LANL.

  1. Accept a task from the BEEWorkflowManager
  2. Format the accepted task as a Slurm job script
  3. Use the Charliecloud linux container runtime to execute the task in the Slurm job
  4. Submit the Slurm job to the HPC cluster
  5. Report back to the BEEWorkflowManager the status of the submitted job
  6. Cancel a submitted but not yet completed job when commanded to do so by the BEEWorkflowManager

Activity Due Date

March 31, 2020