In this repository, you will find the Common Workflow Scheduler for Kubernetes proposed in the paper "How Workflow Engines Should Talk to Resource Managers: A Proposal for a Common Workflow Scheduling Interface."
docker build -t cws .
docker tag cws <your docker account>/cws:<version>
docker push <your docker account>/cws:<version>
# | Resource | Method |
---|---|---|
1 | /{version}/scheduler/{execution} | POST |
2 | /{version}/scheduler/{execution} | DELETE |
3 | /{version}/scheduler/{execution}/DAG/vertices | POST |
4 | /{version}/scheduler/{execution}/DAG/vertices | DELETE |
5 | /{version}/scheduler/{execution}/DAG/edges | POST |
6 | /{version}/scheduler/{execution}/DAG/edges | DELETE |
7 | /{version}/scheduler/{execution}/startBatch | PUT |
8 | /{version}/scheduler/{execution}/endBatch | PUT |
9 | /{version}/scheduler/{execution}/task/{id} | POST |
10 | /{version}/scheduler/{execution}/task/{id} | GET |
11 | /{version}/scheduler/{execution}/task/{id} | DELETE |
12 | /{version}/scheduler/{execution}/metrics/task/{id} | POST |
SWAGGER: /swagger-ui.html
API-DOCS: /v3/api-docs
For more details, we refer to the paper.
You need to create an account to use the Common Workflow Scheduler in Kubernetes.
Therefore, create a file account.yaml
with the following content.
Afterward, apply this to your Kubernetes cluster kubectl apply -f account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: cwsaccount
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cwsrole
rules:
- apiGroups: [""]
resources: ["pods","pods/status","pods/exec","nodes","bindings","configmaps"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["metrics.k8s.io"]
resources: ["nodes"]
verbs: ["get","list"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cwsbinding
subjects:
- kind: ServiceAccount
name: cwsaccount
namespace: <your namespace>
roleRef:
kind: ClusterRole
name: cwsrole
apiGroup: rbac.authorization.k8s.io
Next, you can start the Common Workflow Scheduler in your Kubernetes environment.
Nextflow will start the Common Workflow Scheduler automatically.
Therefore, create a file cws.yaml
with the following content.
Afterward, apply this to your Kubernetes cluster kubectl apply -f cws.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
app: cws
component: scheduler
tier: control-plane
name: workflow-scheduler
spec:
containers:
- env:
- name: SCHEDULER_NAME
value: workflow-scheduler
- name: AUTOCLOSE
value: "false"
image: commonworkflowscheduler/kubernetesscheduler:v1.0
imagePullPolicy: Always
name: workflow-scheduler
resources:
limits:
cpu: "2"
memory: 1400Mi
requests:
cpu: "2"
memory: 1400Mi
volumeMounts:
- mountPath: /input # mount at the same path as you do in your workflow
name: vol-1
- mountPath: /data
name: vol-2
securityContext:
runAsUser: 0
serviceAccount: cwsaccount # use the account created before
volumes:
- name: vol-1
persistentVolumeClaim:
claimName: api-exp-input # mount the same pvc as you use in your workflow.
- name: vol-2
persistentVolumeClaim:
claimName: api-exp-data
This is a Spring Boot application, that can be run with profiles. The "default" profile is used if no configuration is set. The "dev" profile can be enabled by setting the JVM System Parameter
-Dspring.profiles.active=dev
or Environment Variable
export spring_profiles_active=dev
or via the corresponding setting in your development environment or within the pod definition.
Example:
$ SCHEDULER_NAME=workflow-scheduler java -Dspring.profiles.active=dev -jar cws-k8s-scheduler-1.2.jar
The "dev" profile is useful for debugging and reporting problems because it increases the log-level.
- Kubernetes Feature InPlacePodVerticalScaling must be enabled. This is available starting from Kubernetes v1.27. See KEP 1287 for the current status.
- Supported if used together with nf-cws version 1.0.5 or newer.
The memory predictor that shall be used for task scaling is set via the configuration. If not set, task scaling is disabled. The memory predictor is provided as a string following the pattern "<memory predictor>-[<additional>=<parameter>]", e.g., "linear-offset=std". The following strategies are available:
Memory Predictor | Behaviour |
---|---|
linear/lr | The Linear predictor, will try to predict a memory usage that is linear to the task input size. |
linear2/lr2 | The Linear predictor with an unequal loss function. The loss penalizes underprediction more than overprediction. |
mean | The Mean predictor predicts the mean memory seen so far. Prediction is independent of the input size. |
ponder | The Ponder predictor is an advanced memory prediction strategy that ponders between linear regression with unequal loss and historic values. Details are provided in our paper https://arxiv.org/pdf/2408.00047.pdf. |
constX | Predicts a constant value (X), if no X is given, it predicts 0. |
polyX | Prediction will be based on the Xth polynomial function based on a task's input size. If no X is provided, it uses X=2. |
The offset uses the current prediction model and based on that it predicts the memory for all finished tasks. Then, it calculates the difference between the observed memory and the predicted memory.
Offset | Behaviour |
---|---|
none | No additional offset will be applied. |
"" | If no offset is defined, the max offset will be used. |
max | The max offset returns the largest underprediction. |
Xpercentile | X is an integer between 1 and 100, over all prediction differences, it will use the Xth percentile as offset. |
var | This offset applies the variance as an offset. |
Xstd | This offset applies X times the standard deviation as an offset. If no X is provided, it uses X=1. |
The scheduling strategy can be set via the configuration. The scheduling strategy is provided as a string following the pattern "<scheduling strategy>[-<node assignment strategy>]". The following strategies are available:
Scheduling Strategy | Behaviour |
---|---|
fifo | Tasks that have been submitted earlier, will be scheduled earlier. |
rank | Tasks will be prioritized based on their rank in the DAG. |
rank_min | Rank (min) Same as rank but solves ties such that tasks with smaller input size are preferred. |
rank_max | Rank (max) Same as rank but solves ties such that tasks with larger input size are preferred. |
lff_min | Least finished first (min): prioritizes abstract tasks where less instances have finished, solves ties with rank_min |
lff_max | Least finished first (max): prioritizes abstract tasks where less instances have finished, solves ties with rank_max |
gs_min | Generate Samples (min) Hybrid of LFF (min) and Rank (max), prioritize abstract tasks with less than five finished instances. Afterwards, use Rank (max). |
gs_max | Generate Samples (max) Hybrid of LFF (max) and Rank (max), prioritize abstract tasks with less than five finished instances. Afterwards, use Rank (max). |
random | Randomly prioritize tasks. |
max | Prioritize tasks with larger input size. |
min | Prioritize tasks with smaller input size. |
Node Assignment Strategy | Behaviour |
---|---|
random | Randomly distributes the tasks to nodes. |
roundrobin | (default) Assigns tasks in a round robin fashion to the nodes. |
fair | Distributes the tasks fairly to the nodes trying to achieve equal load on all machines. |
If you use this software or artifacts in a publication, please cite it as:
Lehmann Fabian, Jonathan Bader, Friedrich Tschirpke, Lauritz Thamsen, and Ulf Leser. How Workflow Engines Should Talk to Resource Managers: A Proposal for a Common Workflow Scheduling Interface. In 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid). Bangalore, India, 2023.
@inproceedings{lehmannHowWorkflowEngines2023,
author = {Lehmann, Fabian and Bader, Jonathan and Tschirpke, Friedrich and Thamsen, Lauritz and Leser, Ulf},
booktitle = {2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)},
title = {How Workflow Engines Should Talk to Resource Managers: A Proposal for a Common Workflow Scheduling Interface},
year = {2023},
address = {{Bangalore, India}},
doi = {10.1109/CCGrid57682.2023.00025}
}
This work was funded by the German Research Foundation (DFG), CRC 1404: "FONDA: Foundations of Workflows for Large-Scale Scientific Data Analysis."