- Setup the project website
- Support node level SKU as properties, e.g. CPU, GPU, Memory, etc
- The application admin can choose clusters that have nodes with H100 GPU.
- The application admin can choose clusters that have nodes with 128GB memory.
- Support network topology
- The application admin can choose the clusters with requires infiniband, or 100Gbps network.
- Support independent scheduling policy for namespaced resources
- e.g. The application admin can pick one workload in a namespace to cluster A while the other workload in the same namespace to cluster B.
- De-scheduler for the fleet
- The de-scheduler would move the workload to the right cluster if the cluster is not the best fit for the workload anymore.
- Cordon a cluster
- The fleet admin can cordon a cluster to move all the workloads off the cluster.
- Rebalance the workload
- The application admin can rebalance the workload to make sure the workload is spread evenly across the clusters.
- Support affinity/anti-affinity for their workload.
- The application admin can specify that their workload A needs to be placed on the same clusters that workload B runs.
- The application admin can specify that their workload A cannot be placed on the same clusters that workload B runs.
- Support user specified health check for their workload.
- The application admin can provide a customized health check for their workload.
- The application admin can specify a spread mode for their workload.
- The move between clusters would follow the max-unavailable/min-available pods rule.