Skip to content

Commit

Permalink
Shortly describe projects in related work
Browse files Browse the repository at this point in the history
  • Loading branch information
timebertt committed Feb 14, 2024
1 parent c3df4c9 commit efcaf63
Show file tree
Hide file tree
Showing 6 changed files with 39 additions and 14 deletions.
1 change: 1 addition & 0 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
"Kubeflow",
"kubelet",
"kubernetes",
"Kustomize",
"Metacontroller",
"newpage",
"pandoc",
Expand Down
2 changes: 1 addition & 1 deletion content/02-abstract.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This thesis bridges this gap by proposing an approach to achieve horizontal scal
The design builds upon proven mechanisms from distributed databases to distribute the responsibility for API objects across a ring of controller instances, removing the scalability limitations inherent in traditional leader election setups.
Key features include dynamic membership and failure detection for automatic failovers and rebalancing, a consistent hashing algorithm for ensuring a balanced distribution of API objects, label-based coordination for transparent object assignments without client interaction, and a dedicated handover protocol for preventing concurrent reconciliations.

This thesis presents a reusable implementation that allows for easy integration of the mechanism into arbitrary controllers, opening the potential for adoption and collaboration within the open-source community.
This thesis presents a reusable implementation that allows for easy integration of the mechanism into arbitrary controllers including built-in controllers, opening the potential for adoption and collaboration within the open-source community.
Systematic evaluation using load test experiments demonstrates that all identified requirements are met.
The mechanism showcases minimal overhead compared to singleton controller setups and an almost linear increase of the controller's load capacity with every added controller instance.
This work contributes to advancing the scalability and efficiency of Kubernetes controllers, offering promising prospects for the future development and usage of Kubernetes and controller-based applications and platforms.
Expand Down
4 changes: 2 additions & 2 deletions content/10-motivation.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ The Kubernetes community has extensively picked up the operator pattern, and man
- streaming & messaging: strimzi-kafka-operator [@strimzi], Koperator [@koperator]
- storage and backup: Rook [@rook], Velero [@velero]
- machine learning: Kubeflow [@kubeflow]
- networking: Knative [@knative], Istio [@istio]
- serverless and service mesh: Knative [@knative], Istio [@istio]
- infrastructure and application management: Crossplane [@crossplane], Argo CD [@argocd], Flux [@flux], KubeVela [@kubevela]
- cluster management: Gardener [@gardenerdocs], Cluster API [@clusterapi]
- cloud infrastructure: Yaook [@yaook], IronCore [@ironcore]
Expand Down Expand Up @@ -47,7 +47,7 @@ It limits the maximum number of objects and the maximum object churn rate to the
[@bondi2000characteristics]

To address the demand for facilitating large-scale deployments, several of the mentioned open-source projects feature sharding mechanisms that distribute reconciliation work across multiple controller instances [@argocddocs; @kubevela].
However, the mechanisms are specific to the individual projects and cannot be reused in other controllers.
However, the mechanisms are specific to the individual projects and cannot be reused in other custom controllers or Kubernetes core controllers.
Many of these sharding implementations still need to be fully matured and face similar challenges, e.g., the mechanism requires clients to be sharding-aware and manually assign API objects to shards, or the implementation does not facilitate automatic failover and rebalancing [@flux].
Furthermore, many other projects also consider sharding mechanisms for achieving higher scalability[^sharding-issues].
The problem is that no standard design or implementation exists that can be applied to arbitrary controllers for scaling them horizontally.
Expand Down
24 changes: 16 additions & 8 deletions content/30-related-work.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,12 +78,15 @@ The assignments per object also require many sharder reconciliations and API req

![Study project memory usage by pod [@studyproject]](../assets/study-project-memory.pdf){#fig:study-project-memory}

## knative
## Knative

In knative [@knative], controllers also use leader election but not for global locking[^knative-issue].
Knative is a platform for running serverless and event-driven workloads on Kubernetes.
Users package their applications as container images and deploy them without managing infrastructure, networking, autoscaling, revision tracking, and other cross-cutting concerns [@knative].

In Knative, controllers also use leader election but not for global locking[^knative-issue].
Instead, the controllers perform leader election per reconciler[^reconciler] and bucket.
When running multiple instances of the controllers, each instance might acquire a subset of all leases and run only the corresponding reconcilers.
Some of knative's reconcilers are leader-aware and run in all instances but behave differently according to the leadership status.
Some of Knative's reconcilers are leader-aware and run in all instances but behave differently according to the leadership status.
For example, the webhook components also use reconcilers for building up indices.
The reconcilers also run in non-leader instances but only perform writing actions in the leader instance.
Additionally, the keys of all objects are split into a configurable number of buckets.
Expand All @@ -92,7 +95,7 @@ Before reconciling an object, the reconciler checks if its instance is responsib
Only if it is responsible can it continue with the usual reconciliation.
[@mooresharding]

![Failover with leader election per controller and bucket in knative [@mooresharding]](../assets/reconciler-buckets.pdf)
![Failover with leader election per controller and bucket in Knative [@mooresharding]](../assets/reconciler-buckets.pdf)

To realize these mechanisms, all controller instances run all informers.
I.e., they watch all objects regardless of whether they need to reconcile them.
Expand All @@ -102,7 +105,7 @@ Furthermore, the mechanisms do not guarantee an even distribution of objects acr
Users need to configure a higher number of buckets to achieve an even distribution.
This, in turn, increases the additional API request volume for `Lease` objects even further.

The described sharding mechanisms in knative achieve fast failovers as informers are warmed in all controller instances.
The described sharding mechanisms in Knative achieve fast failovers as informers are warmed in all controller instances.
However, the system's scalability is still limited as the watch caches' resource impact is duplicated and not distributed.
Applying the described concepts to other controllers is complex and requires notable changes to the controller implementation.
To summarize, the system benefits from these mechanisms in terms of availability but not in terms of scalability.
Expand All @@ -112,7 +115,10 @@ To summarize, the system benefits from these mechanisms in terms of availability

## Flux

The Flux controllers offer a command line option `--watch-label-selector` that filters the controllers' watch caches using a label selector.
Flux [@flux] is a tool for facilitating continuous delivery of Kubernetes-based applications.
It is comprised of multiple components that pull application configuration from sources like Git repositories and deploy them using Kustomize [@kustomizedocs] and Helm [@helmdocs].

The Flux components offer a command line option `--watch-label-selector` that filters the controllers' watch caches using a label selector.
This can be used to scale out Flux controllers horizontally using a sharding strategy.
For this, users deploy multiple instances of the same controller with unique label selectors used as the sharding key[^flux-sharding].
Then, users assign objects to shards by adding the shard key label to the respective manifests ([@lst:flux-sharding]).
Expand Down Expand Up @@ -157,7 +163,8 @@ The sharding strategy is limited to a static number of instances and does not al

## Argo CD

In Argo CD [@argocddocs], the application controller is the central component that deploys manifests pulled from Git repositories to Kubernetes.
Argo CD [@argocddocs] is a continuous delivery tool for Kubernetes similar to Flux.
In Argo CD, the application controller is the central component that deploys manifests pulled from Git repositories to Kubernetes.
It works with one or more clusters configured via `Secrets` that contain credentials for the cluster.
The application reconciliation process can become memory-intensive depending on the number and size of the deployed manifests.

Expand All @@ -180,7 +187,8 @@ However, this is specific to Argo CD's application controller, and the mechanism

## KubeVela

KubeVela also allows running multiple instances of its core controller responsible for deploying applications to support large-scale use cases.
KubeVela is a platform for delivery and management of Kubernetes-based applications.
The project also allows running multiple instances of its core controller responsible for deploying applications to support large-scale use cases.
For this, users deploy multiple instances of vela-core – one in master mode (primary) and the others in slave mode (shards).
The primary instance runs all controllers and webhooks and schedules applications to one of the available shard instances.
On the other hand, the shard instances are labeled with a unique `shard-id` label and only run the application controller.
Expand Down
2 changes: 1 addition & 1 deletion content/70-conclusion.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ To conclude, the systematic evaluation has shown that all identified requirement
As the mechanism can be easily applied to existing controllers, it opens opportunities for adopting the presented work, discussion, and collaboration in the open-source community.
Further development is simplified because the implementation does not depend on a specific Kubernetes version.

As future work on horizontally scalable Kubernetes controllers, the design and implementation from this thesis should be further evaluated through usage in real-world controllers.
As future work on horizontally scalable Kubernetes controllers, the design and implementation from this thesis should be further evaluated through usage in real-world controllers including built-in and custom controllers.
The implementation's performance during rolling updates, automatic scaling, chaos engineering experiments [@chaos2016], and more scenarios should be investigated and enhanced if necessary.
For this, feedback from the community on the presented development needs to be collected.
New requirements shall be collected and explored if certain use cases cannot adopt the presented work.
Expand Down
20 changes: 18 additions & 2 deletions content/bibliography.bib
Original file line number Diff line number Diff line change
Expand Up @@ -560,10 +560,10 @@ @misc{argocd
}

@misc{flux,
title = {{Flux}},
title = {{Flux Documentation}},
author = {{The Flux Authors}},
date = {2024},
url = {https://fluxcd.io/},
url = {https://fluxcd.io/flux/},
urldate = {2024-02-07}
}

Expand Down Expand Up @@ -607,6 +607,22 @@ @misc{argocddocs
urldate = {2024-02-07}
}

@misc{kustomizedocs,
title = {{Kustomize Documentation}},
author = {{The Kubernetes Authors}},
date = {2024},
url = {https://kustomize.io/},
urldate = {2024-02-14}
}

@misc{helmdocs,
title = {{Helm: The package manager for Kubernetes}},
author = {{The Helm Authors}},
date = {2024},
url = {https://helm.sh/},
urldate = {2024-02-14}
}

@article{argoaws,
author = {Andrew Lee and Christina Andonov and Carlos Santana and Nima Kaviani},
journal = {AWS Open Source Blog},
Expand Down

0 comments on commit efcaf63

Please sign in to comment.