Skip to content

Commit

Permalink
Improve wording and grammar
Browse files Browse the repository at this point in the history
  • Loading branch information
timebertt committed Feb 14, 2024
1 parent 2b2782d commit d31a534
Show file tree
Hide file tree
Showing 6 changed files with 34 additions and 34 deletions.
10 changes: 5 additions & 5 deletions content/10-motivation.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ The Kubernetes community has extensively picked up the operator pattern, and man
With these projects gaining popularity, large-scale Kubernetes and controller-based deployments are becoming more common.
The Kubernetes community recognizes this demand and ensures that the Kubernetes core components scale well.
For this, the special interest group scalability (SIG scalability) defines scalability thresholds (e.g., 5,000 nodes) and verifies that Kubernetes performs well within the recommended thresholds by frequently running automated performance and scalability tests.
However, SIG scalability is only responsible for ensuring high scalability of the Kubernetes core components.
However, SIG scalability is only responsible for ensuring high scalability of Kubernetes core components.
[@k8scommunity]

External components like custom controllers or operators are not included in the scalability considerations and guarantees.
Still, the used custom controllers must also be scalable for these large-scale deployments to work reliably.
External components like custom controllers or operators are not included in the community's scalability considerations and guarantees.
Still, custom controllers must also be scalable for these large-scale deployments to work reliably.
Compared to Kubernetes core controllers, custom controllers typically facilitate heavier reconciliation processes, increasing the demand for a scalable architecture.
[@kubevela]

Expand Down Expand Up @@ -70,7 +70,7 @@ First, this thesis describes all relevant fundamentals in detail (chapter [-@sec
These include the most important aspects of Kubernetes API and controller machinery ([@sec:apimachinery; @sec:controller-machinery]) as well as leader election principles ([@sec:leader-election]).
Next, a general definition for the scalability of a distributed system as defined in standard literature is presented, and how the scalability of Kubernetes is defined and measured ([@sec:kubernetes-scalability]).
[@Sec:controller-scalability] outlines how the scale and performance of a controller setup can be measured.
Based on this, [@sec:scalability-limitations] analyzes the current scalability limitations of Kubernetes controllers in detail.
Based on this, [@sec:scalability-limitations] analyzes current scalability limitations of Kubernetes controllers in detail.

Afterward, this thesis examines existing efforts related to sharding in Kubernetes controllers (chapter [-@sec:related-work]).
Primarily, it assesses the strengths and drawbacks of the design presented in the previous study project ([@sec:related-study-project]).
Expand All @@ -79,7 +79,7 @@ Following this, an evolved design is developed step by step in chapter [-@sec:de
It is based on the design presented in the study project but addresses all additional requirements from the previous chapter.

The implementation of the presented design is described in chapter [-@sec:implementation].
In addition to explaining how the external sharding components are implemented ([@sec:impl-clusterring; @sec:impl-sharder]), the chapter gives instructions for implementing the shard components in existing controllers ([@sec:impl-shard]).
In addition to explaining how external sharding components are implemented ([@sec:impl-clusterring; @sec:impl-sharder]), the chapter gives instructions for implementing the shard components in existing controllers ([@sec:impl-shard]).
[@Sec:impl-setup] presents an example setup that combines all implemented components into a fully functioning sharded controller setup.
Next, the implementation is evaluated in systematic load test experiments (chapter [-@sec:evaluation]).
After precisely describing the experiment setup ([@sec:experiment-setup]) and how measurements are performed ([@sec:measurements]), different experiment scenarios are executed ([@sec:experiments]), and their results are discussed ([@sec:discussion]).
Expand Down
20 changes: 10 additions & 10 deletions content/20-fundamentals.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ Kubernetes is an open-source system for orchestrating container-based applicatio
It is an API-centric and declarative system in which clients specify the desired state of applications and infrastructure instead of managing them via imperative commands.
This approach is essential to Kubernetes' reliability, scalability, and manageability. [@k8sdesign]

The architecture of Kubernetes is divided into two parts: the control plane and the data plane.
The architecture of Kubernetes is divided into two parts: control plane and data plane.
The control plane oversees the cluster's state and orchestrates various operations, while the data plane executes workload containers and serves application traffic.
The core of the control plane is the API server, which stores the cluster's metadata and state in etcd, a highly-available key-value store that acts as the source of truth for the entire cluster [@etcddocs].
The core of the control plane is the API server, which stores cluster metadata and state in etcd, a highly-available key-value store that acts as the source of truth for the entire cluster [@etcddocs].
The state is specified and managed in the form of objects via RESTful [@fielding2000architectural] HTTP endpoints (resources).
All human and machine clients interact with the system through these resources. [@k8sdocs]

Expand Down Expand Up @@ -83,7 +83,7 @@ This can be used to retrieve a specific recent revision of objects or the latest

Clients can add label selectors to list and watch requests for filtering API objects based on key-value pairs in their `metadata.labels`.
Note that the API server always retrieves the complete list of objects from etcd or events from its watch cache.
It subsequently filters the objects or events based on the specified label criteria and transmits the filtered list to the client.
It subsequently filters objects or events based on the specified label criteria and transmits a filtered list to the client.
This approach reduces the transferred data size, the effort for encoding and decoding in the API server and client, and the needed memory on the client side.
However, it neither reduces the transferred data size between the etcd and API server nor the effort for processing objects and events in the API server.

Expand All @@ -106,7 +106,7 @@ Apart from admission plugins for built-in resources, the admission control logic
These webhooks can be applied to both built-in and custom resources, and can be registered through `ValidatingWebhookConfiguration` and `MutatingWebhookConfiguration` objects.
A webhook configuration defines which server should be contacted for specific requests and allows selection based on the requested resource, the operation type, as well as namespace and object labels.
When a client performs a relevant request, the API server dispatches an `AdmissionReview` object including the requested object and relevant request metadata to the designated webhook server.
After processing the `AdmissionReview` object, the webhook server responds with a validation result and in the case of mutating webhooks, it may include optional patches to the object.
After processing the `AdmissionReview` object, the webhook server responds with a validation result and in case of mutating webhooks, it may include patches to the object.
The API server considers the returned validation result and optionally applies the returned patches to the object before storing the updated object in etcd. [@k8sdocs]

All Kubernetes API objects can reference owning objects for establishing relationships between objects.
Expand Down Expand Up @@ -136,7 +136,7 @@ As such, controllers are stateless components, as their state is persisted exter
If a controller restarts or crashes, it can pick up earlier work by reading the current state from the API server again.

The core components of Kubernetes controllers are the watch cache, event handlers, work queue, and worker routines.
A controller's cache is responsible for monitoring the object type on the API server, notifying the controller of changes to the objects, and maintaining the objects in memory as an indexed store.
A controller's cache is responsible for monitoring the object type on the API server, notifying the controller of changes to the objects, and maintaining object copies in memory as an indexed store.
For this, the controller initiates a reflector, which lists and watches the specified object type, emitting delta events added to a queue.
Subsequently, an informer reads events from the queue, updating the store with changed objects.
The flat key-value store has additional indices to increase the performance of frequently used namespaced queries or queries with field selectors.
Expand Down Expand Up @@ -191,7 +191,7 @@ Controllers always read the object for the enqueued object key from the watch ca
Furthermore, using filtered watches enhances the system's scalability by reducing the volume of data processed and stored by controllers.
For instance, the kubelet employs a field selector for the `spec.nodeName` field in `Pods` to filter watch requests for `Pods` running on the kubelet's `Node`.

In situations with a high rate of changes or latency of watch connections, controllers might read an outdated version of the object from the cache.
In situations with a high rate of changes or latency of watch connections, controllers might read an outdated version of the object from their cache.
When this happens, the controller might run into a conflict error when updating the object on the API server due the usage of optimistic concurrency control.
In case of errors like conflicts, controllers rely on an exponential backoff mechanism to retry reconciliations until the cache is up-to-date with the current state and the reconciliation can be performed successfully.
This ensures eventual consistency while reducing the average amount of API requests and network transfers needed for reconciliations.
Expand Down Expand Up @@ -230,8 +230,8 @@ spec:

: Example Lease {#lst:lease}

The Lease object is always acquired for a specified duration, as defined in the `spec.leaseDurationSeconds`.
The leader must continually renew the lease to maintain the leadership.
The Lease object is always acquired for a specified duration, as defined in `spec.leaseDurationSeconds`.
The leader must continuously renew the lease to keep its leadership.
If the leader fails to renew the lease, it must stop all reconciliations.
When the lease expires, other instances are permitted to contend for leadership, resulting in a leadership change.
When an instance is terminated, such as during rolling updates, the leader can voluntarily release the lease to speed up leadership handovers and minimize disruption.
Expand Down Expand Up @@ -273,7 +273,7 @@ In order to evaluate the scalability of controller setups in the scope of this t
The load on or scale of a Kubernetes cluster has many dimensions, for example: number of nodes, number of pods, pod churn, API request rate.
Evaluating the scalability of Kubernetes in every dimension is difficult and costly.
Hence, the community has declared a set of thresholds[^k8s-thresholds] for these load dimensions together, which can be considered as the limits for scaling a single Kubernetes cluster.
Most thresholds define a maximum supported number of API objects, and others define a maximum supported `Pod` churn rate or API request rate.
Most thresholds define a maximum supported number of API objects, while others define a maximum supported `Pod` churn rate or API request rate.
As long as a cluster is configured correctly and the load is kept within these limits, the cluster is guaranteed work reliably and perform adequately.
In Kubernetes development, regular load tests [@perftests] put test clusters under load as high as the declared thresholds to detect performance or scalability regressions.
[@k8scommunity]
Expand Down Expand Up @@ -307,7 +307,7 @@ These SLIs include in-cluster network programming and execution latency, in-clus
Based on the above definition of Kubernetes scalability, a definition for the scalability of Kubernetes controllers is derived.
In this context, "controller setup" refers to a set of coherent controller instances.
First, it is required to devise how to quantify the scale of or load on a specific controller setup.
As controllers are an essential part of Kubernetes, the load is quantified in a subset of the scaling dimensions of Kubernetes itself.
As controllers are an essential part of Kubernetes, the load is quantified in a subset of Kubernetes' scaling dimensions.
For a given controller setup, the load has two dimensions:

1. \dimn{count}The number of API objects that the controller watches and reconciles.
Expand Down
2 changes: 1 addition & 1 deletion content/30-related-work.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ However, this is specific to Argo CD's application controller, and the mechanism
## KubeVela

KubeVela also allows running multiple instances of its core controller responsible for deploying applications to support large-scale use cases.
For this, users deploy multiple instances of the vela-core – one in master mode (primary) and the others in slave mode (shards).
For this, users deploy multiple instances of vela-core – one in master mode (primary) and the others in slave mode (shards).
The primary instance runs all controllers and webhooks and schedules applications to one of the available shard instances.
On the other hand, the shard instances are labeled with a unique `shard-id` label and only run the application controller.
[@kubevela]
Expand Down
6 changes: 3 additions & 3 deletions content/35-requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ If reconciliation work can be distributed across multiple controller instances,
Another mechanism is needed to prevent concurrent reconciliations that does not include global locking.

This thesis builds on the requirements posted in the previous study project (req. \refreq*{membership}–\refreq*{scale-out}) [@studyproject].
While the presented work already fulfills the basic requirements, the evaluation revealed significant drawbacks and limitations that this thesis needs to resolve to make controllers horizontally scalable ([@sec:related-study-project]).
While the presented work already fulfills the basic requirements, the evaluation revealed significant drawbacks and limitations that this thesis must resolve to make controllers horizontally scalable ([@sec:related-study-project]).
The set of requirements is extended in this thesis to address the identified limitations accordingly.

\subsection*{\req{membership}Membership and Failure Detection}
Expand All @@ -20,9 +20,9 @@ As the foundation, the sharding mechanism needs to populate information about th
In order to distribute object ownership across instances, there must be a mechanism for discovering available members of the sharded setup (controller ring).
Instance failures need to be detected to restore the system's functionality automatically.

As controllers are deployed in highly dynamic environments, the sharding mechanism must expect that instances are restarted, added, or removed frequently and can fail at any time.
Furthermore, instances will typically be replaced in quick succession during rolling updates.
As controllers are deployed in highly dynamic environments, the sharding mechanism must expect that instances are restarted, added, or removed frequently, e.g., during rolling updates or for facilitating automatic scaling.
Hence, the sharding mechanism should handle voluntary disruptions explicitly to achieve fast rebalancing during scale-down and rolling updates.
Furthermore, instances can fail involuntarily at any time, which must also be handled by the sharding mechanism.
[@studyproject]

\subsection*{\req{partitioning}Partitioning}
Expand Down
23 changes: 11 additions & 12 deletions content/50-implementation.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,9 @@ It reports the total number, the number of available shards, and the `Ready` con

The shard lease controller is responsible for detecting instance failures based on the `Lease` objects that every shard maintains.
Like in the study project implementation, it watches all shard `Leases` and determines the state of each shard following the conditions in [@tbl:shard-states].
When a shard becomes uncertain, it tries to acquire the `Lease` to ensure connectivity with the API server.
Only if the controller can acquire the `Lease` is the shard considered unavailable and removed from partitioning.
I.e., all shards in states ready, expired, and uncertain are considered available and included in partitioning.
When a shard becomes uncertain, the shard lease controller tries to acquire the `Lease` to ensure connectivity with the API server.
Only if it can acquire the `Lease` is the shard considered unavailable and removed from partitioning.
I.e., all shards in states ready, expired, and uncertain are available for object assignments.
Orphaned `Leases` are garbage collected by the shard lease controller.
For increased observability, the shard lease controller writes the determined state to the `alpha.sharding.timebertt.dev/state` label on the respective `Lease` objects.
[@studyproject]
Expand All @@ -113,14 +113,14 @@ For increased observability, the shard lease controller writes the determined st

The controller watches the `Lease` objects for relevant changes to ensure responsiveness.
However, it also revisits `Leases` after a specific duration when their state would change if no update event occurs.
E.g., the transition from ready to expired happens if the shard fails to renew the `Lease` in time, which does not incur a watch update event.
E.g., the transition from ready to expired happens if the shard fails to renew the `Lease` in time, which does not incur a watch event.
For this, the controller calculates the time until the transition would occur and revisits the `Lease` after this duration.
It also watches `ClusterRings` and triggers reconciliations for all `Lease` objects belonging to a ring, e.g., to ensure correct sharding decisions when the `ClusterRing` object is created after the shard `Leases`.

For partitioning, the sharder reads all shard `Leases` of a given `ClusterRing` from its watch cache and constructs a consistent hash ring of all available shards.
When the sharder's webhook is called, it needs to determine the desired shard for an object based on the discovered membership information.
The sharder reads all shard `Leases` of a given `ClusterRing` from its watch cache and constructs a consistent hash ring of all available shards.
In this process, `Leases` are always read from the cache, and a new ring is constructed every time instead of keeping a single hash ring up-to-date.
This ensures consistency with the cache and watch events seen by the individual controllers.
Reading from a shared hash ring, which can be considered another cache layer, can lead to race conditions where the hash ring still needs to be updated to reflect the state changes for which a controller has been triggered.
This ensures consistency with the cache and watch events seen by the individual controllers to avoid race conditions.

The partitioning key of an object is determined based on whether it is configured as a main or controlled resource in the corresponding `ClusterRing`.
For main resources, the key is composed of `apiVersion`, `kind`, `namespace`, and `name`.
Expand Down Expand Up @@ -374,22 +374,21 @@ import (
func add(mgr manager.Manager, clusterRingName, shardName string) error {
var r reconcile.Reconciler
// Use the shardcontroller package as helpers for:
// - a predicate that triggers when the drain label is present
// (even if the actual predicates don't trigger)
// - wrapping the actual reconciler in a reconciler that handles the drain
// operation
// Use pkg/shard/controller as a helper when building controllers:
return builder.ControllerManagedBy(mgr).
Named("example").
For(
&corev1.ConfigMap{}, builder.WithPredicates(
// wrap the predicate to trigger when the drain label is present
// (even if the actual predicates don't trigger)
shardcontroller.Predicate(
clusterRingName, shardName, MyConfigMapPredicate(),
),
),
).
Owns(&corev1.Secret{}, builder.WithPredicates(MySecretPredicate())).
Complete(
// wrap the reconciler to handle the drain operation
shardcontroller.NewShardedReconciler(mgr).
For(&corev1.ConfigMap{}). // must match the kind in For() above
InClusterRing(clusterRingName).
Expand Down
Loading

0 comments on commit d31a534

Please sign in to comment.