diff --git a/content/35-requirements.md b/content/35-requirements.md index efca8ca..817cc65 100644 --- a/content/35-requirements.md +++ b/content/35-requirements.md @@ -14,7 +14,7 @@ This thesis builds on the requirements posted in the previous study project (req While the presented work already fulfills the basic requirements, the evaluation revealed significant drawbacks and limitations that this thesis needs to resolve to make controllers horizontally scalable ([@sec:related-study-project]). The set of requirements is extended in this thesis to address the identified limitations accordingly. -\subsection*{\requirement\label{req:membership}Membership and Failure Detection} +\subsection*{\req{membership}Membership and Failure Detection} As the foundation, the sharding mechanism needs to populate information about the set of controller instances. In order to distribute object ownership across instances, there must be a mechanism for discovering available members of the sharded setup (controller ring). @@ -25,7 +25,7 @@ Furthermore, instances will typically be replaced in quick succession during rol Hence, the sharding mechanism should handle voluntary disruptions explicitly to achieve fast rebalancing during scale-down and rolling updates. [@studyproject] -\subsection*{\requirement\label{req:partitioning}Partitioning} +\subsection*{\req{partitioning}Partitioning} The sharding mechanism must provide a partitioning algorithm for determining ownership of a given object based on information about the set of available controller instances. It must map every sharded API object to exactly one instance. @@ -36,7 +36,7 @@ As controllers commonly watch controlled objects to trigger reconciliation of th I.e., all controlled objects must map to the same partition key as their owners. [@studyproject] -\subsection*{\requirement\label{req:coordination}Coordination and Object Assignment} +\subsection*{\req{coordination}Coordination and Object Assignment} Based on the partitioning results, the sharding mechanism must provide some form of coordination between individual controller instances and assign API objects to the instances. Individual controller instances need to know which objects are assigned to them to perform the necessary reconciliations. @@ -52,20 +52,20 @@ Essentially, the sharding mechanism must not add additional points of failure on During normal operations, reconciliations should not be blocked for a longer period. [@studyproject] -\subsection*{\requirement\label{req:concurrency}Preventing Concurrency} +\subsection*{\req{concurrency}Preventing Concurrency} Even if object ownership is distributed across multiple controller instances, the controllers must not perform concurrent reconciliations of a single object in different instances. Only a single instance may perform mutations on a given object at any time. The sharding mechanism must assign all API objects to a single instance and ensure that only one instance performs reconciliations at any given time. [@studyproject] -\subsection*{\requirement\label{req:scale-out}Incremental Scale-Out} +\subsection*{\req{scale-out}Incremental Scale-Out} The sharding mechanism must provide incremental scale-out properties. For this, the system's load capacity must increase almost linearly with the number of added controller instances. [@studyproject] -\subsection*{\requirement\label{req:reusable}Reusable Implementation} +\subsection*{\req{reusable}Reusable Implementation} As controllers are implemented in arbitrary programming languages and use different frameworks ([@sec:controller-machinery]), the sharding mechanism must be independent of the controller framework and programming language. The design should allow a generic implementation to apply the sharding mechanism to any controller. @@ -75,7 +75,7 @@ Where the mechanism requires compliance of the controller instances, e.g., for a However, the core logic of the sharding mechanism should be implemented externally in a reusable way. The necessary logic that needs to be reimplemented in the controllers themselves should be limited in scope and simple to implement. -\subsection*{\requirement\label{req:constant}Constant Overhead} +\subsection*{\req{constant}Constant Overhead} A sharding mechanism always incurs an unavoidable overhead for tasks like instance management and coordination compared to a non-distributed setup. However, the inherent overhead of the sharding mechanism must not increase proportionally with the controller's load. @@ -83,7 +83,7 @@ Otherwise, the sharding components would face the original scalability limitatio I.e., the sharding overhead must be almost constant and independent of the number of objects and the object churn rate. Adding more controller instances achieves horizontal scalability only if the design fulfills this requirement. -\subsection*{\requirement\label{req:ecosystem}Only Use API and Controller Machinery} +\subsection*{\req{ecosystem}Only Use API and Controller Machinery} The sharding mechanism should stay in the ecosystem of API and controller machinery. It should only use the mechanisms provided by the Kubernetes API ([@sec:apimachinery]) and match the core concepts of Kubernetes controllers ([@sec:controller-machinery]). diff --git a/content/40-design.md b/content/40-design.md index 0071173..00d335e 100644 --- a/content/40-design.md +++ b/content/40-design.md @@ -11,7 +11,7 @@ Based on this, the following sections develop changes to the design regarding ho The following is a complete list of sharding events that must be considered by the sharder and what actions it needs to perform for each event. -\subsubsection*{\event\label{evt:new-shard}A new shard becomes available} +\subsubsection*{\evt{new-shard}A new shard becomes available} When a new shard becomes available for assignments, no existing objects are assigned to the instance. Accordingly, the sharder needs to rebalance assignments of existing objects to achieve a good distribution of reconciliation work. @@ -23,22 +23,22 @@ It needs to consider all objects and perform one of the following actions accord - If the object is assigned to an unavailable shard, assign it directly to the desired available shard. - If the object is assigned to an available shard but should be moved to another shard, start the handover protocol by draining the object. -\subsubsection*{\event\label{evt:shard-down}An existing shard becomes unavailable} +\subsubsection*{\evt{shard-down}An existing shard becomes unavailable} If an existing shard becomes unavailable, the sharder must move all objects assigned to it to another available shard. Here, the sharder must consider all objects with the `shard` label set to the unavailable shard. It must determine the desired available shard using the partitioning algorithm for every object and add the `shard` label accordingly. If the object is in the process of being drained – i.e., it still carries the `drain` label – the sharder must remove the `drain` label together with adding the `shard` label. If there is no remaining available shard, the sharder does not need to take any action. -In this case, objects effectively stay unassigned until a new shard becomes available (evt. \ref{evt:new-shard}). +In this case, objects effectively stay unassigned until a new shard becomes available (\refevt{new-shard}). -\subsubsection*{\event\label{evt:new-object}A new object is created, or an object is drained} +\subsubsection*{\evt{new-object}A new object is created, or an object is drained} When a client creates a new API object, it is unassigned, and neither carries the `shard` nor the `drain` label. This is also the case when an existing object should be moved to another shard and drained successfully by the currently responsible shard. In these cases, the sharder must directly assign the object to one of the available shards. -If there is no available shard, the assignment is deferred until a new shard becomes available (evt. \ref{evt:new-shard}). +If there is no available shard, the assignment is deferred until a new shard becomes available (\refevt{new-shard}). ## Architecture @@ -58,17 +58,17 @@ Both components realize object assignments in response to different sharding eve The evolved design addresses the extended requirements by two different architectural changes. First, moving partitioning, assignment, and coordination logic to an external sharder deployment configurable via custom resources makes the sharding mechanism independent of the used controller framework and programming language. -With this, the sharding implementation becomes reusable for any arbitrary Kubernetes controller, fulfilling req. \ref{req:reusable} ([@sec:design-external]). +With this, the sharding implementation becomes reusable for any arbitrary Kubernetes controller, fulfilling \refreq{reusable} ([@sec:design-external]). -Second, the design limits the overhead of the sharding mechanism to be independent of the controller's load (req. \ref{req:constant}) by performing assignments during object admission when required by event \ref{evt:new-object}. +Second, the design limits the overhead of the sharding mechanism to be independent of the controller's load (\refreq{constant}) by performing assignments during object admission when required by event \refevt{new-object}. A mutating webhook is triggered whenever a client creates a new unassigned object or the currently responsible shard removes the `drain` label from an existing object ([@sec:design-admission]). With this, watching the sharded objects is obsolete and allows removing the watch cache that causes a resource usage proportional to the number of objects. Additionally, this change reduces the API request volume caused by assignments and coordination. ## External Sharder {#sec:design-external} -The first architectural change generalizes the sharding design and makes the implementation reusable to address req. \ref{req:reusable}. -Note that this change does not reduce the resource overhead or API request volume to address req. \ref{req:constant}, but only move it to an external deployment. +The first architectural change generalizes the sharding design and makes the implementation reusable to address \refreq{reusable}. +Note that this change does not reduce the resource overhead or API request volume to address \refreq{constant}, but only move it to an external deployment. Instead of running the sharder as another controller in the sharded controller deployment itself, it is extracted to a dedicated external deployment without changing its core logic. This allows for reusing the sharder for multiple sharded controller deployments in the same cluster. @@ -128,11 +128,11 @@ However, implementing these changes generically in the respective controller fra ## Assignments in Admission {#sec:design-admission} The second architectural change ensures that the overhead of the sharding mechanism stays the same independent of the controller's load. -It ensures a constant overhead for true horizontal scalability of sharded controller setups, addressing req. \ref{req:scale-out} and \ref{req:constant}. +It ensures a constant overhead for true horizontal scalability of sharded controller setups, addressing \refreq{scale-out} and \refreq*{constant}. This change builds upon the previous one to limit the sharder's resource usage and to reduce the API request volume caused by assignments and coordination. The evolved design achieves these goals by performing assignments in the API admission phase, which replaces the sharder's costly watch cache for sharded objects. -Considering the sharding-related events that need to be handled by the sharding mechanism ([@sec:sharding-events]), the watch cache for sharded objects is only needed to detect and handle evt. \ref{evt:new-object}, i.e., when new unassigned objects are created, or existing objects are drained. +Considering the sharding-related events that need to be handled by the sharding mechanism ([@sec:sharding-events]), the watch cache for sharded objects is only needed to detect and handle \refevt{new-object}, i.e., when new unassigned objects are created, or existing objects are drained. This event always involves a mutating API request for the sharded object itself. Hence, the admission control logic can be leveraged to perform actions in response to the request instead of triggering sharder reconciliations in response to a watch event. @@ -142,7 +142,7 @@ Hence, the admission webhook is more flexible but adds latency to API requests. The sharder is responsible for setting up the `MutatingWebhookConfigurations` as needed. For this, the `ClusterRing` controller creates one webhook configuration for each ring with a matching list of sharded API resources. -The sharder still watches shard leases for detecting ring state changes (evt. \ref{evt:new-shard} and \ref{evt:shard-down}). +The sharder still watches shard leases for detecting ring state changes (\refevt{new-shard}, \refevt*{shard-down}). It runs a controller that accordingly handles both events as described in [@sec:sharding-events]. For this, the sharder does not need to watch the sharded objects themselves. Instead, it can use lightweight metadata-only list requests whenever object assignments of a `ClusterRing` need to be reconciled. diff --git a/content/50-implementation.md b/content/50-implementation.md index 858f178..eff48c4 100644 --- a/content/50-implementation.md +++ b/content/50-implementation.md @@ -215,7 +215,7 @@ It then reads all shards of the ring from the `Lease` cache and constructs a con Afterward, it determines the desired shard and responds to the API server with an object patch, adding the `shard` label as shown in [@lst:webhook-response]. Finally, the sharder runs the "sharder" controller that handles changes to the set of available shards and the ring's configuration. -It watches `ClusterRings` and shard `Leases` to reconcile all object assignments of a ring whenever its configuration changes or when a shard becomes available or unavailable (evt. \ref{evt:new-shard} and \ref{evt:shard-down}). +It watches `ClusterRings` and shard `Leases` to reconcile all object assignments of a ring whenever its configuration changes or when a shard becomes available or unavailable (\refevt{new-shard}, \refevt*{shard-down}). With this, the sharder can perform automatic rebalancing in response to dynamic instance changes or configuration changes (e.g., additional sharded resources) without human interaction. Additionally, it is triggered periodically (every 5 minutes by default) to perform assignments of objects not assigned by the sharder webhook due to intermediate failures. @@ -225,7 +225,7 @@ First, it lists only the metadata of the sharded objects to reduce the amount of Second, it sets the `resourceVersion` request parameter to `0`, instructing the API server to respond with a recent list state from its internal watch cache instead of performing a quorum read from etcd [@k8sdocs]. Finally, the controller performs paginated list requests to keep a limited number of objects in memory at any given time (500 objects by default). This prevents excessive spikes in the sharder's memory consumption. -Such spikes would be proportional to the number of sharded objects, limiting the system's scalability and conflict with req. \ref{req:constant}. +Such spikes would be proportional to the number of sharded objects, limiting the system's scalability and conflict with \refreq{constant}. ## Shard Components {#sec:impl-shard} diff --git a/content/60-evaluation.md b/content/60-evaluation.md index 820bdbb..2ee44ce 100644 --- a/content/60-evaluation.md +++ b/content/60-evaluation.md @@ -346,7 +346,7 @@ The results show that the external sharder setup distributes the controller's re Each shard roughly consumes a third of the resources consumed by the singleton controller. The results also show that performing sharding for controllers comes with an unavoidable resource overhead. However, the external sharder's overhead is constant and does not increase with the controller's load in contrast to the internal sharder's overhead. -With this, the external sharder setup fulfills req. \ref{req:constant}, while the internal sharder setup does not. +With this, the external sharder setup fulfills \refreq{constant}, while the internal sharder setup does not. ### Horizontal Scalability {#sec:eval-scale-out} @@ -409,7 +409,7 @@ Note that the load capacity values cannot be interpreted as absolute values but ![Load capacity increase with added instances in scale-out scenario](../results/scale-out/capacity.pdf){#fig:scale-out-capacity} The results show that adding more controller instances brings more performance and increases the maximum load capacity of the system. -The load capacity grows almost linearly with the number of added instances, so the setup fulfills req. \ref{req:scale-out}. +The load capacity grows almost linearly with the number of added instances, so the setup fulfills \refreq{scale-out}. With this, applying the external sharding design makes Kubernetes controllers horizontally scalable. ### Rolling Updates @@ -446,9 +446,9 @@ Each of the $n$ shards is responsible for roughly $1/n$ objects and roughly requ Although the sharding mechanism incurs an overhead in resource consumption, the overhead is minimal and negligible at scale. Most importantly, the sharder's resource footprint is constant even when the controller's load increases, i.e., when it has to handle more objects or a higher object churn rate. In total, the external sharder setup only requires a little more resources than the singleton controller setup to sustain the same load. -With this, the evaluation has shown that the presented design and implementation fulfills req. \ref{req:constant}. +With this, the evaluation has shown that the presented design and implementation fulfills \refreq{constant}. -Furthermore, the evaluation shows that the external sharder design provides incremental scale-out properties and fulfills req. \ref{req:scale-out}. +Furthermore, the evaluation shows that the external sharder design provides incremental scale-out properties and fulfills \refreq{scale-out}. The load capacity of the controller system can be incrementally increased by adding more instances. With the presented design, the load capacity increases almost linearly with the number of added controller instances. This proves that the architecture is horizontally scalable as defined in [@bondi2000characteristics; @duboc2007framework] ([@sec:kubernetes-scalability]). diff --git a/content/70-conclusion.md b/content/70-conclusion.md index 39b018f..e6933c4 100644 --- a/content/70-conclusion.md +++ b/content/70-conclusion.md @@ -10,23 +10,23 @@ The key properties of the presented work are: - Controller instances (shards) announce themselves to the sharder via individual heartbeat resources. The sharder determines the availability of shards and detects failures based on this membership information to perform automatic failovers and rebalancing as soon as the set of available shards changes. -This makes the mechanism suitable for highly dynamic environments (req. \ref{req:membership}). +This makes the mechanism suitable for highly dynamic environments (\refreq{membership}). - A consistent hashing algorithm achieves a balanced distribution of API objects for all ring sizes while minimizing movements on shard changes. -The partitioning mechanism considers object ownership so that controlled objects are assigned to the same shard as their owner (req. \ref{req:partitioning}). +The partitioning mechanism considers object ownership so that controlled objects are assigned to the same shard as their owner (\refreq{partitioning}). - Coordination is achieved by using label-based object assignments. Assignments are performed by the sharder transparently and automatically using a mutating webhook and controller. Adding corresponding label selectors in the shards distributes the reconciliation work and watch caches' footprint across shards. -None of the existing API semantics are changed, and clients need not be aware of the sharded architecture (req. \ref{req:coordination}). +None of the existing API semantics are changed, and clients need not be aware of the sharded architecture (\refreq{coordination}). - Following a dedicated handover protocol prevents concurrent reconciliations in multiple controller instances. -It ensures that a single instance is responsible for each object at any given time without locking reconciliations on a global level (req. \ref{req:concurrency}). -- The load capacity of the overall system is increased almost linearly with every added controller instance, as shown in [@sec:eval-scale-out] (req. \ref{req:scale-out}). +It ensures that a single instance is responsible for each object at any given time without locking reconciliations on a global level (\refreq{concurrency}). +- The load capacity of the overall system is increased almost linearly with every added controller instance, as shown in [@sec:eval-scale-out] (\refreq{scale-out}). - The core logic of the sharding mechanism is implemented in the dedicated sharder component, which can be installed easily into any cluster. Sharded controllers can be developed in any programming language, and only a few aspects need to be realized to enable controller sharding. -Reusable example implementations of the shard components in Go are presented in [@sec:impl-shard] and can be added for other programming languages as well (req. \ref{req:reusable}). +Reusable example implementations of the shard components in Go are presented in [@sec:impl-shard] and can be added for other programming languages as well (\refreq{reusable}). - The sharding mechanism incurs an overhead compared to a singleton controller setup. -However, the overhead is constant and independent of the controller's load, as shown in [@sec:eval-comparison] (req. \ref{req:constant}). +However, the overhead is constant and independent of the controller's load, as shown in [@sec:eval-comparison] (\refreq{constant}). - The sharding mechanism relies only on existing Kubernetes API and controller machinery. -It does not require managing external components or infrastructure other than controllers, keeping the added complexity low (req. \ref{req:ecosystem}). +It does not require managing external components or infrastructure other than controllers, keeping the added complexity low (\refreq{ecosystem}). To conclude, the systematic evaluation has shown that all identified requirements listed in chapter [-@sec:requirements] are fulfilled by the presented design and implementation. As the mechanism can be easily applied to existing controllers, it opens opportunities for adopting the presented work, discussion, and collaboration in the open-source community. diff --git a/pandoc/includes/header.tex b/pandoc/includes/header.tex index 460da6a..34d1bbe 100644 --- a/pandoc/includes/header.tex +++ b/pandoc/includes/header.tex @@ -269,18 +269,40 @@ % https://tex.stackexchange.com/questions/166416/continue-previous-roman-page-numbering-after-changing-to-arabic \newcounter{savepage} -%%% custom numbering -\newcounter{reqCounter} -\newcommand{\requirement}{% -\refstepcounter{reqCounter}% -Req. \arabic{reqCounter}: +%%% custom labels and cross-references +% requirements +\newcounter{req} +\newcommand{\req}[1]{% +\refstepcounter{req}% +\label{req:#1}% +Req. \arabic{req}: } +\makeatletter +\newcommand{\refreq}{\@ifstar\refreq@star\refreq@nostar} +\newcommand{\refreq@nostar}[1]{% +req. \ref{req:#1}% +} +\newcommand{\refreq@star}[1]{% +\ref{req:#1}% +} +\makeatother -\newcounter{evtCounter} -\newcommand{\event}{% -\refstepcounter{evtCounter}% -Event \arabic{evtCounter}: +% events +\newcounter{evt} +\newcommand{\evt}[1]{% +\refstepcounter{evt}% +\label{evt:#1}% +Event \arabic{evt}: } +\makeatletter +\newcommand{\refevt}{\@ifstar\refevt@star\refevt@nostar} +\newcommand{\refevt@nostar}[1]{% +evt. \ref{evt:#1}% +} +\newcommand{\refevt@star}[1]{% +\ref{evt:#1}% +} +\makeatother %%% spacing \usepackage{setspace}