Skip to content

Commit

Permalink
Add labels and references for load dimensions
Browse files Browse the repository at this point in the history
  • Loading branch information
timebertt committed Feb 10, 2024
1 parent 58bbedb commit a2b4cbc
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 9 deletions.
8 changes: 4 additions & 4 deletions content/20-fundamentals.md
Original file line number Diff line number Diff line change
Expand Up @@ -310,8 +310,8 @@ First, it is required to devise how to quantify the scale of or load on a specif
As controllers are an essential part of Kubernetes, the load is quantified in a subset of the scaling dimensions of Kubernetes itself.
For a given controller setup, the load has two dimensions:

1. The number of API objects that the controller watches and reconciles.
2. The churn rate of API objects, i.e., the rate of object creations, updates, and deletions.
1. \dimn{count}The number of API objects that the controller watches and reconciles.
2. \dimn{churn}The churn rate of API objects, i.e., the rate of object creations, updates, and deletions.

Next, the key SLIs and corresponding SLOs of a controller setup need to be specified.
As a prerequisite for these performance indicators to be meaningful, the official Kubernetes SLOs need to be satisfied by the cluster that the controllers are running on.
Expand Down Expand Up @@ -346,10 +346,10 @@ Other important parameters are the controller's compute resources and the number
While Kubernetes and its controllers are already scalable to a good extent, there are limitations to scaling controllers inherent in the leader election mechanism.
Understanding how a controller's load dimensions, SLIs, and resource usage are related is essential to discuss these limitations.

When increasing the load by adding more objects, the controller's watch cache requires more memory for caching the additional objects.
When increasing the load by adding more objects (\refdimn{count}), the controller's watch cache requires more memory for caching the additional objects.
This doesn't have a direct impact on the SLIs.
However, when consuming more memory than available, the controller might fail due to out-of-memory faults.
When the load on a controller grows by increasing the object churn rate, more watch events for relevant objects are transferred over the network.
When the load on a controller grows by increasing the object churn rate (\refdimn{churn}), more watch events for relevant objects are transferred over the network.
The processing of the additional watch events also results in a higher CPU usage for decoding and for performing reconciliations.
If the number of worker routines is not high enough to facilitate the needed rate of reconciliations, the queue time (SLI 1) increases.
Also, if performing reconciliations is computationally intensive, the extra CPU usage might exhaust the available CPU cycles, increasing the reconciliation latency (SLI 2).
Expand Down
10 changes: 5 additions & 5 deletions content/60-evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ To evaluate the new external sharder mechanism presented in this thesis, the web
With this, the webhosting-operator can be deployed in three different configurations: singleton (sharding disabled), internal sharder (design from study project), and external sharder (design from this thesis).
This allows for the comparison of all three setups using load test experiments.

As described in [@sec:controller-scalability], the scale of controller setups can be described in two dimensions: the number of API objects and the churn rate of API objects.
As described in [@sec:controller-scalability], the scale of controller setups can be described in two dimensions: the number of API objects (\refdimn{count}) and the churn rate of API objects (\refdimn{churn}).
The webhosting-operator's main resource are `Website` objects, which control `Deployment`, `ConfigMap`, `Service`, and `Ingress` objects.
Accordingly, increasing the load on the webhosting-operator involves creating many `Website` objects and triggering `Website` reconciliations.
Additionally, changing the `Theme` referenced by a `Website`, also triggers a `Website` reconciliation.
Expand Down Expand Up @@ -156,9 +156,9 @@ I.e., SLIs are not measured per cluster-day but only for the load test duration.

Next, experiments record the load on the evaluated controller setup to determine the load capacity of a setup using a given resource configuration and to allow for the comparison of results of different scenarios.
For this, both load dimensions of controllers defined in [@sec:controller-scalability] are measured for the tested controller.
Applied to the webhosting-opperator, the number of objects watched and reconciled by the controller (dimension 1) is the number of `Website` objects in the cluster.
Applied to the webhosting-opperator, the number of objects watched and reconciled by the controller (\refdimn{count}) is the number of `Website` objects in the cluster.
This can be measured using the `kube_website_info` metric exposed for every `Website` object by the operator.
On the other hand, the churn rate of API objects (dimension 2) for the webhosting-operator is the rate at which clients create, update, and delete `Website` objects.
On the other hand, the churn rate of API objects (\refdimn{churn}) for the webhosting-operator is the rate at which clients create, update, and delete `Website` objects.
In experiments, `Website` reconciliations are triggered by mutating the `spec.theme` field.
The experiment tool builds upon controller-runtime, and different controllers perform individual actions of scenarios in the form of reconciliations.
Hence, this load dimension can be measured using the reconciliation-related metrics exposed by controller-runtime.
Expand Down Expand Up @@ -323,7 +323,7 @@ For this, it runs three controllers:
- The `website-mutator` updates each `Website` spec every minute.

With this, the generated load slowly increases over 15 minutes.
The number of objects (dimension 1) grows to roughly 9,000, while the churn rate grows to roughly 160 changes per second ([@fig:basic-load]).
The number of objects (\refdimn{count}) grows to roughly 9,000, while the churn rate (\refdimn{churn}) grows to roughly 160 changes per second ([@fig:basic-load]).

![Generated load in basic scenario](../results/basic/load.pdf){#fig:basic-load}

Expand Down Expand Up @@ -356,7 +356,7 @@ The system is scalable if the load capacity can be increased by adding more reso
The system is said to be horizontally scalable if resources are added as additional instances without adding resources to individual instances.

In the `scale-out` scenario, the experiment tool generates a load with a high churn rate over 15 minutes.
The number of objects (dimension 1) grows up to roughly 9,000, and the churn rate grows up to roughly 300 changes per second ([@fig:scale-out-load]):
The number of objects (\refdimn{count}) grows up to roughly 9,000, and the churn rate (\refdimn{churn}) grows up to roughly 300 changes per second ([@fig:scale-out-load]):

- The `website-generator` creates 10 random `Websites` per second.
- The `website-mutator` updates each `Website` spec twice per minute.
Expand Down
16 changes: 16 additions & 0 deletions pandoc/includes/header.tex
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,22 @@
}
\makeatother

% dimensions
\newcounter{dimn}
\newcommand{\dimn}[1]{%
\refstepcounter{dimn}%
\label{dimn:#1}%
}
\makeatletter
\newcommand{\refdimn}{\@ifstar\refdimn@star\refdimn@nostar}
\newcommand{\refdimn@nostar}[1]{%
dimension \ref{dimn:#1}%
}
\newcommand{\refdimn@star}[1]{%
\ref{dimn:#1}%
}
\makeatother

%%% spacing
\usepackage{setspace}
\onehalfspacing % spacing between lines
Expand Down

0 comments on commit a2b4cbc

Please sign in to comment.