This is a utility to enhance the monitoring process on simulator invocation.
The core of this utility are the helm
charts, which are described in the next
sections.
In order for the test to run needs:
-
A Kubernetes cluster which will be used as a client to install the simulators. This cluster will now be referred as
test harness
. Simulator invocation test expects a kubeconfig file, which points to the client-cluster. The kubeconfig file should be mounted on the$HOME/.kube/config
for the default behavior. Details on test harness used for testing these changes are shared below. -
apoctl
installed in the$PATH
. -
kubectl
installed in the$PATH
. -
helm
installed in the$PATH
. -
jq
installed in the$PATH
. -
Charts to use for enforcer installation (see under
charts/enforcer-sim
). The default value that the test will seek isenforcer-sim-<release>.tgz
under current directory. -
apoctl.json
(or any other path passed on--creds
flag) with the application credentials with namespace administrator privileges:# sample example to get application credentials for preprod export APOCTL_API=https://api.preprod.aporeto.us export APOCTL_NAMESPACE=/aporetodev/username eval $(apoctl auth google -e) apoctl appcred create "simulators" --role @auth:role=namespace.administrator \ > apoctl.json
If a values.yaml
file exists, it will be used to take default values for the
charts. The pods, simulators and initial delay, will be overwritten by the
values of the script (default 30, 10 and 5 respectively). Also the
enforcerTagPrefix
will be overwritten by the test from a random value. Here is
an example of a valid values.yaml
file:
# number of pods
pods: 25
# number of simulators per pod
simulatorsPerPod: 10
# number of PUs per simulator
pusPerSimulator: 20
# the type of PUs (a gaia.ProcessingUnitTypeValue or "random")
puType: random
# list of external tags for each PU (must all start with "@"). If null, plan-gen defaults will be
# used.
puMeta: null
# number of flows per PU
flowsPerPU: 50
# Example values for PU lifecycle. These set of values will run each simulator for one hour.
puLife:
puIter: "1"
puInterval: 30 # value in seconds
puCleanup: 1 # value in seconds
flowIter: "12"
flowInterval: 60 # value in seconds
# PU jitter
jitter:
variance: 20 # value in percentage (%)
puStart: 10 # value in seconds
puReport: 1 # value in seconds
flowReport: 500 # value in miliseconds
# enforcer image configuration
image:
# here needs a full name of the image including registry and organization
name: "docker.io/aporeto/enforcerd"
tag: "release-5.0.14"
# plan generation image configuration (the plan gen is incoprorated on the simulator image)
simulatorImage:
# here needs a full name of the image including registry and organization
name: "docker.io/aporeto/benchmark"
tag: "v1.17.0"
# log level
log:
level: debug
format: console
toConsole: 1
disableWrite: false
# Enforcer jitter configuration
enforcerJitters:
puSync: "20%"
puFailureRetry: "20%"
puStatusUpdateRetry: "20%"
policiesSync: "20%"
apiReconnect: "20%"
flowReportDispatch: "20%"
# By default 2 tags are added to enforcer (simulator) with the current charts.
# The first one has enforcerTagPrefix as prefix, as nsim=<prefix>-$i where i ranges
# through pods * simulatorsPerPod.
enforcerTagPrefix: simulator
# The second is the enforcerTag, it is fixed for all enforcers and should
# always be key=value.
enforcerTag: simbase=simulator
# Do not edit these parameters, exclusively used by automation. Value set here will be ignored.
depName: ""
k8sNS: ""
To see available options run:
simulator.sh -h
The test runs in batches of deploying pods * simulators per pod
enforcer in
each batch. By specifying the total enforcer number you want, the test will
deploy TOTAL_ENFORCERS/BATCH_SIZE + 1 batches.
Each batch is configured as deployment into single test harness namespace. Namespace
will be created in format simulator-<6 character random string>
.
A simple example to run:
# BATCH_SIZE=pods*simulators
simulator.sh --pods 30 --simulators 10 --enforcers 3000 --namespace /base/namespace
The default sizes of pods are 30 and the simulators per pod 10. By default the prefix is simulator,
and it points to the prefix of kubernetes namespaces that the test create its batches and also are used for the control plane namespaces under a base namespace. The base namespace can be given via the --namespace
flag.
The test will create one namespace as simulator-<random>
and n sub-namespaces flat, as follows:
/base/namespace/simulator-random/simulator-random-1
/base/namespace/simulator-random/simulator-random-2
...
/base/namespace/simulator-random/simulator-random-n
The n sub-namespaces will be calculated as the total enforcers, divided by the namespace capacity (--capacity
). After the namespace creation then a set of mapping policies will be applied from /base/namespace/simulator-random
towards /base/namespace/simulator-random/simulator-random-i
for the enforcers.
Thus all the enforcers will be stored under the namespace /base/namespace/simulator-random
, but will be mapped eventually under the respective namespace to commit to the capacity per namespace.
NOTE: If the capacity and batch size are not equal, then will be created mapping policies as the number of total enforcers.
If one, needs to run another test on the same backend, in order to add more simulators, can use different prefix for isolation purposes:
simulator.sh --enforcers 3000 \
--namespace /aporeto/test \
--prefix complementary-batches
There is a cleanup switch which deletes the aporeto namespace
/base-namespace/prefix
, as well as the clients premises, on which will delete
all namespaces found with the prefix passed:
simulator.sh --cleanup --namespace /base/namespace --prefix simulator
So the above will delete the aporeto namespace base/namespace/simulator
and
the cluster namespaces that match the regular expression simulator*
.
The simulator script wraps the procedures to deploy the scale tests charts, which are described as follows.
To build the charts you can run:
helm package charts/enforcer-sim
You must first create an appcred, the same way you would do to install an enforcer on Kubernetes:
apoctl appcred create enforcerd \
--type k8s \
--role "@auth:role=enforcer" \
--namespace /target/namespace \
| kubectl apply -f -
alternatively, if you have a credentials file (must be named aporeto.creds
) locally:
kubectl create secret generic enforcerd --from-file=aporeto.creds
Then, to run the default test:
helm install nsim enforcer-sim-<release>.tgz
NOTE: GKE at least, applies by default a limitrange to the default
namespace for 100m CPU
per container. You might want to delete (or alter) that to achieve higher simulator density.
image.name
(required): enforcer image path (e.i. server/org/imagename)image.tag
(required): enforcer image tagsimulatorImage.name
(required): plan generation image path (e.i. server/org/imagename)simulatorImage.tag
(required): plan generation image tagprefix
: prepended to resource names, allowing for multiple deployments in the same namespacepods
: the number of pods to deploysimulatorsPerPod
: the number of simulators to run per podpusPerSimulator
: the number of PUs per enforcer to simulatepuType
: the type of PUspuMeta
: the external tags to add to PUsflowsPerPU
: the number of flows per PU to simulateinitDelay
: the delay (in seconds) between bringing up consecutive simulators in a podrestarts
: max simulator restarts before deleting a podpuLife
: the PU lifecycle parameters (see the default values for details)jitter
: the jitter parameters (see the default values for details)log.level
: log level for simulated enforcerlog.format
: log format for simulated enforcerenforcerTagPrefix
: prefix of the enforcer tag assigned to all enforcersenforcerTag
: tag to assign to all enforcers
NOTE: The total number of simulators will be pods * simulatorsPerPod
. How to split the
simulators into pods is up to you. Bear in mind that more pods make managing the test a bit easier
(fewer k8s objects) and could prove a bit more efficient (pod overhead and limits on pods / node),
but gives away some flexibility and robustness (e.g. is a simulator reaches the restarts limit, the
whole pod goes down with it). A suggestion is:
simulatorsPerPod: 10
for bulk testing e.g. the first few thousand simulators in a large test.- As low as
simulatorsPerPod: 2
or evensimulatorsPerPod: 2
in the more unstable, later hases of a test (when simulator timeouts and thus restarts are more likely).
Generally, the limiting factor for simulator density is the simulator's memory consumption. Testing
has shown that a single simulator container will use ~45-55 MiB at runtime. A single n1-standard-4
node should be able to handle up to 200 simulators (assuming that the node is not running anything
else).
We have tested the simulator test harness with following configuration.
Kubernetes cluster configured with two node pools
- default - reserved for system pods, nodes labeled as
pod: system
one2-standard-8
images - workload - to be used for pods, nodes labeled as
pod: workload
one2-standard-32
images
Simulator pods will be deployed with toplogy spread constraints on zone and node of maximum skew of 1. This will allow to have pods in a balanced load cluster.
In our experiments we observed each pod containing 10 simulators to consumbe ~450Mi of memory and ~100m of CPU cycles. This guidance allowed us pick node sizes & images based on test requirements.
The plans for the simulators are generated at runtime by the
plan-gen
utility of the benchmark suite. This runs in an init container per pod, producing the plans for
that pod's simulators. Refer to that for further details.