BOSH Splitter is a tool to help with large BOSH deployments which have a high instance count for a subset of jobs. Cloud Foundry DEA Runners and Diego Cells are jobs which in large deployments scale above 100 instances. BOSH Splitter works by carving a single large deployment into multiple smaller deployments.
Updating large deployments of Cloud Foundry (500+ DEAs) will result in 36+ hour BOSH deployments, especially if stemcells also need to be upgraded. Breaking a single deployment into multiple smaller deployments is easier for Ops to manage.
The tool also prevents the need for making and maintaining multiple deployment repos and preserves creation of manifest.yml
to allow for rollback.
You need to have the following:
- Spruce 1.8.1 or newer
- A Genesis style deployment architecture
The Cloud Foundry deployment in useast1-sb
was modified to extract each group of runners into its own deployment. Implementation was done in the following order:
- Creation of the
split
bash script - Modification of the environment level Genesis
Makefile
- Modification of the site level
networking.yml
- Create the core and runner manifests
- Deployment of the core manifest
- Deployment of each of the runner manifests
This is a one time activity and has already been done in this repo. If BOSH Splitter is desired in other deployment repos simply copy it and place it into the bin/
folder and make it executable.
A few notes about its execution:
- It requires command line parameters to be passed to it. See the
Makefile
as an example. - A temporary folder and yml files will be created in
<site>/<env>/scripts
and is removed at the end of the script execution. If the script errors out these files may be left behind and can be ignored as they will be removed on the next successful execution of the script. manifest.yml
is used as the source of the split and should be updated before running BOSH Splitter.- The script takes the list of jobs to split out of the manifest.yml as command line arguments. Each job specified will have its own deployment manifest file created.
- When BOSH Splitter is executed the following files are created in the
manifests/
folder:- core.yml - Is the
manifest.yml
with the jobs specified on the command line removed. It preserves the deployment name in manifest.yml - split_{job_name}.yml - This is the
manifest.yml
with all but a single job removed. The job name is appended to the deployment name in manifest.yml so when BOSH deployed a new deployment is created.
- core.yml - Is the
This step needs to be performed for each environment where BOSH Splitter will be used. Modify the Makefile and add a space delimited list of job names you would like stripped out of manifest.yml into their own deployment manifest files. A split
task also needs to be added. In the useast1/prod
folder a small Makefile already exists which can be modified to your own needs.
...
SPLIT_JOBS := "runner_z1 runner_z2"
...
split:
@../../bin/split "$(BUILD_SITE)" "$(BUILD_ENV)" "$(TO)" "$(SPLIT_JOBS)"
...
The networking groups need to be split. One new networking group needs to be added for each job which is being split out. Each network cannot overlap the available range of floating ips.
In the example for useast1/prod
additional networks were added. One additional network was added for runner_z1
and runner_z2
with a different IP range than cf1
and cf2
.
Leveraging BOSH v2 may get around this restriction.
Run the BOSH Splitter from the <site>/<env>
folder in the deployment:
make split
Note that split
should be run AFTER creating the manifest as the split script works by parsing manifest.yml
Do deploy Cloud Foundry but without runner_z1
and runner_z2
jobs deploy manifests/core.yml
:
bosh deployment manifests/core.yml
bosh deploy
Now we can deploy the new runner deployments:
bosh deployment manifests/split_runner_z1.yml
bosh deploy
bosh deployment manifests/split_runner_z2.yml
bosh deploy
Once deployed you will see the 2 new deployments for the runners:
cweibel@sw-jumpbox:~/projects/cloud-foundry-deployments/useast1/sandbox/manifests$ bosh deployments
+-----------------------------------------+------------------------+-------------------------------------------------+
| Name | Release(s) | Stemcell(s) |
+-----------------------------------------+------------------------+-------------------------------------------------+
| useast1-sb-cloudfoundry | buildpacks/2 | bosh-aws-xen-hvm-ubuntu-trusty-go_agent/3262.14 |
| | cf/251 | |
| | shield/6.3.3 | |
| | toolbelt/3.2.10 | |
+-----------------------------------------+------------------------+-------------------------------------------------+
| useast1-sb-cloudfoundry-runner_z1 | buildpacks/2 | bosh-aws-xen-hvm-ubuntu-trusty-go_agent/3262.14 |
| | cf/251 | |
| | shield/6.3.3 | |
| | toolbelt/3.2.10 | |
+-----------------------------------------+------------------------+-------------------------------------------------+
| useast1-sb-cloudfoundry-runner_z2 | buildpacks/2 | bosh-aws-xen-hvm-ubuntu-trusty-go_agent/3262.14 |
| | cf/251 | |
| | shield/6.3.3 | |
| | toolbelt/3.2.10 | |
+-----------------------------------------+------------------------+-------------------------------------------------+
Looking at one of the new deployments you will see the runners:
cweibel@sw-jumpbox:~/projects/cloud-foundry-deployments/useast1/sandbox/manifests$ bosh vms useast1-sb-cloudfoundry-runner_small_z1
+----------------------------------------------------+---------+-----+-----------+------------+
| VM | State | AZ | VM Type | IPs |
+----------------------------------------------------+---------+-----+-----------+------------+
| runner_z1/0 (ce5607c9-df72-4ce0-8aaa-438ee1012a53) | running | n/a | runner_z1 | 10.50.69.0 |
| runner_z1/1 (5910e60f-33f3-43f5-85c0-035b13f52170) | running | n/a | runner_z1 | 10.50.69.1 |
+----------------------------------------------------+---------+-----+-----------+------------+
VMs total: 2
- Modify any existing concourse pipelines to deploy
core.yml
instead ofmanifest.yml
- Modify existing concourse pipelines to add new tasks to deploy each runner group
To revert these changes and go back to a single deployment and manifest:
- Undo the network changes done in Step 3
- Perform
bosh delete deployment
for each of the runner manifests - Run
bosh deployment manifest/manifest.yml; bosh deploy
to redeploy the original runners
It may be impractical to deploy the splitter against the existing runners or cells of a large deployment, the following is your easiest path forward:
- Perform the split before scaling out Diego cells or DEA runners
- Allocate a new /23 or similar networks for each new cell grouping and preserve the existing range for the CF core, database, brain, cc_bridge, route_emitter and access jobs
- Explore converting to BOSH v2 so the networking does not need to be carved into groups to prevent deployments from having overlapping IP ranges.
The following pain points were identified while creating the PoC with useast1-sb:
- The existing two networks (
cf1
andcf2
) networks needed to be split into 6, leveraging thereserved:
to carve out groups of ip ranges which did not overlap. This is needed because otherwise each deployment will try and use the same ip ranges and fail. If there are ips overlapping any of the split groups the vms will get recreated. - Before removing the existing jobs (runners or otherwise) the new deployments of runners have to be created. Otherwise you may wind up in a situation where there are not enough resources to run apps.
- For some fixed sized infrastructures there may not be enough physical resources or IPs to have double the number of runners required after creating all the runner deployments but before deploying
core.yml
removing the runners form the original deployment.