Skip to content

Commit

Permalink
Merge pull request #1 from rfisher001/lab-4.12To4.14
Browse files Browse the repository at this point in the history
Draft 1 of introductory sections - Dec. 8, 2023
  • Loading branch information
rfisher001 authored Dec 19, 2023
2 parents b8e7743 + 41be28f commit 893c376
Show file tree
Hide file tree
Showing 16 changed files with 800 additions and 7 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
This is the source code for [https://labs.sysdeseng.com/hypershift-baremetal-lab/4.13/index.html](https://labs.sysdeseng.com/hypershift-baremetal-lab/4.13/index.html)
Please excuse our dust... we are working on the build of this lab.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
31 changes: 31 additions & 0 deletions documentation/modules/ROOT/pages/API-Compatibility.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
= OCP API Compatibility Policy
include::_attributes.adoc[]
:profile: core-lcm-lab

== CNF API Compatibility
OpenShift and Kubernetes are strong because they use APIs for so many functions. This also means that the APIs will change over time as components are updated. Therefore, it is important to verify that an API call is still compatible with the cluster so that you will receive the desired response. Please review the documented section called https://docs.openshift.com/container-platform/4.14/rest_api/understanding-api-support-tiers.html[“Understanding API Tiers”].

The most important thing to understand when considering which Z-release to upgrade to inside of a new Y-release is what patches need to be in the new Z-release. In other words if you are currently at OCP 4.11.28 you will need to make sure to upgrade to a Z-release of 4.12 that has all of the patches in it that were applied to 4.11.28, otherwise you will break the built-in compatibility of Kubernetes.

== Kubernetes Version Skew
Support of specific API versions needs to be maintained by each cluster operator. With new releases of operators come new APIs. Therefore, the changes or skews in APIs need to be maintained. To a certain extent the APIs can be compatible across several releases of an operator. This list of operators and the releases that are compatible are at: https://kubernetes.io/releases/version-skew-policy

The easiest way verify your application functionality will still work, is to make sure that you follow



== OpenShift Upgrade Path
Please also note that not all releases of OCP can be upgraded to any arbitrary Z-release even if they contain all of the required patches.
OpenShift upgrade process mandates that:
If fix “A” is present in a specific X.Y.Z release of OCP
Then fix “A” MUST be present in the X.Y+1.Z release that OCP is upgraded TO

Consequence of the chosen destination version of 4.12.z defines which is the maximum version of OCP4.11.z, OCP4.10.z and OCP4.9.z
not all 4.9.z version will permit to upgrade to a given version of OCP4.12.z
A given version of OCP4.12.z will have requirements to a maximum version of OCP4.9z
This is due to how fixes are backported into older releases of OCP.

You can use the https://access.redhat.com/labs/ocpupgradegraph/update_path[upgrade graph tool] to determine if the path is valid for your z-release. You should also always verify with your Sales Engineer or Technical Account Manager at Red Hat to make sure the upgrade path is valid for Telco implementations.

.K8s Version Skey
image::../assets/images/k8s-vers-skew.png[]
62 changes: 62 additions & 0 deletions documentation/modules/ROOT/pages/Applying-MCPs.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
= Applying MCPs
include::_attributes.adoc[]
:profile: core-lcm-lab

First you can run “oc get mcp” to show your current list of MCPs:

[source,bash]
----
# oc get mcp
----

List out all of your nodes:

[source,bash]
----
# oc get no
----
Determine, from the above suggestions, how you would like to separate out your worker nodes into machine config pools
(MCP). +
In this example we will just use 1 node in each MCP. +
We first need to label the nodes so that they can be put into MCPs. We will do this with the following commands:

[source,bash]
----
oc label node euschannel-worker-0.test.corp node-role.kubernetes.io/*mcp-1*=
----

This will show up when you run the “oc get node” command:

[source,bash]
----
# oc get no
----

Now you need to create yaml files that will apply the labels as MCPs. Here is one example:

[source,bash]
----
apiVersion: machineconfiguration.openshift.io/v1
----

For each of these, just run “oc apply -f {filename.yaml}”:

[source,bash]
----
# oc apply -f test-mcp-2.yaml
----

Now you can run “oc get mcp” again and your new MCPs will show. Please note that you will still see the original worker
and master MCPs that are part of the cluster.

[source,bash]
----
# oc get mcp
----
46 changes: 46 additions & 0 deletions documentation/modules/ROOT/pages/CNF-Upgrade-Prep.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
= CNF Upgrade Preparation
include::_attributes.adoc[]
:profile: core-lcm-lab

The life of a POD is an important topic to understand. This section will describe several topics that are important to
keeping your CNF PODs healthy and allow the cluster to properly schedule them during an upgrade.

== CNF Requirements Document

Before you go any further, please read through the https://connect.redhat.com/sites/default/files/2022-05/Cloud%20Native%20Network%20Function%20Requirements%201-3.pdf[CNF requirements document].
In this section a few of the most important points will be discussed but the CNF Requirements Document has additional
detail and other important topics.

== POD Disruption Budget

Each set of PODs in a deployment can be given a specific minimum number of PODs that should be running in order to keep
from disrupting the functionality of the CNF, thus called the POD disruption budget (PDB). However, this budget can be
improperly configured. +
For example, if you have 4 PODs in a deployment and your PDB is set to 4, this means that you are telling the scheduler
that you NEED 4 PODs running at all times. Therefore, in this scenario ZERO PODs can come down.

.Deployment with no PDB
image::../assets/images/PDB-full.jpg[]

To fix this, the PDB can be set to 2, letting 2 of the 4 pods to be scheduled as down and this would then let the worker
nodes where those PODs are located be rebooted.

.Deployment with PDB
image::../assets/images/PDB-down-2.jpg[]

== POD Anti-affinity

True high availability requires a duplication of a process to be running on separate hardware, thus making sure that an
application will continue to run if one piece of hardware goes down. OpenShift can easily make that happen since
processes are automatically duplicated in separate PODs within a deployment. However, those PODs need to have
anti-affinity set on them so that they are NOT running on the same hardware. It so happens that anti-affinity also
helps during upgrades because it makes sure that PODs are on different worker nodes, therefore allowing enough PODs to
come down even after considering their PDB.

== Liveness / Readiness Probes

OpenShift and Kubernetes have some built in features that not everyone takes advantage of called
https://docs.openshift.com/container-platform/4.12/applications/application-health.html[liveness and readiness probes].
These are very important when POD deployments are dependent upon keeping state for their application. This document
won’t go into detail regarding these probes but please review the https://docs.openshift.com/container-platform/4.12/applications/application-health.html[OpenShift documentation]
on how to implement their use.
69 changes: 69 additions & 0 deletions documentation/modules/ROOT/pages/OCP-upgrade-prep.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
= OCP Upgrade Preparation
include::_attributes.adoc[]
:profile: core-lcm-lab

== Firmware compatibility

All hardware vendors will advise that it is always best to be on their latest certified version of firmware for their
hardware. In the telco world this comes with a trust but verify approach due to the high throughput nature of telco
CNFs. Therefore, it is important to have a regimented group who can test the current and latest firmware from any vendor
to make sure that all components will work with both. It is not always recommended to upgrade firmware in conjunction
with an OCP upgrade however if it is possible to test the latest release of firmware that will improve the odds that
you won’t run into issues down the road. +
Upgrading firmware is a very important debate because the process can be very intrusive and has a potential for causing
a node to require manual interventions before the node will come back online. On the other hand it may be imperative to
upgrade the firmware due to security fixes, new required functionality or compatibility with the new release of OCP
components. Therefore, it is up to everyone to verify with their hardware vendors, verify compatibility with OCP
components and perform tests in their lab before moving forward.

== Layer product compatibility

It is important to make sure all layered products will run on the new version of OCP that you will be moving to. This,
very much, includes all Operators.

Verify the current installed list of Operators installed on your cluster. For example:
[source,bash]
----
# oc get csv -A
NAMESPACE NAME DISPLAY VERSION REPLACES PHASE
chapter2 gitlab-operator-kubernetes.v0.17.2 GitLab 0.17.2 gitlab-operator-kubernetes.v0.17.1 Succeeded
openshift-operator-lifecycle-manager packageserver Package Server 0.19.0 Succeeded
----

== Prepare MCPs

Prepare your Machine Config Pool (MCP) labels by grouping your nodes, depending on the number of nodes in your cluster.
MCPs should be split up into 8 to 10 nodes per group. However, there is no hard fast rule as to how many nodes need to
be in each MCP. The purpose of these MCPs is to group nodes together so that a group of nodes can be controlled
independently of the rest. Additional information and examples can be found https://docs.openshift.com/container-platform/4.12/updating/update-using-custom-machine-config-pools.html[here, under the canary rollout documentation]. +
These MCPs will be used to un-pause a set of nodes during the upgrade process, thus allowing them to be upgraded and
rebooted at a determined time instead of at the pleasure of the scheduler. Please review the upgrade process flow
section, below, for more details on the pause/un-pause process.

// insert image for MCP
.Worker node MCPs in a 5 rack cluster
image::../assets/images/5Rack-MCP.jpg[]

The division and size of these MCPs can vary depending on many factors. In general the standard division is between 8
and 10 nodes per MCP to allow the operations team to control how many nodes are taken down at a time.

.Separate MCPs inside of a group of Load Balancer or purpose built nodes
image::../assets/images/LBorHT-MCP.jpg[]

In larger clusters there is quite often a need to separate out several nodes for purposes like Load Balancing or other
high throughput purposes, which usually have different machine sets to configure SR-IOV. In these cases we do not want
to upgrade all of these nodes without getting a chance to test during the upgrade. Therefore, we need to separate them
out into at least 3 different MCPs and unpause them individually.

// insert image for MCP
.Small cluster worker MCPs
image::../assets/images/Worker-MCP.jpg[]

Smaller cluster example with 1 rack

The process and pace at which you un-pause the MCPs is determined by your CNFs and their configuration. Please review
the sections on PDB and anti-affinity for CNFs. If your CNF can properly handle scheduling within an OpenShift cluster
you can un-pause several MCPs at a time and set the MaxUnavailable to as high as 50%. This will allow as many as half
of the nodes in your MCPs to restart and upgrade. This will reduce the amount of time that is needed for a specific
maintenance window and allow your cluster to upgrade quickly. Hopefully you can see how keeping your PDB and
anti-affinity correctly configured will help in the long run.
5 changes: 5 additions & 0 deletions documentation/modules/ROOT/pages/_attributes.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:experimental:
:source-highlighter: highlightjs
:branch: lab-4.14
:github-repo: https://github.com/RHsyseng/5g-ran-deployments-on-ocp-lab/blob/{branch}
:profile: core-lcm-lab
4 changes: 2 additions & 2 deletions documentation/modules/ROOT/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ IMPORTANT: This is not an official Red Hat Training. Please contact with your Re
|===
|Name |Role

|Alice
|Rob Fisher
|Lab Developer / Maintainer

|Bob
|
|Lab Reviewer

|===
37 changes: 33 additions & 4 deletions documentation/modules/ROOT/pages/introduction.adoc
Original file line number Diff line number Diff line change
@@ -1,20 +1,49 @@
= Introduction
include::_attributes.adoc[]
:profile: telco-ocp-upgrade-lab
:profile: core-lcm-lab

Welcome to this Telco OCP Upgrades Lab.

Why are upgrades important?
This is the question that all Telco platform administrators are faced with answering.

There is the simple response: because there is a new release out there.

The CNF (container native function) based response: The CNF requires additional functionality from the platform, therefore we need to upgrade OpenShift to get the new functionality.

The pragmatic response: All software needs to be patched because of bug and due to potential security vulnerabilities that have been found since the installation of the cluster.

With OpenShift and K8s (Kubernetes) we are adding in yet another: The platform is only supported for a specific period of time and new releases are coming out every 4 months.

Platform administrators are also asked why not just wait until we need to upgrade due to support?

If we put all of these together we find that OpenShift and K8s have made this less painful for Telco companies because we can skip every other release. OpenShift has long term support (EUS - extended update support) on all even releases and upgrade paths between these EUS releases. This document will discuss how to best upgrade from one EUS to the next and how to plan for that upgrade.

We want you to experience and learn about the steps and procedures that are needed to perform an OpenShift in service upgrade. Listening to and reading about an upgrade can only give you so much experience. Now we can apply the discussion to an experience that will walk you through the steps.

This document is intended to discuss OpenShift LifeCycle management for Telecommunication Network Function Core clusters. The specific size of the cluster has a few differences which are called out specifically at times in this document but this is meant to cover most clusters from 3 nodes to the largest cluster certified by the telco scale team. This includes some scenarios for mixed workload clusters.

This document will discuss the following upgrade scenarios:

* Z-Stream
* Y-Stream
* EUS to EUS
Each of these scenarios has different considerations which are called out as needed.


[#lab-aim]
== Who is this lab aimed at?

The lab is aimed to technical profiles working with OpenShift who are interested in any of these areas:

* Foo
* Bar
* LifeCycle Management of OpenShift
* Telecommunications Core 5G Baremetal Cluster Upgrades
* Disconnected Environments
[#lab-software-versions]
== Lab Software Versions

The lab is based on the following software versions.

* OpenShift Container Platform v4.X
* OpenShift Container Platform v4.12 upgrading to v4.14
Loading

0 comments on commit 893c376

Please sign in to comment.