Merge pull request #1 from rfisher001/lab-4.12To4.14

Draft 1 of introductory sections - Dec. 8, 2023
RHsyseng · Dec 19, 2023 · 893c376 · 893c376
2 parents b8e7743 + 41be28f
commit 893c376
Show file tree

Hide file tree

Showing 16 changed files with 800 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -1 +1 @@
-This is the source code for [https://labs.sysdeseng.com/hypershift-baremetal-lab/4.13/index.html](https://labs.sysdeseng.com/hypershift-baremetal-lab/4.13/index.html)
+Please excuse our dust... we are working on the build of this lab. 
diff --git a/documentation/modules/ROOT/assets/images/5Rack-MCP.jpg b/documentation/modules/ROOT/assets/images/5Rack-MCP.jpg
diff --git a/documentation/modules/ROOT/assets/images/LBorHT-MCP.jpg b/documentation/modules/ROOT/assets/images/LBorHT-MCP.jpg
diff --git a/documentation/modules/ROOT/assets/images/PDB-down-2.jpg b/documentation/modules/ROOT/assets/images/PDB-down-2.jpg
diff --git a/documentation/modules/ROOT/assets/images/PDB-full.jpg b/documentation/modules/ROOT/assets/images/PDB-full.jpg
diff --git a/documentation/modules/ROOT/assets/images/Worker-MCP.jpg b/documentation/modules/ROOT/assets/images/Worker-MCP.jpg
diff --git a/documentation/modules/ROOT/assets/images/k8s-vers-skew.png b/documentation/modules/ROOT/assets/images/k8s-vers-skew.png
diff --git a/documentation/modules/ROOT/assets/images/put_your_images_here b/documentation/modules/ROOT/assets/images/put_your_images_here
diff --git a/documentation/modules/ROOT/pages/API-Compatibility.adoc b/documentation/modules/ROOT/pages/API-Compatibility.adoc
@@ -0,0 +1,31 @@
+= OCP API Compatibility Policy
+include::_attributes.adoc[]
+:profile: core-lcm-lab
+
+== CNF API Compatibility
+OpenShift and Kubernetes are strong because they use APIs for so many functions. This also means that the APIs will change over time as components are updated. Therefore, it is important to verify that an API call is still compatible with the cluster so that you will receive the desired response. Please review the documented section called https://docs.openshift.com/container-platform/4.14/rest_api/understanding-api-support-tiers.html[“Understanding API Tiers”].
+
+The most important thing to understand when considering which Z-release to upgrade to inside of a new Y-release is what patches need to be in the new Z-release. In other words if you are currently at OCP 4.11.28 you will need to make sure to upgrade to a Z-release of 4.12 that has all of the patches in it that were applied to 4.11.28, otherwise you will break the built-in compatibility of Kubernetes.
+
+== Kubernetes Version Skew
+Support of specific API versions needs to be maintained by each cluster operator. With new releases of operators come new APIs. Therefore, the changes or skews in APIs need to be maintained. To a certain extent the APIs can be compatible across several releases of an operator. This list of operators and the releases that are compatible are at: https://kubernetes.io/releases/version-skew-policy 
+
+The easiest way verify your application functionality will still work, is to make sure that you follow 
+
+
+
+== OpenShift Upgrade Path
+Please also note that not all releases of OCP can be upgraded to any arbitrary Z-release even if they contain all of the required patches. 
+OpenShift upgrade process mandates that:
+If fix “A” is present in a specific X.Y.Z release of OCP
+Then fix “A” MUST be present in the X.Y+1.Z release that OCP is upgraded TO
+
+Consequence of the chosen destination version of 4.12.z defines which is the maximum version of OCP4.11.z, OCP4.10.z and OCP4.9.z
+not all 4.9.z version will permit to upgrade to a given version of OCP4.12.z
+A given version of OCP4.12.z will have requirements to a maximum version of OCP4.9z
+This is due to how fixes are backported into older releases of OCP.
+
+You can use the https://access.redhat.com/labs/ocpupgradegraph/update_path[upgrade graph tool] to determine if the path is valid for your z-release. You should also always verify with your Sales Engineer or Technical Account Manager at Red Hat to make sure the upgrade path is valid for Telco implementations.
+
+.K8s Version Skey
+image::../assets/images/k8s-vers-skew.png[]
diff --git a/documentation/modules/ROOT/pages/Applying-MCPs.adoc b/documentation/modules/ROOT/pages/Applying-MCPs.adoc
@@ -0,0 +1,62 @@
+= Applying MCPs
+include::_attributes.adoc[]
+:profile: core-lcm-lab
+
+First you can run “oc get mcp” to show your current list of MCPs:
+
+[source,bash]
+----
+# oc get mcp
+
+----
+
+List out all of your nodes:
+
+[source,bash]
+----
+# oc get no
+
+----
+Determine, from the above suggestions, how you would like to separate out your worker nodes into machine config pools
+(MCP). +
+In this example we will just use 1 node in each MCP. +
+We first need to label the nodes so that they can be put into MCPs. We will do this with the following commands:
+
+[source,bash]
+----
+oc label node euschannel-worker-0.test.corp node-role.kubernetes.io/*mcp-1*=
+
+----
+
+This will show up when you run the “oc get node” command:
+
+[source,bash]
+----
+# oc get no
+
+----
+
+Now you need to create yaml files that will apply the labels as MCPs. Here is one example:
+
+[source,bash]
+----
+apiVersion: machineconfiguration.openshift.io/v1
+
+----
+
+For each of these, just run “oc apply -f {filename.yaml}”:
+
+[source,bash]
+----
+# oc apply -f test-mcp-2.yaml
+
+----
+
+Now you can run “oc get mcp” again and your new MCPs will show. Please note that you will still see the original worker
+and master MCPs that are part of the cluster.
+
+[source,bash]
+----
+# oc get mcp
+
+----
diff --git a/documentation/modules/ROOT/pages/CNF-Upgrade-Prep.adoc b/documentation/modules/ROOT/pages/CNF-Upgrade-Prep.adoc
@@ -0,0 +1,46 @@
+= CNF Upgrade Preparation
+include::_attributes.adoc[]
+:profile: core-lcm-lab
+
+The life of a POD is an important topic to understand. This section will describe several topics that are important to 
+keeping your CNF PODs healthy and allow the cluster to properly schedule them during an upgrade.
+
+== CNF Requirements Document
+
+Before you go any further, please read through the https://connect.redhat.com/sites/default/files/2022-05/Cloud%20Native%20Network%20Function%20Requirements%201-3.pdf[CNF requirements document]. 
+In this section a few of the most important points will be discussed but the CNF Requirements Document has additional 
+detail and other important topics.
+
+== POD Disruption Budget
+
+Each set of PODs in a deployment can be given a specific minimum number of PODs that should be running in order to keep 
+from disrupting the functionality of the CNF, thus called the POD disruption budget (PDB). However, this budget can be 
+improperly configured.  +
+For example, if you have 4 PODs in a deployment and your PDB is set to 4, this means that you are telling the scheduler 
+that you NEED 4 PODs running at all times. Therefore, in this scenario ZERO PODs can come down. 
+
+.Deployment with no PDB
+image::../assets/images/PDB-full.jpg[]
+
+To fix this, the PDB can be set to 2, letting 2 of the 4 pods to be scheduled as down and this would then let the worker
+nodes where those PODs are located be rebooted.
+
+.Deployment with PDB
+image::../assets/images/PDB-down-2.jpg[]
+
+== POD Anti-affinity
+
+True high availability requires a duplication of a process to be running on separate hardware, thus making sure that an
+application will continue to run if one piece of hardware goes down. OpenShift can easily make that happen since
+processes are automatically duplicated in separate PODs within a deployment. However, those PODs need to have
+anti-affinity set on them so that they are NOT running on the same hardware. It so happens that anti-affinity also
+helps during upgrades because it makes sure that PODs are on different worker nodes, therefore allowing enough PODs to
+come down even after considering their PDB.
+
+== Liveness / Readiness Probes
+
+OpenShift and Kubernetes have some built in features that not everyone takes advantage of called
+https://docs.openshift.com/container-platform/4.12/applications/application-health.html[liveness and readiness probes].
+These are very important when POD deployments are dependent upon keeping state for their application. This document
+won’t go into detail regarding these probes but please review the https://docs.openshift.com/container-platform/4.12/applications/application-health.html[OpenShift documentation]
+on how to implement their use.
diff --git a/documentation/modules/ROOT/pages/OCP-upgrade-prep.adoc b/documentation/modules/ROOT/pages/OCP-upgrade-prep.adoc
@@ -0,0 +1,69 @@
+= OCP Upgrade Preparation
+include::_attributes.adoc[]
+:profile: core-lcm-lab
+
+== Firmware compatibility
+
+All hardware vendors will advise that it is always best to be on their latest certified version of firmware for their
+hardware. In the telco world this comes with a trust but verify approach due to the high throughput nature of telco
+CNFs. Therefore, it is important to have a regimented group who can test the current and latest firmware from any vendor
+to make sure that all components will work with both. It is not always recommended to upgrade firmware in conjunction
+with an OCP upgrade however if it is possible to test the latest release of firmware that will improve the odds that
+you won’t run into issues down the road. +
+Upgrading firmware is a very important debate because the process can be very intrusive and has a potential for causing
+a node to require manual interventions before the node will come back online. On the other hand it may be imperative to
+upgrade the firmware due to security fixes, new required functionality or compatibility with the new release of OCP
+components. Therefore, it is up to everyone to verify with their hardware vendors, verify compatibility with OCP
+components and perform tests in their lab before moving forward. 
+
+== Layer product compatibility
+
+It is important to make sure all layered products will run on the new version of OCP that you will be moving to. This,
+very much, includes all Operators. 
+
+Verify the current installed list of Operators installed on your cluster. For example:
+[source,bash]
+----
+# oc get csv -A
+NAMESPACE                              NAME                                 DISPLAY          VERSION   REPLACES                             PHASE
+chapter2                               gitlab-operator-kubernetes.v0.17.2   GitLab           0.17.2    gitlab-operator-kubernetes.v0.17.1   Succeeded
+openshift-operator-lifecycle-manager   packageserver                        Package Server   0.19.0                                         Succeeded
+----
+
+== Prepare MCPs
+
+Prepare your Machine Config Pool (MCP) labels by grouping your nodes, depending on the number of nodes in your cluster.
+MCPs should be split up into 8 to 10 nodes per group. However, there is no hard fast rule as to how many nodes need to
+be in each MCP. The purpose of these MCPs is to group nodes together so that a group of nodes can be controlled
+independently of the rest. Additional information and examples can be found https://docs.openshift.com/container-platform/4.12/updating/update-using-custom-machine-config-pools.html[here, under the canary rollout documentation]. +
+These MCPs will be used to un-pause a set of nodes during the upgrade process, thus allowing them to be upgraded and
+rebooted at a determined time instead of at the pleasure of the scheduler. Please review the upgrade process flow
+section, below, for more details on the pause/un-pause process.
+
+// insert image for MCP
+.Worker node MCPs in a 5 rack cluster
+image::../assets/images/5Rack-MCP.jpg[]
+
+The division and size of these MCPs can vary depending on many factors. In general the standard division is between 8
+and 10 nodes per MCP to allow the operations team to control how many nodes are taken down at a time.
+
+.Separate MCPs inside of a group of Load Balancer or purpose built nodes
+image::../assets/images/LBorHT-MCP.jpg[]
+
+In larger clusters there is quite often a need to separate out several nodes for purposes like Load Balancing or other
+high throughput purposes, which usually have different machine sets to configure SR-IOV. In these cases we do not want
+to upgrade all of these nodes without getting a chance to test during the upgrade. Therefore, we need to separate them
+out into at least 3 different MCPs and unpause them individually.
+
+// insert image for MCP
+.Small cluster worker MCPs
+image::../assets/images/Worker-MCP.jpg[]
+
+Smaller cluster example with 1 rack
+
+The process and pace at which you un-pause the MCPs is determined by your CNFs and their configuration. Please review
+the sections on PDB and anti-affinity for CNFs. If your CNF can properly handle scheduling within an OpenShift cluster
+you can un-pause several MCPs at a time and set the MaxUnavailable to as high as 50%. This will allow as many as half
+of the nodes in your MCPs to restart and upgrade. This will reduce the amount of time that is needed for a specific
+maintenance window and allow your cluster to upgrade quickly. Hopefully you can see how keeping your PDB and
+anti-affinity correctly configured will help in the long run.
diff --git a/documentation/modules/ROOT/pages/_attributes.adoc b/documentation/modules/ROOT/pages/_attributes.adoc
@@ -0,0 +1,5 @@
+:experimental:
+:source-highlighter: highlightjs
+:branch: lab-4.14
+:github-repo: https://github.com/RHsyseng/5g-ran-deployments-on-ocp-lab/blob/{branch}
+:profile: core-lcm-lab
diff --git a/documentation/modules/ROOT/pages/index.adoc b/documentation/modules/ROOT/pages/index.adoc
@@ -16,10 +16,10 @@ IMPORTANT: This is not an official Red Hat Training. Please contact with your Re
 |===
 |Name |Role
 
-|Alice
+|Rob Fisher
 |Lab Developer / Maintainer
 
-|Bob
+|
 |Lab Reviewer
 
 |===
diff --git a/documentation/modules/ROOT/pages/introduction.adoc b/documentation/modules/ROOT/pages/introduction.adoc
@@ -1,20 +1,49 @@
 = Introduction
 include::_attributes.adoc[]
-:profile: telco-ocp-upgrade-lab
+:profile: core-lcm-lab
 
 Welcome to this Telco OCP Upgrades Lab.
 
+Why are upgrades important? 
+This is the question that all Telco platform administrators are faced with answering. 
+
+There is the simple response: because there is a new release out there.
+
+The CNF (container native function) based response: The CNF requires additional functionality from the platform, therefore we need to upgrade OpenShift to get the new functionality.
+
+The pragmatic response: All software needs to be patched because of bug and due to potential security vulnerabilities that have been found since the installation of the cluster.
+
+With OpenShift and K8s (Kubernetes) we are adding in yet another: The platform is only supported for a specific period of time and new releases are coming out every 4 months.
+
+Platform administrators are also asked why not just wait until we need to upgrade due to support?
+
+If we put all of these together we find that OpenShift and K8s have made this less painful for Telco companies because we can skip every other release. OpenShift has long term support (EUS - extended update support) on all even releases and upgrade paths between these EUS releases. This document will discuss how to best upgrade from one EUS to the next and how to plan for that upgrade.
+
+We want you to experience and learn about the steps and procedures that are needed to perform an OpenShift in service upgrade. Listening to and reading about an upgrade can only give you so much experience. Now we can apply the discussion to an experience that will walk you through the steps. 
+
+This document is intended to discuss OpenShift LifeCycle management for Telecommunication Network Function Core clusters. The specific size of the cluster has a few differences which are called out specifically at times in this document but this is meant to cover most clusters from 3 nodes to the largest cluster certified by the telco scale team. This includes some scenarios for mixed workload clusters.
+
+This document will discuss the following upgrade scenarios:
+
+* Z-Stream 
+* Y-Stream 
+* EUS to EUS 
+
+Each of these scenarios has different considerations which are called out as needed.
+
+
 [#lab-aim]
 == Who is this lab aimed at? 
 
 The lab is aimed to technical profiles working with OpenShift who are interested in any of these areas:
 
-* Foo
-* Bar
+* LifeCycle Management of OpenShift
+* Telecommunications Core 5G Baremetal Cluster Upgrades
+* Disconnected Environments
 
 [#lab-software-versions]
 == Lab Software Versions
 
 The lab is based on the following software versions.
 
-* OpenShift Container Platform v4.X
+* OpenShift Container Platform v4.12 upgrading to v4.14