Skip to content

Latest commit

 

History

History
442 lines (337 loc) · 25 KB

07-deployocs.md

File metadata and controls

442 lines (337 loc) · 25 KB

Deploy OpenShift Container Storage

Now that we have added an additional worker node to our lab cluster environment we can deploy OpenShift Container Storage (OCS) on top. The mechanism for installation is to utilise the operator model and deploy via the OpenShift Operator Hub (Marketplace) in the web-console. Note, it's entirely possible to deploy via the CLI should you wish to do so, but we're not documenting that mechanism here. However, we will leverage command line and web-console to show the progress of the deployment.

From a node perspective, the lab environment should look like the following from a master/worker node count:

[lab-user@provision ~]$ oc get nodes
NAME                                 STATUS   ROLES    AGE   VERSION
master-0.dtchw.dynamic.opentlc.com   Ready    master   63m   v1.18.3+47c0e71
master-1.dtchw.dynamic.opentlc.com   Ready    master   63m   v1.18.3+47c0e71
master-2.dtchw.dynamic.opentlc.com   Ready    master   63m   v1.18.3+47c0e71
worker-0.dtchw.dynamic.opentlc.com   Ready    worker   44m   v1.18.3+47c0e71
worker-1.dtchw.dynamic.opentlc.com   Ready    worker   43m   v1.18.3+47c0e71
worker-2.dtchw.dynamic.opentlc.com   Ready    worker   23m   v1.18.3+47c0e71

NOTE: If you do not have three workers listed here, this lab will not succeed - please revert to the previous section and ensure that all three workers are provisioned, noting that the count starts from 0, hence worker-0 through 2 should be listed.

We need to attach a 100GB disk to each of our worker nodes in the lab environment; these disks will provide the storage capacity for the OCS based disks (or Ceph OSD's under the covers). Thankfully we have a little script to do this for us on the provisioning node; utilising the OpenStack API to attach Cinder volumes to our virtual worker nodes, mimicking a real baremetal node with spare, unused disks. Before we run the script lets take a look at it, you'll note that the script has already been automatically customised to suit your environment:

[lab-user@provision ~]$ cat ~/scripts/10_volume-attach.sh
#!/bin/bash
OSP_PROJECT="dtchw-project"
GUID="dtchw"

attach() {
  for NODE in $( openstack --os-cloud=$OSP_PROJECT server list|grep worker|cut -d\| -f3|sed 's/ //g' )
  do
          openstack --os-cloud=$OSP_PROJECT server add volume $NODE $NODE-volume
  done
}

detach() {
  for NODE in $( openstack --os-cloud=$OSP_PROJECT server list|grep worker|cut -d\| -f3|sed 's/ //g' )
  do
          openstack --os-cloud=$OSP_PROJECT server remove volume $NODE $NODE-volume
  done
}

poweroff() {
  /usr/bin/ipmitool -I lanplus -H10.20.0.3 -p6200 -Uadmin -Predhat chassis power off
  /usr/bin/ipmitool -I lanplus -H10.20.0.3 -p6201 -Uadmin -Predhat chassis power off
  /usr/bin/ipmitool -I lanplus -H10.20.0.3 -p6202 -Uadmin -Predhat chassis power off
  /usr/bin/ipmitool -I lanplus -H10.20.0.3 -p6203 -Uadmin -Predhat chassis power off
  /usr/bin/ipmitool -I lanplus -H10.20.0.3 -p6204 -Uadmin -Predhat chassis power off
  /usr/bin/ipmitool -I lanplus -H10.20.0.3 -p6205 -Uadmin -Predhat chassis power off
}

case $1 in
  attach) attach ;;
  detach) poweroff
          sleep 10
          detach ;;
  power) poweroff ;;
  *) attach ;;
esac

We can see the script will attach and detach volumes from a list of worker nodes within a given lab environment, and can also be used to detach if required, which will power the nodes down first, but for now all we want to do is attach them, which is the default behaviour of the script. Lets go ahead and run the script:

[lab-user@provision ~]$ unset OS_URL OS_TOKEN
[lab-user@provision ~]$ ~/scripts/10_volume-attach.sh
(no output)
[lab-user@provision ~]$ echo $?
0

NOTE: Ensure that you have a '0' return code and not something else - zero means that the attachment was succes

We can validate that each node has the extra disk by using the debug container on the worker node:

[lab-user@provision ~]$ oc debug node/worker-0.$GUID.dynamic.opentlc.com
Starting pod/worker-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.20.0.200
If you don't see a command prompt, try pressing enter.
sh-4.2# 

Once inside the debug container we can look at the block devices available:

sh-4.2# chroot /host
sh-4.4# lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                          252:0    0  100G  0 disk 
|-sda1                       252:1    0  384M  0 part /boot
|-sda2                       252:2    0  127M  0 part /boot/efi
|-sda3                       252:3    0    1M  0 part 
|-sda4                       252:4    0 99.4G  0 part 
| `-coreos-luks-root-nocrypt 253:0    0 99.4G  0 dm   /sysroot
`-sda5                       252:5    0   65M  0 part 
sdb                          252:16   0  100G  0 disk 
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

We can see from the output above that on worker-0 the new 100GB volume was attached as sdb. Repeat the above steps to confirm that the remaining workers also have their 100GB sdb volume attached. Once finished, we need to label our nodes for storage, this label will tell OCS that these machines can be utilised for storage requirements:

[lab-user@provision ~]$ oc label nodes worker-0.$GUID.dynamic.opentlc.com cluster.ocs.openshift.io/openshift-storage=''
node/worker-0 labeled

[lab-user@provision ~]$ oc label nodes worker-1.$GUID.dynamic.opentlc.com cluster.ocs.openshift.io/openshift-storage=''
node/worker-1 labeled

[lab-user@provision ~]$ oc label nodes worker-2.$GUID.dynamic.opentlc.com cluster.ocs.openshift.io/openshift-storage=''
node/worker-2 labeled

And confirm the change:

[lab-user@provision ~]$ oc get nodes -l cluster.ocs.openshift.io/openshift-storage=
NAME                                 STATUS   ROLES    AGE   VERSION
worker-0.dtchw.dynamic.opentlc.com   Ready    worker   47m   v1.18.3+47c0e71
worker-1.dtchw.dynamic.opentlc.com   Ready    worker   46m   v1.18.3+47c0e71
worker-2.dtchw.dynamic.opentlc.com   Ready    worker   26m   v1.18.3+47c0e71

Deploying the Local Storage Operator

Now that we know the worker nodes have their extra disk ready and labels added, we can proceed. Before installing OCS we need to first install the local-storage operator which is used to consume local disks, and expose them as available persistent volumes (PV) for OCS to consume; the OCS Ceph OSD pods consume local-storage based PV's and allow additional RWX PV's to be deployed on-top.

The first step is to create a local storage namespace in the OpenShift console. Navigate to Administration -> Namespaces and click on the create namespace button. Once the below dialogue appears, set the namespace Name to local-storage.

After creating the namespace we see the details about the namespace and also can confirm that the namespace is active:

Now we can go to Operators -> OperatorHub and search for the local storage operator in the catalog:

If you do not see any available operators in the operator hub, the marketplace pods may need restarting. This is sometimes related to the initial timeout of the installation (if yours timed out):

[lab-user@provision ~]$ for i in $(oc get pods -A | awk '/marketplace/ {print $2;}');
   do oc delete pod $i -n openshift-marketplace; done

pod "certified-operators-7fd49b9b57-jgbkq" deleted
pod "community-operators-748759c64d-c8cmb" deleted
pod "marketplace-operator-d7494b45f-qdgd5" deleted
pod "redhat-marketplace-b96c44cdd-fhvhq" deleted
pod "redhat-operators-585c89dcf9-6wcrr" deleted

You'll need to wait a minute or two, and then try refreshing the operator hub page. You should then be able to search for "local storage" and proceed.

Select the local storage operator and click install:

This will bring up a dialogue of options for configuring the operator before deploying. The defaults are usually accceptable but note that you can configure the version, installation mode, namespace where operator should run and the approval strategy.

Select the defaults (ensuring that the local-storage namespace is selected) and click install:

Once the operator is installed we can navigate to Operators --> Installed Operators and see the local storage operator is installing which will eventually turn to a suceeded when complete:

We can also validate from the command line that the operator is installed and running as well:

[lab-user@provision ~]$ oc get pods -n local-storage
local-storage-operator-57455d9cb4-4tj54   1/1     Running   0          10m

Now that we have the local storage operator installed lets make a "LocalVolume" storage definition file that will use the disk device in each node:

[lab-user@provision scripts]$ cat << EOF > ~/local-storage.yaml
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
  name: local-block
  namespace: local-storage
spec:
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
        - key: cluster.ocs.openshift.io/openshift-storage
          operator: In
          values:
          - ""
  storageClassDevices:
    - storageClassName: localblock
      volumeMode: Block
      devicePaths:
        - /dev/sdb
EOF

You'll see that this is set to create a local volume on every host from the block device sdb where the selector key matches cluster.ocs.openshift.io/openshift-storage. If we had additional devices on the worker nodes for example: sdc and sdd, we would just list those below the devicePaths to also be incorporated into our configuration.

At this point we should double-check that all three of our worker nodes have the the OCS storage label.

[lab-user@provision scripts]$ oc get nodes -l cluster.ocs.openshift.io/openshift-storage
NAME                                 STATUS   ROLES    AGE    VERSION
worker-0.prt8x.dynamic.opentlc.com   Ready    worker   136m   v1.18.3+47c0e71
worker-1.prt8x.dynamic.opentlc.com   Ready    worker   136m   v1.18.3+47c0e71
worker-2.prt8x.dynamic.opentlc.com   Ready    worker   75m    v1.18.3+47c0e71

Now we can go ahead and create the assets for this local-storage configuration using the local-storage.yaml we created above.

[lab-user@provision ~]$ oc create -f ~/local-storage.yaml
localvolume.local.storage.openshift.io/local-block created

If we execute an oc get pods command on the namespace of local-storage we will see containers being created in relationship to the assets from our local-storage.yaml file. The pods generated are a provisioner and diskmaker on every worker node where the node selector matched.:

[lab-user@provision ~]$ oc -n local-storage get pods
NAME                                      READY   STATUS              RESTARTS   AGE
local-block-local-diskmaker-626kf         0/1     ContainerCreating   0          8s
local-block-local-diskmaker-w5l5h         0/1     ContainerCreating   0          9s
local-block-local-diskmaker-xrxmh         0/1     ContainerCreating   0          9s
local-block-local-provisioner-9mhdq       0/1     ContainerCreating   0          9s
local-block-local-provisioner-lw9fm       0/1     ContainerCreating   0          9s
local-block-local-provisioner-xhf2x       0/1     ContainerCreating   0          9s
local-storage-operator-57455d9cb4-4tj54   1/1     Running             0          76m

As you can see from the above we had labeled 3 worker nodes and we have 3 provisioners and 3 diskmaker pods. To validate the nodes where the pods are running try adding -o wide to the command above. Does it confirm that each worker has a provisioner and diskmaker pod?

Furthermore we can now see that 3 PV's have been created, making up our 100GB sdb disks we attached at the beginning of the lab:

[lab-user@provision ~]$ oc get pv
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
local-pv-40d06fba   100Gi      RWO            Delete           Available           localblock              22s
local-pv-8aea98b7   100Gi      RWO            Delete           Available           localblock              22s
local-pv-e62c1b44   100Gi      RWO            Delete           Available           localblock              22s

And finally a storageclass was created for the local-storage asset we created:

[lab-user@provision scripts]$ oc get sc
NAME         PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
localblock   kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  4m37s

Deploying OpenShift Container Storage

At this point in the lab we have now completed the prerequisites for OpenShift Container Storage. We can now move onto actually installing the OCS operator. To do that we will use Operator Hub inside of the OpenShift Web Console once again. Navigate to Operators --> OperatorHub and then search for OpenShift Container Storage:

Click on the operator and then select install:

The operator installation will present you with options similiar to what we saw when we installed the local storage operator. Again we have the ability here to select version, installation mode, namespace and approval strategy. As with the previous operator we will use the defaults as they are presented and click install:

Once the operator has been installed the OpenShift Console will display the OCS operator and the status of successfully installed:

We can further confirm the operator is up and running by looking at it from the command line where we should see 3 running pods under the openshift-storage namespace:

[lab-user@provision ~]$ oc get pods -n openshift-storage
NAME                                  READY   STATUS    RESTARTS   AGE
noobaa-operator-5567695698-fc8t6      1/1     Running   0          10m
ocs-operator-6888cb5bdf-7w6ct         1/1     Running   0          10m
rook-ceph-operator-7bdb4cd5d9-qmggh   1/1     Running   5          10m

Now that we know the operator is deployed (and the associated pods are running) we can go back to the OpenShift Console and click on the operator to bring us into an operator details page. Here we will want to click on Create Instance in the "Storage Cluster" box:

This will bring up the options page for creating a storage cluster that can either be external or internal; starting with OCS 4.5 we introduced the ability to consume an externally provisioned cluster, but in our case we are going to use internal because we have the necessary resources in our environment to create a cluster.

Further down the page you will notice 3 worker nodes are checked as these were the nodes we labeled earlier for OCS. In the Storage Class drop down we have only one option to chose and that is localblock which was the storage class we created with the local-storage operator and assets created above. Once you select that as the storage class the capacity and replicas are shown below the storageclass box. Notice the capacity is 300GB, the sum of our 3x100GB volumes across our three nodes. Once everything is selected we can click on create:

If we quickly jump over to the command line and issue a oc get pods on the openshift-storage namespace we can see new containers are being created, you may need to run this a few times to see some of the pods starting up:

[lab-user@provision ~]$ oc get pods -n openshift-storage
NAME                                            READY   STATUS              RESTARTS   AGE
csi-cephfsplugin-6mk78                          0/3     ContainerCreating   0          21s
csi-cephfsplugin-9fglq                          0/3     ContainerCreating   0          21s
csi-cephfsplugin-lsldk                          0/3     ContainerCreating   0          20s
csi-cephfsplugin-provisioner-5f8b66cc96-2z755   0/5     ContainerCreating   0          20s
csi-cephfsplugin-provisioner-5f8b66cc96-wsnsb   0/5     ContainerCreating   0          19s
csi-rbdplugin-b4qh5                             0/3     ContainerCreating   0          22s
csi-rbdplugin-k2fjg                             3/3     Running             0          23s
csi-rbdplugin-lnpgn                             0/3     ContainerCreating   0          22s
csi-rbdplugin-provisioner-66f66699c8-l9g9l      0/5     ContainerCreating   0          22s
csi-rbdplugin-provisioner-66f66699c8-v7ghq      0/5     ContainerCreating   0          21s
noobaa-operator-5567695698-fc8t6                1/1     Running             0          104m
ocs-operator-6888cb5bdf-7w6ct                   0/1     Running             0          104m
rook-ceph-mon-a-canary-587d74787d-wt247         0/1     ContainerCreating   0          8s
rook-ceph-mon-b-canary-6fd99d6865-fgcpx         0/1     ContainerCreating   0          3s
rook-ceph-operator-7bdb4cd5d9-qmggh             1/1     Running             5          104m

Finally we should see in the OpenShift Console that the cluster is marked as "ready", noting that it may say "progressing" for a good few minutes. This tells us that all the pods required to form the cluster are running and active.

We can further confirm this from the command line by issuing the same oc get pods command against the openshift-storage namespace. Can you see there is a mon (monitor) and osd (object storage daemon) pod on each of our three worker nodes?

[lab-user@provision ~]$ oc get pods -n openshift-storage
NAME                                                              READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-6mk78                                            3/3     Running     0          5m1s
csi-cephfsplugin-9fglq                                            3/3     Running     0          5m1s
csi-cephfsplugin-lsldk                                            3/3     Running     0          5m
csi-cephfsplugin-provisioner-5f8b66cc96-2z755                     5/5     Running     0          5m
csi-cephfsplugin-provisioner-5f8b66cc96-wsnsb                     5/5     Running     0          4m59s
csi-rbdplugin-b4qh5                                               3/3     Running     0          5m2s
csi-rbdplugin-k2fjg                                               3/3     Running     0          5m3s
csi-rbdplugin-lnpgn                                               3/3     Running     0          5m2s
csi-rbdplugin-provisioner-66f66699c8-l9g9l                        5/5     Running     0          5m2s
csi-rbdplugin-provisioner-66f66699c8-v7ghq                        5/5     Running     0          5m1s
noobaa-core-0                                                     1/1     Running     0          2m38s
noobaa-db-0                                                       1/1     Running     0          2m38s
noobaa-endpoint-74b9ddcffc-pcg95                                  1/1     Running     0          41s
noobaa-operator-5567695698-fc8t6                                  1/1     Running     0          109m
ocs-operator-6888cb5bdf-7w6ct                                     1/1     Running     0          109m
rook-ceph-crashcollector-worker-0-6f9d88bcb6-79bmf                1/1     Running     0          3m23s
rook-ceph-crashcollector-worker-1-78b56fbfc5-lqsnl                1/1     Running     0          3m45s
rook-ceph-crashcollector-worker-2-77687b587b-j4h2h                1/1     Running     0          3m53s
rook-ceph-drain-canary-worker-0-5d9d5b4977-8tc62                  1/1     Running     0          2m43s
rook-ceph-drain-canary-worker-1-54644f5c94-mhnzt                  1/1     Running     0          2m44s
rook-ceph-drain-canary-worker-2-598f89d79f-gbcjc                  1/1     Running     0          2m41s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-dd756495qt9n4   1/1     Running     0          2m6s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-584dd7bc9hsr4   1/1     Running     0          2m5s
rook-ceph-mgr-a-848f8c59fd-ln6nz                                  1/1     Running     0          3m3s
rook-ceph-mon-a-5896d85b4f-tgc64                                  1/1     Running     0          3m54s
rook-ceph-mon-b-7695696f7d-9l565                                  1/1     Running     0          3m45s
rook-ceph-mon-c-5d95b547dc-gsjhj                                  1/1     Running     0          3m24s
rook-ceph-operator-7bdb4cd5d9-qmggh                               1/1     Running     5          109m
rook-ceph-osd-0-745d68b9df-qglsp                                  1/1     Running     0          2m45s
rook-ceph-osd-1-7946dc4cbc-sxm5w                                  1/1     Running     0          2m43s
rook-ceph-osd-2-85746d789-hhj5n                                   1/1     Running     0          2m42s
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-vw4rb-q9kqg          0/1     Completed   0          2m59s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-znnlk-cghzf          0/1     Completed   0          2m58s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-7wdbq-r26b8          0/1     Completed   0          2m58s
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-549f6f742cgb   1/1     Running     0          87s
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-9b695d7q8tml   1/1     Running     0          83s

In the OpenShift Web Console if we go to Storage --> Storage Classes we can also see additional storage classes related to OCS have been created:

We can also confirm this from the command line by issuing an oc get storageclass. We should see 4 new storageclasses: one for block (rbd), two for object (rgw/nooba) and one for file(cephfs).

[lab-user@provision ~]$ oc get storageclass
NAME                          PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
localblock                    kubernetes.io/no-provisioner            Delete          WaitForFirstConsumer   false                  115m
ocs-storagecluster-ceph-rbd   openshift-storage.rbd.csi.ceph.com      Delete          Immediate              true                   5m36s
ocs-storagecluster-ceph-rgw   openshift-storage.ceph.rook.io/bucket   Delete          Immediate              false                  5m36s
ocs-storagecluster-cephfs     openshift-storage.cephfs.csi.ceph.com   Delete          Immediate              true                   5m36s
openshift-storage.noobaa.io   openshift-storage.noobaa.io/obc         Delete          Immediate              false                  65s

At this point you have a fully functional OpenShift Container Storage cluster to be consumed by applications. You can optionally deploy the Ceph tools pod, where you can dive into some of the Ceph internals at your leisure:

[lab-user@provision ~]$ oc patch OCSInitialization ocsinit -n openshift-storage \
    --type json --patch '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'

ocsinitialization.ocs.openshift.io/ocsinit patched

If you look at the pods, you'll find a newly started Ceph tools pod:

[lab-user@provision ~]$ oc get pods -n openshift-storage | grep rook-ceph-tools
rook-ceph-tools-7fcff79f44-g9r6t                                  1/1     Running     0          73s

If you 'exec' into this pod, you'll find a configured environment ready to issue Ceph commands:

[lab-user@provision ~]$ oc exec -it rook-ceph-tools-7fcff79f44-g9r6t -n openshift-storage bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
[root@worker-0 /]# ceph -s
  cluster:
    id:     29fe66fc-b098-4f7d-9e69-b3a9ce661444
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 5m)
    mgr: a(active, since 5m)
    mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay
    osd: 3 osds: 3 up (since 5m), 3 in (since 5m)
    rgw: 2 daemons active (ocs.storagecluster.cephobjectstore.a, ocs.storagecluster.cephobjectstore.b)

  task status:
    scrub status:
        mds.ocs-storagecluster-cephfilesystem-a: idle
        mds.ocs-storagecluster-cephfilesystem-b: idle

  data:
    pools:   10 pools, 176 pgs
    objects: 301 objects, 83 MiB
    usage:   3.1 GiB used, 297 GiB / 300 GiB avail
    pgs:     176 active+clean

  io:
    client:   1.3 KiB/s rd, 7.5 KiB/s wr, 2 op/s rd, 0 op/s wr

[root@worker-0 /]# exit
exit
[lab-user@provision ~]$

NOTE: Make sure you exit out of this pod before continuing.

Now it's time to do some fun stuff with all this infrastructure! In the next lab you will get to deploy OpenShift Virtualization!