Attempt to run a container scenario for api while count is bigger than 1 results in crash #430

achuzhoy · 2023-05-25T16:07:50Z

How to reproduce:
config.yaml shold have this scenario
chaos_scenarios: # List of policies/chaos scenarios to load - container_scenarios: # List of chaos pod scenarios to load - - scenarios/openshift/container_api.yml

The content of the scenario file:
`
scenarios:

name: "kill apiserver container"
namespace: "openshift-apiserver"
label_selector: "app=openshift-apiserver-a"
container_name: "openshift-apiserver"
action: "kill 1"
count: 2
expected_recovery_time: 60
`

python3.9 run_kraken.py --config config/kill-api.yaml _ _ | | ___ __ __ _| | _____ _ __ | |/ / '__/ _ | |/ / _ \ '_ \
| <| | | (| | < __/ | | |
||__| _,||____|| |_|

2023-05-25 11:58:39,485 [INFO] Starting kraken
2023-05-25 11:58:39,495 [INFO] Initializing client to talk to the Kubernetes cluster
2023-05-25 11:58:42,998 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 11:58:42,998 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 11:58:42,999 [INFO] Starting http server at http://0.0.0.0:8085

2023-05-25 11:58:43,000 [INFO] Fetching cluster info
2023-05-25 11:58:43,008 [INFO] Cluster version is 4.13.0
2023-05-25 11:58:43,008 [INFO] Server URL: https://api.elvis2.qe.lab.redhat.com:6443
2023-05-25 11:58:43,008 [INFO] Generated a uuid for the run: a713f10c-8b26-4b2c-8a81-8356cff6ef58
2023-05-25 11:58:43,008 [INFO] Daemon mode not enabled, will run through 1 iterations

2023-05-25 11:58:43,009 [INFO] Executing scenarios for iteration 0
2023-05-25 11:58:43,009 [INFO] connection set up
127.0.0.1 - - [25/May/2023 11:58:43] "GET / HTTP/1.1" 200 -
2023-05-25 11:58:43,010 [INFO] response RUN
2023-05-25 11:58:43,010 [INFO] Running container scenarios
2023-05-25 11:58:44,823 [INFO] Killing container openshift-apiserver in pod apiserver-5d45f6d58f-hmpsj (ns openshift-apiserver)
2023-05-25 11:58:44,959 [INFO] Killing container openshift-apiserver in pod apiserver-5d45f6d58f-cd7bv (ns openshift-apiserver)
2023-05-25 11:58:45,071 [INFO] Scenario kill apiserver container successfully injected
Traceback (most recent call last):
File "/root/krkn/krkn/run_kraken.py", line 421, in
main(options.cfg)
File "/root/krkn/krkn/run_kraken.py", line 218, in main
failed_post_scenarios = pod_scenarios.container_run(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 92, in container_run
failed_post_scenarios = check_failed_containers(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 199, in check_failed_containers
killed_container_list = killed_container_list.remove(item)
AttributeError: 'NoneType' object has no attribute 'remove'

`

The issue reproduced with count set to 3
The issue didn't reproduce with count set to 1.

Note that the cluster has 3 pods.

When the same was attempted against SNO (with a single api pod), the following error was thrown:
2023-05-25 12:06:17,950 [INFO] Killing container openshift-apiserver in pod apiserver-6b77769b8-6j4gg (ns openshift-apiserver) 2023-05-25 12:06:18,083 [ERROR] Trying to kill more containers than were found, try lowering kill count 2023-05-25 12:06:18,083 [ERROR] Scenario kill apiserver container failed
In this case it's an expected error.

The text was updated successfully, but these errors were encountered:

achuzhoy · 2023-05-25T16:30:30Z

Same behavior reproduced with killing etcd:

`
chaos_scenarios: # List of policies/chaos scenarios to load
- container_scenarios: # List of chaos pod scenarios to load
- - scenarios/openshift/container_etcd.yml

`

`
scenarios:

name: "kill etcd container"
namespace: "openshift-etcd"
label_selector: "k8s-app=etcd"
container_name: "etcd"
action: "kill 1"
count: 1
expected_recovery_time: 60
`

python3.9 run_kraken.py --config config/kill-etcd.yaml _ _ | | ___ __ __ _| | _____ _ __ | |/ / '__/ _ | |/ / _ \ '_ \
| <| | | (| | < __/ | | |
||__| _,||____|| |_|

2023-05-25 12:23:02,066 [INFO] Starting kraken
2023-05-25 12:23:02,075 [INFO] Initializing client to talk to the Kubernetes cluster
2023-05-25 12:23:05,649 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 12:23:05,649 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 12:23:05,650 [INFO] Starting http server at http://0.0.0.0:8085

2023-05-25 12:23:05,650 [INFO] Fetching cluster info
2023-05-25 12:23:05,658 [INFO] Cluster version is 4.13.0
2023-05-25 12:23:05,659 [INFO] Server URL: https://api.elvis2.qe.lab.redhat.com:6443
2023-05-25 12:23:05,659 [INFO] Generated a uuid for the run: 77d465f6-2149-4233-b9f7-4642e84dffb0
2023-05-25 12:23:05,659 [INFO] Daemon mode not enabled, will run through 1 iterations

2023-05-25 12:23:05,659 [INFO] Executing scenarios for iteration 0
2023-05-25 12:23:05,659 [INFO] connection set up
127.0.0.1 - - [25/May/2023 12:23:05] "GET / HTTP/1.1" 200 -
2023-05-25 12:23:05,660 [INFO] response RUN
2023-05-25 12:23:05,660 [INFO] Running container scenarios
2023-05-25 12:23:08,343 [INFO] Killing container etcd in pod etcd-master-1-2 (ns openshift-etcd)
2023-05-25 12:23:08,466 [INFO] Killing container etcd in pod etcd-master-1-1 (ns openshift-etcd)
2023-05-25 12:23:08,657 [INFO] Scenario kill etcd container successfully injected
Traceback (most recent call last):
File "/root/krkn/krkn/run_kraken.py", line 421, in
main(options.cfg)
File "/root/krkn/krkn/run_kraken.py", line 218, in main
failed_post_scenarios = pod_scenarios.container_run(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 92, in container_run
failed_post_scenarios = check_failed_containers(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 199, in check_failed_containers
killed_container_list = killed_container_list.remove(item)
AttributeError: 'NoneType' object has no attribute 'remove'
`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to run a container scenario for api while count is bigger than 1 results in crash #430

Attempt to run a container scenario for api while count is bigger than 1 results in crash #430

achuzhoy commented May 25, 2023

achuzhoy commented May 25, 2023

Attempt to run a container scenario for api while count is bigger than 1 results in crash #430

Attempt to run a container scenario for api while count is bigger than 1 results in crash #430

Comments

achuzhoy commented May 25, 2023

achuzhoy commented May 25, 2023