-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to run a container scenario for api while count is bigger than 1 results in crash #430
Comments
Same behavior reproduced with killing etcd: ` ` `
2023-05-25 12:23:02,066 [INFO] Starting kraken 2023-05-25 12:23:05,650 [INFO] Fetching cluster info 2023-05-25 12:23:05,659 [INFO] Executing scenarios for iteration 0 |
How to reproduce:
config.yaml shold have this scenario
chaos_scenarios: # List of policies/chaos scenarios to load - container_scenarios: # List of chaos pod scenarios to load - - scenarios/openshift/container_api.yml
The content of the scenario file:
`
scenarios:
namespace: "openshift-apiserver"
label_selector: "app=openshift-apiserver-a"
container_name: "openshift-apiserver"
action: "kill 1"
count: 2
expected_recovery_time: 60
`
python3.9 run_kraken.py --config config/kill-api.yaml _ _ | | ___ __ __ _| | _____ _ __ | |/ / '__/ _
| |/ / _ \ '_ \| <| | | (| | < __/ | | |
||__| _,||____|| |_|
2023-05-25 11:58:39,485 [INFO] Starting kraken
2023-05-25 11:58:39,495 [INFO] Initializing client to talk to the Kubernetes cluster
2023-05-25 11:58:42,998 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 11:58:42,998 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 11:58:42,999 [INFO] Starting http server at http://0.0.0.0:8085
2023-05-25 11:58:43,000 [INFO] Fetching cluster info
2023-05-25 11:58:43,008 [INFO] Cluster version is 4.13.0
2023-05-25 11:58:43,008 [INFO] Server URL: https://api.elvis2.qe.lab.redhat.com:6443
2023-05-25 11:58:43,008 [INFO] Generated a uuid for the run: a713f10c-8b26-4b2c-8a81-8356cff6ef58
2023-05-25 11:58:43,008 [INFO] Daemon mode not enabled, will run through 1 iterations
2023-05-25 11:58:43,009 [INFO] Executing scenarios for iteration 0
2023-05-25 11:58:43,009 [INFO] connection set up
127.0.0.1 - - [25/May/2023 11:58:43] "GET / HTTP/1.1" 200 -
2023-05-25 11:58:43,010 [INFO] response RUN
2023-05-25 11:58:43,010 [INFO] Running container scenarios
2023-05-25 11:58:44,823 [INFO] Killing container openshift-apiserver in pod apiserver-5d45f6d58f-hmpsj (ns openshift-apiserver)
2023-05-25 11:58:44,959 [INFO] Killing container openshift-apiserver in pod apiserver-5d45f6d58f-cd7bv (ns openshift-apiserver)
2023-05-25 11:58:45,071 [INFO] Scenario kill apiserver container successfully injected
Traceback (most recent call last):
File "/root/krkn/krkn/run_kraken.py", line 421, in
main(options.cfg)
File "/root/krkn/krkn/run_kraken.py", line 218, in main
failed_post_scenarios = pod_scenarios.container_run(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 92, in container_run
failed_post_scenarios = check_failed_containers(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 199, in check_failed_containers
killed_container_list = killed_container_list.remove(item)
AttributeError: 'NoneType' object has no attribute 'remove'
`
The issue reproduced with count set to 3
The issue didn't reproduce with count set to 1.
Note that the cluster has 3 pods.
When the same was attempted against SNO (with a single api pod), the following error was thrown:
2023-05-25 12:06:17,950 [INFO] Killing container openshift-apiserver in pod apiserver-6b77769b8-6j4gg (ns openshift-apiserver) 2023-05-25 12:06:18,083 [ERROR] Trying to kill more containers than were found, try lowering kill count 2023-05-25 12:06:18,083 [ERROR] Scenario kill apiserver container failed
In this case it's an expected error.
The text was updated successfully, but these errors were encountered: