Bug fix for switch_controller when using controller chaining #1591

TakashiSato · 2024-06-26T09:51:48Z

This PR is separated to address only the changes related to bug fix #1568 , fixing a bug that occurs under specific conditions when handling chainable controllers in the controller_manager's switch_controller.

Note: To focus on meaningful changes, it is recommended to compare the code using "hide whitespace" mode as follows.
https://github.com/ros-controls/ros2_control/compare/master...TakashiSato:ros2_control:feature/fix_switch_controller_bug?diff=split&w=1

Explanation of the Bugs

Using the typical model for controller chaining, which is also used in this test, the following explanations introduce three situations where bugs occur.

Note that these bugs occur only when the strictness value is set to BEST_EFFORT. When the strictness value is set to STRICT, switch_controller fails, and these bugs do not arise.
Additionally, the test cases to detect these bugs are implemented in this commit.

BUG1

Test case
Initial state
- active: pid_left (not_chained), pid_right (not_chained)
- inactive: position_tracking, diff_drive
Switch requests
- start: diff_drive
- stop: pid_right
Result:
- active: pid_left (in_chained)
- inactive: position_tracking, diff_drive, pid_right

In this situation, the start request for diff_drive should be rejected, and only the stop command for pid_right should be accepted. While this result is achieved, there is an issue where pid_left incorrectly enters in_chained mode.

The cause of this bug lies in this section.
When checking the activate request for diff_drive, the first following controller, pid_left, is checked for any inconsistencies along with the request. Since no issues are detected, pid_left is added to the to_chained_mode_request.
However, when the check is performed with the next controller, pid_right, inconsistencies are detected because pid_right is targeted for deactivation, leading to the rejection of the diff_drive activate request.
Since the to_chained_mode_request is not re-evaluated in subsequent processes, this request remains and results in the actual transition to chained mode during manage_switch.

BUG2

Test case
Initial state
- active: --
- inactive: position_tracking, diff_drive, pid_left, pid_right
Switch requests
- start: diff_drive, pid_left
- stop: --
Result:
- active: pid_left (in_chained)
- inactive: position_tracking, diff_drive, pid_right

This situation is similar to BUG1, where the start request for diff_drive is rejected, and only pid_left is activated. However, pid_left still incorrectly enters in_chained mode.
The cause of this bug is almost the same as BUG1. During the activate check for diff_drive, since the following controller pid_left is also a target for activation, pid_left is added to the to_chained_mode_request.
Then, during the subsequent check with pid_right, since pid_right is not a target for activation, the activate request for diff_drive is rejected, leaving the to_chained_mode_request for pid_left.

Incidentally, in this situation, the checks are performed in the order of pid_left followed by pid_right. Therefore, if pid_right and pid_left in the switch requests for BUG1 and BUG2 are swapped, the checks will function correctly, and these issues will not occur.

BUG3

Test case
Initial state
- active: position_tracking(not_chained), diff_drive(in_chained), pid_left(in_chained), pid_right(in_chained)
- inactive: --
Switch requests
- start: --
- stop: diff_drive
Result: switch controller will deadlock.

In this situation, since it is impossible to deactivate only diff_drive, the request should be rejected and no changes should occur.
However, the following switch requests are internally generated, and since these switches cannot be handled properly, a deadlock occurs.

Switch requests
- start: pid_left, pid_right
- stop: pid_left, pid_right

The sequence in which such switch requests are generated is as follows:

Since diff_drive is a deactivate target, in propagate_deactivation_of_chained_mode, its following controllers, pid_left and pid_right, are added to the from_chained_mode_request.
The activate request is empty, so check_following_controllers_for_activate does nothing. (Note: The process to erase from from_chained_mode_request exists only within this function.)
In check_preceding_controllers_for_deactivate, the preceding controller of diff_drive, position_tracking, is active and not present in the deactivate_request, resulting in the rejection of the diff_drive deactivate request.
Since pid_left and pid_right are included in the from_chained_mode_request and meet the conditions, they are added to the (de)activate_request as restart targets due to this process.

Bug Fix Proposal

In this part of the check process, propagate_deactivation_of_chained_mode, check_following_controllers_for_activate, and check_preceding_controllers_for_deactivate perform judgments based on the content of the (de)activate_request. Therefore, if the content of the (de)activate_request changes, the process should be retried from the beginning. However, during this process, it is also possible that the content of from/to_chained_mode_request has changed, so it is necessary to properly clear these as well.

This commit demonstrates the simplest change to fix the bug based on this idea.

However, since the above commit uses goto, which is generally not recommended, the final implementation in this PR has been rewritten using lambda functions and a while loop for the retry process (The corresponding commit).

…check from the beginning whenever there is a change in the (de)activate request (but use goto)

codecov · 2024-06-26T10:25:54Z

Codecov Report

Attention: Patch coverage is 95.65217% with 4 lines in your changes missing coverage. Please review.

Project coverage is 87.35%. Comparing base (1f8ca08) to head (05e5f80).
Report is 15 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1591      +/-   ##
==========================================
+ Coverage   87.30%   87.35%   +0.04%     
==========================================
  Files         108      108              
  Lines        9866     9929      +63     
  Branches      890      893       +3     
==========================================
+ Hits         8614     8673      +59     
- Misses        929      932       +3     
- Partials      323      324       +1

Flag	Coverage Δ
unittests	`87.35% <95.65%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...t_controllers_chaining_with_controller_manager.cpp	`99.27% <100.00%> (+0.04%)`	⬆️
controller_manager/src/controller_manager.cpp	`75.86% <92.00%> (+<0.01%)`	⬆️

... and 1 file with indirect coverage changes

mergify · 2024-07-05T09:12:04Z

This pull request is in conflict. Could you fix it @TakashiSato?

TakashiSato · 2024-07-05T13:59:10Z

I have fixed the conflict.

Due to the changes in the recently merged #1021, BUG3 no longer manifests.
Regarding BUG1 and BUG2, issues still persist.
For BUG3, although the issue no longer occurs, there are still abnormal requests in the from/to_chained_mode request up until just before the error check. Therefore, I believe the fix proposed in this PR remains desirable.

TakashiSato added 4 commits June 26, 2024 15:48

add test cases to reveal bugs in the switch_controller

a863277

to fix the bug in switch_controller, modify the process to retry the …

df56d4f

…check from the beginning whenever there is a change in the (de)activate request (but use goto)

change the bug fix implementation to not use goto

fcc1b96

remove TEST_BUG definition

005a79f

github-actions bot requested review from aprotyas, bijoua29, bmagyar, destogl, fmauch, olivier-stasse and progtologist June 26, 2024 09:52

This was referenced Jun 26, 2024

Controller restart by switch_controller (minimal implementation) #1592

Open

Controller restart by switch_controller #1568

Closed

TakashiSato added 4 commits July 5, 2024 21:40

merge master and fix conflict

f41a52f

fix broken test

84b7597

add additional chack to test

dd87de5

fix conflict

cdf96f8

TakashiSato added 2 commits July 9, 2024 21:56

Merge branch 'master' into feature/fix_switch_controller_bug

0393ca8

Merge branch 'master' into feature/fix_switch_controller_bug

05e5f80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix for switch_controller when using controller chaining #1591

Bug fix for switch_controller when using controller chaining #1591

TakashiSato commented Jun 26, 2024

codecov bot commented Jun 26, 2024 •

edited

Loading

mergify bot commented Jul 5, 2024

TakashiSato commented Jul 5, 2024

Bug fix for switch_controller when using controller chaining #1591

Are you sure you want to change the base?

Bug fix for switch_controller when using controller chaining #1591

Conversation

TakashiSato commented Jun 26, 2024

Explanation of the Bugs

BUG1

BUG2

BUG3

Bug Fix Proposal

codecov bot commented Jun 26, 2024 • edited Loading

Codecov Report

mergify bot commented Jul 5, 2024

TakashiSato commented Jul 5, 2024

codecov bot commented Jun 26, 2024 •

edited

Loading