Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix for switch_controller when using controller chaining #1591

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

TakashiSato
Copy link
Contributor

This PR is separated to address only the changes related to bug fix #1568 , fixing a bug that occurs under specific conditions when handling chainable controllers in the controller_manager's switch_controller.

Note: To focus on meaningful changes, it is recommended to compare the code using "hide whitespace" mode as follows.
https://github.com/ros-controls/ros2_control/compare/master...TakashiSato:ros2_control:feature/fix_switch_controller_bug?diff=split&w=1

Explanation of the Bugs

Using the typical model for controller chaining, which is also used in this test, the following explanations introduce three situations where bugs occur.

Note that these bugs occur only when the strictness value is set to BEST_EFFORT. When the strictness value is set to STRICT, switch_controller fails, and these bugs do not arise.
Additionally, the test cases to detect these bugs are implemented in this commit.

BUG1

  • Test case
  • Initial state
    • active: pid_left (not_chained), pid_right (not_chained)
    • inactive: position_tracking, diff_drive
  • Switch requests
    • start: diff_drive
    • stop: pid_right
  • Result:
    • active: pid_left (in_chained)
    • inactive: position_tracking, diff_drive, pid_right

In this situation, the start request for diff_drive should be rejected, and only the stop command for pid_right should be accepted. While this result is achieved, there is an issue where pid_left incorrectly enters in_chained mode.

The cause of this bug lies in this section.
When checking the activate request for diff_drive, the first following controller, pid_left, is checked for any inconsistencies along with the request. Since no issues are detected, pid_left is added to the to_chained_mode_request.
However, when the check is performed with the next controller, pid_right, inconsistencies are detected because pid_right is targeted for deactivation, leading to the rejection of the diff_drive activate request.
Since the to_chained_mode_request is not re-evaluated in subsequent processes, this request remains and results in the actual transition to chained mode during manage_switch.

BUG2

  • Test case
  • Initial state
    • active: --
    • inactive: position_tracking, diff_drive, pid_left, pid_right
  • Switch requests
    • start: diff_drive, pid_left
    • stop: --
  • Result:
    • active: pid_left (in_chained)
    • inactive: position_tracking, diff_drive, pid_right

This situation is similar to BUG1, where the start request for diff_drive is rejected, and only pid_left is activated. However, pid_left still incorrectly enters in_chained mode.
The cause of this bug is almost the same as BUG1. During the activate check for diff_drive, since the following controller pid_left is also a target for activation, pid_left is added to the to_chained_mode_request.
Then, during the subsequent check with pid_right, since pid_right is not a target for activation, the activate request for diff_drive is rejected, leaving the to_chained_mode_request for pid_left.

Incidentally, in this situation, the checks are performed in the order of pid_left followed by pid_right. Therefore, if pid_right and pid_left in the switch requests for BUG1 and BUG2 are swapped, the checks will function correctly, and these issues will not occur.

BUG3

  • Test case
  • Initial state
    • active: position_tracking(not_chained), diff_drive(in_chained), pid_left(in_chained), pid_right(in_chained)
    • inactive: --
  • Switch requests
    • start: --
    • stop: diff_drive
  • Result: switch controller will deadlock.

In this situation, since it is impossible to deactivate only diff_drive, the request should be rejected and no changes should occur.
However, the following switch requests are internally generated, and since these switches cannot be handled properly, a deadlock occurs.

  • Switch requests
    • start: pid_left, pid_right
    • stop: pid_left, pid_right

The sequence in which such switch requests are generated is as follows:

  1. Since diff_drive is a deactivate target, in propagate_deactivation_of_chained_mode, its following controllers, pid_left and pid_right, are added to the from_chained_mode_request.
  2. The activate request is empty, so check_following_controllers_for_activate does nothing. (Note: The process to erase from from_chained_mode_request exists only within this function.)
  3. In check_preceding_controllers_for_deactivate, the preceding controller of diff_drive, position_tracking, is active and not present in the deactivate_request, resulting in the rejection of the diff_drive deactivate request.
  4. Since pid_left and pid_right are included in the from_chained_mode_request and meet the conditions, they are added to the (de)activate_request as restart targets due to this process.

Bug Fix Proposal

In this part of the check process, propagate_deactivation_of_chained_mode, check_following_controllers_for_activate, and check_preceding_controllers_for_deactivate perform judgments based on the content of the (de)activate_request. Therefore, if the content of the (de)activate_request changes, the process should be retried from the beginning. However, during this process, it is also possible that the content of from/to_chained_mode_request has changed, so it is necessary to properly clear these as well.

This commit demonstrates the simplest change to fix the bug based on this idea.

However, since the above commit uses goto, which is generally not recommended, the final implementation in this PR has been rewritten using lambda functions and a while loop for the retry process (The corresponding commit).

Copy link

codecov bot commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 95.65217% with 4 lines in your changes missing coverage. Please review.

Project coverage is 87.35%. Comparing base (1f8ca08) to head (05e5f80).
Report is 15 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1591      +/-   ##
==========================================
+ Coverage   87.30%   87.35%   +0.04%     
==========================================
  Files         108      108              
  Lines        9866     9929      +63     
  Branches      890      893       +3     
==========================================
+ Hits         8614     8673      +59     
- Misses        929      932       +3     
- Partials      323      324       +1     
Flag Coverage Δ
unittests 87.35% <95.65%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...t_controllers_chaining_with_controller_manager.cpp 99.27% <100.00%> (+0.04%) ⬆️
controller_manager/src/controller_manager.cpp 75.86% <92.00%> (+<0.01%) ⬆️

... and 1 file with indirect coverage changes

Copy link
Contributor

mergify bot commented Jul 5, 2024

This pull request is in conflict. Could you fix it @TakashiSato?

@TakashiSato
Copy link
Contributor Author

I have fixed the conflict.

Due to the changes in the recently merged #1021, BUG3 no longer manifests.
Regarding BUG1 and BUG2, issues still persist.
For BUG3, although the issue no longer occurs, there are still abnormal requests in the from/to_chained_mode request up until just before the error check. Therefore, I believe the fix proposed in this PR remains desirable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant