test(robot): migrate test_replica_auto_balance_node_least_effort #2163

yangchiu · 2024-11-13T05:14:17Z

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9776

What this PR does / why we need it:

migrate test_replica_auto_balance_node_least_effort

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

Release Notes

New Features
- Introduced keywords for validating volume replica counts, including checks for specific nodes and ensuring no additional scheduling occurs.
- Added methods to wait for a specific number of replicas for a volume across different classes, enhancing volume management capabilities.
Tests
- Implemented a new test case for validating replica auto-balancing behavior under specific conditions, ensuring proper distribution and health of volumes.
Chores
- Added a new resource for node-related keywords to support the new test case.

coderabbitai · 2024-11-13T05:14:28Z

Walkthrough

The pull request introduces new functionalities for managing volume replicas. It adds several keywords for validating the number of running replicas on specific nodes and a new method wait_for_replica_count across multiple classes, including volume_keywords, CRD, Rest, and Volume. Additionally, a new test case for validating replica auto-balancing behavior is included, ensuring that replicas are distributed correctly across nodes under specified conditions.

Changes

File	Change Summary
e2e/keywords/volume.resource	Added new keywords for validating volume replica counts and node-specific checks.
e2e/libs/keywords/volume_keywords.py	Introduced `wait_for_replica_count` method in `volume_keywords` class.
e2e/libs/volume/crd.py	Added `wait_for_replica_count` method in `CRD` class to check replica counts on nodes.
e2e/libs/volume/rest.py	Implemented `wait_for_replica_count` method in `Rest` class to monitor running replicas.
e2e/libs/volume/volume.py	Added `wait_for_replica_count` method in `Volume` class to wait for specific replica counts.
e2e/tests/regression/test_scheduling.robot	Introduced a new test case "Test Replica Auto Balance Node Least Effort" and added a resource for it.

Assessment against linked issues

Objective	Addressed	Explanation
Migrate test_scheduling.test_replica_auto_balance_node_least_effort (#9776)	✅

Possibly related PRs

test(robot): migrate test_replica_rebuild_per_volume_limit #2159: The changes in this PR introduce new keywords related to replica management, which are directly relevant to the main PR's focus on validating the state of volume replicas.
test(robot): volume should not reattach after node eviction #2164: This PR adds a test case that verifies the behavior of detached volumes during node eviction, which aligns with the main PR's enhancements for monitoring and validating volume replicas.

Suggested reviewers

chriscchien

Poem

🐇 In fields of data, we hop and play,
New keywords sprout, brightening the day.
With replicas dancing, in nodes they align,
Auto-balance whispers, all will be fine!
So let’s cheer for changes, both big and small,
For in our code garden, we nurture them all! 🌼

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 36aaa0b and 406ae36.

📒 Files selected for processing (2)

e2e/keywords/volume.resource (1 hunks)
e2e/libs/volume/crd.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

e2e/libs/volume/crd.py

🔇 Additional comments (4)

e2e/keywords/volume.resource (4)

241-244: LGTM: Basic replica count verification looks good!

The implementation correctly verifies the replica count using the wait_for_replica_count method.

245-252: LGTM: Node-specific replica count verification is well implemented!

The keyword correctly verifies the replica count on a specific node and properly sets test variables for reuse in subsequent keywords.

253-260: LGTM: Node-specific replica verification without count is correct!

The keyword correctly verifies replicas on a specific node without enforcing a specific count.

261-277: 🛠️ Refactor suggestion

Refactor scheduling verification keywords to improve maintainability

The scheduling verification keywords have duplicate code and use hardcoded values. Consider extracting the common logic into a reusable keyword.

Apply this refactor to improve maintainability and configurability:

+Verify No Additional Scheduling For Node
+    [Arguments]    ${volume_name}    ${node_name}    ${replica_count}    ${retry_count}=3    ${sleep_duration}=5
+    FOR    ${i}    IN RANGE    ${retry_count}
+        Log to console    Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count})
+        ${new_replica_count} =    wait_for_replica_count    ${volume_name}    node_name=${node_name}    replica_count=${replica_count}
+        Should Be Equal As Integers    ${replica_count}    ${new_replica_count}
+        Sleep    ${sleep_duration}
+    END

 Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs
     Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}
-    FOR    ${i}    IN RANGE    3
-        Log to console    Ensuring there's no additional scheduling for node ${node_name} ... (${i})
-        ${new_replica_count} =    wait_for_replica_count    ${volume_name}    node_name=${node_name}    replica_count=${expected_replica_count}
-        Should Be Equal As Integers    ${replica_count}    ${new_replica_count}
-        Sleep    5
-    END
+    Verify No Additional Scheduling For Node    ${volume_name}    ${node_name}    ${expected_replica_count}

 Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs
     Volume ${volume_id} should have replicas running on node ${node_id}
-    FOR    ${i}    IN RANGE    3
-        Log to console    Ensuring there's no additional scheduling for node ${node_name} ... (${i})
-        ${new_replica_count} =    wait_for_replica_count    ${volume_name}    node_name=${node_name}
-        Should Be Equal As Integers    ${replica_count}    ${new_replica_count}
-        Sleep    5
-    END
+    Verify No Additional Scheduling For Node    ${volume_name}    ${node_name}    ${replica_count}

This refactor:

Extracts common scheduling verification logic into a reusable keyword
Makes retry count and sleep duration configurable
Improves progress indication in log messages
Reduces code duplication

Likely invalid or redundant comment.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (10)

e2e/tests/regression/test_scheduling.robot (3)
56-57: Add test documentation

Please add a [Documentation] section to describe:

Test purpose and objectives

Preconditions and assumptions

Expected behavior of replica auto-balancing

Significance of using 6 replicas

Example:
[Documentation]    Test replica auto-balancing with least-effort strategy.
...    
...    This test verifies that:
...    1. Replicas are correctly distributed when nodes become schedulable
...    2. Auto-balancing maintains the total replica count
...    3. No unnecessary scheduling occurs after balancing
...    
...    Uses 6 replicas to ensure multiple replicas can be balanced across nodes.
72-78: Move wait condition comments to keywords

The comments about loop behavior and wait conditions should be encapsulated in the keywords themselves rather than as inline comments.

Consider creating a new keyword that encapsulates this logic:
*** Keywords ***
Wait For Stable Replica Distribution
    [Arguments]    ${volume_name}    ${node}    ${expected_count}
    FOR    ${i}    IN RANGE    3
        Volume ${volume_name} should have ${expected_count} replicas running on node ${node}
        Sleep    5s
    END
63-66: Consider parameterizing the replica count

The value 6 for numberOfReplicas appears to be significant for this test. Consider making it a variable to improve maintainability and clarity.
*** Variables ***
${REPLICA_COUNT_FOR_BALANCE_TEST}    6    # Ensures multiple replicas per node for balance testing

*** Test Cases ***
    ...
    And Create volume 0 with    numberOfReplicas=${REPLICA_COUNT_FOR_BALANCE_TEST}    dataEngine=${DATA_ENGINE}
e2e/libs/volume/volume.py (1)
128-130: LGTM! Consider adding docstring documentation.

The new method follows the established pattern of delegating to the underlying implementation and aligns well with the class's existing functionality for replica management.

Consider adding a docstring to document the method's purpose and parameters:
 def wait_for_replica_count(self, volume_name, node_name, replica_count):
+    """Wait until the specified number of replicas are present on the given node.
+
+    Args:
+        volume_name (str): Name of the volume
+        node_name (str): Name of the node to check replicas on
+        replica_count (int): Expected number of replicas
+
+    Returns:
+        bool: True if the desired replica count is reached
+    """
     return self.volume.wait_for_replica_count(volume_name, node_name, replica_count)
e2e/libs/keywords/volume_keywords.py (1)
239-241: Add documentation and logging for consistency

The implementation looks good and follows the class patterns. However, for better maintainability and consistency with other methods in the class, consider these improvements:
 def wait_for_replica_count(self, volume_name, node_name=None, replica_count=None):
+    """Wait for the volume to have the specified number of replicas.
+
+    Args:
+        volume_name: Name of the volume to check
+        node_name: Optional node name to check replicas on a specific node
+        replica_count: Optional specific number of replicas to wait for
+
+    Returns:
+        Result from volume.wait_for_replica_count
+    """
+    logging(f'Waiting for volume {volume_name} to have {replica_count} replicas' + 
+           (f' on node {node_name}' if node_name else ''))
     return self.volume.wait_for_replica_count(volume_name, node_name, replica_count)
e2e/keywords/volume.resource (2)
234-237: Remove unused variable assignment

The ${replica_count} variable is assigned but never used. Consider removing the assignment for cleaner code.
-    ${replica_count} =    wait_for_replica_count    ${volume_name}    node_name=    replica_count=${expected_replica_count}
+    wait_for_replica_count    ${volume_name}    node_name=    replica_count=${expected_replica_count}
246-253: Consider refactoring to reduce code duplication

This keyword is very similar to Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}. Consider refactoring to reuse logic:
Volume ${volume_id} should have replicas running on node ${node_id}
    ${volume_name} =    generate_name_with_suffix    volume    ${volume_id}
    ${node_name} =    get_node_by_index    ${node_id}
-    ${replica_count} =    wait_for_replica_count    ${volume_name}    node_name=${node_name}
-    Set Test Variable    ${volume_name}
-    Set Test Variable    ${node_name}
-    Set Test Variable    ${replica_count}
+    Volume ${volume_id} should have ${1} replicas running on node ${node_id}
e2e/libs/volume/rest.py (2)
234-249: Simplify conditional logic and add error handling

The implementation can be improved in two ways:

Combine the conditional checks for better readability

Add error handling for the volume client call

Apply this diff to improve the code:
 def wait_for_replica_count(self, volume_name, node_name, replica_count):
     for i in range(self.retry_count):
-        running_replica_count = 0
-        volume = get_longhorn_client().by_id_volume(volume_name)
-        for r in volume.replicas:
-            if node_name and r.hostId == node_name and r.running:
-                running_replica_count += 1
-            elif not node_name and r.running:
-                running_replica_count += 1
-        logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})")
-        if replica_count and running_replica_count == int(replica_count):
-            break
-        elif not replica_count and running_replica_count:
-            break
+        try:
+            running_replica_count = 0
+            volume = get_longhorn_client().by_id_volume(volume_name)
+            for r in volume.replicas:
+                if r.running and (not node_name or r.hostId == node_name):
+                    running_replica_count += 1
+            logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})")
+            if (replica_count and running_replica_count == int(replica_count)) or \
+               (not replica_count and running_replica_count):
+                break
+        except Exception as e:
+            logging(f"Failed to get volume {volume_name} with error: {e}")
         time.sleep(self.retry_interval)
     return running_replica_count
🧰 Tools

🪛 Ruff

239-242: Combine if branches using logical or operator

Combine if branches

(SIM114)

244-247: Combine if branches using logical or operator

Combine if branches

(SIM114)

Line range hint 1-332: Consider architectural improvements for better maintainability

The Rest class currently handles both volume operations and monitoring. Consider the following architectural improvements:

Split the class into separate concerns:

VolumeOperations for basic CRUD operations

VolumeMonitor for monitoring and waiting operations

Extract common retry mechanism into a decorator or base class

This would improve maintainability and make the code more modular.

🧰 Tools

🪛 Ruff

239-242: Combine if branches using logical or operator

Combine if branches

(SIM114)

244-247: Combine if branches using logical or operator

Combine if branches

(SIM114)
e2e/libs/volume/crd.py (1)
267-269: Add documentation and type hints.

The implementation looks good and follows the established delegation pattern. Consider adding documentation and type hints for better maintainability:
-    def wait_for_replica_count(self, volume_name, node_name, replica_count):
+    def wait_for_replica_count(self, volume_name: str, node_name: str, replica_count: int) -> bool:
+        """Wait until the specified number of replicas are running on the given node.
+
+        Args:
+            volume_name: Name of the volume to check
+            node_name: Name of the node to check
+            replica_count: Expected number of replicas
+
+        Returns:
+            bool: True if the replica count matches, False otherwise
+        """
         return Rest().wait_for_replica_count(volume_name, node_name, replica_count)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a360760 and 2053fd1.

📒 Files selected for processing (6)

e2e/keywords/volume.resource (1 hunks)
e2e/libs/keywords/volume_keywords.py (1 hunks)
e2e/libs/volume/crd.py (1 hunks)
e2e/libs/volume/rest.py (1 hunks)
e2e/libs/volume/volume.py (1 hunks)
e2e/tests/regression/test_scheduling.robot (2 hunks)

🧰 Additional context used

🪛 Ruff

e2e/libs/volume/rest.py

239-242: Combine if branches using logical or operator

Combine if branches

(SIM114)

244-247: Combine if branches using logical or operator

Combine if branches

(SIM114)

🔇 Additional comments (4)

e2e/tests/regression/test_scheduling.robot (1)

13-13: LGTM: Resource addition is appropriate

The addition of node.resource aligns with the test's requirements for node management operations.

e2e/libs/keywords/volume_keywords.py (1)

239-241: Implementation aligns well with test migration needs

The method provides the necessary functionality for the migrated test case test_replica_auto_balance_node_least_effort to verify replica counts and distribution during auto-balancing scenarios.

e2e/keywords/volume.resource (1)

238-245: LGTM! Well-structured keyword with proper variable management

The implementation correctly validates node-specific replica count and sets test variables for reuse.

e2e/libs/volume/rest.py (1)

Line range hint 251-332: LGTM! Enhanced logging improves observability

The added logging statements improve the observability of the replica rebuilding process, making it easier to track progress and debug issues.

🧰 Tools

🪛 Ruff

239-242: Combine if branches using logical or operator

Combine if branches

(SIM114)

244-247: Combine if branches using logical or operator

Combine if branches

(SIM114)

coderabbitai · 2024-11-13T05:24:00Z

e2e/tests/regression/test_scheduling.robot

+    Then Volume 0 should have replicas running on node 1
+    And Volume 0 should have 6 replicas running


🛠️ Refactor suggestion

Add timeout parameters to wait conditions

The test should include explicit timeout parameters for waiting for replica distribution to ensure test stability.

Then Volume 0 should have replicas running on node 1 timeout=300 And Volume 0 should have ${REPLICA_COUNT_FOR_BALANCE_TEST} replicas running timeout=300

Also applies to: 84-85

coderabbitai · 2024-11-13T05:24:00Z

e2e/tests/regression/test_scheduling.robot

+    Given Set setting replica-soft-anti-affinity to true
+    And Set setting replica-auto-balance to least-effort
+
+    When Disable node 1 scheduling
+    And Disable node 2 scheduling
+    And Create volume 0 with    numberOfReplicas=6    dataEngine=${DATA_ENGINE}
+    And Attach volume 0
+    And Wait for volume 0 healthy
+    And Write data to volume 0
+    Then Volume 0 should have 6 replicas running on node 0
+    And Volume 0 should have 0 replicas running on node 1
+    And Volume 0 should have 0 replicas running on node 2
+
+    When Enable node 1 scheduling
+    # wait for auto balance
+    Then Volume 0 should have replicas running on node 1
+    And Volume 0 should have 6 replicas running
+    # loop 3 times with 5-second wait and compare the replica count to:
+    # ensure no additional scheduling occurs
+    # the replica count remains unchanged
+    And Volume 0 should have replicas running on node 0 and no additional scheduling occurs
+    And Volume 0 should have replicas running on node 1 and no additional scheduling occurs
+    And Volume 0 should have 0 replicas running on node 2 and no additional scheduling occurs
+
+    When Enable node 2 scheduling
+    # wait for auto balance
+    Then Volume 0 should have replicas running on node 2
+    And Volume 0 should have 6 replicas running
+    # loop 3 times with 5-second wait and compare the replica count to:
+    # ensure no additional scheduling occurs
+    # the replica count remains unchanged
+    And Volume 0 should have replicas running on node 0 and no additional scheduling occurs
+    And Volume 0 should have replicas running on node 1 and no additional scheduling occurs
+    And Volume 0 should have replicas running on node 2 and no additional scheduling occurs
+
+    And Wait for volume 0 healthy
+    And Check volume 0 data is intact


💡 Codebase verification

Missing Test Coverage for Edge Cases
The following edge cases are not covered by the existing tests:

Node becomes unschedulable during balancing

Volume operations during balancing

Multiple volumes being balanced simultaneously

🔗 Analysis chain

Verify test coverage for edge cases

The test covers the happy path but should verify additional scenarios:

Node becomes unschedulable during balancing

Volume operations during balancing

Multiple volumes being balanced simultaneously

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check for existing test coverage of these scenarios rg -l "Test.*Auto.*Balance" "e2e/tests" # Look for related test implementations ast-grep --pattern 'Test.*Auto.*Balance.*\n.*\[Documentation\].*\n.*'

Length of output: 154

Script:

#!/bin/bash # Check for tests where a node becomes unschedulable during balancing rg -l "unschedulable.*balancing" "e2e/tests" # Check for tests involving volume operations during balancing rg -l "volume.*operations.*balancing" "e2e/tests" # Check for tests handling multiple volumes being balanced simultaneously rg -l "multiple.*volumes.*balancing" "e2e/tests" # Look for related test implementations with documentation ast-grep --pattern 'Test.*unschedulable.*balancing.*\n.*\[Documentation\].*\n.*' ast-grep --pattern 'Test.*volume.*operations.*balancing.*\n.*\[Documentation\].*\n.*' ast-grep --pattern 'Test.*multiple.*volumes.*balancing.*\n.*\[Documentation\].*\n.*'

Length of output: 401

Script:

#!/bin/bash # Broaden search for tests where a node becomes unschedulable during balancing rg -il "node.*unschedulable.*balance" "e2e/tests" # Broaden search for tests involving volume operations during balancing rg -il "volume.*operation.*balance" "e2e/tests" # Broaden search for tests handling multiple volumes being balanced simultaneously rg -il "multiple.*volume.*balance" "e2e/tests" # Look for any tests related to node scheduling changes with documentation ast-grep --pattern 'Test.*node.*scheduling.*\n.*\[Documentation\].*\n.*' # Look for any tests related to volume operations with documentation ast-grep --pattern 'Test.*volume.*operation.*\n.*\[Documentation\].*\n.*' # Look for any tests related to multiple volumes with documentation ast-grep --pattern 'Test.*multiple.*volumes.*\n.*\[Documentation\].*\n.*'

Length of output: 371

coderabbitai · 2024-11-13T05:24:00Z

e2e/keywords/volume.resource

+Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs
+    Volume ${volume_id} should have replicas running on node ${node_id}
+    FOR    ${i}    IN RANGE    3
+        Log to console    Ensuring there's no additional scheduling for node ${node_name} ... (${i})
+        ${new_replica_count} =    wait_for_replica_count    ${volume_name}    node_name=${node_name}
+        Should Be Equal As Integers    ${replica_count}    ${new_replica_count}
+        Sleep    5
+    END


🛠️ Refactor suggestion

Refactor to reduce duplication and improve robustness

This keyword duplicates most of the logic from the previous keyword. Consider refactoring both keywords to share common logic and apply the same improvements regarding configurable retry counts and sleep duration.

Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs Volume ${volume_id} should have replicas running on node ${node_id} - FOR ${i} IN RANGE 3 - Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i}) + ${retry_count} = Set Variable 3 + ${sleep_duration} = Set Variable 5 + FOR ${i} IN RANGE ${retry_count} + Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count}) ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} Should Be Equal As Integers ${replica_count} ${new_replica_count} - Sleep 5 + Sleep ${sleep_duration} END

Consider creating a new keyword to handle the scheduling verification:

Verify No Additional Scheduling For Volume ${volume_id} On Node ${node_id} With Count ${expected_count} [Arguments] ${retry_count}=3 ${sleep_duration}=5 ${node_name} = get_node_by_index ${node_id} FOR ${i} IN RANGE ${retry_count} Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count}) ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_count} Should Be Equal As Integers ${replica_count} ${new_replica_count} Sleep ${sleep_duration} END

coderabbitai · 2024-11-13T05:24:00Z

e2e/keywords/volume.resource

+Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs
+    Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}
+    FOR    ${i}    IN RANGE    3
+        Log to console    Ensuring there's no additional scheduling for node ${node_name} ... (${i})
+        ${new_replica_count} =    wait_for_replica_count    ${volume_name}    node_name=${node_name}    replica_count=${expected_replica_count}
+        Should Be Equal As Integers    ${replica_count}    ${new_replica_count}
+        Sleep    5
+    END


🛠️ Refactor suggestion

Improve robustness of scheduling verification

A few suggestions to improve this keyword:

The iteration count (3) and sleep duration (5) are hardcoded. Consider making these configurable.

The logging message uses ${node_name} which might not be set if the first keyword fails.

Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} - FOR ${i} IN RANGE 3 - Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i}) + ${node_name} = get_node_by_index ${node_id} # Set node_name for logging + ${retry_count} = Set Variable 3 + ${sleep_duration} = Set Variable 5 + FOR ${i} IN RANGE ${retry_count} + Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count}) ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_replica_count} Should Be Equal As Integers ${replica_count} ${new_replica_count} - Sleep 5 + Sleep ${sleep_duration} END

Committable suggestion skipped: line range outside the PR's diff.

khushboo-rancher · 2024-11-13T23:31:35Z

e2e/tests/regression/test_scheduling.robot

@@ -51,3 +52,43 @@ Test Soft Anti Affinity Scheduling
    Then Wait until volume 0 replicas rebuilding completed
    And Wait for volume 0 healthy
    And Check volume 0 data is intact
+
+Test Replica Auto Balance Node Least Effort


Can we add a brief description?

khushboo-rancher · 2024-11-13T23:36:05Z

e2e/tests/regression/test_scheduling.robot

+    And Volume 0 should have replicas running on node 0 and no additional scheduling occurs
+    And Volume 0 should have replicas running on node 1 and no additional scheduling occurs


Should we include the number of replicas as 3 running on node 0 & node 1 also in the verification?

khushboo-rancher · 2024-11-13T23:36:56Z

e2e/tests/regression/test_scheduling.robot

+    And Volume 0 should have replicas running on node 0 and no additional scheduling occurs
+    And Volume 0 should have replicas running on node 1 and no additional scheduling occurs
+    And Volume 0 should have replicas running on node 2 and no additional scheduling occurs


Same as above comment, including the replica count.

mergify · 2024-11-14T00:17:49Z

This pull request is now in conflict. Could you fix it @yangchiu? 🙏

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)

e2e/tests/regression/test_scheduling.robot (2)

56-58: Enhance test case documentation

The documentation should include:

Purpose of testing replica auto-balance with least-effort strategy
Expected behavior of replica distribution
Key test steps and assertions

Example documentation:

[Documentation]    Test replica auto-balance behavior with least-effort strategy
...    
...    This test verifies that replicas are automatically balanced across nodes when:
...    1. Nodes become available for scheduling
...    2. replica-auto-balance is set to least-effort
...    3. replica-soft-anti-affinity is enabled
...    
...    Expected behavior:
...    - Initially all replicas should be on node 0 (only schedulable node)
...    - When node 1 becomes schedulable, replicas should be balanced between node 0 and 1
...    - When node 2 becomes schedulable, replicas should be balanced across all nodes
...    - Total replica count should remain constant during balancing

74-82: Reduce code duplication with a helper keyword

The verification blocks are duplicated. Consider creating a helper keyword to improve maintainability.

Example:

*** Keywords ***
Verify Replica Distribution
    [Arguments]    ${volume_name}    ${expected_nodes}    ${timeout}=300
    FOR    ${node}    IN    @{expected_nodes}
        Volume ${volume_name} should have replicas running on node ${node}    timeout=${timeout}
        Volume ${volume_name} should have replicas running on node ${node} and no additional scheduling occurs
    END
    Volume ${volume_name} should have 6 replicas running    timeout=${timeout}

Usage:

Then Verify Replica Distribution    0    ${expected_nodes}=['0', '1']
...
Then Verify Replica Distribution    0    ${expected_nodes}=['0', '1', '2']

Also applies to: 84-92

e2e/libs/volume/rest.py (2)

220-236: Add docstring documentation for the new method.

The method lacks documentation describing its parameters, return value, and behavior. This is especially important for test automation code.

Add a docstring like this:
 def wait_for_replica_count(self, volume_name, node_name, replica_count):
+    """Wait for the specified number of running replicas for a volume.
+
+    Args:
+        volume_name: Name of the volume to check
+        node_name: If provided, only count replicas on this node
+        replica_count: If provided, wait for this exact count, otherwise wait for any running replica
+
+    Returns:
+        int: The final count of running replicas
+    """
🧰 Tools

🪛 Ruff

225-228: Combine if branches using logical or operator

Combine if branches

(SIM114)

230-233: Combine if branches using logical or operator

Combine if branches

(SIM114)

225-228: Simplify replica counting conditions.

The conditional logic for counting replicas can be simplified using a logical OR operator.
-                if node_name and r.hostId == node_name and r.running:
-                    running_replica_count += 1
-                elif not node_name and r.running:
-                    running_replica_count += 1
+                if r.running and (not node_name or r.hostId == node_name):
+                    running_replica_count += 1
🧰 Tools

🪛 Ruff

225-228: Combine if branches using logical or operator

Combine if branches

(SIM114)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2053fd1 and 433fd33.

📒 Files selected for processing (6)

e2e/keywords/volume.resource (1 hunks)
e2e/libs/keywords/volume_keywords.py (1 hunks)
e2e/libs/volume/crd.py (1 hunks)
e2e/libs/volume/rest.py (1 hunks)
e2e/libs/volume/volume.py (1 hunks)
e2e/tests/regression/test_scheduling.robot (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

e2e/libs/keywords/volume_keywords.py
e2e/libs/volume/crd.py
e2e/libs/volume/volume.py

🧰 Additional context used

🪛 Ruff

e2e/libs/volume/rest.py

225-228: Combine if branches using logical or operator

Combine if branches

(SIM114)

230-233: Combine if branches using logical or operator

Combine if branches

(SIM114)

🔇 Additional comments (10)

e2e/tests/regression/test_scheduling.robot (4)

13-13: LGTM!

The node resource import is properly added and aligns with the test requirements.

62-71: Add timeout parameters to wait conditions

For test stability, explicit timeout parameters should be added to the wait conditions.

Apply this pattern to the following lines:

And Wait for volume 0 healthy    timeout=300
Then Volume 0 should have 6 replicas running on node 0    timeout=300
And Volume 0 should have 0 replicas running on node 1    timeout=300
And Volume 0 should have 0 replicas running on node 2    timeout=300

94-95: LGTM!

Good practice to verify volume health and data integrity after balancing operations.

59-61: Verify settings persistence

The test relies on these settings. Let's ensure they persist throughout the test.

✅ Verification successful

Based on the search results, I can see that these settings are used in two test files:

test_scheduling.robot: Sets both settings as part of the test setup
node_reboot.robot: Sets replica-soft-anti-affinity to false

Let me verify if there are any teardown or cleanup steps for these settings.

Let me check what's included in the Cleanup test resources keyword since both test files use it.

Based on the investigation:

Both test files use Cleanup test resources in their teardown
The cleanup function doesn't explicitly reset these settings
The setting.resource file shows no default value restoration
The settings are modified in different tests with different values

Let me check if there are any default settings or global setup that might handle this.

I notice there's a reset_settings() method in the Python code. Let me check if and where it's used, and also look at the Setting class implementation.

Settings are properly managed through global reset mechanism

The investigation reveals that settings persistence is properly handled:

The reset_settings() method in Setting class systematically resets all non-readonly settings to their default values
This reset is available through the Cleanup test resources keyword which is used in the test teardown for both files
The settings replica-soft-anti-affinity and replica-auto-balance are not in the exclusion list of reset_settings()
Different test files setting different values won't interfere as each test has its own setup and teardown

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any other tests that might modify these settings
rg -B 2 -A 2 "Set setting (replica-soft-anti-affinity|replica-auto-balance)" "e2e/tests"

# Check if settings are reset in teardown
rg "Set setting.*(replica-soft-anti-affinity|replica-auto-balance)" "e2e/keywords/common.resource"

Length of output: 1612

Script:

#!/bin/bash
# Check for teardown or cleanup sections in these files
rg -B 5 -A 5 "^\[Teardown\]|Test Teardown|Suite Teardown" "e2e/tests/regression/test_scheduling.robot" "e2e/tests/negative/node_reboot.robot"

# Check for any global teardown in robot framework resource files
fd -e resource -x rg -B 5 -A 5 "^\[Teardown\]|Test Teardown|Suite Teardown" {}

Length of output: 1663

Script:

#!/bin/bash
# Search for the definition of Cleanup test resources
rg -B 5 -A 10 "Cleanup test resources" "e2e/keywords/"

# Also check setting.resource for any default settings restoration
fd "setting.resource" -x cat {}

Length of output: 1493

Script:

#!/bin/bash
# Check for any global setup or default settings
rg -B 5 -A 5 "Set test environment" "e2e/keywords/"

# Check Python implementation for any default settings
fd "setting_keywords.py" -x cat {}

Length of output: 1139

Script:

#!/bin/bash
# Check for usage of reset_settings
rg "reset_settings" "e2e/"

# Look for the Setting class implementation
fd "setting.py" -x cat {}

Length of output: 4992

e2e/libs/volume/rest.py (1)

220-236: Verify integration with replica auto-balancing tests.

The new method is part of migrating replica auto-balancing tests. Let's verify its usage in test cases.

✅ Verification successful

Integration with replica auto-balancing tests verified successfully

The wait_for_replica_count method is properly integrated with the auto-balancing test migration:

It's part of the commit 433fd33 that migrates the test_replica_auto_balance_node_least_effort test
The method is implemented across the volume handling layers (rest.py, crd.py, volume.py) and is used in Robot Framework tests
The test_scheduling.robot file contains the auto-balance test that uses this functionality
The implementation supports both specific node counting (when node_name is provided) and total replica counting (when node_name is None), which is essential for auto-balancing verification

The method is correctly implemented and integrated for its intended use in replica auto-balancing tests.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find test files using this method and verify integration
# Expected: Find usage in auto-balancing test files

# Search for test files using wait_for_replica_count
echo "Searching for test files using wait_for_replica_count..."
rg -l "wait_for_replica_count.*node.*replica" --type python

# Search for auto-balance test files
echo "Searching for auto-balance test files..."
rg -l "test_replica_auto_balance" --type python

Length of output: 385

Script:

#!/bin/bash
# Let's try without the --type flag since python type is not recognized
# Search for test files using wait_for_replica_count
echo "Searching for test files using wait_for_replica_count..."
rg -l "wait_for_replica_count.*node.*replica"

# Search for auto-balance test files
echo "Searching for auto-balance test files..."
rg -l "test_replica_auto_balance"

# Also search for the class definition to understand the context
echo "Searching for the class containing wait_for_replica_count..."
rg -B5 "def wait_for_replica_count"

Length of output: 2850

Script:

#!/bin/bash
# Let's check how this method is used in the auto-balance test files
echo "Checking usage in test_zone.py..."
rg -B5 -A5 "wait_for_replica_count" manager/integration/tests/test_zone.py

echo "Checking usage in test_scheduling.py..."
rg -B5 -A5 "wait_for_replica_count" manager/integration/tests/test_scheduling.py

# Let's also check if there are any recent changes or additions related to auto-balancing
echo "Checking recent git history for auto-balance related changes..."
git log --grep="auto.balance\|autobalance" --since="3 months ago" --pretty=format:"%h %s"

Length of output: 623

Script:

#!/bin/bash
# Let's examine the commit that migrated the auto-balance test
echo "Checking the auto-balance test migration commit..."
git show 433fd33 --name-status

# Let's also look for Robot Framework test files that might be using this method
echo "Checking Robot Framework test files for auto-balance tests..."
rg -l "Test Replica Auto Balance" --type robot || true

# And check the actual usage in Robot Framework files
echo "Checking usage of wait_for_replica_count in Robot files..."
rg -B5 -A5 "Wait For Replica Count" --type robot || true

Length of output: 999

Script:

#!/bin/bash
# Let's examine the test_scheduling.robot file for the auto-balance test
echo "Checking test_scheduling.robot for auto-balance test..."
rg -B10 -A10 "Test Replica Auto Balance" e2e/tests/regression/test_scheduling.robot

# Let's also check the implementation in the modified files
echo "Checking volume.resource file..."
cat e2e/keywords/volume.resource

Length of output: 18193

🧰 Tools

🪛 Ruff

225-228: Combine if branches using logical or operator

Combine if branches

(SIM114)

230-233: Combine if branches using logical or operator

Combine if branches

(SIM114)

e2e/keywords/volume.resource (5)

222-225: LGTM!

The implementation is correct and provides a clear way to validate the total replica count for a volume.

226-233: LGTM!

The implementation correctly validates node-specific replica counts and properly sets test variables for reuse in subsequent keywords.

234-241: LGTM!

The implementation correctly validates the presence of replicas on a specific node without requiring an exact count.

242-249: 🛠️ Refactor suggestion

Consider refactoring to improve robustness and reduce duplication

The implementation could be improved by:

Making retry count and sleep duration configurable
Adding iteration count to logging message
Creating a shared keyword for scheduling verification

Apply this diff to improve the implementation:

Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs
    Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}
-    FOR    ${i}    IN RANGE    3
-        Log to console    Ensuring there's no additional scheduling for node ${node_name} ... (${i})
+    ${retry_count} =    Set Variable    3
+    ${sleep_duration} =    Set Variable    5
+    FOR    ${i}    IN RANGE    ${retry_count}
+        Log to console    Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count})
         ${new_replica_count} =    wait_for_replica_count    ${volume_name}    node_name=${node_name}    replica_count=${expected_replica_count}
         Should Be Equal As Integers    ${replica_count}    ${new_replica_count}
-        Sleep    5
+        Sleep    ${sleep_duration}
     END

Consider creating a new shared keyword for scheduling verification:

Verify No Additional Scheduling
    [Arguments]    ${volume_name}    ${node_name}    ${expected_count}    ${retry_count}=3    ${sleep_duration}=5
    FOR    ${i}    IN RANGE    ${retry_count}
        Log to console    Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count})
        ${new_replica_count} =    wait_for_replica_count    ${volume_name}    node_name=${node_name}    replica_count=${expected_count}
        Should Be Equal As Integers    ${expected_count}    ${new_replica_count}
        Sleep    ${sleep_duration}
    END

251-258: 🛠️ Refactor suggestion

Refactor to reduce code duplication

This keyword is nearly identical to the previous one. Consider using the same shared keyword for scheduling verification to reduce duplication.

Apply this diff:

Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs
    Volume ${volume_id} should have replicas running on node ${node_id}
-    FOR    ${i}    IN RANGE    3
-        Log to console    Ensuring there's no additional scheduling for node ${node_name} ... (${i})
-        ${new_replica_count} =    wait_for_replica_count    ${volume_name}    node_name=${node_name}
-        Should Be Equal As Integers    ${replica_count}    ${new_replica_count}
-        Sleep    5
-    END
+    Verify No Additional Scheduling    ${volume_name}    ${node_name}    ${replica_count}

coderabbitai · 2024-11-14T00:23:14Z

e2e/libs/volume/rest.py

+            if replica_count and running_replica_count == int(replica_count):
+                break
+            elif not replica_count and running_replica_count:
+                break
+            time.sleep(self.retry_interval)


⚠️ Potential issue

Add assertion for expected replica count.

The method should assert that the expected count is met before returning. Currently, it silently returns even if the count doesn't match expectations.

if replica_count and running_replica_count == int(replica_count): + logging(f"Successfully found {running_replica_count} running replicas") break elif not replica_count and running_replica_count: + logging(f"Successfully found {running_replica_count} running replicas") break time.sleep(self.retry_interval) + if replica_count: + assert running_replica_count == int(replica_count), \ + f"Expected {replica_count} running replicas but got {running_replica_count}" return running_replica_count

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff

230-233: Combine if branches using logical or operator

Combine if branches

(SIM114)

Signed-off-by: Yang Chiu <yang.chiu@suse.com>

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)

e2e/tests/regression/test_scheduling.robot (3)
58-60: Enhance test case documentation.

While the scenario is mentioned, the documentation could be more comprehensive by including:

Prerequisites (e.g., required node count, initial state)

Expected behavior of replica auto-balancing with least-effort setting

Validation steps and expected outcomes at each stage

Example documentation structure:
[Documentation]    Test replica auto-balance behavior with least-effort setting
...    Prerequisites:
...    - At least 3 nodes available
...    - Initial state: All nodes schedulable
...    
...    Expected Behavior:
...    - With least-effort, replicas should gradually redistribute when nodes become schedulable
...    - Distribution should maintain volume health while minimizing replica movement
...    
...    Steps:
...    1. Configure soft anti-affinity and least-effort auto-balance
...    2. Disable scheduling on nodes 1 & 2
...    3. Create and verify volume with 6 replicas on node 0
...    4. Enable node scheduling one by one
...    5. Verify gradual replica redistribution
...    6. Ensure data integrity throughout the process
64-68: Add volume creation verification steps.

Consider adding explicit verification steps after volume creation:
 And Create volume 0 with    numberOfReplicas=6    dataEngine=${DATA_ENGINE}
+And Wait for volume 0 state running
 And Attach volume 0
 And Wait for volume 0 healthy
 And Write data to volume 0
77-78: Consolidate duplicate comments.

The same comments are repeated in two places. Consider moving these to a [Documentation] section or creating a custom keyword that encapsulates this behavior.

Example:
*** Keywords ***
Verify No Additional Scheduling Occurs
    [Arguments]    ${volume_index}    ${node_index}    ${expected_count}
    [Documentation]    Verifies that the replica count remains stable
    ...    by checking multiple times with delays
    Volume ${volume_index} should have ${expected_count} replicas running on node ${node_index} and no additional scheduling occurs
Also applies to: 88-89
e2e/libs/volume/rest.py (1)
239-242: Simplify conditional logic using logical operators.

The nested if conditions can be simplified for better readability.
-                if node_name and r.hostId == node_name and r.running:
-                    running_replica_count += 1
-                elif not node_name and r.running:
-                    running_replica_count += 1
+                if r.running and (not node_name or r.hostId == node_name):
+                    running_replica_count += 1
🧰 Tools

🪛 Ruff

239-242: Combine if branches using logical or operator

Combine if branches

(SIM114)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 433fd33 and 36aaa0b.

📒 Files selected for processing (6)

e2e/keywords/volume.resource (1 hunks)
e2e/libs/keywords/volume_keywords.py (1 hunks)
e2e/libs/volume/crd.py (1 hunks)
e2e/libs/volume/rest.py (1 hunks)
e2e/libs/volume/volume.py (1 hunks)
e2e/tests/regression/test_scheduling.robot (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

e2e/libs/keywords/volume_keywords.py
e2e/libs/volume/crd.py
e2e/libs/volume/volume.py

🧰 Additional context used

🪛 Ruff

e2e/libs/volume/rest.py

239-242: Combine if branches using logical or operator

Combine if branches

(SIM114)

244-247: Combine if branches using logical or operator

Combine if branches

(SIM114)

🔇 Additional comments (5)

e2e/tests/regression/test_scheduling.robot (1)

13-13: LGTM!

The addition of node.resource follows the existing import convention and is required for the node-related operations in the new test case.

e2e/keywords/volume.resource (3)

234-237: LGTM: Clean and focused implementation

The keyword provides a clear way to validate the total replica count for a volume.

238-245: LGTM: Well-structured with proper variable management

The keyword correctly validates node-specific replica counts and sets test variables for reuse in subsequent keywords.

246-253: LGTM: Consistent with established patterns

The keyword maintains consistency with the previous implementation while providing flexibility for unspecified replica counts.

e2e/libs/volume/rest.py (1)

Line range hint 251-319: LGTM! Improved logging for better observability.

The added logging statements enhance the ability to track replica rebuilding progress without changing the core functionality.

🧰 Tools

🪛 Ruff

239-242: Combine if branches using logical or operator

Combine if branches

(SIM114)

244-247: Combine if branches using logical or operator

Combine if branches

(SIM114)

coderabbitai · 2024-11-14T00:45:47Z

e2e/tests/regression/test_scheduling.robot

+    When Enable node 2 scheduling
+    # wait for auto balance
+    Then Volume 0 should have replicas running on node 2
+    And Volume 0 should have 6 replicas running


🛠️ Refactor suggestion

Add delay after enabling scheduling.

There might be a race condition between enabling scheduling and checking replica distribution. Consider adding a small delay or wait condition.

When Enable node 2 scheduling +And Sleep 5s # Allow scheduling to take effect # wait for auto balance Then Volume 0 should have replicas running on node 2

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

When Enable node 2 scheduling

# wait for auto balance

Then Volume 0 should have replicas running on node 2

And Volume 0 should have 6 replicas running

When Enable node 2 scheduling

And Sleep 5s # Allow scheduling to take effect

# wait for auto balance

Then Volume 0 should have replicas running on node 2

And Volume 0 should have 6 replicas running

coderabbitai · 2024-11-14T00:45:47Z

e2e/keywords/volume.resource

+Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs
+    Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}
+    FOR    ${i}    IN RANGE    3
+        Log to console    Ensuring there's no additional scheduling for node ${node_name} ... (${i})
+        ${new_replica_count} =    wait_for_replica_count    ${volume_name}    node_name=${node_name}    replica_count=${expected_replica_count}
+        Should Be Equal As Integers    ${replica_count}    ${new_replica_count}
+        Sleep    5
+    END
+
+Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs
+    Volume ${volume_id} should have replicas running on node ${node_id}
+    FOR    ${i}    IN RANGE    3
+        Log to console    Ensuring there's no additional scheduling for node ${node_name} ... (${i})
+        ${new_replica_count} =    wait_for_replica_count    ${volume_name}    node_name=${node_name}
+        Should Be Equal As Integers    ${replica_count}    ${new_replica_count}
+        Sleep    5
+    END


🛠️ Refactor suggestion

Extract common scheduling verification logic

Both keywords share similar scheduling verification logic. To improve maintainability and configurability:

Extract the common scheduling verification logic into a new keyword

Make retry count and sleep duration configurable

Improve logging with progress indication

Apply this refactor:

+Verify No Additional Scheduling + [Arguments] ${volume_name} ${node_name} ${replica_count} ${retry_count}=3 ${sleep_duration}=5 + FOR ${i} IN RANGE ${retry_count} + Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count}) + ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${replica_count} + Should Be Equal As Integers ${replica_count} ${new_replica_count} + Sleep ${sleep_duration} + END Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} - FOR ${i} IN RANGE 3 - Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i}) - ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_replica_count} - Should Be Equal As Integers ${replica_count} ${new_replica_count} - Sleep 5 - END + Verify No Additional Scheduling ${volume_name} ${node_name} ${expected_replica_count} Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs Volume ${volume_id} should have replicas running on node ${node_id} - FOR ${i} IN RANGE 3 - Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i}) - ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} - Should Be Equal As Integers ${replica_count} ${new_replica_count} - Sleep 5 - END + Verify No Additional Scheduling ${volume_name} ${node_name} ${replica_count}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs

Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}

FOR ${i} IN RANGE 3

Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i})

${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_replica_count}

Should Be Equal As Integers ${replica_count} ${new_replica_count}

Sleep 5

END

Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs

Volume ${volume_id} should have replicas running on node ${node_id}

FOR ${i} IN RANGE 3

Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i})

${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name}

Should Be Equal As Integers ${replica_count} ${new_replica_count}

Sleep 5

END

Verify No Additional Scheduling

[Arguments] ${volume_name} ${node_name} ${replica_count} ${retry_count}=3 ${sleep_duration}=5

FOR ${i} IN RANGE ${retry_count}

Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count})

${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${replica_count}

Should Be Equal As Integers ${replica_count} ${new_replica_count}

Sleep ${sleep_duration}

END

Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs

Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}

Verify No Additional Scheduling ${volume_name} ${node_name} ${expected_replica_count}

Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs

Volume ${volume_id} should have replicas running on node ${node_id}

Verify No Additional Scheduling ${volume_name} ${node_name} ${replica_count}

coderabbitai · 2024-11-14T00:45:47Z

e2e/libs/volume/rest.py

+    def wait_for_replica_count(self, volume_name, node_name, replica_count):
+        for i in range(self.retry_count):
+            running_replica_count = 0
+            volume = get_longhorn_client().by_id_volume(volume_name)
+            for r in volume.replicas:
+                if node_name and r.hostId == node_name and r.running:
+                    running_replica_count += 1
+                elif not node_name and r.running:
+                    running_replica_count += 1
+            logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})")
+            if replica_count and running_replica_count == int(replica_count):
+                break
+            elif not replica_count and running_replica_count:
+                break
+            time.sleep(self.retry_interval)
+        return running_replica_count


🛠️ Refactor suggestion

Add assertion for expected replica count.

The method should validate that the expected count is met before returning, as suggested in the past review. This ensures that test failures are caught early with clear error messages.

def wait_for_replica_count(self, volume_name, node_name, replica_count): for i in range(self.retry_count): running_replica_count = 0 volume = get_longhorn_client().by_id_volume(volume_name) for r in volume.replicas: - if node_name and r.hostId == node_name and r.running: - running_replica_count += 1 - elif not node_name and r.running: - running_replica_count += 1 + if r.running and (not node_name or r.hostId == node_name): + running_replica_count += 1 logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})") if replica_count and running_replica_count == int(replica_count): + logging(f"Successfully found {running_replica_count} running replicas") break elif not replica_count and running_replica_count: + logging(f"Successfully found {running_replica_count} running replicas") break time.sleep(self.retry_interval) + if replica_count: + assert running_replica_count == int(replica_count), \ + f"Expected {replica_count} running replicas but got {running_replica_count}" return running_replica_count

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def wait_for_replica_count(self, volume_name, node_name, replica_count):

for i in range(self.retry_count):

running_replica_count = 0

volume = get_longhorn_client().by_id_volume(volume_name)

for r in volume.replicas:

if node_name and r.hostId == node_name and r.running:

running_replica_count += 1

elif not node_name and r.running:

running_replica_count += 1

logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})")

if replica_count and running_replica_count == int(replica_count):

break

elif not replica_count and running_replica_count:

break

time.sleep(self.retry_interval)

return running_replica_count

def wait_for_replica_count(self, volume_name, node_name, replica_count):

for i in range(self.retry_count):

running_replica_count = 0

volume = get_longhorn_client().by_id_volume(volume_name)

for r in volume.replicas:

if r.running and (not node_name or r.hostId == node_name):

running_replica_count += 1

logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})")

if replica_count and running_replica_count == int(replica_count):

logging(f"Successfully found {running_replica_count} running replicas")

break

elif not replica_count and running_replica_count:

logging(f"Successfully found {running_replica_count} running replicas")

break

time.sleep(self.retry_interval)

if replica_count:

assert running_replica_count == int(replica_count), \

f"Expected {replica_count} running replicas but got {running_replica_count}"

return running_replica_count

🧰 Tools

🪛 Ruff

239-242: Combine if branches using logical or operator

Combine if branches

(SIM114)

244-247: Combine if branches using logical or operator

Combine if branches

(SIM114)

Refined.

yangchiu self-assigned this Nov 13, 2024

yangchiu requested a review from a team as a code owner November 13, 2024 05:14

yangchiu force-pushed the auto-balance-least-effort branch from 92c0e2c to 2053fd1 Compare November 13, 2024 05:16

coderabbitai bot reviewed Nov 13, 2024

View reviewed changes

khushboo-rancher previously requested changes Nov 13, 2024

View reviewed changes

yangchiu force-pushed the auto-balance-least-effort branch from 2053fd1 to 433fd33 Compare November 14, 2024 00:17

coderabbitai bot reviewed Nov 14, 2024

View reviewed changes

yangchiu force-pushed the auto-balance-least-effort branch 2 times, most recently from 9bdeade to a429022 Compare November 14, 2024 00:31

test(robot): migrate test_replica_auto_balance_node_least_effort

36aaa0b

Signed-off-by: Yang Chiu <yang.chiu@suse.com>

yangchiu force-pushed the auto-balance-least-effort branch from a429022 to 36aaa0b Compare November 14, 2024 00:38

coderabbitai bot reviewed Nov 14, 2024

View reviewed changes

yangchiu requested review from khushboo-rancher and a team November 14, 2024 02:04

yangchiu requested review from khushboo-rancher and removed request for khushboo-rancher November 15, 2024 03:12

Merge branch 'master' into auto-balance-least-effort

406ae36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(robot): migrate test_replica_auto_balance_node_least_effort #2163

test(robot): migrate test_replica_auto_balance_node_least_effort #2163

yangchiu commented Nov 13, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 13, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 13, 2024

coderabbitai bot Nov 13, 2024

coderabbitai bot Nov 13, 2024

coderabbitai bot Nov 13, 2024

khushboo-rancher Nov 13, 2024

yangchiu Nov 14, 2024

khushboo-rancher Nov 13, 2024

yangchiu Nov 14, 2024

khushboo-rancher Nov 13, 2024

yangchiu Nov 14, 2024

mergify bot commented Nov 14, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 14, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 14, 2024

coderabbitai bot Nov 14, 2024

coderabbitai bot Nov 14, 2024

		Then Volume 0 should have replicas running on node 1
		And Volume 0 should have 6 replicas running

		And Volume 0 should have replicas running on node 0 and no additional scheduling occurs
		And Volume 0 should have replicas running on node 1 and no additional scheduling occurs

test(robot): migrate test_replica_auto_balance_node_least_effort #2163

Are you sure you want to change the base?

test(robot): migrate test_replica_auto_balance_node_least_effort #2163

Conversation

yangchiu commented Nov 13, 2024 • edited by coderabbitai bot Loading

Which issue(s) this PR fixes:

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Nov 13, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 13, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 13, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 13, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 13, 2024

Choose a reason for hiding this comment

khushboo-rancher Nov 13, 2024

Choose a reason for hiding this comment

yangchiu Nov 14, 2024

Choose a reason for hiding this comment

khushboo-rancher Nov 13, 2024

Choose a reason for hiding this comment

yangchiu Nov 14, 2024

Choose a reason for hiding this comment

khushboo-rancher Nov 13, 2024

Choose a reason for hiding this comment

yangchiu Nov 14, 2024

Choose a reason for hiding this comment

mergify bot commented Nov 14, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 14, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 14, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 14, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 14, 2024

Choose a reason for hiding this comment

yangchiu commented Nov 13, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 13, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)