-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(robot): migrate test_replica_auto_balance_node_least_effort #2163
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request introduces new functionalities for managing volume replicas. It adds several keywords for validating the number of running replicas on specific nodes and a new method Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🔇 Additional comments (4)e2e/keywords/volume.resource (4)
The implementation correctly verifies the replica count using the
The keyword correctly verifies the replica count on a specific node and properly sets test variables for reuse in subsequent keywords.
The keyword correctly verifies replicas on a specific node without enforcing a specific count.
Refactor scheduling verification keywords to improve maintainability The scheduling verification keywords have duplicate code and use hardcoded values. Consider extracting the common logic into a reusable keyword. Apply this refactor to improve maintainability and configurability: +Verify No Additional Scheduling For Node
+ [Arguments] ${volume_name} ${node_name} ${replica_count} ${retry_count}=3 ${sleep_duration}=5
+ FOR ${i} IN RANGE ${retry_count}
+ Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count})
+ ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${replica_count}
+ Should Be Equal As Integers ${replica_count} ${new_replica_count}
+ Sleep ${sleep_duration}
+ END
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}
- FOR ${i} IN RANGE 3
- Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i})
- ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_replica_count}
- Should Be Equal As Integers ${replica_count} ${new_replica_count}
- Sleep 5
- END
+ Verify No Additional Scheduling For Node ${volume_name} ${node_name} ${expected_replica_count}
Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs
Volume ${volume_id} should have replicas running on node ${node_id}
- FOR ${i} IN RANGE 3
- Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i})
- ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name}
- Should Be Equal As Integers ${replica_count} ${new_replica_count}
- Sleep 5
- END
+ Verify No Additional Scheduling For Node ${volume_name} ${node_name} ${replica_count} This refactor:
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
92c0e2c
to
2053fd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (10)
e2e/tests/regression/test_scheduling.robot (3)
56-57
: Add test documentationPlease add a [Documentation] section to describe:
- Test purpose and objectives
- Preconditions and assumptions
- Expected behavior of replica auto-balancing
- Significance of using 6 replicas
Example:
[Documentation] Test replica auto-balancing with least-effort strategy. ... ... This test verifies that: ... 1. Replicas are correctly distributed when nodes become schedulable ... 2. Auto-balancing maintains the total replica count ... 3. No unnecessary scheduling occurs after balancing ... ... Uses 6 replicas to ensure multiple replicas can be balanced across nodes.
72-78
: Move wait condition comments to keywordsThe comments about loop behavior and wait conditions should be encapsulated in the keywords themselves rather than as inline comments.
Consider creating a new keyword that encapsulates this logic:
*** Keywords *** Wait For Stable Replica Distribution [Arguments] ${volume_name} ${node} ${expected_count} FOR ${i} IN RANGE 3 Volume ${volume_name} should have ${expected_count} replicas running on node ${node} Sleep 5s END
63-66
: Consider parameterizing the replica countThe value 6 for numberOfReplicas appears to be significant for this test. Consider making it a variable to improve maintainability and clarity.
*** Variables *** ${REPLICA_COUNT_FOR_BALANCE_TEST} 6 # Ensures multiple replicas per node for balance testing *** Test Cases *** ... And Create volume 0 with numberOfReplicas=${REPLICA_COUNT_FOR_BALANCE_TEST} dataEngine=${DATA_ENGINE}e2e/libs/volume/volume.py (1)
128-130
: LGTM! Consider adding docstring documentation.The new method follows the established pattern of delegating to the underlying implementation and aligns well with the class's existing functionality for replica management.
Consider adding a docstring to document the method's purpose and parameters:
def wait_for_replica_count(self, volume_name, node_name, replica_count): + """Wait until the specified number of replicas are present on the given node. + + Args: + volume_name (str): Name of the volume + node_name (str): Name of the node to check replicas on + replica_count (int): Expected number of replicas + + Returns: + bool: True if the desired replica count is reached + """ return self.volume.wait_for_replica_count(volume_name, node_name, replica_count)e2e/libs/keywords/volume_keywords.py (1)
239-241
: Add documentation and logging for consistencyThe implementation looks good and follows the class patterns. However, for better maintainability and consistency with other methods in the class, consider these improvements:
def wait_for_replica_count(self, volume_name, node_name=None, replica_count=None): + """Wait for the volume to have the specified number of replicas. + + Args: + volume_name: Name of the volume to check + node_name: Optional node name to check replicas on a specific node + replica_count: Optional specific number of replicas to wait for + + Returns: + Result from volume.wait_for_replica_count + """ + logging(f'Waiting for volume {volume_name} to have {replica_count} replicas' + + (f' on node {node_name}' if node_name else '')) return self.volume.wait_for_replica_count(volume_name, node_name, replica_count)e2e/keywords/volume.resource (2)
234-237
: Remove unused variable assignmentThe
${replica_count}
variable is assigned but never used. Consider removing the assignment for cleaner code.- ${replica_count} = wait_for_replica_count ${volume_name} node_name= replica_count=${expected_replica_count} + wait_for_replica_count ${volume_name} node_name= replica_count=${expected_replica_count}
246-253
: Consider refactoring to reduce code duplicationThis keyword is very similar to
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}
. Consider refactoring to reuse logic:Volume ${volume_id} should have replicas running on node ${node_id} ${volume_name} = generate_name_with_suffix volume ${volume_id} ${node_name} = get_node_by_index ${node_id} - ${replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} - Set Test Variable ${volume_name} - Set Test Variable ${node_name} - Set Test Variable ${replica_count} + Volume ${volume_id} should have ${1} replicas running on node ${node_id}e2e/libs/volume/rest.py (2)
234-249
: Simplify conditional logic and add error handlingThe implementation can be improved in two ways:
- Combine the conditional checks for better readability
- Add error handling for the volume client call
Apply this diff to improve the code:
def wait_for_replica_count(self, volume_name, node_name, replica_count): for i in range(self.retry_count): - running_replica_count = 0 - volume = get_longhorn_client().by_id_volume(volume_name) - for r in volume.replicas: - if node_name and r.hostId == node_name and r.running: - running_replica_count += 1 - elif not node_name and r.running: - running_replica_count += 1 - logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})") - if replica_count and running_replica_count == int(replica_count): - break - elif not replica_count and running_replica_count: - break + try: + running_replica_count = 0 + volume = get_longhorn_client().by_id_volume(volume_name) + for r in volume.replicas: + if r.running and (not node_name or r.hostId == node_name): + running_replica_count += 1 + logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})") + if (replica_count and running_replica_count == int(replica_count)) or \ + (not replica_count and running_replica_count): + break + except Exception as e: + logging(f"Failed to get volume {volume_name} with error: {e}") time.sleep(self.retry_interval) return running_replica_count🧰 Tools
🪛 Ruff
239-242: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
244-247: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
Line range hint
1-332
: Consider architectural improvements for better maintainabilityThe
Rest
class currently handles both volume operations and monitoring. Consider the following architectural improvements:
- Split the class into separate concerns:
VolumeOperations
for basic CRUD operationsVolumeMonitor
for monitoring and waiting operations- Extract common retry mechanism into a decorator or base class
This would improve maintainability and make the code more modular.
🧰 Tools
🪛 Ruff
239-242: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
244-247: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
e2e/libs/volume/crd.py (1)
267-269
: Add documentation and type hints.The implementation looks good and follows the established delegation pattern. Consider adding documentation and type hints for better maintainability:
- def wait_for_replica_count(self, volume_name, node_name, replica_count): + def wait_for_replica_count(self, volume_name: str, node_name: str, replica_count: int) -> bool: + """Wait until the specified number of replicas are running on the given node. + + Args: + volume_name: Name of the volume to check + node_name: Name of the node to check + replica_count: Expected number of replicas + + Returns: + bool: True if the replica count matches, False otherwise + """ return Rest().wait_for_replica_count(volume_name, node_name, replica_count)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
e2e/keywords/volume.resource
(1 hunks)e2e/libs/keywords/volume_keywords.py
(1 hunks)e2e/libs/volume/crd.py
(1 hunks)e2e/libs/volume/rest.py
(1 hunks)e2e/libs/volume/volume.py
(1 hunks)e2e/tests/regression/test_scheduling.robot
(2 hunks)
🧰 Additional context used
🪛 Ruff
e2e/libs/volume/rest.py
239-242: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
244-247: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
🔇 Additional comments (4)
e2e/tests/regression/test_scheduling.robot (1)
13-13
: LGTM: Resource addition is appropriate
The addition of node.resource aligns with the test's requirements for node management operations.
e2e/libs/keywords/volume_keywords.py (1)
239-241
: Implementation aligns well with test migration needs
The method provides the necessary functionality for the migrated test case test_replica_auto_balance_node_least_effort
to verify replica counts and distribution during auto-balancing scenarios.
e2e/keywords/volume.resource (1)
238-245
: LGTM! Well-structured keyword with proper variable management
The implementation correctly validates node-specific replica count and sets test variables for reuse.
e2e/libs/volume/rest.py (1)
Line range hint 251-332
: LGTM! Enhanced logging improves observability
The added logging statements improve the observability of the replica rebuilding process, making it easier to track progress and debug issues.
🧰 Tools
🪛 Ruff
239-242: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
244-247: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
Then Volume 0 should have replicas running on node 1 | ||
And Volume 0 should have 6 replicas running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add timeout parameters to wait conditions
The test should include explicit timeout parameters for waiting for replica distribution to ensure test stability.
Then Volume 0 should have replicas running on node 1 timeout=300
And Volume 0 should have ${REPLICA_COUNT_FOR_BALANCE_TEST} replicas running timeout=300
Also applies to: 84-85
Given Set setting replica-soft-anti-affinity to true | ||
And Set setting replica-auto-balance to least-effort | ||
|
||
When Disable node 1 scheduling | ||
And Disable node 2 scheduling | ||
And Create volume 0 with numberOfReplicas=6 dataEngine=${DATA_ENGINE} | ||
And Attach volume 0 | ||
And Wait for volume 0 healthy | ||
And Write data to volume 0 | ||
Then Volume 0 should have 6 replicas running on node 0 | ||
And Volume 0 should have 0 replicas running on node 1 | ||
And Volume 0 should have 0 replicas running on node 2 | ||
|
||
When Enable node 1 scheduling | ||
# wait for auto balance | ||
Then Volume 0 should have replicas running on node 1 | ||
And Volume 0 should have 6 replicas running | ||
# loop 3 times with 5-second wait and compare the replica count to: | ||
# ensure no additional scheduling occurs | ||
# the replica count remains unchanged | ||
And Volume 0 should have replicas running on node 0 and no additional scheduling occurs | ||
And Volume 0 should have replicas running on node 1 and no additional scheduling occurs | ||
And Volume 0 should have 0 replicas running on node 2 and no additional scheduling occurs | ||
|
||
When Enable node 2 scheduling | ||
# wait for auto balance | ||
Then Volume 0 should have replicas running on node 2 | ||
And Volume 0 should have 6 replicas running | ||
# loop 3 times with 5-second wait and compare the replica count to: | ||
# ensure no additional scheduling occurs | ||
# the replica count remains unchanged | ||
And Volume 0 should have replicas running on node 0 and no additional scheduling occurs | ||
And Volume 0 should have replicas running on node 1 and no additional scheduling occurs | ||
And Volume 0 should have replicas running on node 2 and no additional scheduling occurs | ||
|
||
And Wait for volume 0 healthy | ||
And Check volume 0 data is intact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Missing Test Coverage for Edge Cases
The following edge cases are not covered by the existing tests:
- Node becomes unschedulable during balancing
- Volume operations during balancing
- Multiple volumes being balanced simultaneously
🔗 Analysis chain
Verify test coverage for edge cases
The test covers the happy path but should verify additional scenarios:
- Node becomes unschedulable during balancing
- Volume operations during balancing
- Multiple volumes being balanced simultaneously
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for existing test coverage of these scenarios
rg -l "Test.*Auto.*Balance" "e2e/tests"
# Look for related test implementations
ast-grep --pattern 'Test.*Auto.*Balance.*\n.*\[Documentation\].*\n.*'
Length of output: 154
Script:
#!/bin/bash
# Check for tests where a node becomes unschedulable during balancing
rg -l "unschedulable.*balancing" "e2e/tests"
# Check for tests involving volume operations during balancing
rg -l "volume.*operations.*balancing" "e2e/tests"
# Check for tests handling multiple volumes being balanced simultaneously
rg -l "multiple.*volumes.*balancing" "e2e/tests"
# Look for related test implementations with documentation
ast-grep --pattern 'Test.*unschedulable.*balancing.*\n.*\[Documentation\].*\n.*'
ast-grep --pattern 'Test.*volume.*operations.*balancing.*\n.*\[Documentation\].*\n.*'
ast-grep --pattern 'Test.*multiple.*volumes.*balancing.*\n.*\[Documentation\].*\n.*'
Length of output: 401
Script:
#!/bin/bash
# Broaden search for tests where a node becomes unschedulable during balancing
rg -il "node.*unschedulable.*balance" "e2e/tests"
# Broaden search for tests involving volume operations during balancing
rg -il "volume.*operation.*balance" "e2e/tests"
# Broaden search for tests handling multiple volumes being balanced simultaneously
rg -il "multiple.*volume.*balance" "e2e/tests"
# Look for any tests related to node scheduling changes with documentation
ast-grep --pattern 'Test.*node.*scheduling.*\n.*\[Documentation\].*\n.*'
# Look for any tests related to volume operations with documentation
ast-grep --pattern 'Test.*volume.*operation.*\n.*\[Documentation\].*\n.*'
# Look for any tests related to multiple volumes with documentation
ast-grep --pattern 'Test.*multiple.*volumes.*\n.*\[Documentation\].*\n.*'
Length of output: 371
Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs | ||
Volume ${volume_id} should have replicas running on node ${node_id} | ||
FOR ${i} IN RANGE 3 | ||
Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i}) | ||
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} | ||
Should Be Equal As Integers ${replica_count} ${new_replica_count} | ||
Sleep 5 | ||
END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor to reduce duplication and improve robustness
This keyword duplicates most of the logic from the previous keyword. Consider refactoring both keywords to share common logic and apply the same improvements regarding configurable retry counts and sleep duration.
Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs
Volume ${volume_id} should have replicas running on node ${node_id}
- FOR ${i} IN RANGE 3
- Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i})
+ ${retry_count} = Set Variable 3
+ ${sleep_duration} = Set Variable 5
+ FOR ${i} IN RANGE ${retry_count}
+ Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count})
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name}
Should Be Equal As Integers ${replica_count} ${new_replica_count}
- Sleep 5
+ Sleep ${sleep_duration}
END
Consider creating a new keyword to handle the scheduling verification:
Verify No Additional Scheduling For Volume ${volume_id} On Node ${node_id} With Count ${expected_count}
[Arguments] ${retry_count}=3 ${sleep_duration}=5
${node_name} = get_node_by_index ${node_id}
FOR ${i} IN RANGE ${retry_count}
Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count})
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_count}
Should Be Equal As Integers ${replica_count} ${new_replica_count}
Sleep ${sleep_duration}
END
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs | ||
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} | ||
FOR ${i} IN RANGE 3 | ||
Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i}) | ||
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_replica_count} | ||
Should Be Equal As Integers ${replica_count} ${new_replica_count} | ||
Sleep 5 | ||
END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve robustness of scheduling verification
A few suggestions to improve this keyword:
- The iteration count (3) and sleep duration (5) are hardcoded. Consider making these configurable.
- The logging message uses
${node_name}
which might not be set if the first keyword fails.
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}
- FOR ${i} IN RANGE 3
- Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i})
+ ${node_name} = get_node_by_index ${node_id} # Set node_name for logging
+ ${retry_count} = Set Variable 3
+ ${sleep_duration} = Set Variable 5
+ FOR ${i} IN RANGE ${retry_count}
+ Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count})
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_replica_count}
Should Be Equal As Integers ${replica_count} ${new_replica_count}
- Sleep 5
+ Sleep ${sleep_duration}
END
Committable suggestion skipped: line range outside the PR's diff.
@@ -51,3 +52,43 @@ Test Soft Anti Affinity Scheduling | |||
Then Wait until volume 0 replicas rebuilding completed | |||
And Wait for volume 0 healthy | |||
And Check volume 0 data is intact | |||
|
|||
Test Replica Auto Balance Node Least Effort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a brief description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
And Volume 0 should have replicas running on node 0 and no additional scheduling occurs | ||
And Volume 0 should have replicas running on node 1 and no additional scheduling occurs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include the number of replicas as 3 running on node 0 & node 1 also in the verification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
And Volume 0 should have replicas running on node 0 and no additional scheduling occurs | ||
And Volume 0 should have replicas running on node 1 and no additional scheduling occurs | ||
And Volume 0 should have replicas running on node 2 and no additional scheduling occurs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above comment, including the replica count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
2053fd1
to
433fd33
Compare
This pull request is now in conflict. Could you fix it @yangchiu? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (4)
e2e/tests/regression/test_scheduling.robot (2)
56-58
: Enhance test case documentationThe documentation should include:
- Purpose of testing replica auto-balance with least-effort strategy
- Expected behavior of replica distribution
- Key test steps and assertions
Example documentation:
[Documentation] Test replica auto-balance behavior with least-effort strategy ... ... This test verifies that replicas are automatically balanced across nodes when: ... 1. Nodes become available for scheduling ... 2. replica-auto-balance is set to least-effort ... 3. replica-soft-anti-affinity is enabled ... ... Expected behavior: ... - Initially all replicas should be on node 0 (only schedulable node) ... - When node 1 becomes schedulable, replicas should be balanced between node 0 and 1 ... - When node 2 becomes schedulable, replicas should be balanced across all nodes ... - Total replica count should remain constant during balancing
74-82
: Reduce code duplication with a helper keywordThe verification blocks are duplicated. Consider creating a helper keyword to improve maintainability.
Example:
*** Keywords *** Verify Replica Distribution [Arguments] ${volume_name} ${expected_nodes} ${timeout}=300 FOR ${node} IN @{expected_nodes} Volume ${volume_name} should have replicas running on node ${node} timeout=${timeout} Volume ${volume_name} should have replicas running on node ${node} and no additional scheduling occurs END Volume ${volume_name} should have 6 replicas running timeout=${timeout}Usage:
Then Verify Replica Distribution 0 ${expected_nodes}=['0', '1'] ... Then Verify Replica Distribution 0 ${expected_nodes}=['0', '1', '2']Also applies to: 84-92
e2e/libs/volume/rest.py (2)
220-236
: Add docstring documentation for the new method.The method lacks documentation describing its parameters, return value, and behavior. This is especially important for test automation code.
Add a docstring like this:
def wait_for_replica_count(self, volume_name, node_name, replica_count): + """Wait for the specified number of running replicas for a volume. + + Args: + volume_name: Name of the volume to check + node_name: If provided, only count replicas on this node + replica_count: If provided, wait for this exact count, otherwise wait for any running replica + + Returns: + int: The final count of running replicas + """🧰 Tools
🪛 Ruff
225-228: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
230-233: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
225-228
: Simplify replica counting conditions.The conditional logic for counting replicas can be simplified using a logical OR operator.
- if node_name and r.hostId == node_name and r.running: - running_replica_count += 1 - elif not node_name and r.running: - running_replica_count += 1 + if r.running and (not node_name or r.hostId == node_name): + running_replica_count += 1🧰 Tools
🪛 Ruff
225-228: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
e2e/keywords/volume.resource
(1 hunks)e2e/libs/keywords/volume_keywords.py
(1 hunks)e2e/libs/volume/crd.py
(1 hunks)e2e/libs/volume/rest.py
(1 hunks)e2e/libs/volume/volume.py
(1 hunks)e2e/tests/regression/test_scheduling.robot
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- e2e/libs/keywords/volume_keywords.py
- e2e/libs/volume/crd.py
- e2e/libs/volume/volume.py
🧰 Additional context used
🪛 Ruff
e2e/libs/volume/rest.py
225-228: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
230-233: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
🔇 Additional comments (10)
e2e/tests/regression/test_scheduling.robot (4)
13-13
: LGTM!
The node resource import is properly added and aligns with the test requirements.
62-71
: Add timeout parameters to wait conditions
For test stability, explicit timeout parameters should be added to the wait conditions.
Apply this pattern to the following lines:
And Wait for volume 0 healthy timeout=300
Then Volume 0 should have 6 replicas running on node 0 timeout=300
And Volume 0 should have 0 replicas running on node 1 timeout=300
And Volume 0 should have 0 replicas running on node 2 timeout=300
94-95
: LGTM!
Good practice to verify volume health and data integrity after balancing operations.
59-61
: Verify settings persistence
The test relies on these settings. Let's ensure they persist throughout the test.
✅ Verification successful
Based on the search results, I can see that these settings are used in two test files:
test_scheduling.robot
: Sets both settings as part of the test setupnode_reboot.robot
: Setsreplica-soft-anti-affinity
to false
Let me verify if there are any teardown or cleanup steps for these settings.
Let me check what's included in the Cleanup test resources
keyword since both test files use it.
Based on the investigation:
- Both test files use
Cleanup test resources
in their teardown - The cleanup function doesn't explicitly reset these settings
- The
setting.resource
file shows no default value restoration - The settings are modified in different tests with different values
Let me check if there are any default settings or global setup that might handle this.
I notice there's a reset_settings()
method in the Python code. Let me check if and where it's used, and also look at the Setting
class implementation.
Settings are properly managed through global reset mechanism
The investigation reveals that settings persistence is properly handled:
- The
reset_settings()
method inSetting
class systematically resets all non-readonly settings to their default values - This reset is available through the
Cleanup test resources
keyword which is used in the test teardown for both files - The settings
replica-soft-anti-affinity
andreplica-auto-balance
are not in the exclusion list ofreset_settings()
- Different test files setting different values won't interfere as each test has its own setup and teardown
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if there are any other tests that might modify these settings
rg -B 2 -A 2 "Set setting (replica-soft-anti-affinity|replica-auto-balance)" "e2e/tests"
# Check if settings are reset in teardown
rg "Set setting.*(replica-soft-anti-affinity|replica-auto-balance)" "e2e/keywords/common.resource"
Length of output: 1612
Script:
#!/bin/bash
# Check for teardown or cleanup sections in these files
rg -B 5 -A 5 "^\[Teardown\]|Test Teardown|Suite Teardown" "e2e/tests/regression/test_scheduling.robot" "e2e/tests/negative/node_reboot.robot"
# Check for any global teardown in robot framework resource files
fd -e resource -x rg -B 5 -A 5 "^\[Teardown\]|Test Teardown|Suite Teardown" {}
Length of output: 1663
Script:
#!/bin/bash
# Search for the definition of Cleanup test resources
rg -B 5 -A 10 "Cleanup test resources" "e2e/keywords/"
# Also check setting.resource for any default settings restoration
fd "setting.resource" -x cat {}
Length of output: 1493
Script:
#!/bin/bash
# Check for any global setup or default settings
rg -B 5 -A 5 "Set test environment" "e2e/keywords/"
# Check Python implementation for any default settings
fd "setting_keywords.py" -x cat {}
Length of output: 1139
Script:
#!/bin/bash
# Check for usage of reset_settings
rg "reset_settings" "e2e/"
# Look for the Setting class implementation
fd "setting.py" -x cat {}
Length of output: 4992
e2e/libs/volume/rest.py (1)
220-236
: Verify integration with replica auto-balancing tests.
The new method is part of migrating replica auto-balancing tests. Let's verify its usage in test cases.
✅ Verification successful
Integration with replica auto-balancing tests verified successfully
The wait_for_replica_count
method is properly integrated with the auto-balancing test migration:
- It's part of the commit
433fd33
that migrates thetest_replica_auto_balance_node_least_effort
test - The method is implemented across the volume handling layers (rest.py, crd.py, volume.py) and is used in Robot Framework tests
- The test_scheduling.robot file contains the auto-balance test that uses this functionality
- The implementation supports both specific node counting (when node_name is provided) and total replica counting (when node_name is None), which is essential for auto-balancing verification
The method is correctly implemented and integrated for its intended use in replica auto-balancing tests.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find test files using this method and verify integration
# Expected: Find usage in auto-balancing test files
# Search for test files using wait_for_replica_count
echo "Searching for test files using wait_for_replica_count..."
rg -l "wait_for_replica_count.*node.*replica" --type python
# Search for auto-balance test files
echo "Searching for auto-balance test files..."
rg -l "test_replica_auto_balance" --type python
Length of output: 385
Script:
#!/bin/bash
# Let's try without the --type flag since python type is not recognized
# Search for test files using wait_for_replica_count
echo "Searching for test files using wait_for_replica_count..."
rg -l "wait_for_replica_count.*node.*replica"
# Search for auto-balance test files
echo "Searching for auto-balance test files..."
rg -l "test_replica_auto_balance"
# Also search for the class definition to understand the context
echo "Searching for the class containing wait_for_replica_count..."
rg -B5 "def wait_for_replica_count"
Length of output: 2850
Script:
#!/bin/bash
# Let's check how this method is used in the auto-balance test files
echo "Checking usage in test_zone.py..."
rg -B5 -A5 "wait_for_replica_count" manager/integration/tests/test_zone.py
echo "Checking usage in test_scheduling.py..."
rg -B5 -A5 "wait_for_replica_count" manager/integration/tests/test_scheduling.py
# Let's also check if there are any recent changes or additions related to auto-balancing
echo "Checking recent git history for auto-balance related changes..."
git log --grep="auto.balance\|autobalance" --since="3 months ago" --pretty=format:"%h %s"
Length of output: 623
Script:
#!/bin/bash
# Let's examine the commit that migrated the auto-balance test
echo "Checking the auto-balance test migration commit..."
git show 433fd33 --name-status
# Let's also look for Robot Framework test files that might be using this method
echo "Checking Robot Framework test files for auto-balance tests..."
rg -l "Test Replica Auto Balance" --type robot || true
# And check the actual usage in Robot Framework files
echo "Checking usage of wait_for_replica_count in Robot files..."
rg -B5 -A5 "Wait For Replica Count" --type robot || true
Length of output: 999
Script:
#!/bin/bash
# Let's examine the test_scheduling.robot file for the auto-balance test
echo "Checking test_scheduling.robot for auto-balance test..."
rg -B10 -A10 "Test Replica Auto Balance" e2e/tests/regression/test_scheduling.robot
# Let's also check the implementation in the modified files
echo "Checking volume.resource file..."
cat e2e/keywords/volume.resource
Length of output: 18193
🧰 Tools
🪛 Ruff
225-228: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
230-233: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
e2e/keywords/volume.resource (5)
222-225
: LGTM!
The implementation is correct and provides a clear way to validate the total replica count for a volume.
226-233
: LGTM!
The implementation correctly validates node-specific replica counts and properly sets test variables for reuse in subsequent keywords.
234-241
: LGTM!
The implementation correctly validates the presence of replicas on a specific node without requiring an exact count.
242-249
: 🛠️ Refactor suggestion
Consider refactoring to improve robustness and reduce duplication
The implementation could be improved by:
- Making retry count and sleep duration configurable
- Adding iteration count to logging message
- Creating a shared keyword for scheduling verification
Apply this diff to improve the implementation:
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}
- FOR ${i} IN RANGE 3
- Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i})
+ ${retry_count} = Set Variable 3
+ ${sleep_duration} = Set Variable 5
+ FOR ${i} IN RANGE ${retry_count}
+ Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count})
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_replica_count}
Should Be Equal As Integers ${replica_count} ${new_replica_count}
- Sleep 5
+ Sleep ${sleep_duration}
END
Consider creating a new shared keyword for scheduling verification:
Verify No Additional Scheduling
[Arguments] ${volume_name} ${node_name} ${expected_count} ${retry_count}=3 ${sleep_duration}=5
FOR ${i} IN RANGE ${retry_count}
Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count})
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_count}
Should Be Equal As Integers ${expected_count} ${new_replica_count}
Sleep ${sleep_duration}
END
251-258
: 🛠️ Refactor suggestion
Refactor to reduce code duplication
This keyword is nearly identical to the previous one. Consider using the same shared keyword for scheduling verification to reduce duplication.
Apply this diff:
Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs
Volume ${volume_id} should have replicas running on node ${node_id}
- FOR ${i} IN RANGE 3
- Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i})
- ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name}
- Should Be Equal As Integers ${replica_count} ${new_replica_count}
- Sleep 5
- END
+ Verify No Additional Scheduling ${volume_name} ${node_name} ${replica_count}
if replica_count and running_replica_count == int(replica_count): | ||
break | ||
elif not replica_count and running_replica_count: | ||
break | ||
time.sleep(self.retry_interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add assertion for expected replica count.
The method should assert that the expected count is met before returning. Currently, it silently returns even if the count doesn't match expectations.
if replica_count and running_replica_count == int(replica_count):
+ logging(f"Successfully found {running_replica_count} running replicas")
break
elif not replica_count and running_replica_count:
+ logging(f"Successfully found {running_replica_count} running replicas")
break
time.sleep(self.retry_interval)
+ if replica_count:
+ assert running_replica_count == int(replica_count), \
+ f"Expected {replica_count} running replicas but got {running_replica_count}"
return running_replica_count
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff
230-233: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
9bdeade
to
a429022
Compare
Signed-off-by: Yang Chiu <yang.chiu@suse.com>
a429022
to
36aaa0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (4)
e2e/tests/regression/test_scheduling.robot (3)
58-60
: Enhance test case documentation.While the scenario is mentioned, the documentation could be more comprehensive by including:
- Prerequisites (e.g., required node count, initial state)
- Expected behavior of replica auto-balancing with
least-effort
setting- Validation steps and expected outcomes at each stage
Example documentation structure:
[Documentation] Test replica auto-balance behavior with least-effort setting ... Prerequisites: ... - At least 3 nodes available ... - Initial state: All nodes schedulable ... ... Expected Behavior: ... - With least-effort, replicas should gradually redistribute when nodes become schedulable ... - Distribution should maintain volume health while minimizing replica movement ... ... Steps: ... 1. Configure soft anti-affinity and least-effort auto-balance ... 2. Disable scheduling on nodes 1 & 2 ... 3. Create and verify volume with 6 replicas on node 0 ... 4. Enable node scheduling one by one ... 5. Verify gradual replica redistribution ... 6. Ensure data integrity throughout the process
64-68
: Add volume creation verification steps.Consider adding explicit verification steps after volume creation:
And Create volume 0 with numberOfReplicas=6 dataEngine=${DATA_ENGINE} +And Wait for volume 0 state running And Attach volume 0 And Wait for volume 0 healthy And Write data to volume 0
77-78
: Consolidate duplicate comments.The same comments are repeated in two places. Consider moving these to a [Documentation] section or creating a custom keyword that encapsulates this behavior.
Example:
*** Keywords *** Verify No Additional Scheduling Occurs [Arguments] ${volume_index} ${node_index} ${expected_count} [Documentation] Verifies that the replica count remains stable ... by checking multiple times with delays Volume ${volume_index} should have ${expected_count} replicas running on node ${node_index} and no additional scheduling occursAlso applies to: 88-89
e2e/libs/volume/rest.py (1)
239-242
: Simplify conditional logic using logical operators.The nested if conditions can be simplified for better readability.
- if node_name and r.hostId == node_name and r.running: - running_replica_count += 1 - elif not node_name and r.running: - running_replica_count += 1 + if r.running and (not node_name or r.hostId == node_name): + running_replica_count += 1🧰 Tools
🪛 Ruff
239-242: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
e2e/keywords/volume.resource
(1 hunks)e2e/libs/keywords/volume_keywords.py
(1 hunks)e2e/libs/volume/crd.py
(1 hunks)e2e/libs/volume/rest.py
(1 hunks)e2e/libs/volume/volume.py
(1 hunks)e2e/tests/regression/test_scheduling.robot
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- e2e/libs/keywords/volume_keywords.py
- e2e/libs/volume/crd.py
- e2e/libs/volume/volume.py
🧰 Additional context used
🪛 Ruff
e2e/libs/volume/rest.py
239-242: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
244-247: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
🔇 Additional comments (5)
e2e/tests/regression/test_scheduling.robot (1)
13-13
: LGTM!
The addition of node.resource
follows the existing import convention and is required for the node-related operations in the new test case.
e2e/keywords/volume.resource (3)
234-237
: LGTM: Clean and focused implementation
The keyword provides a clear way to validate the total replica count for a volume.
238-245
: LGTM: Well-structured with proper variable management
The keyword correctly validates node-specific replica counts and sets test variables for reuse in subsequent keywords.
246-253
: LGTM: Consistent with established patterns
The keyword maintains consistency with the previous implementation while providing flexibility for unspecified replica counts.
e2e/libs/volume/rest.py (1)
Line range hint 251-319
: LGTM! Improved logging for better observability.
The added logging statements enhance the ability to track replica rebuilding progress without changing the core functionality.
🧰 Tools
🪛 Ruff
239-242: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
244-247: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
When Enable node 2 scheduling | ||
# wait for auto balance | ||
Then Volume 0 should have replicas running on node 2 | ||
And Volume 0 should have 6 replicas running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add delay after enabling scheduling.
There might be a race condition between enabling scheduling and checking replica distribution. Consider adding a small delay or wait condition.
When Enable node 2 scheduling
+And Sleep 5s # Allow scheduling to take effect
# wait for auto balance
Then Volume 0 should have replicas running on node 2
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
When Enable node 2 scheduling | |
# wait for auto balance | |
Then Volume 0 should have replicas running on node 2 | |
And Volume 0 should have 6 replicas running | |
When Enable node 2 scheduling | |
And Sleep 5s # Allow scheduling to take effect | |
# wait for auto balance | |
Then Volume 0 should have replicas running on node 2 | |
And Volume 0 should have 6 replicas running |
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs | ||
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} | ||
FOR ${i} IN RANGE 3 | ||
Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i}) | ||
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_replica_count} | ||
Should Be Equal As Integers ${replica_count} ${new_replica_count} | ||
Sleep 5 | ||
END | ||
|
||
Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs | ||
Volume ${volume_id} should have replicas running on node ${node_id} | ||
FOR ${i} IN RANGE 3 | ||
Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i}) | ||
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} | ||
Should Be Equal As Integers ${replica_count} ${new_replica_count} | ||
Sleep 5 | ||
END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Extract common scheduling verification logic
Both keywords share similar scheduling verification logic. To improve maintainability and configurability:
- Extract the common scheduling verification logic into a new keyword
- Make retry count and sleep duration configurable
- Improve logging with progress indication
Apply this refactor:
+Verify No Additional Scheduling
+ [Arguments] ${volume_name} ${node_name} ${replica_count} ${retry_count}=3 ${sleep_duration}=5
+ FOR ${i} IN RANGE ${retry_count}
+ Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count})
+ ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${replica_count}
+ Should Be Equal As Integers ${replica_count} ${new_replica_count}
+ Sleep ${sleep_duration}
+ END
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id}
- FOR ${i} IN RANGE 3
- Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i})
- ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_replica_count}
- Should Be Equal As Integers ${replica_count} ${new_replica_count}
- Sleep 5
- END
+ Verify No Additional Scheduling ${volume_name} ${node_name} ${expected_replica_count}
Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs
Volume ${volume_id} should have replicas running on node ${node_id}
- FOR ${i} IN RANGE 3
- Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i})
- ${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name}
- Should Be Equal As Integers ${replica_count} ${new_replica_count}
- Sleep 5
- END
+ Verify No Additional Scheduling ${volume_name} ${node_name} ${replica_count}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs | |
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} | |
FOR ${i} IN RANGE 3 | |
Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i}) | |
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${expected_replica_count} | |
Should Be Equal As Integers ${replica_count} ${new_replica_count} | |
Sleep 5 | |
END | |
Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs | |
Volume ${volume_id} should have replicas running on node ${node_id} | |
FOR ${i} IN RANGE 3 | |
Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i}) | |
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} | |
Should Be Equal As Integers ${replica_count} ${new_replica_count} | |
Sleep 5 | |
END | |
Verify No Additional Scheduling | |
[Arguments] ${volume_name} ${node_name} ${replica_count} ${retry_count}=3 ${sleep_duration}=5 | |
FOR ${i} IN RANGE ${retry_count} | |
Log to console Ensuring there's no additional scheduling for node ${node_name} ... (${i + 1}/${retry_count}) | |
${new_replica_count} = wait_for_replica_count ${volume_name} node_name=${node_name} replica_count=${replica_count} | |
Should Be Equal As Integers ${replica_count} ${new_replica_count} | |
Sleep ${sleep_duration} | |
END | |
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} and no additional scheduling occurs | |
Volume ${volume_id} should have ${expected_replica_count} replicas running on node ${node_id} | |
Verify No Additional Scheduling ${volume_name} ${node_name} ${expected_replica_count} | |
Volume ${volume_id} should have replicas running on node ${node_id} and no additional scheduling occurs | |
Volume ${volume_id} should have replicas running on node ${node_id} | |
Verify No Additional Scheduling ${volume_name} ${node_name} ${replica_count} |
def wait_for_replica_count(self, volume_name, node_name, replica_count): | ||
for i in range(self.retry_count): | ||
running_replica_count = 0 | ||
volume = get_longhorn_client().by_id_volume(volume_name) | ||
for r in volume.replicas: | ||
if node_name and r.hostId == node_name and r.running: | ||
running_replica_count += 1 | ||
elif not node_name and r.running: | ||
running_replica_count += 1 | ||
logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})") | ||
if replica_count and running_replica_count == int(replica_count): | ||
break | ||
elif not replica_count and running_replica_count: | ||
break | ||
time.sleep(self.retry_interval) | ||
return running_replica_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add assertion for expected replica count.
The method should validate that the expected count is met before returning, as suggested in the past review. This ensures that test failures are caught early with clear error messages.
def wait_for_replica_count(self, volume_name, node_name, replica_count):
for i in range(self.retry_count):
running_replica_count = 0
volume = get_longhorn_client().by_id_volume(volume_name)
for r in volume.replicas:
- if node_name and r.hostId == node_name and r.running:
- running_replica_count += 1
- elif not node_name and r.running:
- running_replica_count += 1
+ if r.running and (not node_name or r.hostId == node_name):
+ running_replica_count += 1
logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})")
if replica_count and running_replica_count == int(replica_count):
+ logging(f"Successfully found {running_replica_count} running replicas")
break
elif not replica_count and running_replica_count:
+ logging(f"Successfully found {running_replica_count} running replicas")
break
time.sleep(self.retry_interval)
+ if replica_count:
+ assert running_replica_count == int(replica_count), \
+ f"Expected {replica_count} running replicas but got {running_replica_count}"
return running_replica_count
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def wait_for_replica_count(self, volume_name, node_name, replica_count): | |
for i in range(self.retry_count): | |
running_replica_count = 0 | |
volume = get_longhorn_client().by_id_volume(volume_name) | |
for r in volume.replicas: | |
if node_name and r.hostId == node_name and r.running: | |
running_replica_count += 1 | |
elif not node_name and r.running: | |
running_replica_count += 1 | |
logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})") | |
if replica_count and running_replica_count == int(replica_count): | |
break | |
elif not replica_count and running_replica_count: | |
break | |
time.sleep(self.retry_interval) | |
return running_replica_count | |
def wait_for_replica_count(self, volume_name, node_name, replica_count): | |
for i in range(self.retry_count): | |
running_replica_count = 0 | |
volume = get_longhorn_client().by_id_volume(volume_name) | |
for r in volume.replicas: | |
if r.running and (not node_name or r.hostId == node_name): | |
running_replica_count += 1 | |
logging(f"Waiting for {replica_count if replica_count else ''} replicas for volume {volume_name} running on {node_name if node_name else 'nodes'}, currently it's {running_replica_count} ... ({i})") | |
if replica_count and running_replica_count == int(replica_count): | |
logging(f"Successfully found {running_replica_count} running replicas") | |
break | |
elif not replica_count and running_replica_count: | |
logging(f"Successfully found {running_replica_count} running replicas") | |
break | |
time.sleep(self.retry_interval) | |
if replica_count: | |
assert running_replica_count == int(replica_count), \ | |
f"Expected {replica_count} running replicas but got {running_replica_count}" | |
return running_replica_count |
🧰 Tools
🪛 Ruff
239-242: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
244-247: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
Which issue(s) this PR fixes:
Issue longhorn/longhorn#9776
What this PR does / why we need it:
migrate test_replica_auto_balance_node_least_effort
Special notes for your reviewer:
Additional documentation or context
Summary by CodeRabbit
Release Notes
New Features
Tests
Chores