Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High error rate for HTTPs loadtests with 200 concurrent users #42628

Closed
xlight05 opened this issue Apr 23, 2024 · 5 comments · Fixed by #42634
Closed

High error rate for HTTPs loadtests with 200 concurrent users #42628

xlight05 opened this issue Apr 23, 2024 · 5 comments · Fixed by #42634
Assignees
Labels
Area/Scheduler jBallerina runtime scheduler related issues Type/Bug

Comments

@xlight05
Copy link
Contributor

Description:
$Subject.
This works fine for the 60 user case.
Applies to both passthrough and transformation usecase we have.

https://github.com/ballerina-platform/ballerina-performance-cloud/actions/runs/8782386068/job/24096556961
https://github.com/ballerina-platform/ballerina-performance-cloud/blob/1d735a28d62a06265a7ccad3c72bfa78764b476a/load-tests/https_passthrough/results/summary.csv#L2609

Steps to reproduce:

Affected Versions:

OS, DB, other environment details and versions:

Related Issues (optional):

Suggested Labels (optional):

Suggested Assignees (optional):

@TharmiganK
Copy link
Contributor

TharmiganK commented Apr 24, 2024

Tried running the existing load tests with 200 concurrent requests using the workflow.

Workflow run: https://github.com/ballerina-platform/ballerina-performance-cloud/actions/runs/8794152427
Results: https://github.com/ballerina-platform/module-ballerina-http/pull/1964/files

There were some significant differences between the code used in the https_passthrough load-test and the code used in the h1_h1_passthrough load-test.

  1. The one in the ballerina-performance-cloud uses a h2-h2 approach where as the one in the http module uses h1-h1
  2. The one in the ballerina-performance-cloud uses http:Caller to respond where as the one in the http module just return the http:Response

So I tried by adding two more load-tests: h2_h2_passthrough and h2_transformation but still getting 0% error rate:

Workflow run: https://github.com/ballerina-platform/ballerina-performance-cloud/actions/runs/8797283914
Results: https://github.com/ballerina-platform/module-ballerina-http/pull/1963/files

I have tried to reproduce this issue locally using the the code in the https_passthrough and running load-test with 200 concurrent users for 5 minutes. But I could not reproduce the issue.

@xlight05 Can we check on the configurations used to run this load-tests?

@xlight05
Copy link
Contributor Author

Had an offline chat on this. We were able to get a stand dump when this issue was reproduced.

Strand dump - https://gist.github.com/xlight05/9ef16bbe1ea7f733d43a398429920a32

@TharmiganK
Copy link
Contributor

TharmiganK commented Apr 24, 2024

I was able to reproduce this issue with the help of @xlight05 in a constraint environment. Please find the below steps:

  1. Clone the following repo: https://github.com/xlight05/bal_https_hello
  2. Run bal build
  3. Run docker-compose up
  4. Run the load test using this JMX file: https://gist.github.com/TharmiganK/8f78d8a3ec820661c4fdab7ee723ad7e/

Please note that this issue is only reproducible when you make multiple requests at a small interval. Strangely, if we make only one request at first and wait for the response then the subsequent requests are passing.

I have checked the following:

  1. With update 9 - Failing
  2. With update 9 and without http changes - Failing
  3. With update 8 and new http changes - Passing

So it seems the issue is coming from lang with update 9 changes. Adding @HindujaB @gabilang to check on this

@TharmiganK TharmiganK transferred this issue from ballerina-platform/ballerina-library Apr 24, 2024
@TharmiganK TharmiganK added the Area/Scheduler jBallerina runtime scheduler related issues label Apr 24, 2024
@TharmiganK
Copy link
Contributor

TharmiganK commented Apr 24, 2024

I was able to reduce the reproducer code with this: (no need for docker, just use bal run)

import ballerina/http;

listener http:Listener securedEP = new (9090);

final http:Client nettyEP = check new ("http://localhost:8688");

service /passthrough on securedEP {
    resource function post .(http:Request clientRequest) returns http:Response|error {
        return nettyEP->/'service/EchoService.post(clientRequest);
    }
}

But in order to reproduce, I have to use 1000 users with 5s ramp-up period. (I checked the similar configuration with update 8 service and it was working without any hanging.)

If I remove the clientRequest from the resource signature then it is working without any hanging issue. So this might be related to the previous memory issue: #42566. The difference here is there is no memory increase now but some strands used to populate the default values seems to be in runnable state.

Please note that I have removed SSL here, so not 100% sure that both of these are related. (With SSL also the service is hanging). But I think with SSL, the probability of this issue occurrence is high.

When hanging most of the jbal threads are in monitor state:
image

Strand dump: https://gist.github.com/TharmiganK/932d0274a391aa55f8fbe9e9da5135a1
Thread dump: https://drive.google.com/file/d/14Y4x7b5Vdm-8RCT_VyLTQSC8sSsqSfbT/view?usp=drive_link

Copy link

This issue is NOT closed with a proper Reason/ label. Make sure to add proper reason label before closing. Please add or leave a comment with the proper reason label now.

      - Reason/EngineeringMistake - The issue occurred due to a mistake made in the past.
      - Reason/Regression - The issue has introduced a regression.
      - Reason/MultipleComponentInteraction - Issue occured due to interactions in multiple components.
      - Reason/Complex - Issue occurred due to complex scenario.
      - Reason/Invalid - Issue is invalid.
      - Reason/Other - None of the above cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Scheduler jBallerina runtime scheduler related issues Type/Bug
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants