Deflake etcd tests #13167

serathius · 2021-06-30T12:41:22Z

If we look into tests results since we migrated Github Actions commits on main branch we get:

7 out of 32 failures on 1st page
14 out of 31 failures on 2nd page
14 out of 33 failures on 3rd page
15 out of 29 failures on 4th page
13 out of 25 failures on 5th page
13 out of 22 failures on 6th page
19 out of 33 failures on 7th page

Where failure/success is based on green check vs red cross under commit message (commits without them means that they were not tested as they were multiple commits in one PR).

Those are all test failures on main branch, so after a PR passed tests and was approved. We can use those failures to calculate chance of any PR failing to pass tests just due to test flaking.

(7 + 14 + 14+ 14 + 15 + 13 + 13 + 19) / (32 + 31 + 33 + 29 + 25 + 22 + 33) = 53%

Having flakyness ratio of over 50% means that average PR needs to be run 2 times, but number of failures in sequences may be much much longer, 3-5 failures in row is not something uncommon. This can be frustrating especially to new contributors, as there is no easy way to retrigger tests (need to do an empty commit amend and push).

Proposal

Etcd community should set on a test flakyness target, measure it and establish a process to fix flaky tests.

For start I would propose to target a 10% failure rate for whole test suite. It should be reachable by fixing only couple of tests as from last runs we got 22% (7 out of last 32). Measuring flakyness could start from something simple, like for example running a script once a week that checks last 100 test results. If the measured flakyness is over our target, we should identify most flaky tests, create issues for them and encourage community to fix them.

For couple of first runs we could depend on executing the scripts manualy, but we should plan to automate them.

TODO:

Create a script to measure flakyness (@karuppiah7890)
Create a script to identify flaky tests
- Export JUnit report from tests ( scripts: add option to generate junit xml reports #13112)
- Upload the reports to Github artifacts (*: Upload test junit results #13152)
- Implement script that analyses the reports
Automate the process

cc @hexfusion @Rajalakshmi-Girish

The text was updated successfully, but these errors were encountered:

karuppiah7890 · 2021-07-01T18:34:56Z

This sounds like a pretty interesting thing and also like a thing that alleviates a lot of pain and improves developer experience !

karuppiah7890 · 2021-07-01T20:35:50Z

I was able to get a basic bash script using GitHub GraphQL API - https://github.com/karuppiah7890/issues-info/blob/main/etcd-io/etcd/issue-13167/find-flaky-tests-data.sh . It gives data like this - https://github.com/karuppiah7890/issues-info/blob/main/etcd-io/etcd/issue-13167/commit-and-check-data.json

karuppiah7890 · 2021-07-01T20:36:51Z

I'm able to get the number of successes and we can get failures too. Given total (for example 100) and any one of those (successes / failures), we get the other value too

serathius · 2021-07-02T08:18:18Z

Great! Would you be interested in sending PR that adds it to etcd scripts ?

karuppiah7890 · 2021-07-02T08:22:23Z

Sure @serathius ! I was also wondering if I should try out a golang script too, so anyone can run it with just "go run" or similar on any platform. No need to worry about OS, bash shell being available, other tools being available etc. What do you think?

serathius · 2021-07-02T08:29:34Z

Letting everyone to run it is a good initiative, but on the other hand long term we should just automate it. Most scripts are already written in bash and I don't think there is any need to invest in this script too much. It should be simple enough (2-3 commands) that it could be replaced when needed.

I think it would make sense revisit those improvements when we have established whole process and automated it.

karuppiah7890 · 2021-07-02T08:56:22Z

Makes sense @serathius ! 👍 I'll raise the PR and we can discuss more about the bash script as part of the PR

This is to start measuring the test flakyness and see the numbers improving once we improve and deflake flaky tests Fixes etcd-io#13167

This is to start measuring the test flakiness and see the numbers improving once we improve and deflake flaky tests Fixes etcd-io#13167

…ommits with failed status The workflow runs on a cron schedule on a weekly basis - once every week Fixes etcd-io#13167

stale · 2021-10-01T06:36:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

karuppiah7890 · 2021-10-01T06:49:18Z

commenting to avoid closing of issue

stale · 2021-12-30T10:26:39Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

endocrimes · 2022-04-05T13:30:16Z

I hacked together a tool for finding/tracking/fixing flakes the other day: https://github.com/endocrimes/etcd-test-analyzer

Because it parses all of the test results from every run in a given time period, it makes it relatively easy to modify to ask new questions in place, but definitely isn't a tool that is widely useful in its current form.

serathius · 2022-06-09T12:18:15Z

Status update, running ./scripts/measure-test-flakiness.sh gave me:

Commit status failure percentage is - 23 %

So on last 100 merged commits we got 24 test failures. Excluding 7 coverage failures (not blocking merge) and 2 recent failures due to post merge bug #14101, we get 14% flakiness.

Going down from 50% to 14% is great result!!
Thanks everyone who helped.

serathius · 2022-06-09T12:28:54Z

Looking into failures from last 100 runs (excluding coverage and known issues) we get failures in:

4 failures in e2e tests of TestDowngradeUpgradeClusterOf3 (example) - @serathius
4 failures in functional tests of BLACKHOLE_PEER_PORT_TX_RX_LEADER (example)
4 failures in functional tests of NO_FAIL_WITH_NO_STRESS_FOR_LIVENESS (example)
2 timeouts in grpcproxy test of TestLeasingReconnectOwnerConsistency (example)
2 failures in grpcproxy test of TestWatchCancelOnServer (example)
1 failure in integration test of TestDropReadUnderNetworkPartition (example) (possible goroutine leak in previous test)
1 failure in integration test of TestBalancerUnderNetworkPartitionTxn
1 failure in integration test of TestAuthority (example)
1 timeout in grpcproxy test of TestLeasingReconnectNonOwnerGet (example)
1 failure in integration tests of TestMaxLearnerInCluster (example)
1 failure in functional tests of DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT (example)

serathius · 2022-06-09T12:29:57Z

As there are a lot of tests would be great to get some help. Please let me know if you are interested in tackling one of the tests listed.

This should aid in debugging test flakes, especially in tests where the process is restarted very often and thus changes its pid. Now it's a lot easier to grep for different members, also when different tests fail at the same time. The test TestDowngradeUpgradeClusterOf3 as mentioned in etcd-io#13167 is a good example for that. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>

serathius · 2022-09-21T09:10:16Z

Status: 28% flakiness
https://github.com/etcd-io/etcd/actions/runs/3075242479/jobs/4968492878

chaochn47 · 2022-09-30T05:43:14Z

Thanks for raising this issue. It is really annoying for any contributors to etcd that unrelated tests failed.

I can take one TestDowngradeUpgradeClusterOf3 because I just faced in #14331. It's also a good opportunity to learn how downgrade works as well.

Track this in #14540

serathius · 2023-03-18T09:49:05Z

I noticed recent increase in flakes (at least in my PRs). From https://github.com/etcd-io/etcd/actions/runs/4394774437/jobs/7696017126 we see 26% of flakiness.

Loved recent initiative by @chaochn47 to use tools developed by @endocrimes in #15501.

It would be great to integrate them into https://github.com/etcd-io/etcd/actions/workflows/measure-test-flakiness.yaml
@chaochn47 would you be interested in this?

chaochn47 · 2023-03-18T16:29:20Z

Yeah, I can help add to the existing workflow. ETA next Monday

nitishfy · 2024-02-19T17:22:16Z

Hi, I'd like to work on this!

serathius · 2024-02-20T09:41:10Z

Thanks @nitishfy for your interest. The issue was created some time ago so not everything is up to date, however high level goals remained relevant. We want to improve our visibility of test flakes so we can fix them more effectively.

For the original plan, we have instrumented etcd e2e tests to export JUnit reports, @endocrimes and @karuppiah7890 implemented some custom scripts that would analyse them. This approach allowed us to start reporting and manually creating issues to fix flakes.

One thing we can do better is to avoid developing our own scripting, etcd community is not very big, so we want to avoid spreading too thin maintaining too many custom tools. With introduction of SIG-etcd we now have a option to benefit from whole ecosystem of tools built by Kubernetes community. We should do that.

One example of such tool is testgrid, it's a test result visualization tool that uses the same JUnit reports to create a grid showing which tests passed and which failed. It makes it really easy to track flakes. For example https://testgrid.k8s.io/sig-etcd-periodics#ci-etcd-e2e-amd64

I think we should work more on integrating with K8s tools, this first requires migrating etcd testing to Prow, K8s CI tool. This work can be tracked in kubernetes/k8s.io#6102.

In the meantime we could improve ensure that all etcd tests generate a Junit report, that can be later used.

Looking at github workflows only in https://github.com/etcd-io/etcd/blob/main/.github/workflows/tests-template.yaml
We set JUNIT_REPORT_DIR and export junit files

etcd/.github/workflows/tests-template.yaml

Lines 69 to 73 in 11ff264

    
                 - uses: actions/upload-artifact@5d5d22a31266ced268874388b861e4b58bb5c2f3 # v4.3.1 
        
                   if: always() 
        
                   with: 
        
                     name: "${{ matrix.target }}" 
        
                     path: ./**/junit_*.xml

we should look into adding it to more test scenarios.

This should aid in debugging test flakes, especially in tests where the process is restarted very often and thus changes its pid. Now it's a lot easier to grep for different members, also when different tests fail at the same time. The test TestDowngradeUpgradeClusterOf3 as mentioned in etcd-io#13167 is a good example for that. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>

karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Jul 1, 2021

add etcd-io/etcd#13167 issue

8255a33

karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Jul 1, 2021

update etcd-io/etcd#13167 issue

51ea75c

karuppiah7890 mentioned this issue Jul 2, 2021

scripts: add script to measure percentage of commits with failed status #13175

Merged

karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Jul 2, 2021

update etcd-io/etcd#13167 issue details

3f12b80

karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Jul 6, 2021

update etcd-io/etcd#13167 issue

189ae6a

karuppiah7890 added a commit to karuppiah7890/etcd that referenced this issue Jul 13, 2021

workflow: add workflow to invoke script that measures percentage of c…

3317716

…ommits with failed status The workflow runs on a cron schedule on a weekly basis - once every week Fixes etcd-io#13167

karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Jul 29, 2021

update issue etcd-io/etcd#13167

b026311

karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Aug 13, 2021

update info about issue etcd-io/etcd#13167

72384a6

karuppiah7890 mentioned this issue Aug 28, 2021

Fix Failing pipeline - tikv_ghpr_integration_common_test tikv/tikv#10850

Closed

stale bot added the stale label Oct 1, 2021

stale bot removed the stale label Oct 1, 2021

ysksuzuki mentioned this issue Oct 3, 2021

Replace github.com/dgrijalva/jwt-go with github.com/golang-jwt/jwt #13378

Merged

stale bot removed the stale label Jan 24, 2022

serathius mentioned this issue Jan 24, 2022

Unify testing framework #13637

Open

serathius closed this as completed in #13175 Apr 5, 2022

serathius reopened this Apr 5, 2022

serathius added the help wanted label Jun 9, 2022

serathius added area/testing type/flake labels Jun 14, 2022

tjungblu mentioned this issue Aug 1, 2022

Add test name to e2e cluster members #14292

Merged

stale bot added the stale label Sep 21, 2022

serathius added stage/tracked and removed stale labels Sep 21, 2022

etcd-io deleted a comment from stale bot Sep 21, 2022

chaochn47 mentioned this issue Mar 18, 2023

deflake TestTracing #15501

Merged

chaochn47 mentioned this issue Mar 20, 2023

add etcd test analyzer build and integrate into measure-test-flakiness workflow #15513

Merged

siyuanfoundation mentioned this issue Feb 28, 2024

Add a test status section to display testgrid status. #17508

Merged

siyuanfoundation mentioned this issue Mar 27, 2024

Add script and workflow to detect flaky tests in testgrid. #17662

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deflake etcd tests #13167

Deflake etcd tests #13167

serathius commented Jun 30, 2021 •

edited

Loading

karuppiah7890 commented Jul 1, 2021

karuppiah7890 commented Jul 1, 2021

karuppiah7890 commented Jul 1, 2021

serathius commented Jul 2, 2021

karuppiah7890 commented Jul 2, 2021

serathius commented Jul 2, 2021 •

edited

Loading

karuppiah7890 commented Jul 2, 2021

stale bot commented Oct 1, 2021

karuppiah7890 commented Oct 1, 2021

stale bot commented Dec 30, 2021

endocrimes commented Apr 5, 2022

serathius commented Jun 9, 2022 •

edited

Loading

serathius commented Jun 9, 2022 •

edited

Loading

serathius commented Jun 9, 2022

serathius commented Sep 21, 2022

chaochn47 commented Sep 30, 2022 •

edited

Loading

serathius commented Mar 18, 2023 •

edited

Loading

chaochn47 commented Mar 18, 2023

nitishfy commented Feb 19, 2024

serathius commented Feb 20, 2024

Deflake etcd tests #13167

Deflake etcd tests #13167

Comments

serathius commented Jun 30, 2021 • edited Loading

Proposal

karuppiah7890 commented Jul 1, 2021

karuppiah7890 commented Jul 1, 2021

karuppiah7890 commented Jul 1, 2021

serathius commented Jul 2, 2021

karuppiah7890 commented Jul 2, 2021

serathius commented Jul 2, 2021 • edited Loading

karuppiah7890 commented Jul 2, 2021

stale bot commented Oct 1, 2021

karuppiah7890 commented Oct 1, 2021

stale bot commented Dec 30, 2021

endocrimes commented Apr 5, 2022

serathius commented Jun 9, 2022 • edited Loading

serathius commented Jun 9, 2022 • edited Loading

serathius commented Jun 9, 2022

serathius commented Sep 21, 2022

chaochn47 commented Sep 30, 2022 • edited Loading

serathius commented Mar 18, 2023 • edited Loading

chaochn47 commented Mar 18, 2023

nitishfy commented Feb 19, 2024

serathius commented Feb 20, 2024

serathius commented Jun 30, 2021 •

edited

Loading

serathius commented Jul 2, 2021 •

edited

Loading

serathius commented Jun 9, 2022 •

edited

Loading

serathius commented Jun 9, 2022 •

edited

Loading

chaochn47 commented Sep 30, 2022 •

edited

Loading

serathius commented Mar 18, 2023 •

edited

Loading