Interop runner counts one connection attempt too many when parsing logs #125

huitema · 2020-07-27T00:51:48Z

Looking at the logs of this handshake loss trial, I find that the test failed because Expected 50 handshakes. Got: 51. Yet, the client and server logs only mention 50 connections, and the client downloaded the expected 50 files.

The test of the number of handshakes is probably mistaking a failed attempt for a successful one, leading to a false negative in the result. False negatives are unhelpful. They waste time when people investigate, and they push people to just dismiss test results without investigating them further.

The text was updated successfully, but these errors were encountered:

marten-seemann · 2020-07-27T01:17:22Z

Here's how the number of handshakes is counted: https://github.com/marten-seemann/quic-interop-runner/blob/56fcba0285f063d8e0f4f8d068335d2c85712f5f/testcases.py#L163-L168

Why is this leading to overcounting in this case?

huitema · 2020-07-27T04:36:10Z

In the C1 case, it happens if a client's initial is corrupted in a way that changes the CID. The server will reply to both the correct Initial and the corrupted Initial. In the L1 case, it happens if a server times out one connection attempt and starts a new one later. This is indeed what happened here:

The client started a connection attempt with ICID 6e89084ca8a26853
The server allocated the SCID 6dbee6fcc50be74b to the context and sent a first flight
The server never got any packet back from the client and closed the context after 15 seconds
The server then received a repeated Initial from the client, sequence number 7, same ICID
The server allocated the SCID 3f9e5813a5f2167b to that new connection
That second connection completed correctly.

I confirmed that scenario by checking the client side qlog and the corresponding server side qlog. (I am creating an issue in Picoquic because the two connections where logged under the same ICID-derived file ID and the second file erased the first one -- I should really derive the file ID from the SCID.)

If you counted the handshake with the client side PCAP, you would could 50 connections because none of the Server Initial with the first SCID made it to the client.

In fact, there is a remarkable set of circumstance that caused the client to not receive any of the server first flight, and the server to not receive any of the repeated client initial with sequences 1 to 6. Given a 30% loss rate, this kind of loss sequence is supposed to have a probability of about 0.002%, but it seems to happen much more frequently than that.

marten-seemann · 2020-07-27T04:39:23Z

In the C1 case, it happens if a client's initial is corrupted in a way that changes the CID. The server will reply to both the correct Initial and the corrupted Initial.

That shouldn't happen though. If the CID is corrupted, the AEAD tag won't be correct. Or am I missing something here?

huitema · 2020-07-27T05:20:18Z

Yes, you are right about C1. But the L1 case was pretty clear once I managed to identify the logs.

huitema mentioned this issue Jul 27, 2020

Client chosen ICID is not a unique Identifier of server-side logs private-octopus/picoquic#998

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interop runner counts one connection attempt too many when parsing logs #125

Interop runner counts one connection attempt too many when parsing logs #125

huitema commented Jul 27, 2020

marten-seemann commented Jul 27, 2020

huitema commented Jul 27, 2020 •

edited

Loading

marten-seemann commented Jul 27, 2020

huitema commented Jul 27, 2020

Interop runner counts one connection attempt too many when parsing logs #125

Interop runner counts one connection attempt too many when parsing logs #125

Comments

huitema commented Jul 27, 2020

marten-seemann commented Jul 27, 2020

huitema commented Jul 27, 2020 • edited Loading

marten-seemann commented Jul 27, 2020

huitema commented Jul 27, 2020

huitema commented Jul 27, 2020 •

edited

Loading