Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elusive problem in ktcp #9

Open
Mellvik opened this issue Feb 22, 2023 · 0 comments
Open

Elusive problem in ktcp #9

Mellvik opened this issue Feb 22, 2023 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@Mellvik
Copy link
Owner

Mellvik commented Feb 22, 2023

Under high load, ktcp may fail to open new connections and eventually hang completely. The issue is very timing sensitive. even the smallest printf in related code eliminates the problem. Even rearranging some code in the ktcp.c main loop is enough to change the behaviour (but not to eliminate the problem).

Here's the scenario most likely to repeat the problem (this is on a AMI386SX/40 SBC):

  • telnet into TLVC, issue the command hd /dev/hda - which will run for a long time.
  • Wait for a couple of minutes, the connect to the TLVC system with ftp. A message should say connect <something>, then the ftp client would hang. The output in the telnet window will continue. If the client ftp command does not hang, ktcp is working normally and will continue to do so for a while. Try again in a few minutes, run some other commands in the meanwhile. And make sure there aren't any debug statements in the code that emit output during the opening of connections.
  • If the ftp command hangs, issue a netstat command from the console or a serial terminal window. this will hang and the output in the telnet window will stop. Interrupt the hanging netstat with ^C and the output in the telnet window will continue. This stop/start sequence is repeatable any number of times.
  • Aborting the hung ftp connection from the client is likely to hang ktcp completely (and make all network processes uninterruptible, including netstat).
  • This is not the only way to trigger the behaviour, a second telnet session may have the same effect (or not), this part is unpredictable.

The ftp startup sequence that hangs gets through the first few packets, then stops:

    10.0.2.2.34938 > 10.0.2.17.ftp: Flags [S], cksum 0x1841 (incorrect -> 0xd3e1), seq 1108791349, win 29200, options [mss 1460,sackOK,TS val 2374025631 ecr 0,nop,wscale 7], length 0
12:48:23.230917 00:80:29:ef:d9:51 (oui Unknown) > b8:27:eb:9a:77:bc (oui Unknown), ethertype IPv4 (0x0800), length 64: (tos 0x0, ttl 64, id 558, offset 0, flags [none], proto TCP (6), length 44)
    10.0.2.17.ftp > 10.0.2.2.34938: Flags [S.], cksum 0x2338 (correct), seq 1057845702, ack 1108791350, win 4380, options [mss 1460], length 0
12:48:23.231067 b8:27:eb:9a:77:bc (oui Unknown) > 00:80:29:ef:d9:51 (oui Unknown), ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 64, id 55864, offset 0, flags [DF], proto TCP (6), length 40)
    10.0.2.2.34938 > 10.0.2.17.ftp: Flags [.], cksum 0x182d (incorrect -> 0xda00), seq 1, ack 1, win 29200, length 0
12:48:23.235371 00:80:29:ef:d9:51 (oui Unknown) > b8:27:eb:9a:77:bc (oui Unknown), ethertype IPv4 (0x0800), length 282: (tos 0x0, ttl 64, id 559, offset 0, flags [none], proto TCP (6), length 268)

IOW, ktcp accepts the connection by responding with a SYN-ACK, and the client follows with an ACK as appropriate, and it seems like this ACK is never registered on the TLVC side of the connection.

I lot of time has been spent on tracking down the problem, with little success so far. It seems likely though that the problem is located in the devtcp code

@Mellvik Mellvik added the bug Something isn't working label Feb 22, 2023
@Mellvik Mellvik self-assigned this Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant