-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLS failure: End_of_file #8
Comments
Since "it works fine for any small instances", I don't think you need to tweak the TLS config. Is there a rate limiting of the slack endpoint? Do you send hundreds of hooks at once? Could you increase the debug level (with the Logs library, the tls library has some log sources) to see at which point in the TLS handshake fails (the server sending an "End of file" ( |
no, only one. Once at the end of each run. I'll try to increase the log level. Sadly the only way I've been able to reproduce the problem is to let the main instance complete its run, which takes several days. So this will take some time |
Hmm, and the output Maybe in addition ti makes sense to capture the IP packets of the commnuication between opam-health-check and slack - a |
Yes. This is as simple as a curl call, no connection is kept waiting. I’ve set re-run opam-health-check with |
Setting the log level did not yield anything (nothing got printed). The interesting change from last time is that now the
I hope the tcpdump log helps. If I can give you more feel free to ask |
Reading the doc, I seem to have forgotten to call |
@hannesm I have the logs but they contain private informations. What should i look for? The log is quite big but on first glance this seems a bit weird:
From the logs it looks like some stuff happens in between the start of the call to
Many of the actions seem to be done out of order from what I would be expecting (especially the part where the discord webhook is being triggered before the slack webhook is done). Do you see an obvious problem in these summarized logs? |
Thanks for your detailed logs - I do not see any obvious problems in the summarized. How I expect this to work:
from your log, (1) is part of (c), (3) look like (e) returned and http-lwt-client attempts a TLS handshake; (4) looks like either it is done asynchronously (in a separate Lwt task) or the webhook task for slack has been finished. (6) attempts DNS lookups, and figures that the previous TCP session to the resolver has been terminated (end of file reading from resolver), and initiates new TCP sessions (first again on port 853, which is refused, then on port 53) -- (8) is the continuation thereof (where connection to your resolver on port 853 was refused, and then the connection is done to the same resolver on port 53 -- "Connected to NS." indicates the connection establishment was fine). The error "Webhook failed with: TLS failure: End_of_file" is still mysterious to me - also 67 and c6 as output. could you include the webhok url that failed, and maybe add some more printf to figure out when a connection is established and the http request being sent / the http response being awaited (and received)? |
from your tcpdump output (in #8 (comment)):
|
I did some testing (using the debug branch here) with output
could you try the debug branch with your webhooks on your server? The 192.168.42.3 is my local DNS resolver that does not listen to DNS-over-TLS (and the connection refused are harmless). |
ah, with some more pressure I find some connection failure races. will investigate further, thanks for your report again. |
Observations:
could you try the following patch:
This uses the same happy_eyeballs instance (and thus the same resolver) for all webhook requests -> only a single connection to the resolver is used. |
...and then even with such a patch, in dns_client_lwt |
Would you mind to test with the following branches:
With these (and my custom hurl) I'm no longer able to trigger connection failures. Apart from to MirageOS unikernels (but that is reported to mirage/mirage-tcpip#470 and a separate issue). Thanks for your patience, and please let me know if with the pins above your connection establishment issues are gone. |
@hannesm I’ve just tested both of these branches and upon first quick testing it seems to work now! 🎉 I’m gonna try a few more time and will report any issues I encounter but this is really really promessing! Thanks a lot! 💚 |
I can confirm, it is fixed! \o/ |
…s-lwt (0.1.3) CHANGES: * Happy_eyeballs.create: add v6_connect_timeout parameter - the amount of nanoseconds (default: 200ms) after which to attempt IPv4 connection establishment. (robur-coop/happy-eyeballs#21 @hannesm, review by @reynir, issue reported at robur-coop/http-lwt-client#8 by @kit-ty-kate) * Happy_eyeballs.create: add resolve_retries - the amount of resolve attempts when a (resolve) timeout occurs (default: 3). (robur-coop/happy-eyeballs#21 @hannesm, review by @reynir, issue reported at robur-coop/http-lwt-client#8 by @kit-ty-kate)
thanks again for your report and analysis -- the dns changes are released now (in 6.1.4) and happy-eyeballs changes as well (in 0.1.3, pending to be merged into opam-repository). closing this issue since it is fixed. |
More debug Make the obuilder-spec printing a tiny bit faster Use http-lwt-client instead of cohttp-lwt-unix for the client part Give better error message from http-lwt-client Fix issues with h2 and http-lwt-client Show all the logs in debug mode Fix last commit (logs requires Logs.set_reporter to be called to actually print anything) Be more specific for the debug logs Remove debug logging on the webhook requests Return 404 instead of 500 (exn Not_found) when getting a request for a non-existant log file Return 404 instead of 500 (exn Not_found) when getting a request for a non-existant log directory Wait for most of the possibly blocking actions to finish before calling the webhooks Return some helpful error message when no run exist yet instead of 500 Do not wait for the end of initialisation to start the tcp servers Fix the display of opam metadata Use opam-dev instead of opam 2.1 Revert "Wait for most of the possibly blocking actions to finish before calling the webhooks" This reverts commit aa89ab8. Revert "Do not wait for the end of initialisation to start the tcp servers" This reverts commit 6ed4ce9. Fix compilation after the previous git reverts Revert 39b14a5 Try to debug a new Lwt_io.Channel_closed(input) exception at the end of each run when reloading the cache Avoid race conditions with the cache Revert 4a8e2a6. Remove possible Lwt_io.Channel_closed("input") exception Rewrite some use of Lwt_io.read_line_opt to read line faster (and hopefully elimitate an unknown Lwt_io.Channel_closed("input") race-condition) Give the docker image hash explicitly to improve reliablity and fix e56c933 Use lwt_ppx instead of Lwt.finally Fix "TLS failure: End_of_file" (robur-coop/http-lwt-client#8)
More debug Make the obuilder-spec printing a tiny bit faster Use http-lwt-client instead of cohttp-lwt-unix for the client part Give better error message from http-lwt-client Fix issues with h2 and http-lwt-client Show all the logs in debug mode Fix last commit (logs requires Logs.set_reporter to be called to actually print anything) Be more specific for the debug logs Remove debug logging on the webhook requests Return 404 instead of 500 (exn Not_found) when getting a request for a non-existant log file Return 404 instead of 500 (exn Not_found) when getting a request for a non-existant log directory Wait for most of the possibly blocking actions to finish before calling the webhooks Return some helpful error message when no run exist yet instead of 500 Do not wait for the end of initialisation to start the tcp servers Fix the display of opam metadata Use opam-dev instead of opam 2.1 Revert "Wait for most of the possibly blocking actions to finish before calling the webhooks" This reverts commit aa89ab8. Revert "Do not wait for the end of initialisation to start the tcp servers" This reverts commit 6ed4ce9. Fix compilation after the previous git reverts Revert 39b14a5 Try to debug a new Lwt_io.Channel_closed(input) exception at the end of each run when reloading the cache Avoid race conditions with the cache Revert 4a8e2a6. Remove possible Lwt_io.Channel_closed("input") exception Rewrite some use of Lwt_io.read_line_opt to read line faster (and hopefully elimitate an unknown Lwt_io.Channel_closed("input") race-condition) Give the docker image hash explicitly to improve reliablity and fix e56c933 Use lwt_ppx instead of Lwt.finally Fix "TLS failure: End_of_file" (robur-coop/http-lwt-client#8)
More debug Make the obuilder-spec printing a tiny bit faster Use http-lwt-client instead of cohttp-lwt-unix for the client part Give better error message from http-lwt-client Fix issues with h2 and http-lwt-client Show all the logs in debug mode Fix last commit (logs requires Logs.set_reporter to be called to actually print anything) Be more specific for the debug logs Remove debug logging on the webhook requests Return 404 instead of 500 (exn Not_found) when getting a request for a non-existant log file Return 404 instead of 500 (exn Not_found) when getting a request for a non-existant log directory Wait for most of the possibly blocking actions to finish before calling the webhooks Return some helpful error message when no run exist yet instead of 500 Do not wait for the end of initialisation to start the tcp servers Fix the display of opam metadata Use opam-dev instead of opam 2.1 Revert "Wait for most of the possibly blocking actions to finish before calling the webhooks" This reverts commit aa89ab8. Revert "Do not wait for the end of initialisation to start the tcp servers" This reverts commit 6ed4ce9. Fix compilation after the previous git reverts Revert 39b14a5 Try to debug a new Lwt_io.Channel_closed(input) exception at the end of each run when reloading the cache Avoid race conditions with the cache Revert 4a8e2a6. Remove possible Lwt_io.Channel_closed("input") exception Rewrite some use of Lwt_io.read_line_opt to read line faster (and hopefully elimitate an unknown Lwt_io.Channel_closed("input") race-condition) Give the docker image hash explicitly to improve reliablity and fix e56c933 Use lwt_ppx instead of Lwt.finally Fix "TLS failure: End_of_file" (robur-coop/http-lwt-client#8)
More debug Make the obuilder-spec printing a tiny bit faster Use http-lwt-client instead of cohttp-lwt-unix for the client part Give better error message from http-lwt-client Fix issues with h2 and http-lwt-client Show all the logs in debug mode Fix last commit (logs requires Logs.set_reporter to be called to actually print anything) Be more specific for the debug logs Remove debug logging on the webhook requests Return 404 instead of 500 (exn Not_found) when getting a request for a non-existant log file Return 404 instead of 500 (exn Not_found) when getting a request for a non-existant log directory Wait for most of the possibly blocking actions to finish before calling the webhooks Return some helpful error message when no run exist yet instead of 500 Do not wait for the end of initialisation to start the tcp servers Fix the display of opam metadata Use opam-dev instead of opam 2.1 Revert "Wait for most of the possibly blocking actions to finish before calling the webhooks" This reverts commit aa89ab8. Revert "Do not wait for the end of initialisation to start the tcp servers" This reverts commit 6ed4ce9. Fix compilation after the previous git reverts Revert 39b14a5 Try to debug a new Lwt_io.Channel_closed(input) exception at the end of each run when reloading the cache Avoid race conditions with the cache Revert 4a8e2a6. Remove possible Lwt_io.Channel_closed("input") exception Rewrite some use of Lwt_io.read_line_opt to read line faster (and hopefully elimitate an unknown Lwt_io.Channel_closed("input") race-condition) Give the docker image hash explicitly to improve reliablity and fix e56c933 Use lwt_ppx instead of Lwt.finally Fix "TLS failure: End_of_file" (robur-coop/http-lwt-client#8)
More debug Make the obuilder-spec printing a tiny bit faster Use http-lwt-client instead of cohttp-lwt-unix for the client part Give better error message from http-lwt-client Fix issues with h2 and http-lwt-client Show all the logs in debug mode Fix last commit (logs requires Logs.set_reporter to be called to actually print anything) Be more specific for the debug logs Remove debug logging on the webhook requests Return 404 instead of 500 (exn Not_found) when getting a request for a non-existant log file Return 404 instead of 500 (exn Not_found) when getting a request for a non-existant log directory Wait for most of the possibly blocking actions to finish before calling the webhooks Return some helpful error message when no run exist yet instead of 500 Do not wait for the end of initialisation to start the tcp servers Fix the display of opam metadata Use opam-dev instead of opam 2.1 Revert "Wait for most of the possibly blocking actions to finish before calling the webhooks" This reverts commit aa89ab8. Revert "Do not wait for the end of initialisation to start the tcp servers" This reverts commit 6ed4ce9. Fix compilation after the previous git reverts Revert 39b14a5 Try to debug a new Lwt_io.Channel_closed(input) exception at the end of each run when reloading the cache Avoid race conditions with the cache Revert 4a8e2a6. Remove possible Lwt_io.Channel_closed("input") exception Rewrite some use of Lwt_io.read_line_opt to read line faster (and hopefully elimitate an unknown Lwt_io.Channel_closed("input") race-condition) Give the docker image hash explicitly to improve reliablity and fix e56c933 Use lwt_ppx instead of Lwt.finally Fix "TLS failure: End_of_file" (robur-coop/http-lwt-client#8)
With the same test case as in #7 I’m now getting:
in some specific conditions I still don’t understand.
I use it for https://github.com/ocurrent/opam-health-check/ and it works fine for any small instances or if I call the function directly but fails with this error on a main instance specifically for the
slack.com
webhook (theDiscord
webhook worked fine)Here is the code: https://github.com/ocurrent/opam-health-check/blob/debug-crashes/server/backend/check.ml#L404
Do I have to tweak the TLS config?
The text was updated successfully, but these errors were encountered: