Container stops intermittently - Disruptive behaviour #512

parul157 · 2024-07-25T10:27:23Z

Hello,

We are using shinyproxy with EKS infrastructure and intermittently the pods stops and one has to restart it get it back up again. The some cases we have noticed when the app is in use by 10 or more people at once, then the issue is prominent and in some cases it won't even work on restart/refresh.

shinyproxy v3.0.1
EKS version 1.27
Below is the error we receive

2024-07-24 11:27:20.728 ERROR 1 --- [   XNIO-1 I/O-1] io.undertow.proxy    : UT005028: Proxy request to /proxy_endpoint/00d3d5d6-da46-4d61-8260-62948726874d/websocket/ failed

java.io.IOException: UT001000: Connection closed
	at io.undertow.client.http.HttpClientConnection$ClientReadListener.handleEvent(HttpClientConnection.java:600) ~[undertow-core-2.2.21.Final.jar!/:2.2.21.Final]
	at io.undertow.client.http.HttpClientConnection$ClientReadListener.handleEvent(HttpClientConnection.java:535) ~[undertow-core-2.2.21.Final.jar!/:2.2.21.Final]
	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92) ~[xnio-api-3.8.8.Final.jar!/:3.8.8.Final]
	at org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66) ~[xnio-api-3.8.8.Final.jar!/:3.8.8.Final]
	at org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:89) ~[xnio-nio-3.8.8.Final.jar!/:3.8.8.Final]
	at org.xnio.nio.WorkerThread.run(WorkerThread.java:591) ~[xnio-nio-3.8.8.Final.jar!/:3.8.8.Final]

We also tried to upgrade to the latest version 3.1.1 but the issue remains. Sharing below the error that we get.

2024-07-25T14:49:36+05:30 java.io.IOException: UT001033: Invalid connection state
2024-07-25T14:49:36+05:30 	at io.undertow.client.http.HttpClientConnection.sendRequest(HttpClientConnection.java:352) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at io.undertow.server.handlers.proxy.ProxyHandler$ProxyAction.run(ProxyHandler.java:598) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at io.undertow.util.SameThreadExecutor.execute(SameThreadExecutor.java:35) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at io.undertow.server.HttpServerExchange.dispatch(HttpServerExchange.java:844) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at io.undertow.server.handlers.proxy.ProxyHandler$ProxyClientHandler.completed(ProxyHandler.java:348) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at io.undertow.server.handlers.proxy.ProxyHandler$ProxyClientHandler.completed(ProxyHandler.java:322) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at io.undertow.server.handlers.proxy.SimpleProxyClientProvider.getConnection(SimpleProxyClientProvider.java:70) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at eu.openanalytics.containerproxy.util.ProxyMappingManager$1.getConnection(ProxyMappingManager.java:180) ~[containerproxy-1.1.1.jar!/:1.1.1]
2024-07-25T14:49:36+05:30 	at io.undertow.server.handlers.proxy.ProxyHandler$ProxyClientHandler.failed(ProxyHandler.java:361) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at io.undertow.server.handlers.proxy.ProxyHandler.handleFailure(ProxyHandler.java:703) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at io.undertow.server.handlers.proxy.ProxyHandler$ResponseCallback.failed(ProxyHandler.java:770) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at io.undertow.client.http.HttpClientExchange.setFailed(HttpClientExchange.java:158) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at io.undertow.client.http.HttpClientConnection$ClientReadListener.handleEvent(HttpClientConnection.java:600) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at io.undertow.client.http.HttpClientConnection$ClientReadListener.handleEvent(HttpClientConnection.java:535) ~[undertow-core-2.3.13.Final.jar!/:2.3.13.Final]
2024-07-25T14:49:36+05:30 	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92) ~[xnio-api-3.8.8.Final.jar!/:3.8.8.Final]
2024-07-25T14:49:36+05:30 	at org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66) ~[xnio-api-3.8.8.Final.jar!/:3.8.8.Final]
2024-07-25T14:49:36+05:30 	at org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:89) ~[xnio-nio-3.8.8.Final.jar!/:3.8.8.Final]
2024-07-25T14:49:36+05:30 	at org.xnio.nio.WorkerThread.run(WorkerThread.java:591) ~[xnio-nio-3.8.8.Final.jar!/:3.8.8.Final]

Application Template Configuration we use

  spring:
    session:
      store-type: redis
      redis:
        configure-action: none
    redis:
      host: {{ .Values.shinyproxy.redis.host }}
      database: {{ .Values.shinyproxy.redis.database }}
      ssl: {{ .Values.shinyproxy.redis.ssl }}
  proxy:
    store-mode: Redis
    stop-proxies-on-shutdown: false
    default-webSocket-reconnection-mode: {{ .Values.shinyproxy.webSocketReconnection }}
    title: {{ .Values.global.title }}
    logo-url: {{ .Values.shinyApp.logoUrl }}
    port: {{ .Values.shinyproxy.targetPort }}
    template-path: xxxxx
    authentication: simple
    hide-navbar: true
    landing-page: xxxxxx
    heartbeat-rate: {{ .Values.shinyApp.heartbeat.rate }}
    heartbeat-timeout: {{ .Values.shinyApp.heartbeat.timeout }}
    oauth2:
      resource-id: xxxxxx
      jwks-url: xxxxxx
      username-attribute: xxxxxx
    container-backend: kubernetes
    kubernetes:
      internal-networking: xxxx
      namespace: xxxx
      node-selector: xxxx
      pod-wait-time: xxxx
    specs:
    - id: {{ .Values.global.webappname }}
      display-name: {{ .Values.global.title }}
      container-image: {{ .Values.shinyApp.image.url | quote }}
      labels:
        app.kubernetes.io/name: xxxxxx
        app.kubernetes.io/part-of: xxxxxx
        app.kubernetes.io/managed-by: xxxxxx
      container-memory-request: xxxxxx
      container-memory-limit: xxxxxx
      container-cpu-limit: xxxxxx
      container-cpu-request: xxxxxx
  logging:
    file:
      shinyproxy.log

The text was updated successfully, but these errors were encountered:

LEDfan · 2024-07-29T06:26:43Z

Hi, there can be multiple reasons why the pods are stopped. Since you mention it happens more frequently when multiple uses are active, I suspect it's caused by a lack of resources. I see you are already assigning memory and cpu requests/limits which is great. However, are you using the same value for container-memory-request and container-memory-limit? If not, this could cause the pod to get oom killed, even if the pod is using less than the specific container-memory-limit.

In addition, I advice to have a look at the events of the pod when the app are stopped. E.g. you could run the following command and then try to re-produce the problem (e.g. by starting multiple instances of your app: https://shinyproxy.io/documentation/ui/#using-multiple-instances-of-an-app):

kubectl get events -n shinyproxy -w

saurabh0402 · 2024-07-30T10:11:16Z

Hi @LEDfan,

Yes, we use the same value for container-memory-request and container-memory-limit.
We have tried assigning a much bigger values as well, and we still get the same result. We are pretty sure it is not because of OOM.
We are already running multiple pods for the app and even for ShinyProxy.

Here are a few things we noticed since we raised the issue

The application pod gets killed and deleted instantly when we see the error in ShinyProxy pods. Even adding pre-stop hooks to prevent the pod from getting deleted does nothing and the pod does get deleted anyways.
- We tried checking the pod's events, and there isn't anything substantial in there.
Another thing we noticed is that it's not just about multiple users. Even if a single user refreshes the app a few times, the issue occurs.

LEDfan · 2024-08-05T07:37:27Z

Thanks for the additional information. Could you check whether your shinyproxy logs contain a line similar to Proxy unreachable/crashed, stopping it now, failed request?
In this case the pod is killed by ShinyProxy when the request fails. Can you check the network tools in the browser console to see whether a request fails?
Finally, which ShinyProxy version are you using? A fix was made in 3.1.1. that solves a similar issue.

saurabh0402 · 2024-08-05T16:53:21Z

We don't see the Proxy unreachable/crashed, stopping it now, failed request error but we have noticed the following error whenever the pod has been killed

2024-07-24 11:27:20.728 ERROR 1 --- [   XNIO-1 I/O-1] io.undertow.proxy    : UT005028: Proxy request to /proxy_endpoint/00d3d5d6-da46-4d61-8260-62948726874d/websocket/ failed

java.io.IOException: UT001000: Connection closed
	at io.undertow.client.http.HttpClientConnection$ClientReadListener.handleEvent(HttpClientConnection.java:600) ~[undertow-core-2.2.21.Final.jar!/:2.2.21.Final]
	at io.undertow.client.http.HttpClientConnection$ClientReadListener.handleEvent(HttpClientConnection.java:535) ~[undertow-core-2.2.21.Final.jar!/:2.2.21.Final]
	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92) ~[xnio-api-3.8.8.Final.jar!/:3.8.8.Final]
	at org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66) ~[xnio-api-3.8.8.Final.jar!/:3.8.8.Final]
	at org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:89) ~[xnio-nio-3.8.8.Final.jar!/:3.8.8.Final]
	at org.xnio.nio.WorkerThread.run(WorkerThread.java:591) ~[xnio-nio-3.8.8.Final.jar!/:3.8.8.Final]

When this error comes, there are a few failed API requests in the network tab and the screen then displays - This app has been stopped, you can now close this tab.

We are running v3.0.2 of shinyproxy but we have tried upgrading to v3.1.1 and faced similar issues on that as well.

LEDfan · 2024-09-17T09:20:52Z

Hi @saurabh0402 did you perhaps discover the cause of your issue already? We did not yet have a similar issue, making it difficult to give additional suggestions.

It seems to me that ShinyProxy is killing the pod, but then I would expect the Proxy unreachable/crashed, stopping it now, failed request message to be logged. Therefore I suggest that you try to find out what process is killing the pod, it should be possible to find this in the audit logs of EKS. If it's being killed by ShinyProxy, I could create a build of ShinyProxy that contains more logging.

saurabh0402 · 2024-09-17T11:11:45Z

Hi, sadly we weren't able to find the cause. We did try looking into the pod events as well but couldn't find anything conclusive.
It would be really helpful if you could create a build with more logging. 🙏🏽

saurabh0402 · 2024-09-18T07:53:01Z

shiny-webapp.log

@LEDfan here are the logs for the shinyproxy pod with log-level set to DEBUG. Just opening the app, caused the pod to be deleted. Looking at the logs, it seems like the Shiny app returned a 503 error for one of the requests, after which ShinyProxy killed the pod and started returning 410 to subsequent requests.
Can you please take a look at the logs once, and see if it gives any hint around the issue?

I also saw this somewhere

Shiny apps can only handle one R session per user, and if multiple users are trying to access the app simultaneously, it may reach its concurrency limits, causing 503 errors.

The 503 errors seem to be coming when multiple requests come at once. Can this be causing the issue?

parul157 · 2024-09-30T09:08:49Z

Hi @LEDfan , can someone please check the above logs if that helps in figuring out the issue with shinyproxy. It has been a major blocker for us for a long time now.

LEDfan · 2024-10-28T15:20:38Z

Hi @saurabh0402 @parul157 I would remove the log file here, it seems to contains some sensitive information.

From the log file I see that you are using, istio, I'm wondering whether this could be causing some problems with the connections. Nevertheless, I made some adjustments to the way ShinyProxy handles crashes, this should improve the behavior when requests fails because of network issues. Can you test using the image openanalytics/shinyproxy-snapshot:3.2.0-SNAPSHOT-20241008.093924?

parul157 changed the title ~~container stops intermittently~~ Container stops intermittently - Disruptive behaviour Jul 26, 2024

LEDfan added the question label Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container stops intermittently - Disruptive behaviour #512

Container stops intermittently - Disruptive behaviour #512

parul157 commented Jul 25, 2024 •

edited

Loading

LEDfan commented Jul 29, 2024

saurabh0402 commented Jul 30, 2024

LEDfan commented Aug 5, 2024

saurabh0402 commented Aug 5, 2024

LEDfan commented Sep 17, 2024

saurabh0402 commented Sep 17, 2024

saurabh0402 commented Sep 18, 2024 •

edited

Loading

parul157 commented Sep 30, 2024

LEDfan commented Oct 28, 2024

Container stops intermittently - Disruptive behaviour #512

Container stops intermittently - Disruptive behaviour #512

Comments

parul157 commented Jul 25, 2024 • edited Loading

LEDfan commented Jul 29, 2024

saurabh0402 commented Jul 30, 2024

LEDfan commented Aug 5, 2024

saurabh0402 commented Aug 5, 2024

LEDfan commented Sep 17, 2024

saurabh0402 commented Sep 17, 2024

saurabh0402 commented Sep 18, 2024 • edited Loading

parul157 commented Sep 30, 2024

LEDfan commented Oct 28, 2024

parul157 commented Jul 25, 2024 •

edited

Loading

saurabh0402 commented Sep 18, 2024 •

edited

Loading