-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add TERM handling #375
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for yet another contribution, Gavin! At a high level, this makes sense to me. At an implementation level, I wonder if this could be done without resorting to the private sun.misc.Signal
API, which should generally be avoided for forward compatibility with future Java versions. I did a quick Google search and found that the official Runtime.getRuntime().addShutdownHook
API should be able to handle SIGTERM
. Can you look into using that API rather than the private sun.misc.Signal
API to implement this RFE?
@basil thanks for the feedback... I did start out with the |
167e22a
to
dc49946
Compare
@basil I've replaced the use of PTAL :) |
This is useful to gracefully stop the Jenkins swarm process, for example when running in Kubernetes in conjunction with a preStop lifecycle hook. Signed-off-by: Gavin Williams <gavin.williams@elastic.co>
dc49946
to
c5b5b1e
Compare
logger.info("Interrupting '" + th.getClass() + "' termination"); | ||
th.interrupt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you verified, when sending a TERM
signal to a connected Swarm client, that this if
statement is true and that the code inside the if
statement is executed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh, so I've confirmed that doing a kill <PID>
results in the log line being output:
java -jar client/target/swarm-client.jar -description "fatmcgav swarm test" -disableClientsUniqueId -executors 1 -fsroot jenkins -retry 6 -maxRetryInterval 160 -retryBackOffStrategy exponential -mode exclusive -username local-swarm -password **** -deleteExistingClients -name fatmcgav-swarm-test -master https://*** -username local-swarm -webSocket -labels swarm -labels x86_64
Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client logArguments
INFO: Client invoked with: -deleteExistingClients true -description fatmcgav swarm test -disableClientsUniqueId true -executors 1 -fsroot jenkins -labels [swarm, x86_64] -maxRetryInterval 160 -mode exclusive -name fatmcgav-swarm-test -password ***** -retry 6 -retryBackOffStrategy EXPONENTIAL -url https://*** -username ***** -webSocket true
Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client run
INFO: Connecting to Jenkins controller
Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client run
INFO: Attempting to connect to https://***
...
Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client run
WARNING: Remaining retries: 4
Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client run
INFO: Retrying in 20 seconds
Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client$1 run
INFO: Interrupting threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh, so I've confirmed that doing a
kill <PID>
results in the log line being output:java -jar client/target/swarm-client.jar -description "fatmcgav swarm test" -disableClientsUniqueId -executors 1 -fsroot jenkins -retry 6 -maxRetryInterval 160 -retryBackOffStrategy exponential -mode exclusive -username local-swarm -password **** -deleteExistingClients -name fatmcgav-swarm-test -master https://*** -username local-swarm -webSocket -labels swarm -labels x86_64 Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client logArguments INFO: Client invoked with: -deleteExistingClients true -description fatmcgav swarm test -disableClientsUniqueId true -executors 1 -fsroot jenkins -labels [swarm, x86_64] -maxRetryInterval 160 -mode exclusive -name fatmcgav-swarm-test -password ***** -retry 6 -retryBackOffStrategy EXPONENTIAL -url https://*** -username ***** -webSocket true Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client run INFO: Connecting to Jenkins controller Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client run INFO: Attempting to connect to https://*** ... Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client run WARNING: Remaining retries: 4 Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client run INFO: Retrying in 20 seconds Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client$1 run INFO: Interrupting threads
Do you see a line above that says "Interrupting '[…]' termination"? I can't see one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point... I'll do some debugging...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I ran some tests locally, and even with the swarm-client
successfully connecting to a Jenkins controller, I don't see any threads being interrupted.
I've put a couple of thread dump here: https://gist.github.com/fatmcgav/f848c071bbf5f650408c3e7f0b45238a
The first one is just with the agent idle and connected to Jenkins, the 2nd with a simple sleep script running...
It looks like the runningThreads
set never gets populated with anything based on adding some additional logging...
So is it safe just to remove the logic? Or should the threads be getting interrupted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on https://github.com/jenkinsci/remoting/blob/f4b81bbb7b2df53bc14e1a14a36d4fd3ce570a31/src/main/java/hudson/remoting/jnlp/Main.java#L284-L287 it is sufficient to send an interrupt to the main thread, which will in turn interrupt the Engine
thread. In Swarm's case, the main thread is the one running the while (true)
loop in Client.java
. (One way of reliably getting at this thread is to call Thread.currentThread()
at the start of main()
and hold on to the reference.) That thread should be sitting at line 235 of Client.java
at idle, and when interrupted it would first hit the finally
block of the snippet I linked to above (which calls engine.interrupt()
to interrupt the Engine
thread) then fall into the catch
block at line 240 of Client.java
, where we log a stack trace and eventually loop back around to reconnect. In your case you want to effect a clean shutdown, so you will want to change this to not log a stack trace and reconnect but rather to wait for the Engine
thread to finish shutting down and then exit from the main thread cleanly without looping back around to reconnect. To distinguish this new case from the existing use cases you will likely need to add some sort of AtomicBoolean
that is set by your new shutdown hook.
@basil thanks for the great reply and the pointer... I'll rework... |
This is useful to gracefully stop the Jenkins swarm process, for example
when running in Kubernetes in conjunction with a preStop lifecycle hook.
Signed-off-by: Gavin Williams gavin.williams@elastic.co
Example log output: