-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposing an easy method for graceful application shutdown #198
Comments
There are a couple exception handlers that would need to be modified, but before I do it and send a PR I would love to have feedback from someone deeper in the design of the thing. |
Your application (WSGI app) is running inside a bunch of threads, and is not the one that sees the KeyboardException that is caught in server.py. Your application won't ever receive that exception or be able to handle them. Since the application object is a WSGI application, there is no way to notify it that shutdown is in progress, and waitress doesn't expose any handlers that would allow custom code to be executed. This means there is no way to signal your application it is going away (and even if there were, there is no good way to signal that per thread, which may be deep inside your WSGI application). Now if you are talking about graceful shutdown as "continue servicing existing requests, but don't accept new ones" similar open PR's exist: Neither of which will likely apply to the current waitress (since there was some changes related to how it creates listen sockets), but I would be interested in seeing how to make graceful shutdown happen that way. |
I think we misunderstand each other. In my deployment, it seems that waitress itself is shutting down satisfactorily -- at least I see no evidence of lingering effects when I send it a kill signal. (CentOS 7.3, python 2.7.15) The tickets you cited seem focused on waitress itself. I am specifically talking about shutting down the WSGI application that waitress services. My workaround for the moment is to apply shutdown operations after the waitress.serve() call, but I would prefer to wrap the waitress.serve() call in an exception handler and to be able to catch KeyboardInterrupt and SystemExit (e.t al) myself to have finer control over why I'm actually shutting down the application. Waitress seems to be receiving the signals, I'm not sure why you're confident that raising them at the end of waitress' handler would not allow them to be successfully processed at the outer scope. You're more in the weeds of the implementation, so I trust your judgement - it's just not immediately obvious. In any case, I may look at #56 in a few weeks. I can't promise anything, but I appreciate that this is a volunteer effort. Thanks. |
#56 wouldn't help you either, since that is related to how long waitress will wait before it shuts down connections hard. |
@drone115b What do you expect the application to do if the server intends to shut down? Since waitress is usually the outermost layer of the WSGI stack, it can not raise exceptions to the application, which is probably the lowermost layer. It might be possible to create a signal similar to the one implemented in #310 so the application is able to check if the server intends to shut down and can somehow react to it. But I don't know how helpful that would be. Either the application decides to finish the request and write the answer, in which case it will probably not have a possibility to become faster if it knows that the server is shutting down (if it can be faster, why isn't it always faster?). Or it decides to abort execution - which it will also do if any graceful shutdown period runs out and waitress is shutting down. The only case I could imagine is if the application knows that the next step in the execution will take a long time and decides that it might as well not start if the server is anyhow going to shut down. Is this a use case you have? @bertjwregeer I would like to start discussing possible implementations regarding a graceful shutdown that a) serves the remaining requests without accepting new ones and b) can be triggered both from receiving a signal and from an exception from within the application. In your opinion, should I open a new issue or continue the discussion in this issue, which seems to have been started with a different goal in mind? |
@viktordick This is what needs to be done to add graceful shutdown which allows waitress to finish serving any remaining connections: #269 (comment) If you want to work on that, by all means. I feel uneasy about anything that implements the second part of your b though, I am not a huge fan of an app being able to send a signal back to shut stuff down. |
I agree with @bertjwregeer that signaling the application should not be part of this initial work. It's a valid concern but initially the focus should be on:
This behavior should be triggered from an api on the server object like As far as a timeout on the graceful shutdown, there's not much to say here because no matter what, waitress is thread-based and cannot kill active threads. We should rely on the process manager to follow standard graceful shutdown logic which is to send a SIGTERM, wait X seconds, then send SIGKILL. Graceful is graceful, and waitress cannot stop connections that don't want to stop because an app is processing them. Once we're at this point, we can look at how to help the app shutdown if it wants to care. |
All right, I agree that we can drop idea b). If required, the app can simply send SIGTERM to its own process. When implementing a graceful shutdown, I usually use the pattern that a second SIGTERM or SIGINT terminates immediately. That way, if started directly from the command line for debugging purposes and a request is not getting finished, hitting Ctrl+C will say "Terminating" and if it still takes too long, I can hit a second Ctrl+C. I do not have any experience of signal handling on systems other than Linux. Are other systems even supported? I hope I will find some time in the next week to work on this. |
@viktordick I agree with your proposed behavior. The idea being that the graceful signal handler is simply unregistered after the signal is received. The second signal triggers the default behavior in the process. |
I don't know how realistic this is, but if there's a way to turn this feature into a couple PRs that'd be really great because I think it's going to involve touching quite a lot of waitress and become difficult to review. Anyway just something to think about. As a friendly warning, I think that this shutdown / cleanup stuff is directly related to quite a lot of open sockets left around in the test suite as it runs as those tests do not cleanup the running event loop. This issue should be solvable as part of this work, but maybe not in the same PR necessarily. Just a thing to watch out for. |
I am afraid that I will not be able to work directly on this in the near future, after all. We decided to first use a different method - unregistering the backend from the load balancer (in our case HAproxy), waiting until the backend has no requests left and then restarting the waitress process before again activating the backend. With this strategy, I am more confident that I know how to implement each step on the way (and it will also work for older Zope2 installations). Hopefully, I will be able to still return to the problem in waitress itself, but I am afraid it will not be sometime soon. |
@viktordick thank you for keeping us in the loop, thanks for all your efforts! |
use from waitress.server import create_server
if __name__ == '__main__':
#
# some deleted code.....
#
# signal handler, to do something before shutdown service
def handle_sig(sig, frame):
logging.warning(f"Got signal {sig}, now close worker...")
worker.close()
server.close()
for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGQUIT, signal.SIGHUP):
signal.signal(sig, handle_sig)
server = create_server(app, host="0.0.0.0", port=metric_port)
server.run() |
There should also be a distinction between INT and TERM such that TERM lets the application stop listening on the socket for new connections but finish serving existing open sockets, while INT tells the app "just drop everything on the floor". Ideally the application also allows the orchestrator to know when it is finally quiescent (probably through an HTTP mechanism). Waitress-based applications, containerized for use on Kubernetes, need this behavior. Otherwise killing the app is just going to cause broken active connections. |
For graceful shutdown the focus is really on handling SIGTERM as that's the standard signal used for that purpose. There are no required semantics around SIGINT. If I were implementing this feature I would be treating both SIGINT and SIGTERM as a graceful shutdown which is consistent with other systems I have used supporting the feature. |
Yeah I wouldn't mind if INT is not "properly handled", so long as SIGTERM is graceful shutdown with all connections finishing up properly. I require this for my automation, and Kubernetes folks require it too. |
…application shutdown. Not handled by waitress: Pylons/waitress#198
…application shutdown. Not handled by waitress: Pylons/waitress#198
…application shutdown. Not handled by waitress: Pylons/waitress#198
I think there's an appetite for waitress to be able to shutdown applications gracefully. I use waitress independently of pylons for a wide variety of micro-services.
On line 263 of server.py, SystemExit and KeyboardInterrupt exceptions are caught to shut down the task dispatcher.
I am not very fluent with the code, but if there was a "raise" call inserted after line 264, then it seems that higher-level exception handlers could be used to catch the same exceptions and shut down the application object gracefully.
Is there an argument against this? Have I misunderstood the code?
I think the pros are: 1-line (1-word!) solution, highly backwardly compatible, seems like it would address graceful application shutdown robustly.
The text was updated successfully, but these errors were encountered: