-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about how to gracefully stop a plugin. #379
Comments
Another question related to my previous one. In the application, there are several plugins that interact with each other (depend on each other) in the entire application workflow. So to avoid to lost "data" (or "information"), the plugin must be stopped in a define order. When I stop mfdata, the plugins are stopped in "random" order. So in order to stop the plugin in a defined order, I imagine to uninstall plugin (plugins.uninstall) one by one. Unfortunately, this doesn't seem the right way to stop a plugin, the process continues to run and then failed. How can I stop the plugins in a certain order ? The main use case related to my questions concern the shutdown of the application (plugins) for maintenance (upgrade of plugins and/or their configuration). |
=> graceful_timeout parameter in plugin config.ini is the way to configure this the default value is 600 (10 minutes) => can you confirm you have the problem with a manually, you can control your plugin step (including order) with
to see available "watchers"
=> do you have the same problem when you manually stop the plugin with this command? |
Thanks, I'm trying and checking your suggestions with _=> graceful_timeout parameter in plugin config.ini is the way to configure this the default value is 600 (10 minutes)_ In the plugin config.ini, the timeout parameter is 600 (for each step of the plugins)
I said it seems 300 seconds because of the "counter" displayed when stopping the plugin, e.g.:
The counter seems to be incrementing by 1 every second |
ok I'm waiting for you about circusctl behaviour |
=> can you confirm you have the problem with a mfdata.stop which doesn't respect the graceful_timeout value of your plugin? I don't think mfdata.stop doesn't respect the graceful_timeout value of the plugin. I don't get the same behavior when I run several times the test/check (described below) I ingest a large file (the time to processed this file by the plugin is about 30 minutes, mainly a command inside the plugin to convert the grib file to a netcdf file with the grib_to_netcdf tools of ecmwf) I check with First check with I ingest the file, the process of my plugin is running, then I stop mfdata (mfdata.stop), I get
(each increment is 1 second) During this wait, I run
Then when 300 is reached, so after 300 seconds (each increment is 1 second), I get:
I check the log file of the plugin: the process did not come to the end. The process seems to be killed. I run Second check (same condition) with mfdata.stop I ingest the file, the process of the plugin is running, then I stop mfdata (mfdata.stop), I get
(each increment is 1 second) During this wait, I run
Then when 300 is reached, so after 300 seconds (each increment is 1 second), I get:
Here the 'waiting' counter seems to be increment by 1 every 15 seconds During this wait, I run
In the meantime, the process of the plugin is successfully ended Then I get :
I check the log file of the plugin : all is fine. In this case, the process of my plugin does not appear to have been killed. Third check with circusctl stop I ingest the file, the process of the plugin is running, then I run I run
The result says step.wcsingestion.grib is stopping I check the process is always running, with ps -ef |grep grib command:
That's OK, the process is already running. However, when I run
It is considered to be stopped. It seems also What should I do, if I want to update my plugin (or its configuration). Stop and restart mfdata ? However my process is not ended, and before launching this command I must wait for the plugin process to finish (check visually in the log and / or with the command ps-ef | grep .. .) At this step, I can't say if I have gracefully stopped the application and if I can do safely some "handling" in a production environnment (e.g. ugrade plugins, configuration, stop and restart mfdata to aplly the upgrade ). |
ok thanks, let's fix it! |
OK, thanks. What I understand is that it will be possible to stop a plugin individually (e.g. with circusctl), after the fix. So we can stop the application plugins in a specific order. That's right ? |
I don't understand why you need to stop plugins in a specific order if the bug is fixed (and it will be fixed very soon), you don't need to stop your plugins in a specific order, don't you? |
Simple ("academic") examples to illustrate why I want to stop plugins in a specific order: A plugin "1" processes data and then send a "message" to others plugins ("2", "3", ...) that must do something with it to keep consistency in the system/application. Let's assume with run mfdata.stop, plugin "1" is being processing data, mfdata.stop waits for plugin "1" to end, but perhaps plugins "2", "3" have been stop by mfdata.stop before executing the 'plugin "1" stop' instruction. Then the plugin 1 is finished, the "message is sent" to plugins "2" and "3". However, plugins "2" and "3" are stopped. So the "message" will be lost, and the state of the system/application may be inconsistent (database for instance) In my mind, to keep consistency, I need to stop first plugin "1". A "mechanism" waits for the end of plugins "1". Then I can stop the plugins "2" and "3" (after a short time to be sure the "message" is received by plugins "2" and "3"). I imagine to do this "stop" through a shell script for instance (stop plugin "1", wait for "a short time", stop the plugin "2", wait ... and so on). In MET-GATE MF application we have many plugins that do a specific "task" (as "micro-service"). So, the worflow must not be broken. To be more concrete: Here is the list of a plugin and the order that mfdata.stop stops the plugins:
In my mind, the order to be sure the system/application keeps consistent according to this list of plugins is:
|
the standard communication protocol for mfdata plugins is to exchange files (and tags on theses files) if you exchange messages with a message bus (AMQP?) between mfdata plugins, it can explain your problem but if you have this kind of synchronization problem, you will get some other ones during plugins autorestart (max_age feature) |
I intentionnally surrounds messages with quotes, because it is currently AMQP messages, and files which are created by a plugin and sent to "switch". And the problem is even more complex, because all the plugins are not necessarily hosted on the same machine. In my list of plugin, you will see several plugin to stop in the same command mfdata.stop. This is because we are here on an development or integration plateform and all the plugins are on the same host. That's not necessary the same thing in production environnment. About max_age feature, I understand the process may be restarted at "any time", that's will be indeed a issue. I see max_age may be disabled (max_age = 0), I imagine if it's disable, it could cause other issues ? |
@thefab Ok. Thanks. |
@thefab : I don't forget to check. Just a lack of time. I'll do it soon 😃 |
I check my use case. I have ugraded all metwork module (v1.0). The fix works, but there are other errors, something "strange" , I explain below. The fix works : my plugin For I run my 'long process' (huge grib file) and then stop mfdata. mfdata waits for my
My process takes about 30 minutes. It is executed correctly as expected after 30 minutes. That's OK. then mfdata continue stopping. But the other issues I see are the following : when stopping mfdata, mfdata first schedules 'stop for the plugin', and all the scheduling after the plugin whose process is running (
The plugins log file says (same error for the plugin steps in error except for
About
After this time mfdata continue to stop the other plugins : and displays :
Notice, if there is no process running (for the plugins) during stopping, there is no error : all is 'OK':
|
I need to stop gracefully the process of the plugin.
I realize when I stop mfdata (mfdata.stop), mfdata waits for the end of the processes being running before stopping the plugin (that's great). The maximum waiting time seems to be 300 seconds.
Unfortunately, some of our plugins deal with large files whose processing time exceeds this limit of 300 s. So the process is roughly killed, and the "workflow" is broken/lost.
My question is : is there a way to change/configure the waiting time ? Or is there another way to gracefully stop a plugin when the process is being running ?
This concerns MFDATA. About MFSERV, I suppose the same mechanism is implemented : i.e, when a request is being running while mfserv is stopping, mfserv waits for the end of the process. Is it right ? (the requests processing time doesn't exceed 300 s 😊)
Thanks.
The text was updated successfully, but these errors were encountered: