Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful stop in repeat mode - part 2 #833

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

tleedjarv
Copy link
Contributor

@tleedjarv tleedjarv commented Nov 15, 2022

Background and motivation in #554. Closes #554
cc @bruceg

This PR adds several methods to fix the original request from #554, namely a "graceful shutdown". Not all of these methods need to be merged. They all have their pros and cons. I'm hoping for a discussion to reach a decision on which methods (possibly all) will be merged. I've kept commits separate for easier review and selection.

"Graceful shutdown" means that a stop can be requested at any time but the process is stopped only after any ongoing sync is completed (it does not need to be successful, just finished for the moment). This is complementary to #810.

In this PR the following methods for a "greaceful shutdown" are included:

  • signal SIGUSR2
  • reading a "stop condition" from (possibly redirected) stdin:
    • this can be an EOF (from the terminal Ctrl-D) (not available if stdin is a regular file or EOF already at startup (such as /dev/null)), or
    • byte 0x04 (other input is silently ignored), or
    • closing stdin (if it was open at process start)

For more details please also see the changes to the manual in this PR.

With the stdin method one has basically endless possibilities and flexibility to send a stop request. Here are some examples to get the imagination going:

  • sleep 60 | unison ... -repeat -- automatically request to stop the repeat loop after 60 seconds; it stops when the current sync is completed
  • echo | unison -batch ... -- guaranteed to stop after one sync, even if there is a repeat preference in the (default) profile; could be used in a script to guarantee termination when not all preference values are known
  • mkfifo fifoname ; cat fifoname | unison ... -repeat -- request to stop the repeat loop by writing 0x03 or 0x04 into the named pipe (it doesn't even matter what is written because EOF/closing will also work)
  • ncat -l 45678 | unison ... -repeat -- request to stop the repeat loop by opening a TCP connection to port 45678 and writing 0x04 into the socket (it doesn't even matter what is written because EOF/closing will also work) (in bash you can even do echo quit > /dev/tcp/yourhost/45678)

An overview of different methods and some of their strong and weak sides

MethodProsConsComments
SIGUSR2
  • an otherwise unused signal
  • easy for a user to send a signal to any process
  • works even if the process wasn't started in a "special way" (setting up stdin redirection, ...)
  • signals N/A in Windows
  • requires keeping track of or looking up the process id
stdin (terminal)
  • a simple keypress for the user
  • have to use blocking mode (input buffering from user's perspective); this could work much smoother if it wasn't for consideration for terminal settings and background jobs
  • pressing Ctrl-D several times risks quitting the shell (perhaps Unison should keep reading from stdin and that way prevent shell from seeing additonal Ctrl-D presses?)
It would be best to read from stdin in a non-blocking way. This can be done by setting stdin into non-blocking mode (POSIX) or changing terminal settings (POSIX and Windows). But both of these methods are very fragile because the terminal is shared with other applications that will be impacted or will change terminal settings for themselves. On top of that, the process can be run in background or foreground and switched between those modes at any moment. All this becomes very tricky. That's why I haven't included this code in the PR (although it works well, ignoring these concerns).
stdin (not a regular file)
  • very wide usage possibilities
  • available on all platforms
  • user has to start the process in a certain way (redirect stdin)
  • some sources may provide huge amounts of data (avoiding this should be responsibility of the user)
stdin (regular file)
  • relatively easy and instinctive(?) to use for the user
  • available on all platforms
  • user has to start the process in a certain way (redirect stdin)
  • relatively difficult to read from -- you have to keep looping and reading, indefinitely
Not sure if regular file redirected to stdin should be supported or not. Reading from a file needs constant looping (I currently set the interval at 1 s) which means that either it can be unreasonably busy, or the opposite, quite slow to react. If not looping then an EOF would be immediately reached. Also, current code only reads appended input (not a major problem, I think).

I've excluded files that are initially larger than 32 bytes (arbitrary limit).

doc/unison-manual.tex Outdated Show resolved Hide resolved
doc/unison-manual.tex Outdated Show resolved Hide resolved
doc/unison-manual.tex Outdated Show resolved Hide resolved
Unison would keep reading all the input looking for the stop condition.
an existing regular file larger than a few tens of bytes or a device or
a pipe that produces large quantities of data). In such cases, Unison
would keep reading all the input looking for the stop condition.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following. What is the stop condition then.

This seems like a one-off ad hoc mechanism. It feels like the beginning of a control socket to be able to do things to a long-running background process that has no UI. There's a long history of that in various programs, and (unix-centric view I know) and the usual solution is some simple protocol on a unix-domain socket. That leads to wanting to write the pid, and figure out naming because one could run lots of copies.

I lean to adding a mechanism and using it vs a non-extendable mechanism. But really if you can send SIGINT twice, or SIGUSR2 once, that might be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following. What is the stop condition then.

Byte 0x03 or 0x04 in the input (arbitrarily chosen, and 0x03 kind of goes against my previous comment about common idiom).

This seems like a one-off ad hoc mechanism. It feels like the beginning of a control socket to be able to do things to a long-running background process that has no UI. There's a long history of that in various programs, and (unix-centric view I know) and the usual solution is some simple protocol on a unix-domain socket. That leads to wanting to write the pid, and figure out naming because one could run lots of copies.

Yes, that's not a far fetched analogy because that's basically what this is.

I lean to adding a mechanism and using it vs a non-extendable mechanism. But really if you can send SIGINT twice, or SIGUSR2 once, that might be enough.

If it wasn't for Windows...

@tleedjarv
Copy link
Contributor Author

You may have noticed a common theme in my comments. Windows. A solution based on signals is fine but that leaves Windows completely out. Except for SIGINT which Windows knows about... but which is sent to all processes, so not really helpful.

@gdt
Copy link
Collaborator

gdt commented Nov 17, 2022

I guess the $64,000 question is: Are we willing to have unix do things the wrong way because there is no right way on windows. I will think more.

@tleedjarv
Copy link
Contributor Author

I've removed the SIGQUIT method. It really was redundant and only made the code more complex. Now the diff is smaller and easier to review.

In principle I'm also ok with allowing /dev/null as stdin but I didn't add the code just yet because it's going to make the code more complex again. I'm waiting with this until after the concept has been reviewed and agreed upon.

@tleedjarv
Copy link
Contributor Author

tleedjarv commented Feb 14, 2023

Status update.

What is currently implemented (not everything is merged yet).

To interrupt/terminate immediately (already merged)

  • SIGINT tries to terminate the process as quickly as possible, interrupting the ongoing sync but still doing the cleanup (which requires communication with the remote end)
    • SIGINT multiple times will terminate immediately, skipping the cleanup
  • SIGTERM works the same way as SIGINT
  • This works in all modes, interactive and batch, single run and repeat mode

To stop the repeat mode without interrupting ongoing sync (this PR)

Continue with any ongoing sync, just don't start the next repeat loop after this one.

  • SIGUSR2 (not available in Windows)
  • stdin
    • was initially open and is now closed
    • EOF received (in some cases only; regular files and < /dev/null will not trigger this condition, for example)
    • byte 0x4 received

@tleedjarv
Copy link
Contributor Author

All the changes are now pushed in this PR.

@@ -310,9 +310,23 @@ or
.Sy SIGTERM
works the same way as described above and will additionally stop the continuous
process. To stop only the continuous process and let the last synchronization
complete normally, send signal
complete normally, press
.Sy Ctrl-D
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this talking about sending a literal ^D 0x4, which one would do on BSD by ^V^D, or about typing the character coded as EOF to the tty so that the reader gets EOF?

Copy link
Collaborator

@gdt gdt Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that was a question, but maybe I mean "it would be nice if this were clearer, even though I realize being precise is harder to understand for many". Perhaps "Send an EOF by pressing the key configured for that, typically ^D. Or, send a literal 0x4."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try rewording it. Literal ^D and EOF, either will work

.Sy SIGUSR2
instead.
Closing the input, receiving an EOF, or receiving a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really do not understand "closing the input". stdin is file descriptor zero and unison is in charge of closing/not-closing that. I think you mean closing the other half of some split pty/fifo special file to cause read(2) in unison to return EOF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. There may have been some situation where a close on the other end would not produce EOF on the read end but I can't remember what it was. And thinking about it now, that doesn't seem realistic either. I may have been trying to make the language assume less about the reader knowing about conditions when EOF is generated. I'll see if I can rephrase it (no promises).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to speak precisely about how unison behaves based on return values from read(2) syscall on fd 0, and not try to have it make sense to people that do not understand, when that has any fuzziness.

man/unison.1.in Outdated
.Sy ^D
(0x04) from the input terminal or a redirected standard input all have the same
effect. If the standard input is redirected from a regular file, is not open or
is not open for reading already at Unison startup then it is ignored and only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is not open" means presumably that read on fd 0 returns some errno that the fd is not valid. Which is a very odd way to invoke a program. And not open for reading I don't follow, do you mean read returns EPERM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is not open" means presumably that read on fd 0 returns some errno that the fd is not valid. Which is a very odd way to invoke a program.

Yes, basically getting EBADF. If it is too much to be included in the manual like this, let me know. It is hard for my to judge myself.

And not open for reading I don't follow, do you mean read returns EPERM?

I don't remember the exact errno(s) anymore, but yes, this is the idea. This is something that some versions of nohup do, for example. https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/nohup.c?id=01755d36e7389aef48f712193976205e7381115b#n117

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "In repeat mode, unison tries to read from stdin immediately on startup. If that results in EOF, or returns any other error, then stdin is ignored and it is not possible to cause a graceful stop via stdin."

any source providing a lot of input (such as a device or a pipe that produces
large quantities of data). In such cases, Unison would keep reading all the
input looking for the stop condition.
.Ed
Copy link
Collaborator

@gdt gdt Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but it seems that whatever is provided for stdin should produce essentially nothing until it either sends EOF or 0x4, and anything else is odd. This phrasing hints at usages with moderate amounts of data making sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is somewhat unnecessary, yes. I'll try to rephrase or get rid of it completely. I thought about warning users not to shoot themselves in the foot but the same is true for any stdin redirection, so nothing special here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, just dropping it seems fine.

standard input is redirected from a regular file, is not open or is not open
for reading already at Unison startup then it is ignored and only signals can
be used to send the stop request.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to say really clearly that if stdin is /dev/null, then the stdin checking process is not used, as a guarantee.

| { st_kind = S_REG; _ } -> false
| st -> begin try fdIsReadable Unix.stdin with Unix.Unix_error _ -> true end && notEOF st
| exception Unix.Unix_error _ -> false

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the /dev/null check, that if you get EOF on the very first read, then it isn't used? I didn't figure that out from the manual, and it provides a race condition where termination might be signalled before unison does the first read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Yes, this is the check and yes, there is a race condition. However, I decided to accept this race condition because it is extremely unlikely to occur with "normal" usage. I don't know of another way of checking specifically for /dev/null and distinguishing it from something else that just happens to already return EOF.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may or may not be ok, but the documentation did not explain the behavior or caution, and did not hint at a race.

not interpreted as a stop request when standard input is redirected from a
regular file. If the standard input is not open or is not open for reading
already at Unison startup then it is ignored and only signals can be used to
send the stop request.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have some idea that the regular file will be written, and that unison will be doing rewind and seeing the content again.

And, if there is an empty/32byte rule, it should be in the manual. Perhaps the manual should say that it must be a 0-byte file and stopping must write a single 0x4, if that's how it is, or maybe 0x4\n is ok too. The specification of what to do should be narrow. We also need to avoid UTF-8/encoding issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have some idea that the regular file will be written, and that unison will be doing rewind and seeing the content again.

Need to improve the language. That's actually not what will happen; there will be no rewind, only appended content is read. (In fact, I thought about rewinding but it seems too much...)

And, if there is an empty/32byte rule, it should be in the manual. Perhaps the manual should say that it must be a 0-byte file and stopping must write a single 0x4, if that's how it is, or maybe 0x4\n is ok too. The specification of what to do should be narrow. We also need to avoid UTF-8/encoding issues.

Can add the 0-byte rule. Currently (can be changed, of course), only 0x4 is checked but any other input before and after that byte is ignored. So, 0x4\n will work but so will \n0x4. Encodings will cause any issues here because it's just the raw byte 0x4 that is checked.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So read(2) will return EOF, and then when a byte is written a further read(2) call with return that byte? Is that guaranteed?

@gdt
Copy link
Collaborator

gdt commented Feb 17, 2023

Now that this has settled I looked through in more detail.

I'm totally ok with merging the USR2 stuff as I see no downsides.

I think the stdin stuff is ok, but I would like to see tighter interface specs to avoid limiting us later, and more clarity about the rules; I find "stdin is open" to be confusing.

There is also some plan presumably (didn't read the code super carefully) to recheck a regular file via seek/read every N seconds. I hope this does not update atime, and in general I'm a little worried about the process doing things and using cpu and fs ops all the time when unison should be more or less quiescent. All of this makes me wonder about a command-line arg (not a preference) to enable the stdin checking behavior.

@tleedjarv
Copy link
Contributor Author

tleedjarv commented Feb 17, 2023

Now that this has settled I looked through in more detail.

I'm totally ok with merging the USR2 stuff as I see no downsides.

Ok, I may very well split the PR to get the foundations out of the way.

I think the stdin stuff is ok, but I would like to see tighter interface specs to avoid limiting us later, and more clarity about the rules; I find "stdin is open" to be confusing.

Ok, let's work on this. I don't quite remember the origins of "stdin is open" (is this possible on POSIX at all?). It may be a Windows thing where it is possible not to have stdin at all but that only applies to GUI applications, which is clearly not the case here. If we conclude that stdin is always open (although, as you can see with nohup, it does not have to be readable) then we can skip all that language.

There is also some plan presumably (didn't read the code super carefully) to recheck a regular file via seek/read every N seconds. I hope this does not update atime, and in general I'm a little worried about the process doing things and using cpu and fs ops all the time when unison should be more or less quiescent. All of this makes me wonder about a command-line arg (not a preference) to enable the stdin checking behavior.

It's not as bad. select(2) is used, so no extra cpu or fs ops are consumed constantly/regularly. I specifically made sure that the process wouldn't do anything extra. If it does then it's a bug. I don't know about updating the atime but if you have redirected a file to stdin then you have requested yourself for that file to be read, so I'd say that's not a worry.

Edit: select(2) doesn't work on regular files -- of course... -- so there is a loop with 1 second interval to read from stdin if it's a regular file. That loop is active only during the wait between sync cycles; but this wait can be several hours long.

@gdt
Copy link
Collaborator

gdt commented Feb 17, 2023

Please do split USR2 and we can merge that right away and maybe get some user testing.

It strikes me that the stdin mechanism is ending up a bit awkward. I see the reason for each step but it is still complicated.

I wonder if there is a way (that works on windows) to instead of the stdin mechanism, do:

unison <profile> -repeat -graceful-id 57 &
[long time later]
unison -graceful-stop 57

and have that under the covers (but the above is the interface spec) open a fifo (or regular file if it has to be on windows) in $UNISONHOME.

It doesn't have the "fg^D" simplicity, but it is very scriptable and straighforward. It avoids the race condition (assuming the 2nd command waits until the fifo appears, or gives up after a minute).

Lwt_unix does not implement all functions on Win32. Make it possible to
differentiate the Lwt_unix implementation at run time to avoid calling
unimplemented functions. Previously this differentiation was done by
probing the compiler OS platform (Win32? with Cygwin?) but this is not
reliable because it does not indicate the actual Lwt_unix implementation
that's linked in.
Add a new stop request method to terminate repeat mode. See commit
e72c1c5.

Monitor stdin to read any of the several stop conditions. Using stdin
for checking the repeat mode stop request opens up a wide range of
possibilities for the user, including manual and automated, local and
remote.

This method is usable only if stdin is an interactive terminal or is
redirected from a readable source that is not a regular file.
Add a new stop request method to terminate repeat mode. See commit
e72c1c5.

Enable the method from commit a1bc7be
to work in Windows.
Improve the method from commit a1bc7be
to allow Unison run as a background job in an interactive terminal.
Reading from stdin while not a foreground process will normally result
in the process being stopped (or receiving EIO if ignoring the stop
signal).

Do not attempt to read from stdin while not in foreground. Guard the
process against being stopped when trying to read from stdin at the
wrong moment.
Add a new stop request method to terminate repeat mode. See commit
e72c1c5.

Enable the method from commit a1bc7be
to work with stdin redirected from a regular file.

Regular files differ from other sources:

 - Since a regular file is always ready for reading, limit the reads to
   once a second.

 - Reaching EOF in a regular file is not considered a stop request. This
   enables the process to be started with input redirected from an empty
   file and then waiting for data to be written into the file.

 - To prevent reading massive amounts of data, only enable the regular
   file method if the file is initially empty or at most 32 bytes in
   size.
@tleedjarv tleedjarv changed the title Graceful stop in repeat mode Graceful stop in repeat mode - part 2 Feb 18, 2023
@tleedjarv tleedjarv marked this pull request as draft March 19, 2023 15:56
@gdt gdt added the DRAFT PR is not ready to merge label Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DRAFT PR is not ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make SIGINT be more of a "graceful shutdown" than it is now
2 participants