Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and propagate renames/moves instead of delete+create #576

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tleedjarv
Copy link
Contributor

@tleedjarv tleedjarv commented Aug 17, 2021

Motivation

Currently all renames are detected and propagated as two separate actions: a delete on the old path and a create on the new path. For small files this is usually not an issue. For big files, this may require transferring perhaps gigabytes of data. All this for a single rename or a move. With the local copy shortcut it may save the user from a network transfer, but copying locally is still less than optimal.

This patch will detect moves/renames and try to propagate those as a move/rename in the other replica, with no data copied. It may seem like a gimmick--and perhaps it is, given its limitations--but in theory, it can be a huge performance boost. It can be turned on or off by user preference.

Closes #23

Description

Important! This is a breaking change. Once the compatibility patches have been merged, I will rebase this PR and make it backwards-compatible. I am hoping that making this PR available earlier gives a better chance of getting reviews and testing. Now rebased and backwards compatible.

I was thinking about implementing a move detection in any of the three phases: update detection, reconciliation, and propagation.

Doing move detection at propagation is not a good choice as it would be confusing to users (they will still see separate delete and create), allow the user to change the propagation of deletion and creation separately and require processing of all transfer instructions to detect moves.

Doing move detection during update detection seems reasonable--I even have a working proof of concept--but becomes complex with reconciliation conflicts, and would still require post-processing of reconciliation results.

With the inevitable post-processing requirement, I ended up implementing move detection during reconciliation. Reconciliation will handle all the conflicts first and only then, if still possible, will the moves be injected into reconciliation results. The moves are injected as one new reconItem replacing a combination of a delete and a create reconItem.

At propagation, these move-reconItems will be propagated as a single rename, if possible. If move/rename propagation is not possible then the transport falls back to propagating the create and the delete separately; exactly as they would be propagated without this patch.

More details are in the code comments.

My claim is that this patch does not change the properties of Unison's formal specification. Without an actual proof, my reasoning is based on these two facts: i) a "move" is a transport-level optimization akin to local copy shortcut--even if it is visible to user during interaction; and ii) if a "move" can't be propagated as just a rename then it falls back to normal propagation of a create and a delete.

The first commit does change the prevState type but this does not change any of the semantics, it just provides more data in the updateItem (and as a bonus, perhaps counter-intuitively, reduces memory consumption).

Limitations

There are quite severe limitations when detecting what can be a move/rename. The contents must be exactly the same. For files, this means file contents. For directories, this means the names of all children and their contents are the same, recursively. In practice, this is not a limitation when doing a simple rename. But when doing a rename and then changing even one byte in a file, or renaming children in a directory or adding a file in the renamed directory, or creating a new directory and moving files into it, then it becomes impossible to detect and/or propagate such changes as a rename.

These limitations are most likely not going to be a hinder when using the watch mode with fsmonitor. Then each change is propagated almost immediately, so a rename could be propagated before the user can do further changes.

How does it look like

(It's basically the same in GTK GUI)

Some examples:
image
image
Some examples of conflicts:
image
image

For comparison, how it looked before:
image

@acolomb
Copy link

acolomb commented Aug 17, 2021

I'm a bit skeptical whether I would like renames to be displayed during reconciliation. As you mentioned, with a delete and a create entry, the user can choose to override each of them individually. I would like to keep that available. I sometimes even deliberately use it to revert some change and end up with two copies of the same file.

Without having considered all the details, my preference would be to optimize during propagation. We carry out what the user chooses, and if that happens to allow renaming a file instead of trashing the one they chose to discard (under its old name), the shortcut can be taken. But keep in mind that such a rename might cross file system boundaries so would have to be broken up again. Or checked for same file system in advance, which is probably hard to do in a portable way.

Also please make sure this won't affect how diligently unison tries to keep a consistent state and never loses data. There is probably a good reason why deletes are started only after all creates are carried out? What happens when the source file changes between re-checking its contents and the actual rename? That's what a temporary copy mostly protects you from.

@tleedjarv
Copy link
Contributor Author

I'm a bit skeptical whether I would like renames to be displayed during reconciliation. As you mentioned, with a delete and a create entry, the user can choose to override each of them individually. I would like to keep that available. I sometimes even deliberately use it to revert some change and end up with two copies of the same file.

That's a fair point. I was thinking even for myself that I'd like that option to be available. The user could skip syncing those items and then do another run with the moves preference set to false. Of course, this would not do any renames during propagation either.

The good news is that with this patch, "injecting" the detected moves into transport items as an optimization even after user interaction is not going to be very difficult. So that is something worth considering.

But keep in mind that such a rename might cross file system boundaries so would have to be broken up again. Or checked for same file system in advance, which is probably hard to do in a portable way.

I currently catch EXDEV (have not tested on Windows) and then fall back to copy+delete.

Also please make sure this won't affect how diligently unison tries to keep a consistent state and never loses data.

All those guarantees are intact (save for any bugs), including backup copies that are made.

There is probably a good reason why deletes are started only after all creates are carried out? What happens when the source file changes between re-checking its contents and the actual rename? That's what a temporary copy mostly protects you from.

None of those protections are broken or changed (save for any bugs). Actually, they are even made somewhat stronger.

The way it happens currently is that for non-deletions, many propagation threads are started simultaneously and they all have to complete (success or failure, doesn't matter, just have to complete the threads). After that, many propagation threads are again started simultaneously, but this time to propagate all the deletes.

If rename propagation is not possible then it does propagate a delete as part of that first batch of threads, before the final run of all deletes at the same time. But the crucial part here is that this delete is propagated if and only if the related copy propagation succeeded. So this is actually a stronger protection compared to the current non-rename propagation (which will propagate the delete even if the copy failed).

As you mentioned, while propagating a copy (this includes new files/dirs, updates), it is first created under a temporary name and then renamed into place with its final name. This is exactly the same mechanism as how the rename propagation works (in fact, it even uses the same function), except that it renames directly from the source, not from the temporary name. There is an unavoidable race condition between re-checking the contents and doing the actual rename. This is inherent and does not change with this patch. Not considering the race condition, if the source file has changed before the rename could be carried out, then the propagation fails and no changes are made.

I have been very careful in maintaining (or even improving) the correctness and safety, and I do expect reviews and testing from anyone interested in this feature.

@acolomb
Copy link

acolomb commented Sep 8, 2021

As you mentioned, while propagating a copy (this includes new files/dirs, updates), it is first created under a temporary name and then renamed into place with its final name. This is exactly the same mechanism as how the rename propagation works (in fact, it even uses the same function), except that it renames directly from the source, not from the temporary name. There is an unavoidable race condition between re-checking the contents and doing the actual rename. This is inherent and does not change with this patch. Not considering the race condition, if the source file has changed before the rename could be carried out, then the propagation fails and no changes are made.

The thing that gives us the highest probability to rename a file which has not been modified since hashing is its temporary name IMHO. By using the .unison. prefix, only really nasty other software would end up touching the file. Whereas I can imagine that some software might access the file under its original name, e.g. because it is configured to run something in regular intervals or triggered by some other file changing etc. That increases the risk of content going awry, but could be caught by hashing the target again after the rename operation?

@tleedjarv
Copy link
Contributor Author

You are right, that risk exists but it's not unique to nor introduced by this PR. When doing a copy+delete, there are two such moments. First, when doing the rename from temp name to the final target name (the final target could change/be created and the rename will just overwrite it); second, when deleting the old file (the file could change and still be deleted). The risk exists but it should not be any more probable with this PR than what it is already now. Or perhaps I've overlooked something?

@gdt
Copy link
Collaborator

gdt commented Sep 12, 2021

I had a reaction to an earlier comment that it was pointing out that unison is not performing atomic transactions, and that the underlying filesystems don't support transactions. Still, a fair comment that the risk shouldn't be made much higher.

@acolomb
Copy link

acolomb commented Sep 12, 2021

Yes, there is the risk that the target file was modified before unison had a chance to replace it with the temporary file. That would lose this late, unexpected modification on the target file. But the temporary file is relatively safe from unnoticed modification, so there is a stronger probability that at least the state which unison regards as the final target state is reached. Same goes for deletion, where the intended state from unison's view is the file being gone, with only a very slim chance of deleting unnoticed intermediate modifications.

So my point was about the "source thing" which unison will move to the target file name having more or less risk of being different from the intended state. I agree with you that the other risks (overwriting or deleting last minute modifications on the target) are not affected and cannot be completely eliminated on a wide range of file system types.

@acolomb
Copy link

acolomb commented Sep 12, 2021

Just a quick idea how to keep the risk as low as possible. When doing your rename shortcut:

  1. Verify the source (as is done always before deleting).
  2. Rename it to the temporary name (giving the "protection" of unison's prefix namespace).
  3. Verify it again under the temporary name (should be quick if unchanged because the file was just read and buffered).
  4. Verify the target has not changed (as is done always).
  5. Unlink the target path.
  6. Rename temporary to target path.

Compared to making a temporary copy, this is still more efficient on I/O, especially if inode numbers can be used for checking. One corner case to consider is what happens when a temporary file already exists under that name. And of course if the second rename fails, you'd lose both files. But at least the temporary file would stay there to speed up a second attempt which then consists of only the creation part.

What do you think?

@gdt gdt changed the title Detect and propagate renames/moves instead of delete+create BREAK: Detect and propagate renames/moves instead of delete+create Nov 20, 2021
@gdt gdt added the BREAK (PR) Wire or Archive protocol break label Dec 23, 2021
@tleedjarv tleedjarv marked this pull request as draft January 30, 2022 13:55
@gdt gdt changed the title BREAK: Detect and propagate renames/moves instead of delete+create RFC: Detect and propagate renames/moves instead of delete+create Feb 17, 2022
@gdt gdt changed the title RFC: Detect and propagate renames/moves instead of delete+create Detect and propagate renames/moves instead of delete+create Feb 17, 2022
@gdt gdt added the DRAFT PR is not ready to merge label Feb 17, 2022
@tleedjarv
Copy link
Contributor Author

tleedjarv commented Apr 28, 2023

With 2.53.3 released, I'm going to work on rebasing this PR and making it backwards compatible.

Let's continue the discussion, @acolomb, if you're interested.

  1. Verify the source (as is done always before deleting).
  2. Rename it to the temporary name (giving the "protection" of unison's prefix namespace).
  3. Verify it again under the temporary name (should be quick if unchanged because the file was just read and buffered).
  4. Verify the target has not changed (as is done always).
  5. Unlink the target path.
  6. Rename temporary to target path.

If I understand correctly, the goal of this procedure is to guarantee the result of step 1 until step 6 is executed. I want to go through it to understand the reasoning.

First, to get the obvious out of the way, the inherent race moments are between steps 1 and 2 and between steps 4 and 5, and even between steps 5 and 6.

One immediate problem with this procedure is that it creates a need for a real rollback and this is a big red flag in my view. Unison currently does not require a rollback in any situation. It basically operates a 2-phase commit with "rollback" meaning that the garbage is just left there to be resumed or collected later; the real target is unaffected. With the proposed procedure, any of the steps 3, 4, 5 failing would require step 2 to be reverted. This is a complete non-starter for me. The revert itself may fail for any number of reasons, not least that there may now be a new file/dir under the original name.

What problem is this procedure supposed to solve? Let's walk through a few scenarios, assuming no temporary rename as proposed above. So, think of the procedure with steps 1, 4, 5, 6 only.

A. Plain file renamed to a new name; nothing previously existed under that name in the target replica.

The target does not exist, so verification is as quick as checking that it still doesn't exist. Even if the source has changed due to the inherent race moments (remember, we're unaware of this change), that would not be a problem. This would simply be picked up as normal file update at the next sync, just this time under the new name. Even if this is not what the user wanted, no data has been lost.

With the procedure proposed above, data could be lost, because due to step 2 we no longer have it under the original name and in case we need to roll back, we may not be able to do so.

B. Plain file rename to a new name; something else was created under that name in the target replica after update detection.

The target verification is still as quick as checking that a file doesn't exist. This time it fails, and that's the end of story. No changes were made on disk and that's it. With the proposed procedure above, there would have been a change on disk which needs to be rolled back and could fail, just like in scenario A.

C. Plain file rename to a new name; something else exists under that name in the target replica already at update detection time.

In this case the user has knowingly requested the "something else" under the new name to be overwritten. This time, the target verification (step 4) may take a very long time, so the risk of source changing increases considerably. However, if either step 4 or 5 fails, nothing has changed on the disk and the whole procedure can just stop. If steps 4 and 5 don't fail but the source has changed (we're unaware of this), we're back at reasoning of scenario A because the user has explicitly requested the target to be overwritten.

There are many-many more scenarios but I'm going to stop here for the moment.

Let's imagine the proposed procedure was changed slightly:

  1. Verify the source (as is done always before deleting).
  2. Verify the target has not changed (as is done always).
  3. Verify the source again (may be quick if unchanged because the file was just read and buffered).
  4. Unlink the target path.
  5. Rename source to target path.

The main improvement over the first proposal is that it does not require any rollback. The weakness is that it increases the risk of the source changing during step 2. But that weakness is not really a weakness as we saw with scenarios above. Another weakness is that it increases the risk of the target changing during step 3. This is real weakness that can't be ignored. And we're back to square one: if step 3 is there only due to an pseudo-weakness but is itself a major weakness, then we just eliminate it. Now we have:

  1. Verify the source (as is done always before deleting).
  2. Verify the target has not changed (as is done always).
  3. Unlink the target path.
  4. Rename source to target path.

Edit: fixed typos.

tleedjarv and others added 3 commits May 3, 2023 10:25
The primary intention of this change is to have more information
available about the previous archive in the updateItem. As a bonus,
this means less allocations and less memory consumption. This gain is
unfortunately lost in network transfer.

During network transfer, 0-8 bytes are added for each file change and up
to [file name length + ArchiveFile length (in range of roughly 57 to 137
bytes but can be considerably more)] bytes are added per each child,
recursively, for a directory deletion or with any change to any child
within the directory. When received over a network, in memory this data
structure requires for each child [name length + 202 to 282, and may be
considerably more] bytes. For example, deleting a largish directory of
100 000 files (regardless of depth of hierarchy) may easily add 10 MB to
the network transfer and require additional 25-35 MB of memory on the
receiving end. (Note that these sizes are not fixed, are counted _very_
roughly here, and can change from version to version.)
A rename or a move is currently seen as a delete on the old path and a
create on the new path. This means that propagating what was from
users's perspective a simple rename can require copying gigabytes of
data for big files and directories (this is in best case scenario when
local copy shortcut is used instead of transmitting those gigabytes over
a network).

Add new functionality that enables detecting renames and moves, and
propagating them without copying any data. If this is not possible (due
to conflicts, errors or user actions) then it falls back to copying, as
before. Add a new user preference to control this (defaults to "off").

This is just a shortcut (akin to copying locally instead of transmitting
over a network).

Renames/moves are detected for files and directories, and only if the
contents have not changed (for directories, "contents" means the names
and contents of all its children, recursively).
@tleedjarv
Copy link
Contributor Author

I've finished the rebase and made the change backwards compatible (it won't break earlier versions, that is; the functionality itself requires the new version on both machines). I haven't done too much testing this time around but I added a testsuite with some primitive tests.

While doing the rebase I discovered that the way props were propagated was not optimal (one could perhaps even argue it was incorrect). I tried improving it... and went down a rabbithole. For now, when moving a directory, props are only set on the moved directory itself only and not on any of the children. If there were any props changes (in any replica) that were not propagated together with the move then these will show up at the next sync. This does not seem ideal and it requires a discussion about how to go about this.

Another issue I discovered is with the first commit (changing the prevState type). While it is true that it reduces allocations and memory usage, it's true only on the local machine. Transfer this data over the network and suddenly you've increased the data size both in transfer and in memory, possibly considerably so. This is in theory, I haven't done any practical measurments yet. This is something that may end up being too much of an overhead and require complete rethinking.

The last thing missing (and maybe not needed) is an addition to user manual, explaining the semantics and caveats of this functionality.

@gdt gdt removed the BREAK (PR) Wire or Archive protocol break label May 4, 2023
@acolomb
Copy link

acolomb commented May 9, 2023

One immediate problem with this procedure is that it creates a need for a real rollback and this is a big red flag in my view. Unison currently does not require a rollback in any situation. It basically operates a 2-phase commit with "rollback" meaning that the garbage is just left there to be resumed or collected later; the real target is unaffected. With the proposed procedure, any of the steps 3, 4, 5 failing would require step 2 to be reverted.

Just picking up this statement, for lack of a complete picture about the situation yet... I'm not sure we would actually need this rollback function. If step 2 (mv src -> tmp) was carried out, we do still have the source file contents available for picking up during later runs. And the deletion of the old path (part of the user-requested action) did in fact complete. We just don't guarantee atomicity in a sense that the deletion is carried out without its matching creation. But that's no different to the current behavior, if the creation part fails for some reason. A re-run will fix it, picking up and using the leftover "garbage" (tmp file) from the previous run.

Isn't the "rollback" / corrective action for a failed rename always to just transfer it again from the origin replica, under the desired final path? I wouldn't consider a failed rename to mean lost data, as long as the database correctly marks the file so that it doesn't get picked up as another deletion on the next run.

@fuzzy7k
Copy link

fuzzy7k commented Dec 10, 2023

Hi I love what you are doing here. This is a much needed performance improvement. I am just reading the limitations, wondering why it would be impossible to propagate a renamed folder with renamed or updated contents.

But when doing a rename and then changing even one byte in a file, or renaming children in a directory or adding a file in the renamed directory, or creating a new directory and moving files into it, then it becomes impossible to detect and/or propagate such changes as a rename.

It would seem to me that if a folder has the same inode or reference number, then it would be safe to treat that as a rename, followed by an update.

Impossible seems like an extreme word to use. Though, I don't doubt that you have a valid reason for stating that. I'm just trying to understand this limitation. I will soon be checking out your source to test this out.

@tleedjarv
Copy link
Contributor Author

Impossible seems like an extreme word to use. Though, I don't doubt that you have a valid reason for stating that. I'm just trying to understand this limitation.

This was based on some design assumptions that I have had to revise since. I haven't yet changed to code to detect more scenarios as renames/moves but it should be possible in future.

As a general remark - to explain why sometimes the simplest solutions are not sufficient - the design is restricted by having to be rather generic. We need to support a single replica spanning multiple file systems, different file system types and different platforms, not all of which provide usable or consistent inode numbers, for example. Even if you never sync with Windows, maybe you do sync a FAT-formatted USB drive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DRAFT PR is not ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add mechanisms to replace "copy+delete" with "move", for efficiency
4 participants