-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When moving a file, sync on two local replicas does a copy from replica1 rather than from the file in replica2 #472
Comments
I think this is not about ssh but about some notion of local and remote. But if you can find out details from reading the code, please update the issue. |
I may not have been clear enough about what I meant. Continuous sync does work on 2 local replicas. It's just not very efficient because it will delete moved files first and then copy them again. I mention SSH because it works as it should when using SSH. That is, it moves the files in the destination directly, without any deleting/copying. This makes a huge difference when backing up large files/directories, as well as reducing wear on the disks. I'm not sure the current title reflects what I'm trying to express. |
Also I'm not sure it should be tagged What I'm basically asking for is for the command:
(which works, but just is not "smart" about file moves), to work identical to the command:
Which is smart about file moves. The big benefit then, is that users are no longer required to run sshd for no reason at all. |
@davidde Can you write up a repeatable test case that demonstrates the issue (is it just that a simple |
I adjusted the title; I had a hard time understanding your original report, and I understand better now. The requested test case will help a lot for clarity. Also, you are saying "ssh", but there is local, there is unison over a socket, and there is unison over some program that executes remote commands. Presumably rsh would be the same, and we don't know about over the socket. Narrowing down exactly where the behavior happens is helpful for finding the right place in the code. My guess it's going to take 20h total to find out exactly what's wrong, read the code, prepare a patch, test it, and review it. I tend to estimate high when there are a lot of unknowns. In any case, when somebody prepares a PR, it will get looked at, and if not, this won't get fixed, so the label doesn't have any real effect. |
Also, using "smart" to describe moves is not very precise. I understand now that what you mean is that if in one replica there has been "mv A B" then in the other replica in one case unison just executes the rename(2) system call, and in the other it does something different, which is unlink(2) on A and either having copied A to B in that replica or sending B from the first replica. This is different in a practical sense mostly if either replica is on a remote filesystem, but that's important because an sshfs mount can work around not being able to build unison. |
Correct. That's exactly what I mean. So for a test case: # Create replica 1:
mkdir -p replica1/dir1
echo fileA > replica1/fileA
echo fileB > replica1/fileB
echo fileC > replica1/dir1/fileC
# Sync with unison to replica 2:
unison -auto replica1 replica2
# Move file in replica 1:
mv replica1/fileB replica1/dir1/fileB
# Sync again without ssh, optionally with debug:
unison -auto replica1 replica2 -debug copy
# This will delete and copy:
# replica1 replica2
# new file ----> dir1/fileB
# deleted ----> fileB Doing the same with the ssh syntax will move the file in replica2. I believe Let me know if there's anything else I can do to help. |
You're totally right that my original report was not very precise. My apologies for the confusion. Thanks for the clarification about the tagging. I was personally just worried that the |
Thank you for the test case. I have verified that Could you paste the output of your sync run with ssh? |
Sure. Output of
So if I understand correctly, you're saying there is no benefit in using SSH for local files? |
For completeness sake, the output of the first
|
So it looks like it used the remote file to copy based on finding it somehow (sha1 match?), saving the trip across the network connection. Given all that, I don't see that you are getting anything from using ssh. Am I missing something? |
It seems to work exactly like this. As you can see from the debug output, |
@gdt I don't think the output is very readable, but indeed, after @tleedjarv findings, it doesn't look like there is any benefit in using ssh. Note this was a tip on a user forum when I was searching for how to prevent excessive copying. I have not tested it more elaborately on a larger sync, because I wasn't sure I was going to use it in this way if I'm forced to keep sshd running. I guess that dilemma is off the table now ;) The @tleedjarv Thank you for verifying this! Thank you both for the prompt reply and clarifications. |
Wait, wait, people. I just noticed that in the first (non ssh) output, I guess it would still be best to do a more elaborate test with larger files, and see if there is any difference in execution time. @gdt, @tleedjarv What do you think? |
I think unison is already complicated and that trying to optimize runtime for moved files by trying to guess which source is faster (replica1, replica2) when both are local, is not a wise use of developer time, to be blunt. There are many more problems to solve that will make a far bigger deal to users, like the pinned issues -- but obviously that's my opinion because I pinned them. An optimization for an actual move (rename) might be good. But it's pretty low on my personal list sorted by value/effort. Don't let that stop you then - learn ocaml and start hacking! |
I'm in favor of closing this, as I don't see anything to fix, absent a true rename(2) optimization which is something different. it was a useful exercise, as I understand what happens better now. |
I understand the sentiment, but if the above is true, then there is hardly any need for guessing. Unison could simply prefer the files that are already available in the same replica over the files that should be copied over from the other replica. And since this behavior is already the default for unison over ssh, it may be straightforward to just port it over to the local unison variant. I feel this could bring a huge usability improvement, since external storage backups seem to me an ideal use case for unison. Currently, every backup without SSH makes needless copies after moving files, complicating this use case (e.g. media center backup). This may fix that! Could you keep the issue open until next week? I'll try to verify whether the above is the case with a more elaborate test over the weekend. |
I really don't understand. First, I don't do "mv" of huge files very often. Second, if you mount an external disk on /mnt and then use unison, and you did a rename, I do not understand why it matters if the data source for the new name in the external disk replica is the pre-renamed file in the external disk or the computer's internal disk. Arguably, using the computer's copy is better because that disk is likely faster, between being more likely to be SSD and better bus attachment. Making this use rename(2) would be better, but you've more or less said this isn't about that. I also don't understand "huge usability" improvement. Do you have an actual use case where the current behavior is making the sync slow enough to notice, and where doing the copy from the external disk would be faster? For me, unison is almost always fast, except when I add a few GB of data to a replica and sync it to some machine far away across the net, and then it runs at the speed of sending that new data. I don't mind leaving it open a bit while you are actively trying to explain and/or make a repro recipe. It was just that at this point I don't see the problem. |
Thanks for keeping it open for now. I'm not sure yet if it does make a difference, I'm just finding it hard to believe that people on the forum would advise using SSH if they didn't notice it making a difference. Still not impossible I guess. I also think I've seen the same suggestion on Stack Overflow somewhere. But since it's late here, I'll test things out over the weekend and get to the bottom of it then. I'll let you know if it turns out a dud or an actual improvement. |
I echo @gdt's sentiment here. Wouldn't this case be exactly where you don't want a replica-local copy? Assuming spinning disks over USB, you wouldn't want to read and write on the same disk. But I still think it's good to keep the issue open (and renamed) as detecting an actual rename instead of copy+delete may be an interesting feature.
Not that I have verified, but I think you are right in that it most likely is very straightforward. |
Let's leave this issue being about the different behavior for remote vs local. As for using the rename system call instead, that's separate and should be a new issue, if it's in the tracker. I'd prefer to close issues where we've figured things out and just have an issue that crisply and correctly states what's going on, as when looking at open issues, things with lots of discussion that could be summarized take much longer to re-read. |
I had a look to see if the replica-local copy can be enabled even with two local replicas. It is not entirely straightforward, and it is intentionally disabled for local replicas because it would entail additional processing for all files, all the time. I can't think that it would be noticable in practice, though. If there is a good case to be made that this is useful functionality to users, I think it could be implemented with reasonable effort, based on the preliminary look. |
So I've tested things out more elaborately with larger files over different drives, and verified for myself now that there is indeed no benefit whatsoever in using SSH. It seems this suggestion was completely unfounded (that, or it did work this way at some point). This also means there is currently no benefit in porting the SSH behavior to the local variant. However, if the So I am starting to wonder about a possible |
It sounds like there is no problem to solve regarding the copy behavior, so I'm going to close this. Thanks for doing the tests and figuring out what's going on. We are really trying to keep the tracker being for bug reports and concise feature requests. Thus, I decline to engage in anything other than those two categories within github. If you would like to discuss the issues around extending the codebase to use rename -- and that sounds like a useful improvement if someone writes the code -- please join the hackers list and I'll be happy to comment there. There is a wiki page that explains the lists at https://github.com/bcpierce00/unison/wiki/Mailing-Lists |
I didn't see this topic in the mailing list, so just for the record here, I think this is going to be a very complex task. I don't claim to know too much about this functionality but it is some of the most complex code in Unison. |
This is the feature request: #23 |
I'm mostly drawn to unison because of its wonderful feature to keep 2 directories in sync without excessively copying and deleting. In other words, because it can be smart about file moves.
After some testing, it appears this smart behavior is only available when I specify at least 1 remote replica, even for syncing 2 local directories. For example:
Obviously, this requires an SSH daemon to be running, which is not really desirable if you don't need it. It is also kind of obscure to track down why the "smart moving" feature doesn't work by default, and not very intuitive for new users.
I don't know much about OCaml, or how this is implemented internally, so there may be a valid reason for this behavior. But if there is no specific blocker, wouldn't it make more sense to drop the SSH requirement?
The text was updated successfully, but these errors were encountered: