Prompt or not to prompt? #5927

dmpetrov · 2019-09-14T01:12:30Z

dmpetrov
Sep 14, 2019
Maintainer

EDITED 9/15/19

There are a couple of discussions around prompting for user input from CLI. This issue-question tries to summarize the rules around prompting.

Now, DVC asks y/n questions from the command line. We have to introduce --yes option (which is not a problem) and users should remember to use this option from their scripts all the time otherwise script stuck. This difference (script experience versus user experience) is an important issue for Unix tools.

Traditional Unix-tools do not ask questions (and do not break the scripting experience) which leads to overwriting files or aborting commands. There is an opinion that DVC should behave the same way. The backside of this - it might lead to data loss.

Examples:

[Overwriting] cp, tar - will overwrite files with
[Aborting] git checkout - fails if there is a conflict
[Prompt] there are some exceptions with prompt: unzip, apt-get. However, zip is not a traditional Unix tool, it was ported from DOS/Windows and follows Windows rules.

First, we need to decide if prompting is okay. Second, what are the set of rules for DVC for prompting, overwriting and aborting command?

The first rules summarization from @shcheklein #2497 (comment)

This is my take on abort vs prompt:

They both give overhead when you use them in scripts. You have to use some CLI argument to force.

If script is not written right (without set -eux) abort is less safer choice.

If script is written right, prompt is more annoying since script hangs.

The biggest difference for us in terms of workflow and usability comes from some long-running operations (as opposed to Git which is very fast in general). In our case it might be the case that we calculate status and/or md5 before even showing the prompt. Abort does not give a chance to continue - you have to run it second time.

Let's discuss the rules.

ghost · 2019-09-18T17:33:32Z

ghost
Sep 18, 2019

We can use conventions as guidelines, but in the end, it is important that we consider the usefulness of prompting in our own terms.

Also, there's no consistency on how to skip those questions (--force vs --yes vs --overwrite-dvcfile).

A quick grep for prompt showed that we ask for confirmation on the following methods:

safe_remove: "File is going to be removed. Are you sure you want to proceed? [y/n]
_collect_used_dir_cache: "Missing cache for directory. Cache for files inside will be lost. Would you like to continue? [y/n/]"
gc: "This command ... (is dangerous). Are you sure you want to proceed? [y/n]"
_is_outs_only: "Are you sure you want to remove dvcfile with its outputs? [y/n]" (on dvc remove --purge)
destroy: "This will destroy all info about pipelines... yada yada. Are you sure you want to continue? [y/n]
check_can_commit: "dependencies changed. Are you sure you commit it? [y/n]" (:warning: there's a typo here)
stage.create: "dvcfile already exists. Do you wish to run the command and overwrite it? [y/n]"
stage.reproduce: "Going to reproduce dvcfile. Are you sure you want to continue? [y/n]" (only if --interactive)

From those prompts, I can point out the following:

All the prompts are "guarding" the user from potential data loss
Some prompts are on commands that are destructive by itself (I mean, don't expect dvc destroy or dvc remove to be safe)
I'm not sure if the user can predict when the command will result in some data loss
- At least for me, it is still not that straight forward to think that dvc run -o foo will remove foo
- Some of those prompts happen in specific cases (the ones if private methods)
- What is data loss? Sometimes, files are in the cache but there's no reference (dvcfile) to them, is this data loss?

With that being said:

If we are going to remove the prompts, we should make the commands more predictable (better documentation, different naming, different levels of verbosity, etc.)
- Tools were you can predict the behavior are safer than the ones that you can't (at least, you develop trust)
I like dvc's flexibility, but I can't imagine "designing" a tool for a lot of purposes and environments.
It shouldn't take us a lot to go through all the commands and discuss them.
It would help to identify common use cases for it (scripts, files sizes, workflows, etc.)

0 replies

shcheklein · 2019-09-18T17:51:42Z

shcheklein
Sep 18, 2019
Maintainer

@dmpetrov @MrOutis nice summaries, thanks!!

Just a few comments from the top of my head:

and do not break the scripting experience

To my mind and in my experience aborting is very similar to prompting in terms of scripts. You have to take that into account and remember to provide an option. Aborting can be even more dangerous because people tend to forget to put set -uex and script goes even if there was a problem in the middle.

there are some exceptions

heroku CLI is another example, even if you specify explicitly that you want to do something it will explain what is going on ask you to put a name of your project (repeating their Web UI) before destroying something for you.

0 replies

dashohoxha · 2019-09-19T12:18:44Z

dashohoxha
Sep 19, 2019

Now, DVC asks y/n questions from the command line. We have to introduce --yes option (which is not a problem) and users should remember to use this option from their scripts all the time otherwise script stuck.

For me it is not a problem if a command asks for confirmation (or is interactive), as long as this can be overridden by a --yes or --force option.

For example the command apt install in debian/ubuntu. If it has to download lots of files it asks you for confirmation (showing the list of packages that is going to be installed). But in a script you can called it like apt install --yes ....

It shouldn't take us a lot to go through all the commands and discuss them.

I also think that discussing the commands one by one is better than formulating some general (abstract) principles/rules and trying to follow them strictly.

0 replies

dmpetrov · 2019-09-26T20:50:45Z

dmpetrov
Sep 26, 2019
Maintainer Author

@dashohoxha I agree that we need to discuss commands one by one to make actionable decisions. However, we need to formulate basic principles to simplify the decision-making process. It should help to each of the team members to make and explain these UX decisions without any holy wars.

Data file importance

The biggest issue related to prompting is data loss. Importance of data files is different. In a high level, we can separate two data types by the importance:

critical data or data sources which DVC cannot recover. In some cases, data might be completely unrecoverable for users (no others copies).
not critical data or data derivatives/outputs which can be reproduced by DVC. But it might require some time and resources from a user.

In general, it is better not to lose any of the data types. However, it is okay to remove not critical data (data derivatives\outputs) in order to avoid prompting or simplifying UX (not to introduce additional options). Failing is always an option of keeping data.

Examples

DVC aligns a lot to Git and we expect users to have Git experience. It will be a mistake not to reuse the knowledge and behavior patterns that users have built with Git. In general, we should use Git UX as a guiding star when we make decisions about UX. Every time we don't agree on something we should ask ourselves a simple question - how it should look like if it becomes a part of a Git. If there is no clear answer, I'd recommend finding analogies with classical Unix commands such as ls, rm etc which most of our users should be familiar with.

Based on my understanding, Git avoids prompting as much as possible (see discussion in #2497). In some exceptional cases (merge, rebase), dialogs are needed. These exceptions are not just prompted questions, they are full dialogs with multiple questions.

0 replies

dmpetrov · 2019-09-26T20:53:09Z

dmpetrov
Sep 26, 2019
Maintainer Author

@MrOutis thank you for the list of methods when DVC ask questions. Is it possible to find DVC commands and scenarios/cases when it is happening?

0 replies

dashohoxha · 2019-09-27T09:19:31Z

dashohoxha
Sep 27, 2019

@dmpetrov The principles that you lay out are simple and reasonable, I completely agree with them.

It should help to each of the team members to make and explain these UX decisions without any holy wars.

My opinion is that developers (in general) are a little bit biased towards the software that they are building, and it might be difficult for them to look at the software from the point of view of a user that knows nothing about implementation. Once you know all the implementation details you can't become a novice again.

So, it might be useful if someone is selected as a UX expert, who decides about the details of user interface and user interaction. The others should try to convince him about their point of view, but at the end it is his decision that prevails. This would avoid any holy wars and would bring consistency.

0 replies

ghost · 2019-09-27T22:45:43Z

ghost
Sep 27, 2019

yes, @dmpetrov ! I can come up with some scenarios, I'll try to do it soon

0 replies

ghost · 2019-10-01T18:56:36Z

ghost
Oct 1, 2019

@dmpetrov , it is taking a while 😅 !

⚠️ this is a work in progress ⚠️

checkout a file without cache:

dvc init --no-scm
echo 'foo' > foo
dvc add foo
rm -rf .dvc/cache/d3/b07384d113edec49eaa6238ad5ff00
dvc checkout

# WARNING: Cache 'd3b07384d113edec49eaa6238ad5ff00' not found. File 'foo' won't be created.
#   0%|          |Checkout                                                                                                 0/1 [00:00<?,     ?file/s]
# file 'foo' is going to be removed. Are you sure you want to proceed? [y/n]

remove "redundant files" during a directory checkout:

dvc init --no-scm
mkdir directory
echo 'foo' > directory/foo
dvc add directory
echo 'bar' > directory/bar
dvc checkout

# 0%|          |Checkout                                                                                                 0/1 [00:00<?,     ?file/sfile 'directory/bar' is going to be removed. Are you sure you want to proceed? [y/n]

checkout a file that changed:

dvc init --no-scm
echo 'foo' > foo
dvc add foo
echo 'bar' > foo
dvc checkout
#   0%|          |Checkout                                                                                                 0/1 [00:00<?,     ?file/s]
# file 'foo' is going to be removed. Are you sure you want to proceed? [y/n]

0 replies

jorgeorpinel · 2019-10-15T15:44:05Z

jorgeorpinel
Oct 15, 2019

it is still not that straight forward to think that dvc run -o foo will remove foo

Agree! Vote for prompt in this case.

0 replies

dmpetrov · 2019-10-15T16:27:49Z

dmpetrov
Oct 15, 2019
Maintainer Author

Thank you @MrOutis for funding these scenarios!

Let's go one by one.

checkout a file without cache

Is there are cases when it makes sense to checkout data files without checking out a particular one (answer Yes case)? It seems like in most of the cases this is going to happen when a user forgets to run dvc pull\fetch - it is fine to fail in this case (wee can also show a proper message like Data file 'foo' is missing in cache. Please run 'dvc fetch' first.). Also, I can imagine when dvc fetch won't give you the data file (corrupted cache and remote) - in this "debug" mode partial checkout can be done by *.dvc files. So, in both of cases, failing seems like a better option.

I propose to fail instead of prompting

remove "redundant files" during a directory checkout:

It seems like there are no conflicts to fail or prompt questions. DVC shouldn't remove files that are not committed. A similar Git example:

git init
echo foo > foo
git add foo
git commit -m 'foo commit'
echo bar > bar
git checkout ffb38d8 # the current commit

No changes, no errors. bar is in its place. I'd expect the same behavior from DVC.

I propose to avoid prompting and outputting any error messages or warning

checkout a file that changed

It looks the same as the previous case (even though the implementation might be different). So, checkout should not raise any errors or prompts or even error messages.

I propose to avoid prompting and outputting any error messages or warning

Please take a look at my proposals. If there are no concerns, I'll create an issue.

@MrOutis are these three the only prompts that DVC has or we can find more?

0 replies

dmpetrov · 2019-10-15T16:38:35Z

dmpetrov
Oct 15, 2019
Maintainer Author

A few more things...

It is okay to remove data without prompting if user runs destructive commands like destroy and remove.

Regarding dvc run. It might be not so important as the other commands (run is just a wrapper). But I'd suggest removing outputs with no prompting because it removes data derivatives, not data sources.

0 replies

jorgeorpinel · 2019-10-15T16:56:28Z

jorgeorpinel
Oct 15, 2019

dvc run... removes data derivatives, not data sources

I thought if you have some files manually added to a directory and then use that dir as -o for run, it will remove your files with no prompt. These could be some other arbitrary files you needed and have no way of recovering.

0 replies

dmpetrov · 2019-10-16T04:44:16Z

dmpetrov
Oct 16, 2019
Maintainer Author

if you have some files manually added to a directory and then use that dir as -o for run, it will remove your files with no prompt.

Good point @jorgeorpinel. It would be great to keep the information about data sources (like source: true in dvc-file) and raise an error if run tries to remove it. But even in this case, I'd prefer to fail and ask a user to remove the data source manually first (through dvc remove added-source-file.dvc).

How do you @iterative/engineering think, is it a good idea to introduce a new 'source: true' parameter in DVC file specifically for the cases not to removing them automatically. I'm wondering, can this information be derived from the dependency graph (and how reliable this information would be in this case)?

0 replies

dmpetrov · 2019-11-16T20:02:03Z

dmpetrov
Nov 16, 2019
Maintainer Author

It looks like we got all the pros and cons of this issue. It makes sense to get rid of the prompting when it is possible. I created 3 issues for all the prompt cases that we've found except dvc run.

dvc run is a special case. From one point of view, it generates data derivatives and it is fine to remove them without asking questions. On another hand - there are some cases when data sources can be removed. Let's keep it as is and return to this question later if it arises.

I'm keeping checkboxes. We can close the current issue when all the checkboxes will be checked:

0 replies

jorgeorpinel · 2019-11-18T01:00:38Z

jorgeorpinel
Nov 18, 2019

We can close the current issue when all the checkboxes will be checked:

Another option is to simply close this issue now. 🙂

0 replies

dmpetrov · 2019-11-19T06:14:58Z

dmpetrov
Nov 19, 2019
Maintainer Author

@jorgeorpinel we can lose all the details if we close it :)

0 replies

dmpetrov · 2019-11-19T06:25:09Z

dmpetrov
Nov 19, 2019
Maintainer Author

@shcheklein brought a good point that this issue is not only about prompting or not-prompting - it is about changing the checkout logic behavior in the first place.

So, we need to make dvc checkout logic align with git checkout logic and fail or proceed as Git does. There are two possible cases for checkout conflicts and none of these requires prompting user input:

Keep the modified files in the workspace without failing when "checkouting" file is the same as changed one.
Fail if an old version of a file needs to be checkout.

More details with Git code examples: #2802 (comment)

All 3 scenarios have to follow this checkout logic.

0 replies

shcheklein · 2019-11-20T06:46:45Z

shcheklein
Nov 20, 2019
Maintainer

Still trying to understand this :) (btw I still don't quite understand how is related to prompt vs not to prompt and why we are discussing this here).

Let's start with a simple case, first:

$ git clone https://github.com/iterative/example-get-started
$ cd example-get-started
$ dvc fetch
$ dvc checkout

What is the effect of running the last command in terms of changes to the workspace?

0 replies

dmpetrov · 2019-11-20T17:36:12Z

dmpetrov
Nov 20, 2019
Maintainer Author

Let's start with a simple case, first:

I'm not sure I understand how it is related to the topic we discuss. Could you please elaborate?

0 replies

shcheklein · 2019-11-20T17:39:46Z

shcheklein
Nov 20, 2019
Maintainer

@dmpetrov we are discussing the changes to the dvc checkout behavior, right? The question is exactly about dvc checkout. I'm trying to understand exactly the new semantics of it. For example, how would it determine if files are new or old in a sense you mentioned. So, I'm going with really simple real scenarios to see outcomes and better understand what do you have in mind.

0 replies

dmpetrov · 2019-11-20T17:51:14Z

dmpetrov
Nov 20, 2019
Maintainer Author

@shcheklein prompts happens in case of conflicts during checkout as you can see from the scenarios that @MrOutis showed. Git in such conflicts avoids prompts and (as we realized) behaves differently. It would be great to align DVC behavior with Git behavior (and avoid prompts as a result).

We found 3 DVC "misbehavior" scenarios (2 are actually related) and I create 3 issues to fix these scenarios.

In your last scenario, I see no checkout conflicts and no prompts. This is why I don't understand how it is related to the issue.

0 replies

shcheklein · 2019-11-20T17:59:59Z

shcheklein
Nov 20, 2019
Maintainer

So, are you saying that checkout will behave differently if data is cached or not? E.g. if the new version is cached then we use the current semantics if not then we don't fail and do not effectively do checkout.

Again, I'm really confused about the change. It would be really beneficial to show some examples and simple scenarios when people do checkouts, modifications, etc.

0 replies

dmpetrov · 2019-11-20T18:07:15Z

dmpetrov
Nov 20, 2019
Maintainer Author

@shcheklein I showed all the examples with Git: #2802 (comment)
We just need to reflect the Git checkout behavior in these 3 scenarios.

0 replies

shcheklein · 2019-11-20T18:12:41Z

shcheklein
Nov 20, 2019
Maintainer

@dmpetrov DVC is not Git by definition. E.g. we use Git with DVC to version DVC-files, it means that Git workflow affects the DVC workflow. DVC is a higher level abstraction in this sense. We can't just reflect even if we want to keep the same philosophy. I'm just saying I don't understand the suggested change (to the DVC semantics) still. And git examples are not enough for me to better understand those. I'm fine if it's only me. Would love to hear what others have to say.

0 replies

dmpetrov · 2019-11-20T18:18:03Z

dmpetrov
Nov 20, 2019
Maintainer Author

@shcheklein you are right but it is not clear how it related to the current issue. It would be great if you can provide an example when Git's failing/proceeding behavior doesn't make sense for DVC in the context of these 3 checkout scenarios.

0 replies

skshetry · 2021-05-27T07:09:36Z

skshetry
May 27, 2021
Maintainer

Reviving the discussion for the UI change here. I agree with @MrOutis on #5927 (comment).

I'd propose that DVC, being a modern tool, it should not be shy about prompting users:

in critical/dangerous places that might result in a data loss (eg: gc/destroy),
or that might waste a lot of time of users (eg: gc takes an hour to collect the cache),
or, that the operation cannot move ahead without a piece of small information (eg: password prompts).

I think that it's okay to poke the user in those cases, given that:

we don't show prompts on obvious commands (eg: asking users to commit in dvc commit is pointless).
we provide a way to bypass the prompts from flags (for the use in scripts), and
we don't run it if it's not a tty (heroku does this and it's annoying and it should just abort).

1 reply

rogermparent May 27, 2021

we provide a way to bypass the prompts from flags (for the use in scripts)

This is also particularly important not only for scripts, but full-blown external applications that run DVC CLI in the background. In these cases, the application would run everything with --yes/--force and maybe use its own UI to replicate the prompt where it makes sense.

casperdcl · 2021-08-10T17:23:44Z

casperdcl
Aug 10, 2021

If DVC should be consistent with another CLI tool, I think Git would be a good reference.

The default is to not prompt
Option flags are provided to be interactive or to dry run
- git rebase --interactive, git {checkout,reset} --patch (analogous to dvc checkout)
- git clean --dry-run (analogous to dvc {exp} gc, dvc remove)
If the operation is dangerous AND the working state makes the operation ambiguous AND no --force/resolution method flags are given, the command does nothing and fails with a warning

Basically, NEVER PROMPT (#5927 (comment)) unless user specifically requested to be in "interactive mode."

This is good user workflow practice and also good for devs to make sure APIs stay clean and script-friendly.

If you disagree, then we'd have to have a stable widely-used interactive mode BEFORE we consider making it the default.

0 replies

Prompt or not to prompt? #5927

dmpetrov Sep 14, 2019 Maintainer

Replies: 27 comments · 1 reply

shcheklein Sep 18, 2019 Maintainer

dmpetrov Sep 26, 2019 Maintainer Author

Data file importance

Examples

dmpetrov Sep 26, 2019 Maintainer Author

dmpetrov Oct 15, 2019 Maintainer Author

dmpetrov Oct 15, 2019 Maintainer Author

dmpetrov Oct 16, 2019 Maintainer Author

dmpetrov Nov 16, 2019 Maintainer Author

dmpetrov Nov 19, 2019 Maintainer Author

dmpetrov Nov 19, 2019 Maintainer Author

shcheklein Nov 20, 2019 Maintainer

dmpetrov Nov 20, 2019 Maintainer Author

shcheklein Nov 20, 2019 Maintainer

dmpetrov Nov 20, 2019 Maintainer Author

shcheklein Nov 20, 2019 Maintainer

dmpetrov Nov 20, 2019 Maintainer Author

shcheklein Nov 20, 2019 Maintainer

dmpetrov Nov 20, 2019 Maintainer Author

skshetry May 27, 2021 Maintainer

dmpetrov
Sep 14, 2019
Maintainer

Replies: 27 comments 1 reply

shcheklein
Sep 18, 2019
Maintainer

dmpetrov
Sep 26, 2019
Maintainer Author

dmpetrov
Sep 26, 2019
Maintainer Author

dmpetrov
Oct 15, 2019
Maintainer Author

dmpetrov
Oct 15, 2019
Maintainer Author

dmpetrov
Oct 16, 2019
Maintainer Author

dmpetrov
Nov 16, 2019
Maintainer Author

dmpetrov
Nov 19, 2019
Maintainer Author

dmpetrov
Nov 19, 2019
Maintainer Author

shcheklein
Nov 20, 2019
Maintainer

dmpetrov
Nov 20, 2019
Maintainer Author

shcheklein
Nov 20, 2019
Maintainer

dmpetrov
Nov 20, 2019
Maintainer Author

shcheklein
Nov 20, 2019
Maintainer

dmpetrov
Nov 20, 2019
Maintainer Author

shcheklein
Nov 20, 2019
Maintainer

dmpetrov
Nov 20, 2019
Maintainer Author

skshetry
May 27, 2021
Maintainer