Replies: 4 comments 5 replies
-
I guess, in the background, there always has been this idea that a lot of different use cases could be covered by DVC. Not only ML use
Sorry, swarm review is something new for me, tried watching the video on perforce but I do not understand what are its advantages over "usual" review offered, for example by Github. Also - are we talking here about code review or also other assets reviews? Like would you need the ability to review changes of, for example, images?
Since DVC leverages git for project code and its structure versioning, all the tools created for git should be working fine with it. For example Gitlab/Github. But that regards to code. I guess that you would also probably need to control for data? Ex -
Are we talking about git-like hooks? They should be working just fine with DVC, if used with git repo. I think this is a great idea, the question is whether we can incorporate those requirements to our plans of development but I think this requires some discussion. |
Beta Was this translation helpful? Give feedback.
-
Hmmm... tl;dr I came to the conclusion that having a good API (I didn't dive into this topic yet in DVC) to ease implementing the back-end, supporting locking files as an option from the client-side, and standardizing the HTTP path to request a list of currently locked files (details below) should be sufficient. I've been reading a little more about the state of DVC. 3. seems to be possible with a combination of git hooks and git custom commands (we ran a few tests, but until we deploy it in a greenfield project nothing can be said for 100%). User-Level permissions; this could be solved by first hitting a HTTP server with custom logic with your HTTP "dvc remote modify myremote url https://example.com/path." But there are also client-side issues. How does the DVC client react to access denial? Can it retrieve information like git with "remote: "? Because the server could just provide (or deny) different data to different users based on what they are allowed to access, the client would just need to act accordingly. Making as much of the DVC operations available from some-kind of API (which you currently have), would be indispensable to make it work seamlessly and for a long time (especially after adding #829 ). Without this API, one would have to develop a backend from scratch for DVC repositories. (Example: client-DVC makes a pull request to an intermediate HTTP server for the content folder — which holds data for programmers, writers, and artists — this server processes the request, makes an API request for the list of files to send to the on-server DVC instance, finds out that he has access to art-related resources, filters out this list of programmers and writers' files, and then sends only those files to a user).
I think we focused too much on the perforce reviewing system (gitlab has similar functionality in this regard). If you knew about it, sorry, but I'll clarify by pasting in a very succinct comment about what file locking is.
As stated above, file locking makes changes sequential and saves a TON of time overall. It's a must for game-dev, where six people may be working on one asset (level, character, etc). DVC would need to support (even only as an option) file-locking on the client side (which boils down to making everything that the user specifies as non-mergable, for example with *.jpg mask read-only and giving write access after successfully locking a file), and make an effort to standardize a folder on FTP repos and URI path on HTTP to access server-side information about what files are locked. We don't expect the DVC core team to maintain a server-implementation. The standardization would save the DVC ecosystem from a free-for-all with dozens of incompatible implementations of server-side locking. It would also make it possible for the DVC team to make changes underneath without everything suddenly breaking for customers. I've seen a few different solutions. Gitlab just made their own tool for their premium members before git LFS came, I don't have the time to look into the source code of how it was made, but this is stated on their website.
So they utilised git hooks. For which DVC hooks may be redundant, but as it was a premium feature, I guess it wasn't easy to add this. Git LFS went a different way. They made ".git\lfs\cache\locks\refs\heads\main\verifiable", on the client machine and hard-coded a path to retrieve that information from the server. Gitlab implemented it and stored all the information in a PostgreSQL database. Ultimately, Git LFS on a supporting hosting machine does a GET request from reponame.git/info/lfs/locks and it (server) responds as follows (sample) {"locks":[{"path":"path/from/root/of/repo/test.gif","id":"17","locked_at":"2022-02-14T18:18:54Z","owner":{"name":"Testing"}}]} I honestly think this discussion is going in the right direction, so I would like to continue it until the DVC core team is satisfied. With regards |
Beta Was this translation helpful? Give feedback.
-
DVC by itself does not have a server, aren't we talking here about actually creating GitHub-like UI that would expand DVC capabilities?
As above.
DVC by itself does not react - GIT handles versioning so every problem with GIT repo (ex conflict) will be handled as in the usual git workflow. In conflict example, GitHub will reject the push if the upstream log differs from our change log.
Got it. I guess I am still a bit confused about this functionality and provided GitLab docs helps me understand this a bit. My understanding of it as of now is that in the ideal scenario we have some kind of a server that is capable of handling GIT+DVC repos. This server contains one source of truth regarding which files are locked, and doesn't allow the users to modify a particular asset unless the lock is taken down, even if GIT would allow that in a usual scenario (eg someone adds a new commit, with history matching the target |
Beta Was this translation helpful? Give feedback.
-
@Hisamera Have you tried evaluating GitLab for your needs? It seems that it has a file locking capabilities. |
Beta Was this translation helpful? Give feedback.
-
Greetings
First things first, I want everybody to know what this discussion doesn't entail, it is NOT a "Someone, somewhere, please code this features!!!". I want this discussion to be a start of open-minded cooperation with the help of DVC maintainers and DVC community, to establish how to approach this problem, and progress with implementation of those features. We (Due to contract I cannot disclose, what I mean by "We"), are fully capable of just taking DVC and implementing these features on our backend ourselves, but we would like for it to be mainlined into main, open-source repo, so we won't have to actively maintain it.
We are fully aware that DVC is more geared towards "Machine Learning Projects", but Game Development projects have a lot in common. After taking a look at DVC features, we have pinpointed those three must-have
With good implementation of Hooks, there could be a possibility to solve 1. and 2. but maybe DVC community would have other better ideas. If we switch and try out a green project we would need a DVC in a state that makes it BETTER than perforce is for US currently, otherwise why switch?
We applaud - for example - "Split data into blocks #829", as we think that Data Version System can compress that data far better than a general compression algorithm ever could; with openZFS possibly becoming multi-OS we can compress and dedup data that way.
Game dev makes a lot of artifacts; maybe with #829 a new daily build would not weigh another 10GB, but maybe 100MB? This would allow us to focus more on delivering value for our customers, than on our backend.
We sacrifice today for tommorow, and are firm believers in tools, and even though there are other projects that are more feature-complete for US... Let's just say that we don't like the way they are maintained and/or where they are headed.
We would like for this discussion to end with an Epic (issue) that entails what needs to be done, in what way and that an effort could be made in an orderly and efficient manner to implement those features.
With regards
Undisclosed company.
Beta Was this translation helpful? Give feedback.
All reactions