Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: initial proposal for OCI artifact registry #48

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rchincha
Copy link

Partially addresses kubeflow/community#682

Description

How Has This Been Tested?

Merge criteria:

  • The commits and have meaningful messages; the author will squash them after approval or will ask to merge with squash.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link

google-cla bot commented Mar 20, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rchincha
Once this PR has been reviewed and has the lgtm label, please assign zijianjoy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Partially addresses kubeflow/community#682

Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
@rchincha
Copy link
Author

cc: @rareddy

@dhirajsb
Copy link
Contributor

@rchincha thanks for this proposal. There is certainly a lot of interest around OCI registry for ML model data. One of the ideas being discussed within the model registry team is to add support in some fashion for an OCI registry as a default store for artifacts.
Since OCI registry has it's own data and metadata store, how do you see it being integrated into the current kubeflow model registry design? Or, is this proposal for replacing the current mlmd based implementation with an OCI registry?

@rchincha
Copy link
Author

rchincha commented Mar 22, 2024

@dhirajsb This proposal is aligned with https://github.com/kubeflow/community/pull/682/files#diff-aaf54745ecb36016135c83a5a41a03025574ecb492aec56ef6d2c7c902abfe17R180

Basically, why invent a new piece of machinery when a required infra piece like the container registry is evolving to be more general-purpose. Furthermore, chances are no need to worry about support/maintenance since standards-based and many implementations likely available.

From what I know and am learning, it appears mlmd client needs to "learn" (somehow) to use the registry as the datastore.

kserve changes will likely look like this: kserve/kserve#3539 (wip, contract-only, needs fleshing out)

@rchincha
Copy link
Author

Cross-posting here ...

https://kccnceu2024.sched.com/event/1YeLi
^ This idea is spreading around I suppose ... https://github.com/kubecon EU 2024

@dhirajsb
Copy link
Contributor

dhirajsb commented Mar 25, 2024

container registry is evolving to be more general-purpose

Although, container registry is becoming more general purpose, it'll still be a single type of storage service for storing model data. Which, excludes other data stores like S3, and other data sources like files, DBs, etc. for training data sources/features.

mlmd client needs to "learn" (somehow) to use the registry as the datastore.

The mlmd based approach is to not store the data in the model registry, but to store references (links) to the data in external store. Basically, we decouple the storage of the data and references to data. So, there is not much learning involved there.
The primary purpose of an ML metadata registry is not just to store references to the model data though. It's primary purpose is to store information about anything and everything related to the development, evolution, lineage, and even usage of the ML model. This includes artifacts, as well as actions (executions, and events) involved in its lifecycle.
It's meant to store metadata about the training data used and its history, information about the notebooks/sources used to create ML models and its history, ML model data produced and its history as versions of the model, deployments as inference service and their history in different environments, links to metrics/performance measurements and their history, etc. to put everything together to create lineage graphs that can be traversed back and forth in relationships and in time to show a single pane of glass style view of the entire ML model lifecycle and history for all user personas.

@rchincha
Copy link
Author

Will try to give a presentation on next Kubeflow registry meeting on Apr 1.
Best to give a demo on current dist-spec v1.1.0 capabilities and then field follow-up questions.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tarilabs
Copy link
Member

/lifecycle frozen

likely in some 1.10 roadmap to integrate some default, ootb storage complementary solution for [this] model registry

Copy link

@tarilabs: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

likely in some 1.10 roadmap to integrate some default, ootb storage complementary solution for [this] model registry

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tarilabs
Copy link
Member

/remove-label lifecycle/stale
likely in some 1.10 roadmap to integrate some default, ootb storage complementary solution for [this] model registry

Copy link

@tarilabs: The label(s) /remove-label lifecycle/stale cannot be applied. These labels are supported: tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/remove-label lifecycle/stale
likely in some 1.10 roadmap to integrate some default, ootb storage complementary solution for [this] model registry

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tarilabs
Copy link
Member

/remove-label lifecycle/stale
likely in some 1.10 roadmap to integrate some default, ootb storage complementary solution for [this] model registry

Copy link

@tarilabs: The label(s) /remove-label lifecycle/stale cannot be applied. These labels are supported: tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, lifecycle/needs-triage. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/remove-label lifecycle/stale
likely in some 1.10 roadmap to integrate some default, ootb storage complementary solution for [this] model registry

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants