Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:update model loader #178

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

qinguoyi
Copy link
Member

@qinguoyi qinguoyi commented Sep 28, 2024

What this PR does / why we need it

#163 (comment)

Which issue(s) this PR fixes

None

Special notes for your reviewer

this pr, it mainly includes things:

  1. add allow_patterns and ignore_patterns, and adapted with filename
  2. clean up python code, put some hard code in constant file
  3. when download models, only use snapshot_download, with allow_patterns we can download one or more files.
  4. there is not consider the chunked model weights and the complete weights, it will be push with another pr

Does this PR introduce a user-facing change?

Support specify or ignore multiple file downloads

@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 28, 2024
@kerthcet
Copy link
Member

Will review this ASAP. Thanks for the work.

@kerthcet
Copy link
Member

/kind feature

@InftyAI-Agent InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 29, 2024
Copy link
Member

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also need an e2e test to make sure the pattern works as expected, but it can be a follow up until I publish a official image tag for model-builder.

@@ -49,6 +49,12 @@ type ModelHub struct {
// +kubebuilder:default=main
// +optional
Revision *string `json:"revision,omitempty"`
// AllowPatterns refers to only files matching at least one pattern are downloaded.
// +optional
AllowPatterns *string `json:"allowPatterns,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it a slice because huggingface_hub accepted list.

AllowPatterns *string `json:"allowPatterns,omitempty"`
// IgnorePatterns refers to files matching any of the patterns are not downloaded.
// +optional
IgnorePatterns *string `json:"ignorePatterns,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

llmaz/model_loader/constant.py Show resolved Hide resolved
revision=revision,
).add_done_callback(handle_completion)
)
if filename:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more simple here, if filename, the allow_patterns will be the filename. Actually, In OP is not that accurate, we may have pattern like *.json.

And we should add a validation in the webhook as Once filename is set, the both patterns should be nil.

revision=revision,
).add_done_callback(handle_completion)
)
if filename:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here.

coreapi "github.com/inftyai/llmaz/api/core/v1alpha1"
"github.com/inftyai/llmaz/pkg/util"
corev1 "k8s.io/api/core/v1"
"strings"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure the libs is grouped like this:

important (
    # the base libs golang provided
    fmt
    strings

    # the third part libs
    corev1 "k8s.io/api/core/v1"

    # project libs
    coreapi "github.com/inftyai/llmaz/api/core/v1alpha1"
)

@@ -59,12 +60,28 @@ type ModelSourceProvider interface {

func NewModelSourceProvider(model *coreapi.OpenModel) ModelSourceProvider {
if model.Spec.Source.ModelHub != nil {

// fileName should not in ignorePatterns
if model.Spec.Source.ModelHub.Filename != nil && model.Spec.Source.ModelHub.IgnorePatterns != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we verify this in webhooks? As I mentioned, once filename is set, the other patterns should not be configured anyway. It's useless.

}
if p.modelIgnorePatterns != nil {
initContainer.Env = append(initContainer.Env,
corev1.EnvVar{Name: "MODEL_IGNORE_PATTERNS", Value: *p.modelIgnorePatterns},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this a String value, but we want a Optional[List[str]] in python side, will this work?

I just want to highlight that we hardcoded the loader image name in pkg/defaults.go, so what we changed in python side will only be tested part of them:

  • what we have tested is the refactors in controller side is correct
  • what we haven't tested is the new added patterns

So you may have to do the extra steps to make sure the code is working:

  • make loader-image-load -> to build the right loader image
  • replace the image name in pkg/defaults.go
  • run the e2e tests(you may need to run kind load image to load the image since you didn't push to the remote registry)

I'm sorry for the trivial steps, but I have't found better ways right now.

@qinguoyi
Copy link
Member Author

We may also need an e2e test to make sure the pattern works as expected, but it can be a follow up until I publish a official image tag for model-builder.

Tks for your review, Please hold on this.
I will retest the complete process, probably after National Day of the People's Republic of China.

@kerthcet
Copy link
Member

kind ping @qinguoyi would like to publish a new release, and happy to include this feature? Do you still work on this?

@qinguoyi
Copy link
Member Author

kind ping @qinguoyi would like to publish a new release, and happy to include this feature? Do you still work on this?

yes, i will finish this work util 10.20

@qinguoyi qinguoyi force-pushed the feat-update-modelloader branch 2 times, most recently from ff79237 to 10bee4c Compare October 18, 2024 08:38
@qinguoyi
Copy link
Member Author

qinguoyi commented Oct 18, 2024

hi, i have finish this work. PTAL. @kerthcet

------- What changes: ---------

  1. The allowPatterns and ignorePatterns field are changed to slice.
    The patterns fields configured in yaml is connected into a string with ',' and put into the environment variable.
    After the model-loader obtains it, it is separated by ',' to obtain the list.

  2. Add a validation in the webhook . Once the file name field is set, the allowPatterns and ignorePatterns fields must be nil.

  3. If the file name is set, in the model-loader, the allow_patterns will be the filename.

  4. add some ut to test pattern fields.

------- What tests: -----------

llama.cpp infers with huggingface model

  1. specifying filename, this scene is normal.
image
  1. llama.cpp infer needs specifying the file path not file dir, so test with specifying allowpattern and ignorepattern, we only see what file download, this scene is normal.
apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: qwen2-0--5b-gguf
spec:
  familyName: qwen2
  source:
    modelHub:
      modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
      allowPatterns:
        - "*"
      ignorePatterns:
        - "*.gguf"
image
  1. specifying filename and allowpattern or ignorepattern, there will be wrong, this scene is normal.
specify-filename-allowpattern

@qinguoyi qinguoyi force-pushed the feat-update-modelloader branch 4 times, most recently from b4acd87 to 380f7f7 Compare October 20, 2024 14:51
@kerthcet
Copy link
Member

I will take a look today.

@kerthcet
Copy link
Member

On the way now, sorry for the late response, I was busy with other things yesterday.

Copy link
Member

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @qinguoyi this is awesome!

I just left some nits, others are great to me. And once we merged this PR, I'll update the model-loader official image

if model.Spec.Source.ModelHub.AllowPatterns != nil && len(model.Spec.Source.ModelHub.AllowPatterns) != 0 {
allErrs = append(allErrs, field.Invalid(sourcePath.Child("modelHub.allowPatterns"), model.Spec.Source.ModelHub.AllowPatterns, "Once Filename is set, allowPatterns should be nil"))
}
if model.Spec.Source.ModelHub.IgnorePatterns != nil && len(model.Spec.Source.ModelHub.IgnorePatterns) != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

},
failed: false,
}),
ginkgo.Entry("set filename and allowPatterns when modelHub is Huggingface", &testValidatingCase{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add one more case as both allowPatterns and ignorePatterns are set.

@@ -49,6 +49,12 @@ type ModelHub struct {
// +kubebuilder:default=main
// +optional
Revision *string `json:"revision,omitempty"`
// AllowPatterns refers to only files matching at least one pattern are downloaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
// AllowPatterns refers to only files matching at least one pattern are downloaded.
// AllowPatterns refers to files matched with at least one pattern will be downloaded.

// AllowPatterns refers to only files matching at least one pattern are downloaded.
// +optional
AllowPatterns []string `json:"allowPatterns,omitempty"`
// IgnorePatterns refers to files matching any of the patterns are not downloaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// IgnorePatterns refers to files matching any of the patterns are not downloaded.
// IgnorePatterns refers to files matched with any of the patterns will not be downloaded.

@@ -118,5 +118,15 @@ func (w *OpenModelWebhook) generateValidate(obj runtime.Object) field.ErrorList
allErrs = append(allErrs, field.Invalid(sourcePath.Child("modelHub.filename"), *model.Spec.Source.ModelHub.Filename, "Filename can only set once modeHub is Huggingface"))
}
}

if model.Spec.Source.ModelHub != nil && model.Spec.Source.ModelHub.Filename != nil {
if model.Spec.Source.ModelHub.AllowPatterns != nil && len(model.Spec.Source.ModelHub.AllowPatterns) != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if model.Spec.Source.ModelHub.AllowPatterns != nil && len(model.Spec.Source.ModelHub.AllowPatterns) != 0 {
if model.Spec.Source.ModelHub.AllowPatterns != nil {

@@ -49,6 +49,12 @@ type ModelHub struct {
// +kubebuilder:default=main
// +optional
Revision *string `json:"revision,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the comment of Filename as well,

	// Filename refers to a specified model file rather than the whole repo.
	// This is helpful to download a specified GGUF model rather than downloading
	// the whole repo which includes all kinds of quantized models.
	// TODO: this is only supported with Huggingface, add support for ModelScope
	// in the near future.
        // Note: once filename is set, allowPatterns and ignorePatterns should be left unset.

@kerthcet
Copy link
Member

I just tested with the model yaml:

apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: qwen2-0--5b-gguf
spec:
  familyName: qwen2
  source:
    modelHub:
      modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
      # filename: qwen2-0_5b-instruct-q5_k_m.gguf
      allowPatterns:
      - qwen2-0_5b-instruct-q5_k_m.gguf

It failed just because we didn't specify the model file name, are you the same. I'm ok with that because we suggest people to use the filename rather than the patterns when only one file is needed.

The error looks like:

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF'
INFO [                    main] HTTP server is listening | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0"
INFO [                    main] loading model | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0"
 ERR [              load_model] unable to load model | tid="139916821146176" timestamp=1729659912 model="/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF"
 ERR [                    main] exiting due to model loading error | tid="139916821146176" timestamp=1729659912

@qinguoyi
Copy link
Member Author

I just tested with the model yaml:

apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: qwen2-0--5b-gguf
spec:
  familyName: qwen2
  source:
    modelHub:
      modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
      # filename: qwen2-0_5b-instruct-q5_k_m.gguf
      allowPatterns:
      - qwen2-0_5b-instruct-q5_k_m.gguf

It failed just because we didn't specify the model file name, are you the same. I'm ok with that because we suggest people to use the filename rather than the patterns when only one file is needed.

The error looks like:

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF'
INFO [                    main] HTTP server is listening | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0"
INFO [                    main] loading model | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0"
 ERR [              load_model] unable to load model | tid="139916821146176" timestamp=1729659912 model="/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF"
 ERR [                    main] exiting due to model loading error | tid="139916821146176" timestamp=1729659912

yes, i am same with you.

with the llama.cp to infer, there needs to specify the filename path not the directory path. so, if we specify the dir path,there will be wrong.

another, let we can see, when filename is not set, the model path will be the directory. when i develop, i have already find this , but i have no good idea to fit it.

how do you think about it ?

// Example 1:
// - modelID: facebook/opt-125m
// modelPath: /workspace/models/models--facebook--opt-125m
//
// Example 2:
// - modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
// fileName: qwen2-0_5b-instruct-q5_k_m.gguf
// modelPath: /workspace/models/qwen2-0_5b-instruct-q5_k_m.gguf
func (p *ModelHubProvider) ModelPath() string {
if p.fileName != nil {
return CONTAINER_MODEL_PATH + *p.fileName
}
return CONTAINER_MODEL_PATH + "models--" + strings.ReplaceAll(p.modelID, "/", "--")
}

@kerthcet
Copy link
Member

Let's be simple at first and we can evolve in the future, so the idea is:

  • If people wants to deploy with one file, use the filename and leave the patters empty, we'll combine the repoName + fileName
  • If people wants to use the patters, let's load the whole repo in the runtime

Just as we do today.

@kerthcet
Copy link
Member

Some kind tips, let's address the comments with new commits, then the reviewers can find the append changes and understand what's going on there, we can squash the commits in the last minute.

@qinguoyi
Copy link
Member Author

Some kind tips, let's address the comments with new commits, then the reviewers can find the append changes and understand what's going on there, we can squash the commits in the last minute.

thanks for your kind tips, i have revert the square commits. PTAL one more.

@qinguoyi
Copy link
Member Author

qinguoyi commented Oct 23, 2024

Let's be simple at first and we can evolve in the future, so the idea is:

  • If people wants to deploy with one file, use the filename and leave the patters empty, we'll combine the repoName + fileName
  • If people wants to use the patters, let's load the whole repo in the runtime

Just as we do today.

So, the current code does not need to be changed.

In addition, we need to give user-friendly tips in the documentation to use patterns fields.

@@ -147,6 +147,12 @@ var _ = ginkgo.Describe("model default and validation", func() {
},
failed: true,
}),
ginkgo.Entry("set filename, allowPatterns and ignorePatterns when modelHub is Huggingface", &testValidatingCase{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is adding a model repo, and allowPatters and ignorePatterns both set but the filename is empty, to make sure we will not failed.

@kerthcet
Copy link
Member

Squash as well, I'm LGTM with the other parts.

@kerthcet
Copy link
Member

/lgtm
/approve

Great work!

@InftyAI-Agent InftyAI-Agent added lgtm Looks good to me, indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 23, 2024
@InftyAI-Agent InftyAI-Agent merged commit bd8e398 into InftyAI:main Oct 23, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. feature Categorizes issue or PR as related to a new feature. lgtm Looks good to me, indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants