Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Validation Fix #167

Merged
merged 10 commits into from
May 7, 2021
Merged

Schema Validation Fix #167

merged 10 commits into from
May 7, 2021

Conversation

AdheipSingh
Copy link
Contributor

@AdheipSingh AdheipSingh commented Apr 16, 2021

Fixes #152

  • I did try to re-create issues as faced here Wrong pvc for middle manager  #135 (comment)

  • I am able to run this with tiny-cluster.yaml, and dont see any issues, my k8s version 1.20.

  • Can someone confirm if this does solve there problem, as i was able to run this without any issues, though not able to re-create the validation issue.

@dsx @lum-splunk @y25ean @hugo-costa-tek

Keeping this Draft as of now.
cc @himanshug

Description


This PR has:

  • been tested on a real K8S cluster to ensure creation of a brand new Druid cluster works.
  • been tested for backward compatibility on a real K*S cluster by applying the changes introduced here on an existing Druid cluster. If there are any backward incompatible changes then they have been noted in the PR description.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Key changed/added files in this PR
  • MyFoo
  • OurBar
  • TheirBaz

@AdheipSingh AdheipSingh marked this pull request as draft April 16, 2021 13:06
@himanshug
Copy link
Member

@AdheipSingh did you re-produce the issue before this patch, and checked that this patch indeed fixed the issue?

@AdheipSingh
Copy link
Contributor Author

AdheipSingh commented Apr 21, 2021

@himanshug i dint not get the error as the other users mentioned, i got an error on deleteOrphanPVC key which i fixed, also the file size went over a certain bytes, so when applying the server gave me an error, so to fix that i added this https://github.com/druid-io/druid-operator/pull/167/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52R5 ( basically this removes the description key from the crd )
I tested with tiny-cluster.yaml for k8s version 1.20

ok ill try another round to see if i can re-create this. and update :)

@y25ean
Copy link

y25ean commented Apr 21, 2021

I tested this pull request and still face the same issue. Looking at the #135 comment, I can not seem to find this in the file druid.apache.org_druids.yaml, is this where the fix needed to be added?

@AdheipSingh
Copy link
Contributor Author

AdheipSingh commented Apr 21, 2021

@y25ean can you share your yaml.
I use the tiny-cluster.yaml , can you also share k8s version

@y25ean
Copy link

y25ean commented Apr 21, 2021

@AdheipSingh I am using this tiny-cluster.yaml. The only thing changed from the one supplied is the storage class name. I am also running it on k8s version 1.20. In the original pull request #152 that fixes this issue I can see the CRDs file is updated with the code mentioned in the #135 comment , however I can not see it in the branch that is linked to this pull request.

tiny-cluster.txt

@AdheipSingh
Copy link
Contributor Author

@y25ean thanks, i will run few more tests and get back with the updates.

@AdheipSingh
Copy link
Contributor Author

i fail to re-produce the error, my PR works fine for the tiny-cluster.yaml, i too just updated my storage class.
@y25ean can you make sure you used this https://github.com/AdheipSingh/druid-operator/blob/schema/deploy/crds/druid.apache.org_druids.yaml

adheip@singh:~/druid-operator/deploy/crds$ kubectl create -f druid.apache.org_druids.yaml 
customresourcedefinition.apiextensions.k8s.io/druids.druid.apache.org created
adheip@singh:~/druid-operator/deploy/crds$ kubectl create -f test.yaml 
druid.druid.apache.org/tiny-cluster created
adheip@singh:~/druid-operator/deploy/crds$ kubectl get pods 
NAME                                READY   STATUS              RESTARTS   AGE
druid-tiny-cluster-brokers-0        0/1     ContainerCreating   0          4s
druid-tiny-cluster-coordinators-0   0/1     ContainerCreating   0          4s
druid-tiny-cluster-routers-0        0/1     ContainerCreating   0          3s

cc @himanshug

@y25ean
Copy link

y25ean commented Apr 22, 2021

@AdheipSingh The pod that the persistent volume attaches too is the historicals, which you can not see in the list you get by running kubectl get pods. To see the error you need to run the command kubectl describe sts druid-tiny-cluster-historicals as the historical pods don't get to the stage of ContainerCreating like the others. Could please have another look to see if you see the error with the command given. Thanks

@AdheipSingh
Copy link
Contributor Author

AdheipSingh commented Apr 22, 2021

i tried all routes, fail to understand what is causing this. I am getting the same sts error, this error goes off if there is no validation.
What i can see is that the OPENAPI passes on the schema but does not include the name parameter and the operator just get an empty name - and it adds the nodeSpecUniquestring which is not appropriate.

I have even tried updating the go mod nothing worked though

The crd defination is not passing name field in the metadata
image

I wonder what // +kubebuilder tag can add this name #152 as mentioned here, if we are able to find which tag is responsible it should be good.

I have no clue, spent hours on this. :(

Will try again and see, but for now clueless.
Any hints suggestions would be helpfull.
cc @himanshug @y25ean

@himanshug
Copy link
Member

we could go back to not having openapi spec validation by documenting users to use crd spec without it, just so that the bug is "fixed"

@AdheipSingh
Copy link
Contributor Author

AdheipSingh commented Apr 26, 2021

@himanshug i believe for now we can do this.

I think we might be missing out something on change of local kubebuilder version, controller-gen, controller runtime or k8s 1.20.
Not sure, ill try take this up again as we upgrade the controller-runtime inhouse and check on this. again.
ill omit this file in this PR. At least we could have a release since this might be a blocker.

This requires change in helm crd too

cc @y25ean

@y25ean
Copy link

y25ean commented Apr 26, 2021

I was able to fix this issue by using the CRD supplied in the pull request #152. The PVC for the Historicals work fine and get attached no problem. Only issue I subsequently had was that Druid needed to run as root to be able to access the volume attached. Could you try it CRD file mentioned and see where you get to?

@AdheipSingh
Copy link
Contributor Author

@y25ean i agree that #152 solves the issue. My only concern is the validation is a generated file, and any manual changes might be hard to track.

@y25ean
Copy link

y25ean commented May 4, 2021

@AdheipSingh I don't have a lot of experience with Kubernetes or CRD's, however here is where I have got to regarding this issue.

I went looking for any issues surrounding PVC and / or metadata objects not being generated and have found that the controller-tools, which is used to generate the CRD, has issues:

This issue has been merged into their master branch but unfortunately has not been fully released. Fortunately, they have a beta release and I was able to test it and it seems to work for me.

There was two changes that I had to make in the Makefile. First is the URL of the controller-gen was changed from sigs.k8s.io/controller-tools/cmd/controller-gen@v0.4.1 to sigs.k8s.io/controller-tools/cmd/controller-gen@v0.6.0-beta.0. Second was the add the CRD option generateEmbeddedObjectMeta=true to the list of CRD_OPTIONS

If you could test this fix and comment on how we want to proceed with either waiting for a new release of controller-tools or go with the beta.

@AdheipSingh
Copy link
Contributor Author

@y25ean appreciate your help. I see, this definitely needed to be a bug on the generation.
I'll test out the approach and get back by today itself. Thanks a lot for this info again.

@AdheipSingh
Copy link
Contributor Author

AdheipSingh commented May 4, 2021

i have pushed the changes, bumped the helm chart version and update the crd defination there too.
The beta version changes work fine for me, issue resolved, tested on k8s version 1.20. As far i am concerned beta seems fine for controller gen, we can update to GA once released.
Thanks sean for the help
cc @himanshug @y25ean

@AdheipSingh AdheipSingh marked this pull request as ready for review May 4, 2021 22:59
@himanshug
Copy link
Member

LGTM , can you remove the vendor'd files please?

@AdheipSingh
Copy link
Contributor Author

oh sorry i though we are still pushing vendor :)
Removed vendor @himanshug

@himanshug himanshug merged commit 051776a into druid-io:master May 7, 2021
@zhangluva zhangluva mentioned this pull request Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants