-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod fails to start after modifying the Nexus resource #191
Comments
Hi @bszeti ! Thanks for filling this bug. I'll take a look at it today. That one could be tricky since we have some use cases that |
Hey @bszeti thanks for raising the issue. As @ricardozanini mentioned the @bszeti can you please provide how you installed the operator and what version you're running? Along with that, please also share the output from:
Be sure to run it in the project on which the failing pod is. I'll try to replicate the issue, but so far here's what I have, installing v0.4.0 via OLM: Click to expandFirst, let's install OLM and the operator via OLM:
So far so good. Let's instantiate a Nexus CR using the sample from OperatorHub:
All good. Now let's change the memory limit to 2512 MiB:
The newer deployment faced no issues and the previous was terminated as expected, when the newer was available. Of course, this is not Openshift. I can't test it myself on Openshift, but we'll see if we can test it there as well. @ricardozanini @Kaitou786 if any of you could try replicating the issue on OCP that would be great :-) |
@LCaparelli I believe he also has a PV updated with some information. That way, Nexus will lock the data directory, preventing the rollingupdate. If this is the case, we must change to "Recreate", since even for updating we won't be able to do it with the data folder locked. Or at least, signal to the server to unlock the data directory before performing the update. |
@ricardozanini So if I enable persistence I should run into this issue, right? Let me give that a swing |
Ah yes, indeed. I have reproduced the same issue. Simply using At the moment no action from you is requested @bszeti, thanks again for reporting it. :-) Click to expandLet's enable persistence and go back to 2 GiB limit:
And then change the memory limit:
|
Hi, Thanks for looking into this. Yes, of course the issue only shows up if you use persistence. Nexus has a lock file, so the new Pod can't start until the old one is running. (By the way isn't this a problem if number of Nexus replicas is greater than one??) Install operator:
Install Nexus:
|
…cas to max 1 Signed-off-by: Ricardo Zanini <zanini@redhat.com>
…cas to max 1 (#192) * Fix #191 - Changing deployment strategy to recreate and setting replicas to max 1 Signed-off-by: Ricardo Zanini <zanini@redhat.com> * reverting back openapi gen files Signed-off-by: Ricardo Zanini <zanini@redhat.com> * Apply suggestions from code review Co-authored-by: Lucas Caparelli <lucas.caparelli112@gmail.com> * Move mutability to defaults Signed-off-by: Ricardo Zanini <ricardozanini@gmail.com> Co-authored-by: Lucas Caparelli <lucas.caparelli112@gmail.com>
Describe the bug
Modifying the Nexus resource triggers a new Deployment, but the new Pod can't start (CrashLoopBackOff) because the previous one is still holding the /nexus-data/lock. The problem is probably cause by the Deployment using spec.strategy=RollingUpdate. Using "Recreate" may help, so previous Nexus instance is shut down before the new one is created.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The Deployment is successfully rolled out.
Environment
OpenShift 4.6.5
Client Version: 4.4.30
Server Version: 4.6.5
Kubernetes Version: v1.19.0+9f84db3
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: