-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poll workers of a partitioned step with a database query #4705
base: main
Are you sure you want to change the base?
Conversation
Have you tried Spring Batch 5.1.2 or Spring Batch 5.0.6? It contains a fix for #4598, which looks like the same as #3790 to me. It would be very helpful to see how the fix for #4598 affects your application. Does it solve your problem? Does it improve the situation a little? Does it not help at all? |
The main problem is that all steps are loaded into memory to check the step result. If there are many steps (we have thousands), this leads to an OutOfMemory error. |
I feel that your response is evading a reply to my question whether you have tried the most recent fix. Please give the fix in Spring Batch 5.1.2 or 5.0.6 an honest try, if you haven't done so. I'd really appreciate the feedback. You are right that the implementation calls The reason I'm persisting is this: Your proposed implementation of |
Perhaps I didn’t express myself entirely clearly. The main problem is that all steps are loaded into memory to check the step result every 10,000 ms.
In other words, this is done regularly at short intervals. With large long-running jobs, we encounter an OOM error because the memory keeps filling up with these steps every 10,000 ms.
You’re right, we have enough memory to load all steps from the database if it’s done as a one-time action (as in my fix). But in the current implementation, we do this with a |
At the end of the day, I'm not the final arbiter of this discussion as I'm just a contributor and not a maintainer of the project. So feel free to discard what I'm saying. The reproducing example from #4598 reduced the poll interval from the default 10,000 ms to just 2 ms. And the fix that is included in the latest Spring Batch 5.x releases has been successfully tested against this reproducer. I understand that you are very much invested in your proposed solution. But it's a breaking change in public APIs, which would potentially lead to migration efforts for other community members. I'd still be happy to support you, if you publish a minimal complete reproducible example against the latest 5.x releases. |
@Ichanskiy @hpoettker Thank you both for your constructive feedback towards each other comments! It is really a pleasure to see such interesting discussions in a constructive way. I understand the issue and my opinion is that the final arbiter is pragmatism and common sense. As mentioned in my previous comment #3790 (comment), any computation that can be done at the database level should be done at the database level, unless there is a very good reason not to do so. I think it is important to consider the past context in order to explain the reason behind previous fixes and also to make the best decision now. When the polling performance issue was first reported, we did our best to fix the memory leak in a backward compatible way in 93800c6 (Many thanks to @hpoettker for that!). At that time, we had to do it that way because we were required to backport the patch to 5.0.x. A change set like in this PR is clearly breaking and cannot be considered as such in a patch version, not even in a minor version (exceptions are accepted if we manage to have default implementations of new methods added in public interfaces). That said and based on facts and numbers, I believe that the performance issue was resolved for a single partitioned step. Now if someone runs a thousand partitioned steps in the same JVM with a short polling interval, I think everyone would agree that this becomes out of the scope of Spring Batch. So it seems like the OOM discussed here is due to the deployment anti-pattern of running several batch jobs in the same JVM. Again, Ideally the count should be done with a single query on the db side, and we are not there yet as of 5.1.2. While this PR goes in the right direction, it still not the optimal way (in my opinion) to achieve that. Why is there a new method to get step executions by a given list of statuses and not something like Anyway, I did not work on this PR in details until now, and even though I wanted to include it in 5.2, unfortunately this won't be possible given the breaking changes and API design considerations that need to be discussed further. Now in the meantime and if it were up to me given the context of a temporary solution, I would create a custom extension of Now that said, and I see there are currently three open PRs for this request and we need to wrap things up:
|
Fix OutOfMemory issue by optimizing step result checks. Previously, thousands of StepExecutions were loaded into memory every 10000 ms, causing memory overload. There’s no point in loading all StepExecutions into memory to check the running status at short intervals.
We've had this code in production for over a year now, and we no longer get any OutOfMemory errors — it works! However, with each Spring Boot upgrade in our services, we have to patch Spring Batch with this fix, which has been a constant inconvenience for us every time to make the
.patch
, deploy the artifact etc.That's why we've created this pull request, which essentially addresses the same problem as PR 3791. Although that PR was approved, it cannot be merged due to conflicts. @fmbenhassine asked that the conflicts be resolved here , but this was not done. In my pull request, I've included the code with the latest changes
I would greatly appreciate it if you could review my pull request, so we can merge it and permanently resolve the OutOfMemory issue.
Closes #3790