Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with optimizer-server zombie processes #622

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

coder-y04
Copy link

Problem Description

Based on the process outlined in optimize_nydus_image, we attempted to build an optimizer to generate the accessed files list. The optimizer-server and optimizer-nri-plugin were deployed on the server. After some time, we noticed that some of the optimizer-servers became defunct. The specific screenshots are as follows:
image
The parent processes of these zombie processes all belong to the optimizer-nri-plugin. We have also verified that this issue occurs on every server we deployed.
We believe that if the servers continue to retain these zombie processes, they will accumulate to a certain number and eventually affect the normal operation of the business.

Modified Sections

We checked and found that the issue primarily lies in two areas:

  1. When the optimizer-nri-plugin performs the fanotifyServer operation, it needs to invoke the optimizer-server executable, which creates a child process. It is necessary to wait for the completion of this process and properly clean up the resources.
  2. When the optimizer-server is running, it also creates a child process for the fanotify operation. Similarly, it is necessary to wait for this process to complete and clean up the resources.

Validation Results

We deployed the modified functionality on four servers for observation. After a full day of monitoring, no zombie processes were observed.

Testing Method

The modified code was recompiled, and the newly generated optimizer-server and optimizer-nri-plugin binaries replaced the existing ones.
The process status was checked using the command ps -ef | grep optimizer to verify that no zombie processes had appeared.

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 21.29%. Comparing base (29243e3) to head (917375e).
Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
pkg/fanotify/fanotify.go 0.00% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #622      +/-   ##
==========================================
- Coverage   21.93%   21.29%   -0.64%     
==========================================
  Files         122      122              
  Lines       10839    13682    +2843     
==========================================
+ Hits         2377     2913     +536     
- Misses       8140    10447    +2307     
  Partials      322      322              
Files with missing lines Coverage Δ
pkg/fanotify/fanotify.go 0.00% <0.00%> (ø)

... and 112 files with indirect coverage changes

@coder-y04
Copy link
Author

@imeoer PTLK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant