Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlaybd observability support #140

Open
shuaichang opened this issue Oct 2, 2022 · 2 comments
Open

Overlaybd observability support #140

shuaichang opened this issue Oct 2, 2022 · 2 comments

Comments

@shuaichang
Copy link

shuaichang commented Oct 2, 2022

When using OverlayBD in production we will need to monitor the healthiness of OverlayBD components using popular cloud native instrumentation toolings.

A similar issue was brought up here: containerd/overlaybd#101 There are certain things users could try but it would be great that it's supported by the DADI service so it can be standardized and re-used. I believe this is key for helping DADI adoption.

The following metrics are some rough idea for what we'd like to monitor:

  • Overlaybd:

    1. Healthcheck ping for the Overlaybd daemon
    2. number of failed blob reads group by http status (500 for registry error, 404 for blob not exists, 403 for auth failure etc.)
    3. blob read latency for each block (e.g. 1M)
    4. Other unexpected errors such as failed to write to local cache or online decompression failures.
    5. Virtual block device IO hang monitoring
    6. Virtual block device IO latency
  • Overlaybd-snapshotter:

    1. Healthcheck ping for the snapshotter daemon
    2. Error count of all GRPC APIs (prepare, commit etc.)
    3. Latency for all GRPC APIs

It's ideal that the above metrics can be exposed in Prometheus such that's it's easy to monitor DADI in cloud native envs.

Some similar monitoring support:

Please let me know your thoughts, the metrics mentioned above are just some quick ideas, would be happy to discuss, too.

@lihuiba
Copy link

lihuiba commented Oct 12, 2022

Thanks for your great suggestions! We'll realize such observability support.

@HileQAQ
Copy link
Contributor

HileQAQ commented Oct 13, 2022

Thanks for your suggestions! we'll implement it after we've done recent work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants