Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parent's inclusive time may be smaller than child's in spot caliper data #6

Open
slabasan opened this issue Feb 2, 2022 · 0 comments

Comments

@slabasan
Copy link
Collaborator

slabasan commented Feb 2, 2022

Not a bug in how Hatchet is reading the data, but users may be confused with some of the spot caliper data. Tracking this caliper discussion here.

This case can happen is if node N (and its subgraph) occurs on only a subset of ranks.

Caliper computes the metrics from the records it has, e.g. if some node N exists on 4 out of 8 ranks it computes the average (and min) for only those 4 records, whereas the result for the root would be based on all 8 ranks.

One of the issues here is maintaining compatibility with existing Spot data. If we change the way Caliper computes the min/max/avg, it'll change the metric name and we won't be able to compare new with old data anymore - not just in hatchet but also in the Spot web GUI.

The issue is that in the Average tree, F6 is 5x larger than its parent, F1. I do not understand how that is possible mathematically, as the global sum of F1 should include the global sum of F6, and therefore ave_F1 >> ave_F6 (the division by num_procs should not change that)

Ave time (inc)
├─ 14.145 F1
   │  └─ 14.140 F2
   │     ├─ 0.445 F3
   │     │  ├─ 0.359 F4
   │     │  └─ 0.045 F5
   │     └─ 70.790 F6
   │        ├─ 38.856 F7
   │        │  └─ 37.614 F8
   │        │     ├─ 10.260 F9
   │        │     └─ 12.561 F10
   │        └─ 11.160 F11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant