Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tdigest #418

Merged
merged 6 commits into from
Jan 30, 2024
Merged

Tdigest #418

merged 6 commits into from
Jan 30, 2024

Conversation

AlexanderSaydakov
Copy link
Contributor

See #409

@coveralls
Copy link

coveralls commented Jan 24, 2024

Pull Request Test Coverage Report for Build 7703245801

  • -28 of 300 (90.67%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.2%) to 98.788%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tdigest/include/tdigest.hpp 24 26 92.31%
tdigest/include/tdigest_impl.hpp 248 274 90.51%
Totals Coverage Status
Change from base Build 7574934365: -0.2%
Covered Lines: 16308
Relevant Lines: 16508

💛 - Coveralls

Copy link
Contributor

@jmalkin jmalkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments.

Overall, I can mostly follow what it's doing but not necessarily why without consulting the original paper (which I really should be but haven't). The scale function in particular and how it's used makes the least sense.

Anyway, approving but there's at least one place where a comment would be quite helpful in thw future.


add_library(tdigest INTERFACE)

add_library(${PROJECT_NAME}::QUANTILES ALIAS tdigest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${PROJECT_NAME}::TDIGEST?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops, will fix

REQUIRE(td.get_quantile(0.5) == deserialized_td.get_quantile(0.5));
}

TEST_CASE("serialize deserialize steam and bytes equivalence", "[tdigest]") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've not done a robust job of testing this explicitly elsewhere, so I appreciated this.

static const uint8_t PREAMBLE_LONGS_EMPTY = 1;
static const uint8_t PREAMBLE_LONGS_NON_EMPTY = 2;
static const uint8_t SERIAL_VERSION = 1;
static const uint8_t SKETCH_TYPE = 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't forget to reserve this in java's Family.java

const double proposed_weight = centroids_.back().get_weight() + it->get_weight();
const double projected_weight = weight_so_far + proposed_weight;
bool add_this;
if (USE_WEIGHT_LIMIT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a good reason, but this code reads like "if USE_WEIGHT_LIMIT then don't use the w_limit value" which is a little odd. Could probably use a comment.

@AlexanderSaydakov AlexanderSaydakov merged commit 50ad1ba into master Jan 30, 2024
8 checks passed
@AlexanderSaydakov AlexanderSaydakov deleted the tdigest branch January 30, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants