Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java client and direct CURL call do not return the same result #843

Open
mbarbet opened this issue Jul 4, 2024 · 9 comments
Open

Java client and direct CURL call do not return the same result #843

mbarbet opened this issue Jul 4, 2024 · 9 comments
Labels

Comments

@mbarbet
Copy link

mbarbet commented Jul 4, 2024

Java API client version

8.13.3

Java version

17.0.11

Elasticsearch Version

8.14.1

Problem description

We do a GeoTile Grid aggregation with two average aggregations.
The curl call and the client java do not return the same json result for the buckets.
Below the curl version response:

{
   "key": "10/455/466",
   "doc_count": 26,
   "avg#avg:AOD565": {
       "value": 0.08212617407692308
   },
   "avg#avg:AOD566": {
       "value": null
   }
}

And the java client version response:

{
   "key": "10/455/466",
   "doc_count": 26,
   "avg#avg:AOD565": {
       "value": 0.08212617407692308
   },
   "avg#avg:AOD566": {
       "value": 0.0
   }
}

All the documents in the 10/455/466 grid have no value for AOD566 property.
With CURL the result of the average on this property is null and with the java client the result is 0.0.
There is a way to obtain the result of the curl with the java client ?

@l-trotta
Copy link
Contributor

l-trotta commented Jul 4, 2024

Hello!
The Java client defaults to zero when deserializing aggregation results because of a design choice that was made in order to avoid boxing numbers, which could increase memory usage substantially; we currently don't plan to change this. To check whether the result is actually zero or if the field is missing like in this case it's possible to use the other response fields (for example source).

@l-trotta l-trotta closed this as completed Jul 4, 2024
@l-trotta
Copy link
Contributor

Hello again!
For most aggregation responses where the java client defaults null values to 0.0, it's possible to disambiguate the correct value by checking the doc_count field. In this specific case it looks like this isn't possible, so we're trying to figure out a way to solve this issue without breaking the current implementation. Would a bool isEmpty() method in the aggregation response class work for you? So it would be possible to call it to understand whether the aggregation value of 0.0 is an actual 0 or it's actually null.

@l-trotta l-trotta reopened this Jul 11, 2024
@mbarbet
Copy link
Author

mbarbet commented Jul 12, 2024

Hello!
Thank you for your feedback and your suggestion. What matters to us is to recover the empty information for each metric in the aggregation. It would be perfect if you added it!

@l-trotta l-trotta added Category: Enhancement New feature or request and removed Category: Not an issue This doesn't seem right labels Jul 15, 2024
@l-trotta
Copy link
Contributor

l-trotta commented Aug 8, 2024

Update: in the end we realized that it would just be easier to just box the numbers and change the field from the primitive type to the object, so in this case from double to the nullable Double, so to reflect more accurately what the server is sending at the cost of a memory usage increase which probably will only be noticeable for large aggregations. Unfortunately this is a breaking change, meaning it will have to wait for the next minor version, after the one that will come out in the following days, possibly in a couple of month. Sorry for the delay!

@Priyadarshanvijay
Copy link

Priyadarshanvijay commented Sep 19, 2024

Bump on this^
We're observing similar issues for min, avg, and max aggregations where they're extending SingleMetricAggregateBase and are incorrectly deserializing null to 0.0 because of this

protected static <BuilderT extends AbstractBuilder<BuilderT>> void setupSingleMetricAggregateBaseDeserializer(
ObjectDeserializer<BuilderT> op) {
AggregateBase.setupAggregateBaseDeserializer(op);
op.add(AbstractBuilder::value, JsonpDeserializer.doubleOrNullDeserializer(0), "value");
op.add(AbstractBuilder::valueAsString, JsonpDeserializer.stringDeserializer(), "value_as_string");

@l-trotta
Copy link
Contributor

l-trotta commented Oct 3, 2024

Hello again, after drafting the solution we came to the conclusion that the changes are too impactful to be released with the next minor version, especially because they won't result in a compiler error: the type of the various aggregations result will be changed from the java primitive type to the corresponding Object, meaning with no null checks users will incur in unexpected NullPointerExceptions.

Because of this, we will be waiting until the next major version of the java client (9.0.0) to release this fix, together with instructions on how to modify the code accordingly, especially for those users who followed our advice and were comparing the "zero" result to the doc count to check if the returned "zero" was originally a json "null".

Sorry again for the delay, if there were any simple workaround we would have implemented it ASAP, but currently this is the only way we can proceed.

@Priyadarshanvijay
Copy link

Priyadarshanvijay commented Oct 3, 2024

Hey @l-trotta! thanks a lot for the update. I agree with the fact that the silent break this change (double -> Double) will introduce, which will manifest as NPE during runtime, is not desirable.
Alternatively, have we looked into keeping the data type as primitive double, but changing it's default value to Double.NaN? Thanks!

@l-trotta
Copy link
Contributor

l-trotta commented Oct 3, 2024

@Priyadarshanvijay unfortunately that would still be a silent breaking change for all the users who are checking for 0 and would have to switch to checking for NaN. The only non breaking way would be to add some boolean method to the response, like wasItActuallyNull(), but that instead would propagate changes throughout all of our deserialization code, which would be breaking for users implementing custom deserializers - no easy way out the way I see it.

@Priyadarshanvijay
Copy link

Priyadarshanvijay commented Oct 4, 2024

@l-trotta I agree with that, but the current implementation seems incorrect and as a consequence, unusable for aggregations. I would argue that the current behavior is also a "silent-break" for clients which upgraded from older versions of ES ( <= ES 7.x ) since they would expect the aggregations to work in the same way, which they do on the server side, but the java client misrepresents them while deserialization.
While adding another doc count query to every aggregation query is a valid approach, it's not a practical one because it would simply add more load to the Elasticsearch server for large aggregations.
Can we instead introduce another minor version of the client where the response type is not double or Double (since there are type casted into each other internally silently), but some other TaggedUnion type (eg. DoubleOrNull) which will be a completely new class containing the precise value returned by the server. Since this will be an actual Java break, existing clients will be alerted loudly when they bump the minor version of their client and can patch the fixes in their consumers as required.
Will this increase the memory allocations done by client? Probably yes, but only for large aggregations and not at the cost of correctness. Let me know what you think of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants