-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ArrayUtils.binarySearch() with a key extractor #1270
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @kvr000
Thank you for your PR.
Please see my comments.
Waiting for community feedback as well...
* | ||
* @author Zbynek Vyskovsky | ||
*/ | ||
public class SortedListUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collection utilities do not belong in Commons Lang IMO. This would be for Commons Collections unless this functionality is already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it. Unsure whether this was the best choice, as this is about the algorithm, not about implementing collections.
ed7f900
to
ae97b6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @kvr000,
This code should be in ArrayUtils
.
TY!
e974c36
to
d3e3623
Compare
@garydgregory : Yeah, got it, was busy to finish that. Fixed now, thank you. |
Run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @kvr000
- See my scattered comments.
- Add a method called
createSortedList(int...)
that guarantees sorted results and use it. - Add tests for non-sorted input, make sure the results are predictable, we do not want a DOS for an infinite loop for example.
src/test/java/org/apache/commons/lang3/ArrayUtilsBinarySearchTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/apache/commons/lang3/ArrayUtilsBinarySearchTest.java
Outdated
Show resolved
Hide resolved
public class ArrayUtilsBinarySearchTest extends AbstractLangTest { | ||
|
||
@Test | ||
public void binarySearch_whenEmpty_returnM1() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all tests in the class call binarySearch()
and the class is already called "binary search", you do not need to repeat that name in each test method name, that'll de-clutter names a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with this comment. The methods names are intentionally descriptive with pattern name-condition-expected
. This improves readability. While there are truly only binarySearch
now, it doesn't have to be forever and vice versa - this class may be possibly merged into more generic ones which would work with sorted arrays (which was the very original name).
4c16d1a
to
9063592
Compare
The Good point for non-sorted input, I added a test. The code always does |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some documentation clarifications
* @return | ||
* index of the search key, if it is contained in the array within specified range; otherwise, | ||
* (-first_greater - 1). The first_greater is the index of lowest greater element in the list - if all elements | ||
* are lower, the first_greater is defined as toIndex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above para needs rework: there is no specified range here, and no toIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result should be documented as undefined if all array elements are equals; like the Arrays
version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to match non-index version.
@garydgregory : That is great point. In fact, one of my use cases is array (list) with duplicities. So this can be done better. I introduce binarySearchFirst
and binarySearchLast
methods which return consistently first and last matching element respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WRT binarySearchFirst and Last, let's not add those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@garydgregory : I already did. I dropped binarySearch
as the randomness of choosing any of equal element effectively disqualifies it from such use cases. Predictable behavior of binarySearchFirst
and binarySearchLast
is huge benefit and comes at almost no cost (is slower only in cases when the lookup is quickly "lucky").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use caee, I thought, was a version of the 'Arrays' method plus key extraction. I would not necessarily want an API that always allocates an array to return a value. Maybe do this later, as a second API only if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of the 'Arrays' method, the entire entry is matched, so to some extent it does not matter which index is returned. That is not the case where a partial key is used, so I think in the user will often want to know all the matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sudden thought: surely key extraction can take place in the Comparator, so why is this method needed at all?
Or maybe I am missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sudden thought: surely key extraction can take place in the Comparator, so why is this method needed at all?
Or maybe I am missing something?
Good point @sebbASF !
We might not need this at all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of the 'Arrays' method, the entire entry is matched, so to some extent it does not matter which index is returned. That is not the case where a partial key is used, so I think in the user will often want to know all the matches.
In my particular case, I have list of GPS locations sorted by time and track distance, with distance unchanged if there was no move (causing duplicates). One use case is to find point for particular distance and if there are duplicates, I want the result to be predictable (here last but can be first in different use cases). If the result is unpredictable, it will lead to different results based on initial conditions for the search. I don't care about whole range, though I can imagine someone would appreciate such function with added complexity.
Sudden thought: surely key extraction can take place in the Comparator, so why is this method needed at all?
No, unless we're able to construct the full object which is not always possible - note that Comparator
takes two arguments of the same type. It is technically achievable with hacking and casting one argument to T
and the other one to K
, relying on internal implementation passing the arguments in expected order but that would be super ugly and fragile to implementation changes.
Please revert the First/Last APIs.
For the reasons above, I want to keep those as they better match intended use case. I think we actually don't need original unpredictable binarySearch()
.
return m; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are no higher matches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then l
reaches h+1
and exits the loop, returning -toIndex - 1
as documented.
9063592
to
f56bf0a
Compare
@kvr000 |
@garydgregory : As explained in #1270 , I want to keep First/Last as I find them more useful due to their predictability. I don't think we really need original unpredictable simple binarySearch. |
There is
Collections.binarySearch()
in Java core. However, it searches for exact element which is often not usable, as the elements are typically sorted by particular field.These methods introduce
binarySearch()
on arrays and (sorted) List, with ability to provide sorting key extraction function and key comparator.The rest of semantics, behavior, parameters and return values remain consistent with
Collections.binarySearch()
from Java core.