Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Implement pruning for neural sparse search #988

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

zhichao-aws
Copy link
Member

Description

Implement prune for sparse vectors, to save disk space and accelerate search speed with small loss on search relevance. #946

  • Implement pruning at sparse_encoding ingestion processor. Users can configure the pruning strategy when create the processor, and the processor will prune the sparse vectors before write to index.
  • Implement pruning at neural_sparse 2-phase search. Users can configure the pruning strategy when search with neural_sparse query. The query builder will prune the query before search on index.

Related Issues

#946

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
@zhichao-aws
Copy link
Member Author

This PR is ready for review now

Copy link
Collaborator

@heemin32 heemin32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide an overview of how the overall API will look? I initially thought this change would only affect the query side, but it seems it will also modify the parameters for neural_sparse_two_phase_processor.

Additionally, the current implementation appears to be focused on two-phase processing with different strategies for splitting vectors, rather than a combination of pruning and two-phase processing?

@zhichao-aws
Copy link
Member Author

zhichao-aws commented Nov 21, 2024

Could you provide an overview of how the overall API will look? I initially thought this change would only affect the query side, but it seems it will also modify the parameters for neural_sparse_two_phase_processor.

Based on our benchmark results in #946 , when searching, applying prune to 2-phase search has superseded applying it to neural sparse query body, on both precision and latency. Therefore, enhancing the existing 2-phase search pipeline makes more sense.
To maintain compatibility with existing APIs, the overall API will look like:

# ingestion pipeline
PUT /_ingest/pipeline/sparse-pipeline
{
    "description": "Calling sparse model to generate expanded tokens",
    "processors": [
        {
            "sparse_encoding": {
                "model_id": "fousVokBjnSupmOha8aN",
                "pruning_type": "alpha_mass",
                "pruning_ratio": 0.8,
                "field_map": {
                    "body": "body_sparse"
                },
            }
        }
    ]
}

# two phase pipeline
PUT /_search/pipeline/neural_search_pipeline
{
  "request_processors": [
    {
      "neural_sparse_two_phase_processor": {
        "tag": "neural-sparse",
        "description": "Creates a two-phase processor for neural sparse search.",
        "pruning_type": "alpha_mass",
        "pruning_ratio": 0.8,
      }
    }
  ]
}

Additionally, the current implementation appears to be focused on two-phase processing with different strategies for splitting vectors, rather than a combination of pruning and two-phase processing?

The existing two-phase use max_ratio prune criteria. And now we add supports for other criteria as well

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
@zhichao-aws zhichao-aws changed the title [Feature] Implement pruning for neural sparse search [Enhancement] Implement pruning for neural sparse search Nov 22, 2024
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 95.30201% with 7 lines in your changes missing coverage. Please review.

Project coverage is 81.07%. Comparing base (a3fd3a2) to head (2a3e2cf).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...opensearch/neuralsearch/util/prune/PruneUtils.java 94.68% 2 Missing and 3 partials ⚠️
...earch/processor/NeuralSparseTwoPhaseProcessor.java 92.85% 0 Missing and 1 partial ⚠️
...h/neuralsearch/query/NeuralSparseQueryBuilder.java 83.33% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #988      +/-   ##
============================================
+ Coverage     78.46%   81.07%   +2.61%     
- Complexity     1027     1053      +26     
============================================
  Files            85       80       -5     
  Lines          3617     3519      -98     
  Branches        604      610       +6     
============================================
+ Hits           2838     2853      +15     
+ Misses          529      424     -105     
+ Partials        250      242       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Copy link
Collaborator

@heemin32 heemin32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from minor comment, why this PR is trying to merge into main?
If this changes API that used to define the processor, it should be checked with application security and for that we need to merge to feature branch in main repo, and only after that's cleared from feature branch to main.

}
if (!PruneUtils.isValidPruneRatio(pruneType, pruneRatio)) throw new IllegalArgumentException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please mark block of code with curly braces and use String.format to form the error message

@@ -49,17 +60,19 @@ public void doExecute(
BiConsumer<IngestDocument, Exception> handler
) {
mlCommonsClientAccessor.inferenceSentencesWithMapResult(this.modelId, inferenceList, ActionListener.wrap(resultMaps -> {
setVectorFieldsToDocument(ingestDocument, ProcessMap, TokenWeightUtil.fetchListOfTokenWeightMap(resultMaps));
List<Map<String, Float>> sparseVectors = TokenWeightUtil.fetchListOfTokenWeightMap(resultMaps);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
List<Map<String, Float>> sparseVectors = TokenWeightUtil.fetchListOfTokenWeightMap(resultMaps);
List<Map<String, Float>> sparseVectors = TokenWeightUtil.fetchListOfTokenWeightMap(resultMaps).stream().map(vector -> PruneUtils.pruneSparseVector(pruneType, pruneRatio, vector)).toList();

// if we have prune type, then prune ratio field must have value
// readDoubleProperty will throw exception if value is not present
pruneRatio = readDoubleProperty(TYPE, tag, config, PruneUtils.PRUNE_RATIO_FIELD).floatValue();
if (!PruneUtils.isValidPruneRatio(pruneType, pruneRatio)) throw new IllegalArgumentException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before - please put throw exception code into a block marked with "{ }", and use String.format to format exception message

);
} else {
// if we don't have prune type, then prune ratio field must not have value
if (config.containsKey(PruneUtils.PRUNE_RATIO_FIELD)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can merge this if with a previous else and have one single else if block

return type;
}
}
throw new IllegalArgumentException("Unknown prune type: " + value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use String.format

* the second with low-scoring elements
*/
public static Tuple<Map<String, Float>, Map<String, Float>> splitSparseVector(
PruneType pruneType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use final for arguments of public method. Same for other methods in this class

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should introduce some static checker to enforce this :)


switch (pruneType) {
case TOP_K:
return pruneByTopK(sparseVector, (int) pruneRatio, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to move this type case inside the pruneByTopK method?

}
}

switch (pruneType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you think of modifying this into a map of <prune_type> -> <functional_interface>, so instead of switch structure we use map.get()?


switch (pruneType) {
case TOP_K:
return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);
return pruneRatio > 0 && pruneRatio == Math.rint(pruneRatio);

this is more reliable for float numbers, otherwise there is a chance of false positive

}
}

switch (pruneType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, can we use map instead of switch?

@zhichao-aws
Copy link
Member Author

@martin-gaievski Thanks for the comments. We didn't create feature branch because there is no other contributors working on this and we regard the PR branch as feature branch.

I'm on PTO this week, will follow the app sec issue and solve the comments next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants