[Enhancement] Implement pruning for neural sparse search #988

zhichao-aws · 2024-11-15T06:46:38Z

Description

Implement prune for sparse vectors, to save disk space and accelerate search speed with small loss on search relevance. #946

Implement pruning at sparse_encoding ingestion processor. Users can configure the pruning strategy when create the processor, and the processor will prune the sparse vectors before write to index.
Implement pruning at neural_sparse 2-phase search. Users can configure the pruning strategy when search with neural_sparse query. The query builder will prune the query before search on index.

Related Issues

#946

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

zhichao-aws · 2024-11-20T07:34:56Z

This PR is ready for review now

heemin32

Could you provide an overview of how the overall API will look? I initially thought this change would only affect the query side, but it seems it will also modify the parameters for neural_sparse_two_phase_processor.

Additionally, the current implementation appears to be focused on two-phase processing with different strategies for splitting vectors, rather than a combination of pruning and two-phase processing?

src/main/java/org/opensearch/neuralsearch/processor/factory/SparseEncodingProcessorFactory.java

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

zhichao-aws · 2024-11-21T03:25:53Z

Could you provide an overview of how the overall API will look? I initially thought this change would only affect the query side, but it seems it will also modify the parameters for neural_sparse_two_phase_processor.

Based on our benchmark results in #946 , when searching, applying prune to 2-phase search has superseded applying it to neural sparse query body, on both precision and latency. Therefore, enhancing the existing 2-phase search pipeline makes more sense.
To maintain compatibility with existing APIs, the overall API will look like:

# ingestion pipeline
PUT /_ingest/pipeline/sparse-pipeline
{
    "description": "Calling sparse model to generate expanded tokens",
    "processors": [
        {
            "sparse_encoding": {
                "model_id": "fousVokBjnSupmOha8aN",
                "pruning_type": "alpha_mass",
                "pruning_ratio": 0.8,
                "field_map": {
                    "body": "body_sparse"
                },
            }
        }
    ]
}

# two phase pipeline
PUT /_search/pipeline/neural_search_pipeline
{
  "request_processors": [
    {
      "neural_sparse_two_phase_processor": {
        "tag": "neural-sparse",
        "description": "Creates a two-phase processor for neural sparse search.",
        "pruning_type": "alpha_mass",
        "pruning_ratio": 0.8,
      }
    }
  ]
}

Additionally, the current implementation appears to be focused on two-phase processing with different strategies for splitting vectors, rather than a combination of pruning and two-phase processing?

The existing two-phase use max_ratio prune criteria. And now we add supports for other criteria as well

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

codecov · 2024-11-22T06:45:04Z

Codecov Report

Attention: Patch coverage is 95.30201% with 7 lines in your changes missing coverage. Please review.

Project coverage is 81.07%. Comparing base (a3fd3a2) to head (2a3e2cf).
Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
...opensearch/neuralsearch/util/prune/PruneUtils.java	94.68%	2 Missing and 3 partials ⚠️
...earch/processor/NeuralSparseTwoPhaseProcessor.java	92.85%	0 Missing and 1 partial ⚠️
...h/neuralsearch/query/NeuralSparseQueryBuilder.java	83.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #988      +/-   ##
============================================
+ Coverage     78.46%   81.07%   +2.61%     
- Complexity     1027     1053      +26     
============================================
  Files            85       80       -5     
  Lines          3617     3519      -98     
  Branches        604      610       +6     
============================================
+ Hits           2838     2853      +15     
+ Misses          529      424     -105     
+ Partials        250      242       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

heemin32

LGTM. Thanks!

martin-gaievski

Apart from minor comment, why this PR is trying to merge into main?
If this changes API that used to define the processor, it should be checked with application security and for that we need to merge to feature branch in main repo, and only after that's cleared from feature branch to main.

martin-gaievski · 2024-11-25T02:37:52Z

src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java

            }
+            if (!PruneUtils.isValidPruneRatio(pruneType, pruneRatio)) throw new IllegalArgumentException(


can you please mark block of code with curly braces and use String.format to form the error message

martin-gaievski · 2024-11-25T02:41:05Z

src/main/java/org/opensearch/neuralsearch/processor/SparseEncodingProcessor.java

@@ -49,17 +60,19 @@ public void doExecute(
        BiConsumer<IngestDocument, Exception> handler
    ) {
        mlCommonsClientAccessor.inferenceSentencesWithMapResult(this.modelId, inferenceList, ActionListener.wrap(resultMaps -> {
-            setVectorFieldsToDocument(ingestDocument, ProcessMap, TokenWeightUtil.fetchListOfTokenWeightMap(resultMaps));
+            List<Map<String, Float>> sparseVectors = TokenWeightUtil.fetchListOfTokenWeightMap(resultMaps);


Suggested change

List<Map<String, Float>> sparseVectors = TokenWeightUtil.fetchListOfTokenWeightMap(resultMaps);

List<Map<String, Float>> sparseVectors = TokenWeightUtil.fetchListOfTokenWeightMap(resultMaps).stream().map(vector -> PruneUtils.pruneSparseVector(pruneType, pruneRatio, vector)).toList();

martin-gaievski · 2024-11-25T02:43:02Z

src/main/java/org/opensearch/neuralsearch/processor/factory/SparseEncodingProcessorFactory.java

+            // if we have prune type, then prune ratio field must have value
+            // readDoubleProperty will throw exception if value is not present
+            pruneRatio = readDoubleProperty(TYPE, tag, config, PruneUtils.PRUNE_RATIO_FIELD).floatValue();
+            if (!PruneUtils.isValidPruneRatio(pruneType, pruneRatio)) throw new IllegalArgumentException(


same comment as before - please put throw exception code into a block marked with "{ }", and use String.format to format exception message

martin-gaievski · 2024-11-25T02:44:23Z

src/main/java/org/opensearch/neuralsearch/processor/factory/SparseEncodingProcessorFactory.java

+            );
+        } else {
+            // if we don't have prune type, then prune ratio field must not have value
+            if (config.containsKey(PruneUtils.PRUNE_RATIO_FIELD)) {


we can merge this if with a previous else and have one single else if block

martin-gaievski · 2024-11-25T02:45:21Z

src/main/java/org/opensearch/neuralsearch/util/prune/PruneType.java

+                return type;
+            }
+        }
+        throw new IllegalArgumentException("Unknown prune type: " + value);


please use String.format

martin-gaievski · 2024-11-25T02:50:57Z

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

+     * the second with low-scoring elements
+     */
+    public static Tuple<Map<String, Float>, Map<String, Float>> splitSparseVector(
+        PruneType pruneType,


please use final for arguments of public method. Same for other methods in this class

We should introduce some static checker to enforce this :)

martin-gaievski · 2024-11-25T02:52:32Z

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

+
+        switch (pruneType) {
+            case TOP_K:
+                return pruneByTopK(sparseVector, (int) pruneRatio, true);


is it possible to move this type case inside the pruneByTopK method?

martin-gaievski · 2024-11-25T02:54:27Z

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

+            }
+        }
+
+        switch (pruneType) {


can you think of modifying this into a map of <prune_type> -> <functional_interface>, so instead of switch structure we use map.get()?

martin-gaievski · 2024-11-25T02:58:56Z

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

+
+        switch (pruneType) {
+            case TOP_K:
+                return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);


Suggested change

return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);

return pruneRatio > 0 && pruneRatio == Math.rint(pruneRatio);

this is more reliable for float numbers, otherwise there is a chance of false positive

martin-gaievski · 2024-11-25T03:03:49Z

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

+            }
+        }
+
+        switch (pruneType) {


same as above, can we use map instead of switch?

zhichao-aws · 2024-11-25T03:36:19Z

@martin-gaievski Thanks for the comments. We didn't create feature branch because there is no other contributors working on this and we regard the PR branch as feature branch.

I'm on PTO this week, will follow the app sec issue and solve the comments next week.

martin-gaievski mentioned this pull request Nov 16, 2024

[FEATURE] Enhanced adaptive token pruning for neural sparse search #989

Open

zhichao-aws added 10 commits November 20, 2024 15:33

add impl

ff60b53

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

add UT

7817c9a

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

rename pruneType; UT

30b434e

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

changelog

baedc00

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

ut

3bb9d08

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

add it

3fd5392

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

change on 2-phase

41d7cb5

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

UT

339222d

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

it

302f949

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

rename

46b9d9a

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

zhichao-aws force-pushed the pruning_dev branch from 1e55b7c to 46b9d9a Compare November 20, 2024 07:34

zhichao-aws marked this pull request as ready for review November 20, 2024 07:34

zhichao-aws requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski, sean-zheng-amazon, model-collapse, zane-neo, vibrantvarun, yuye-aws and minalsha as code owners November 20, 2024 07:34

heemin32 reviewed Nov 20, 2024

View reviewed changes

src/main/java/org/opensearch/neuralsearch/processor/factory/SparseEncodingProcessorFactory.java Outdated Show resolved Hide resolved

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java Outdated Show resolved Hide resolved

zhichao-aws added 2 commits November 21, 2024 14:14

enhance: more detailed error message

a6a351b

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

refactor to prune and split

b8d902d

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

zhichao-aws changed the title ~~[Feature] Implement pruning for neural sparse search~~ [Enhancement] Implement pruning for neural sparse search Nov 22, 2024

changelog

2a65ec8

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

fix UT cov

2a3e2cf

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

zhichao-aws requested a review from heemin32 November 22, 2024 07:18

heemin32 approved these changes Nov 22, 2024

View reviewed changes

martin-gaievski requested changes Nov 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Implement pruning for neural sparse search #988

[Enhancement] Implement pruning for neural sparse search #988

zhichao-aws commented Nov 15, 2024

zhichao-aws commented Nov 20, 2024

heemin32 left a comment

zhichao-aws commented Nov 21, 2024 •

edited

Loading

codecov bot commented Nov 22, 2024 •

edited

Loading

heemin32 left a comment

martin-gaievski left a comment

martin-gaievski Nov 25, 2024

martin-gaievski Nov 25, 2024

martin-gaievski Nov 25, 2024

martin-gaievski Nov 25, 2024

martin-gaievski Nov 25, 2024

martin-gaievski Nov 25, 2024

heemin32 Nov 25, 2024

martin-gaievski Nov 25, 2024

martin-gaievski Nov 25, 2024

martin-gaievski Nov 25, 2024

martin-gaievski Nov 25, 2024

zhichao-aws commented Nov 25, 2024

		}
		if (!PruneUtils.isValidPruneRatio(pruneType, pruneRatio)) throw new IllegalArgumentException(

	List<Map<String, Float>> sparseVectors = TokenWeightUtil.fetchListOfTokenWeightMap(resultMaps);
	List<Map<String, Float>> sparseVectors = TokenWeightUtil.fetchListOfTokenWeightMap(resultMaps).stream().map(vector -> PruneUtils.pruneSparseVector(pruneType, pruneRatio, vector)).toList();

	return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);
	return pruneRatio > 0 && pruneRatio == Math.rint(pruneRatio);

[Enhancement] Implement pruning for neural sparse search #988

Are you sure you want to change the base?

[Enhancement] Implement pruning for neural sparse search #988

Conversation

zhichao-aws commented Nov 15, 2024

Description

Related Issues

Check List

zhichao-aws commented Nov 20, 2024

heemin32 left a comment

Choose a reason for hiding this comment

zhichao-aws commented Nov 21, 2024 • edited Loading

codecov bot commented Nov 22, 2024 • edited Loading

Codecov Report

heemin32 left a comment

Choose a reason for hiding this comment

martin-gaievski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhichao-aws commented Nov 25, 2024

zhichao-aws commented Nov 21, 2024 •

edited

Loading

codecov bot commented Nov 22, 2024 •

edited

Loading