Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Weights and score normalization for JoinDocuments node with reciprocal rank fusion #5704

Merged
merged 6 commits into from
Dec 21, 2023

Conversation

robpasternak
Copy link
Contributor

@robpasternak robpasternak commented Sep 1, 2023

Related Issues

Proposed Changes:

  • Modifies calculation of RRF scores by including weights, as well as normalizing by dividing by maximum possible score.
  • Non-breaking: providing equal weights (or not providing weights) leads to same doc ordering as before.

How did you test it?

  • Manual tests ongoing, but weighting and normalization calculations are pretty simple math.

Notes for the reviewer

  • Node is currently configured to obligatorily perform score normalization, but can easily be revised so that normalization is optional. However, I haven't encountered a use for raw RRF scores in the wild.
  • In this PR draft, raw (pre-normalization) RRF scores are each multiplied by the number of input rankings (len(results) in the formula (weight * len(results)) / (K + rank)). This has no impact on the rankings or post-normalization scores. But it's useful if we decide to make normalization optional, since in this case the raw scores with equal (or unspecified) weights will be identical to the current raw scores.
  • As noted above, manual tests still ongoing, but weighting and normalization calculations are pretty simple math.

Checklist

@coveralls
Copy link
Collaborator

coveralls commented Sep 18, 2023

Pull Request Test Coverage Report for Build 6341046291

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.002%) to 50.187%

Files with Coverage Reduction New Missed Lines %
utils/context_matching.py 1 95.7%
Totals Coverage Status
Change from base Build 6340961471: 0.002%
Covered Lines: 12466
Relevant Lines: 24839

💛 - Coveralls

@robpasternak robpasternak marked this pull request as ready for review September 28, 2023 15:30
@robpasternak robpasternak requested a review from a team as a code owner September 28, 2023 15:30
@robpasternak robpasternak requested review from anakin87 and removed request for a team September 28, 2023 15:30
@anakin87
Copy link
Member

Hey, @robpasternak...

Thanks for this contribution! 💙

  1. Please add a release note about this new feature as described here.
  2. Can you add 1 or 2 unit tests here, to make sure that this new feature performs as expected?

@robpasternak
Copy link
Contributor Author

Will do, and I'll let you know when it's done. Thanks @anakin87!

@Timoeller
Copy link
Contributor

Timoeller commented Sep 29, 2023

Please also use standard syntax when linking issues @robpasternak . Because when you use "addresses issue #xyz" it does not automatically link PR and issue, if you use "fixes #xyz" it does 🎉

@masci masci changed the base branch from main to v1.x November 24, 2023 12:06
@masci masci added the 1.x label Nov 24, 2023
@anakin87
Copy link
Member

Hey, @robpasternak...

Do you think you can add the release note and a unit test or two?

if you can't, we'll do it when possible...

@robpasternak
Copy link
Contributor Author

@anakin87 Sorry, this slipped off the docket. I'll get that done tomorrow.

@anakin87
Copy link
Member

Thanks, @robpasternak!

Please make sure to merge the v1.x branch.
I remember that there has been work on this topic: see #6261.

silvanocerza and others added 2 commits December 15, 2023 18:10
* cast to PromptModelInvocationLayer

* fix pylint pointless-exception-statement

* use two variables to avoid re-assignment

* black

* use mocked tokenizer in unit test
@anakin87
Copy link
Member

anakin87 commented Dec 15, 2023

@robpasternak I took the liberty of rebasing your branch.
Please go on from there.

Apart from one failed test, it seems to be in good shape...

@robpasternak
Copy link
Contributor Author

@anakin87 Great, thanks! I'll sort the failed tests out.

@github-actions github-actions bot added the type:documentation Improvements on the docs label Dec 20, 2023
@robpasternak
Copy link
Contributor Author

@anakin87 Fixed the tests and the checks all passed! Not sure of the procedure here, should I go ahead and merge?

@anakin87
Copy link
Member

I will give it one last look and then merge it...

Thanks!

(Sorry for not supporting you more but these last few days have been crazy) 😄

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good!

@anakin87 anakin87 merged commit 28c0c01 into v1.x Dec 21, 2023
64 checks passed
@anakin87 anakin87 deleted the join_docs_rrf branch December 21, 2023 15:21
@robpasternak
Copy link
Contributor Author

Thanks, and no worries whatsoever!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weights and score normalization for JoinDocuments node with reciprocal rank fusion (PR ready)
7 participants