You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some applications call for a partially fuzzy join, meaning fuzzy joining within groups of exactly matched entities.
For instance, matching loans from two tables of users having multiple loans, when there is no loan_id. In this scenario, constraining the fuzzy join on loans belonging to the same users (having a user_id) would make sense. Within these groups, we would next perform fuzzy joining on loan prices and loan creation dates, for example.
Feature Description
We could have multiple strategies to use constraints and units that have a business meaning:
The user could pass a custom distance function to define the weights between loan price distance and creation date distance.
The user could indicate a single column to minimize while constraining other columns to some threshold distance. For instance, minimizing the price distance while keeping the date distances within a range of one day.
Alternative Solutions
No response
Additional Context
Fuzzy joining different columns on the same l2 space currently limits the application of Joiner and fuzzy_join to tangible use cases.
The text was updated successfully, but these errors were encountered:
I'm not sure that exact matching is what you are looking for.
Indeed, constraints of exact matching can lead to crashing on new objects, which is a behavior I would really strive to avoid by default.
However, I can see that you want to prioritize the corresponding column in the fuzzy match when it's in a multi-column setting.
We need to find an API to do this that is easily understandable by people (as always, I fight for not adding feature that are going to be used by 0.01 of users)
I'm not sure that exact matching is what you are looking for.
I guess it is when IDs must match exactly before performing fuzzy join, right? In a scenario where joining on an ID that is close but different would be a mistake.
could there also be situations where this helps narrow down the nearest neighbor search and thus reduce computation & memory? in the example you give above we would only compute pairwise distances between loans of a given user, not of all users
Problem Description
Some applications call for a partially fuzzy join, meaning fuzzy joining within groups of exactly matched entities.
For instance, matching loans from two tables of users having multiple loans, when there is no
loan_id
. In this scenario, constraining the fuzzy join on loans belonging to the same users (having auser_id
) would make sense. Within these groups, we would next perform fuzzy joining on loan prices and loan creation dates, for example.Feature Description
We could have multiple strategies to use constraints and units that have a business meaning:
Alternative Solutions
No response
Additional Context
Fuzzy joining different columns on the same l2 space currently limits the application of
Joiner
andfuzzy_join
to tangible use cases.The text was updated successfully, but these errors were encountered: