Skip to content

Clustering is dropping input nodes #2457

Answered by RobinL
AdamChapnik1 asked this question in Q&A
Discussion options

You must be logged in to vote

You're not doing anything obviously wrong and you're right that the number of output rows should be equal to the number of input rows.

My suspicion is there's something about the format of your input data (the tables o,l,e, or the predictions) which is not quite right. Possibly some IDs are not unique, or there are predictions without a corresponding node

This is a bit difficult for us to debug without a reprex but there are a few things you can try - if I have time I will try to create one from the tables you've pasted. [EDIT] here's an example which seems to work correctly:

This seems to work, click to expand
import duckdb
import pandas as pd
from splink import Linker, DuckDBAPI, Setti…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@RobinL
Comment options

Answer selected by AdamChapnik1
@AdamChapnik1
Comment options

@RobinL
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants