Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can a trained model be used to predict multiple columns with missing data? #175

Open
pnoyens opened this issue Jul 8, 2024 · 2 comments

Comments

@pnoyens
Copy link

pnoyens commented Jul 8, 2024

Hi there,

I'm interested in trying out this library for a specific problem I'm dealing with. However, at this moment it is unclear to me if a model can be trained to predict missing values in more than 1 column of the tabular dataset.

When looking at the documentation, the SimpleImputer has a parameter for output_column, indicating only 1 column can be defined as the target. The Imputer interface however, has a label_encoder_cols parameter, indicating multiple columns can be defined for prediction.

Is this a typo, or does it mean that the library can indeed be used to predict multiple columns at a time?

@felixbiessmann
Copy link
Contributor

Hi,

thanks a lot for your interest in this package. It's not maintained anymore and for your use case i'd recommend to use an actively maintained AutoML package for tabular data such as AutoGluon - most of the functionality in datawig is available in AutoGluon and the implementation is actually a lot better.

For the tabular prediction problem with all columns, i'd suggest following this tutorial and wrap it in a for loop going round robin on all columns:
https://auto.gluon.ai/stable/tutorials/tabular/tabular-quick-start.html

Alternatively you could try the sklearn solutions to imputation, they also support random forest or KNN/hot-deck based imputations
https://scikit-learn.org/stable/modules/impute.html

Best wishes
Felix

@felixbiessmann
Copy link
Contributor

... i forgot to mention: if you'd like to use datawig after all (and get it installed) then i guess SimpleImputer.complete is what would do the job:
https://github.com/awslabs/datawig/tree/master?tab=readme-ov-file#quickstart-example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants