datawig.SimpleImputer.complete is not imputing any columns #153

imsazzad · 2021-07-23T04:58:21Z

df_with_missing = prepare_training_data().iloc[:, : 12]
print("Null value in every column\n", df_with_missing.isnull().sum(axis=0))

# impute missing values
df_with_missing_imputed = datawig.SimpleImputer.complete(df_with_missing, precision_threshold=0.8)
print("Null value in every column\n", df_with_missing_imputed.isnull().sum(axis=0))

mainly two problems

Null values are the same before and after running model
If I run with the above 12 features, it is taking indefinite time to run ( I ran the code for 30 minutes, and still running)

versions

python 3.7.11
sklearn-pandas==1.8.0 numpy==1.14.6 pandas==0.25.3 scikit-learn==0.22.1
mxnet==1.4.0
datawig==0.2.0

I have string, float, and integer data as input
Am I missing something?
@felixbiessmann

The text was updated successfully, but these errors were encountered:

maqboolkhan · 2021-12-09T23:47:36Z

Facing the same problem.

felixbiessmann · 2021-12-10T06:57:10Z

When the precision threshold is set to values above 0.0 datawig will only impute values when it is 'certain' enough that its imputations will be correct, based on a precision threshold. If you set that threshold to 0.8, this means that only for imputations that reached 0.8, on an independent validation set, you will get an imputation. This threshold is calibrated for each value separately. So if datawig cannot impute values with reasonably high precision, you will have Nones/NaNs. If you'd like to have more imputations (with lower precision), you can lower the precision threshold.

As for the long runtime: the model selection / hyperparameter optimization can take a long time. You can try turning off the hpo or reduce the number of dimensions when calling complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datawig.SimpleImputer.complete is not imputing any columns #153

datawig.SimpleImputer.complete is not imputing any columns #153

imsazzad commented Jul 23, 2021 •

edited

Loading

maqboolkhan commented Dec 9, 2021

felixbiessmann commented Dec 10, 2021 •

edited

Loading

datawig.SimpleImputer.complete is not imputing any columns #153

datawig.SimpleImputer.complete is not imputing any columns #153

Comments

imsazzad commented Jul 23, 2021 • edited Loading

mainly two problems

versions

maqboolkhan commented Dec 9, 2021

felixbiessmann commented Dec 10, 2021 • edited Loading

imsazzad commented Jul 23, 2021 •

edited

Loading

felixbiessmann commented Dec 10, 2021 •

edited

Loading