Rename/Alias `GeneralImputer` to `MICE` #59

ParadaCarleton · 2023-10-15T18:04:14Z

The algorithm listed as GeneralImputer here is more widely-known as MICE (Multiple imputation by chained equations) in statistics. I'm not sure if the name used here is standard in ML, but the lack of a solid MICE implementation is a common complaint in the Julia statistics ecosystem, so I was very surprised to stumble across this pure-Julia implementation of MICE under a completely different name. Would it make sense to either rename or alias GeneralImputer to make this easier to discover?

The text was updated successfully, but these errors were encountered:

sylvaticus · 2023-10-17T20:23:21Z

Hmmm... I am aware of the MICE package in R, but there the idea is that the nultiple imputations are "chained" along the whole statistical procedure.
Also I am not super fan of their usage in ml models in general .
The issue is that there is no guarantee on the origin of the differences between the various imputations, there isn't a probabilistic model determining them. Sometimes this even depends on parameters of the imputation algorithm. So the variance between imputations can not be taken as a measure of the quality or trust in the imputation.
But for sure I should add MICE in the models docstring...

ParadaCarleton · 2023-10-23T00:01:01Z

Hmmm... I am aware of the MICE package in R, but there the idea is that the nultiple imputations are "chained" along the whole statistical procedure.

I'm not sure what you mean here; sorry 😅 Is this different from GeneralImputer? The docstring is a bit vague.

The issue is that there is no guarantee on the origin of the differences between the various imputations, there isn't a probabilistic model determining them. Sometimes this even depends on parameters of the imputation algorithm. So the variance between imputations can not be taken as a measure of the quality or trust in the imputation.

If you're doing cross-validation or some other resampling strategy, shouldn't that give a good estimate of the model-based uncertainty? Although you could try something fancier (like a Bayesian bootstrap or other ensemble model).

sylvaticus · 2023-11-10T13:58:02Z

You may be interested in this new package: https://github.com/tom-metherell/Mice.jl

Compared to the imputers in BetaML it provides pooling of the analysis you perform using the imputed values, that you don't have here (you just have the multiple imputations in a vector).

Conversely, BetaML supports random forests that in my (limited) experience perform a better job than pmm for real datasets on which I erased (at random) some data and then checked the quality of the imputation.

ParadaCarleton · 2023-11-13T19:28:19Z

Compared to the imputers in BetaML it provides pooling of the analysis you perform using the imputed values, that you don't have here (you just have the multiple imputations in a vector).

As in, BetaML just performs one imputation per missing data point, by randomly sampling a possible imputed value?

sylvaticus · 2023-11-13T20:59:15Z

As in, BetaML just performs one imputation per missing data point, by randomly sampling a possible imputed value?

No. Let's consider some tabular data with records as N rows and dimensions as C cols.
BetaML, for each imputation, builds C supervised models of c as a function of c-complement cols and then uses these models to predict the missing values.
There is no "sampling" of the missing values. Each imputation is an independent set of models and relative predictions, and the output is a vector of the imputed tables. What distinguishes each imputation is the randomness specific to each supervised model. For example, in random forests it is given by the records used to train the individual decision tree and the subset of dimension employed for that tree, for a neural network estimator it would be the initial weights of the deep layers, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename/Alias `GeneralImputer` to `MICE` #59

Rename/Alias `GeneralImputer` to `MICE` #59

ParadaCarleton commented Oct 15, 2023

sylvaticus commented Oct 17, 2023

ParadaCarleton commented Oct 23, 2023

sylvaticus commented Nov 10, 2023 •

edited

Loading

ParadaCarleton commented Nov 13, 2023

sylvaticus commented Nov 13, 2023 •

edited

Loading

Rename/Alias GeneralImputer to MICE #59

Rename/Alias GeneralImputer to MICE #59

Comments

ParadaCarleton commented Oct 15, 2023

sylvaticus commented Oct 17, 2023

ParadaCarleton commented Oct 23, 2023

sylvaticus commented Nov 10, 2023 • edited Loading

ParadaCarleton commented Nov 13, 2023

sylvaticus commented Nov 13, 2023 • edited Loading

Rename/Alias `GeneralImputer` to `MICE` #59

Rename/Alias `GeneralImputer` to `MICE` #59

sylvaticus commented Nov 10, 2023 •

edited

Loading

sylvaticus commented Nov 13, 2023 •

edited

Loading