Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K-Means Initialization with Missing Values #64

Open
jaanisfehling opened this issue Sep 19, 2024 · 0 comments
Open

K-Means Initialization with Missing Values #64

jaanisfehling opened this issue Sep 19, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@jaanisfehling
Copy link

Hi, I think it should not be possible to initialize StepMix with a NaN compatible measurement model (e.g. GaussianNan) and init_params="kmeans".
I get the following error:

ValueError: Input X contains NaN.
KMeans does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

It makes sense tho, since the data has NaN values and the default sklearn implementation of k-means does not handle that.

@sachaMorin sachaMorin added the enhancement New feature or request label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants