-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent attribution of individuals to clusters #120
Comments
Thanks.
I'm not likely to address this in the near future. But if you propose a
fix, I'd be happy to review it.
Thanks.
…On Mon, Jul 29, 2024 at 12:35 PM aloboa ***@***.***> wrote:
Given
hc <- hclust(dist(USArrests[c(1, 6, 13, 20, 23), ]), "ave")
dend <- as.dendrogram(hc)
plot(dend)
I think the following difference should be considered as a bug:
a <- cutree(dend, h=50)
b <- cutree(dend, h=50, order_clusters_as_data = FALSE)
table(a)
a
1 2 3
3 1 1
table(b)
b
1 2 3
1 1 3
One thing is changing the order of the labels in the vector, and another
one is changing the cluster to which a given element has
been classified. In this example, Minnesota should be in the same cluster
in both cases, and the number of individuals within each cluster should be
the same.
—
Reply to this email directly, view it on GitHub
<#120>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHOJBTU7JFJAU2IY5JJTC3ZOYEHXAVCNFSM6AAAAABLT4OU2SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQZTIOJTGMYDMMI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
If you do not fix this issue, please clarify asap the documentation of your dendextend::cuttre() If you fix the issue, you probably want to create a new function named The documentation would be: In case The fix is very simple, just look at this example:
|
@aloboa sorry to see that things did not behave as you expected, but I am slightly confused by your opening description of this issue.
the whole point of the option ( Now I agree that for some purposes you might wish to return the integer cluster membership vector for each individual observation ordered by the input data rather than by the dendrogram. But that is a choice and because this is doing something different to base R I don't think you can say that one behaviour or another is a bug. I suppose one could add yet another argument asking to return the cluster membership in data order (e.g. In other words Minnesota should be in a different cluster in the two cases. But you could discuss the ordering of the return vector. |
Also although I understand the intent behind your suggestion to change the docs:
I don't think it works because for the clusters naming/ordering are the same thing. What you want is to change the sort order of the returned cluster membership vector for observations. If you want some ideas about proposing changes to the docs to avoid surprise then maybe take a look at dendroextras::slice which does the same thing as
which give the output you want. The group membership vector is not just an ascending or descending set of cluster ids as in your smaller examples. This may help to highlight that As a side note, I have to say that anyone I have ever tried to teach clustering methods to finds it very strange that clusters in base R are not assigned in the order they appear in the dendrogram. |
Thanks Aloboa.
Related to what Gregory wrote, I don't think it's a bug but rather a
behaviour which is not documented well enough to avoid all possible
confusion.
I'll keep this issue open and take a look at it in the coming weeks
(assuming nothing critical would stop me from taking a look).
…On Tue, 30 Jul 2024, 21:42 Gregory Jefferis, ***@***.***> wrote:
Also although I understand the intent behind your suggestion to change the
docs:
clusters are named and ordered according to their sequence in the data.
I don't think it works because for the *clusters* naming/ordering are the
same thing. What you want is to change the sort order of the returned
cluster membership vector for *observations*.
If you want some ideas about proposing changes to the docs to avoid
surprise then maybe take a look at dendroextras::slice
<https://rdrr.io/cran/dendroextras/man/slice.html> which does the same
thing as cutree(order_clusters_as_data = FALSE). Note also the example
slice(hc,k=5)[order(hc$order)]
which give the output you want. The group membership vector is not just an
ascending or descending set of cluster ids as in your smaller examples.
This may help to highlight that observations!=clusters.
As a side note, I have to say that anyone I have ever tried to teach
clustering methods to finds it very strange that clusters in base R are not
assigned in the order they appear in the dendrogram.
—
Reply to this email directly, view it on GitHub
<#120 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHOJBWRVUVIRNOFH4EFHJTZO7NDFAVCNFSM6AAAAABLT4OU2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJYHE4DCMRSGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Given
I think the following difference should be considered as a bug:
One thing is changing the order of the labels in the vector, and another one is changing the cluster to which a given element has
been classified. In this example, Minnesota should be in the same cluster in both cases, and the number of individuals within each cluster should be the same.
The text was updated successfully, but these errors were encountered: