Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tnse() treats input matrix X rows as observations, should be columns #19

Open
alyst opened this issue Nov 21, 2017 · 6 comments
Open

tnse() treats input matrix X rows as observations, should be columns #19

alyst opened this issue Nov 21, 2017 · 6 comments
Labels
api API change up for grabs Contribution is welcome

Comments

@alyst
Copy link
Collaborator

alyst commented Nov 21, 2017

tsne(X) treats X rows as observations (points) and columns as their features (dimentions). I guess it's because the original implementation is written in Python/NumPy, which uses row-major order.
However, Julia packages (Clustering.jl, Distances.jl, BlackBoxOptim.jl) tend to treat rows as features and columns as observations, because in Julia matrices are stored in column-major order (one column/observation occupies continuous block of memory).

IMO it would be nice to switch to the default Julian behaviour at some point, but that would be a big breaking change, of course.

@oxinabox
Copy link

oxinabox commented Dec 1, 2017

Other notable package also treat columns as observations:
MultivariateStats.jl, MLDataUtils.jl (though that is configurable, columns is the default),
Distributions.jl

@xukai92
Copy link

xukai92 commented Jan 15, 2018

I agree. Row-major can actually cause performance issue: fetching a row is slow than fetching a column in Julia

@juliohm
Copy link
Contributor

juliohm commented May 13, 2018

One more vote for column-major ordering.

TSne.jl could be absorbed by MultivariateStats.jl for better maintenance, peer review of PRs, and performance improvements.

@alyst
Copy link
Collaborator Author

alyst commented May 13, 2018

@juliohm Technically speaking, you can submit PRs and get peer review right away. E.g. you are very welcome to submit col-major PR. Integration into MultivariateStats.jl (or the move under JuliaStats umbrella, as a more modest alternative) could make it easier for the users to discover tsne() and guarantee some minimal level of maintenance. But, based on my experience, I don't expect the overflow of PRs because of the code relocation.

@oxinabox
Copy link

oxinabox commented May 14, 2018

@juliohm if someone were to implement this into MultivariateStats (which I think is a good idea),
it will need to be clean-room implemented from the paper, or some other description of the algorithm
as the license on this repo is not MIT compatible

@juliohm
Copy link
Contributor

juliohm commented May 14, 2018

Yes, it is unfortunate that t-SNE is currently separate from the rest of the embedding methods in MultivariateStats.jl. I hope someone will tackle this issue in the future.

@alyst alyst added api API change up for grabs Contribution is welcome labels Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API change up for grabs Contribution is welcome
Projects
None yet
Development

No branches or pull requests

4 participants