-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tnse() treats input matrix X rows as observations, should be columns #19
Comments
Other notable package also treat columns as observations: |
I agree. Row-major can actually cause performance issue: fetching a row is slow than fetching a column in Julia |
One more vote for column-major ordering. TSne.jl could be absorbed by MultivariateStats.jl for better maintenance, peer review of PRs, and performance improvements. |
@juliohm Technically speaking, you can submit PRs and get peer review right away. E.g. you are very welcome to submit col-major PR. Integration into MultivariateStats.jl (or the move under JuliaStats umbrella, as a more modest alternative) could make it easier for the users to discover |
@juliohm if someone were to implement this into MultivariateStats (which I think is a good idea), |
Yes, it is unfortunate that t-SNE is currently separate from the rest of the embedding methods in MultivariateStats.jl. I hope someone will tackle this issue in the future. |
tsne(X)
treatsX
rows as observations (points) and columns as their features (dimentions). I guess it's because the original implementation is written in Python/NumPy, which uses row-major order.However, Julia packages (
Clustering.jl
,Distances.jl
,BlackBoxOptim.jl
) tend to treat rows as features and columns as observations, because in Julia matrices are stored in column-major order (one column/observation occupies continuous block of memory).IMO it would be nice to switch to the default Julian behaviour at some point, but that would be a big breaking change, of course.
The text was updated successfully, but these errors were encountered: