-
Notifications
You must be signed in to change notification settings - Fork 1
Materials
Mikhail Koltsov edited this page Nov 9, 2016
·
9 revisions
-
VP trees: A data structure for finding stuff fast
An article, from which bhtsne implementation of Vantage Point trees originates; -
How to use t-SNE effectively
Article talks about common pitfalls when interpreting t-SNE results.
-
Design at Large - Laurens van der Maaten, Visualizing Data Using Embeddings.
Interesting points:
- PCA preserves global structure, while t-SNE aims local structure (nearest neighbours);
- Student-t distribution permits us to place dissimilar points farther on the map;
- we can use t-SNE to evaluate our machine learning feature design (i.e. features for similar objects are similar);
- we can use t-SNE to observe data weaknesses (e.g. denormalization);
- matrix factorization is used (in machine learning), because it allows compact representation of data, plus we can use matrix rows as points;
- in order to plot co-authorship or synonim data we can use multiple maps t-SNE. The number of maps can be choosed by the value of KL divergence as a function of number of maps;
- larger datasets can have perplexity higher than 50.
- Maaten L., Hinton G. Visualizing data using t-SNE //Journal of Machine Learning Research. – 2008.
- Van Der Maaten L. Accelerating t-SNE using tree-based algorithms //Journal of machine learning research. – 2014.
- Hinton G. E., Roweis S. T. Stochastic neighbor embedding //Advances in neural information processing systems. – 2002.
-
Yang Z., Peltonen J., Kaski S. Optimization Equivalence of Divergences Improves Neighbor Embedding //ICML. – 2014.
They prove something related to "equality" of graph- and point-visualization approaches, and give examples of performance of t-SNE with respect to graph visualization (in context of their ws-SNE approach superiority). - Biuk-Aghai R. P. Visualizing co-authorship networks in online Wikipedia //2006 International Symposium on Communications and Information Technologies. – IEEE, 2006.
- Venna J. et al. Information retrieval perspective to nonlinear dimensionality reduction for data visualization //Journal of Machine Learning Research. – 2010.
- Vladymyrov M., Carreira-Perpinan M. Partial-Hessian strategies for fast learning of nonlinear embeddings //arXiv preprint arXiv:1206.4646. – 2012.
- Vihrovs J. et al. An inverse distance-based potential field function for overlapping point set visualization //Information Visualization Theory and Applications (IVAPP), 2014 International Conference on. – IEEE, 2014.
- Santamaría R., Therón R. Overlapping clustered graphs: co-authorship networks visualization //International Symposium on Smart Graphics. – Springer Berlin Heidelberg, 2008.
- Vehlow C., Beck F., Weiskopf D. The state of the art in visualizing group structures in graphs //Eurographics Conference on Visualization (EuroVis)-STARs. – 2015.