Skip to content

Materials

Mikhail Koltsov edited this page Nov 8, 2016 · 9 revisions

Articles

  1. VP trees: A data structure for finding stuff fast
    An article, from which bhtsne implementation of Vantage Point trees originates.

Videos

  1. Design at Large - Laurens van der Maaten, Visualizing Data Using Embeddings.
    Interesting points:
  • PCA preserves global structure, while t-SNE aims local structure (nearest neighbours);
  • Student-t distribution permits us to place dissimilar points farther on the map;
  • we can use t-SNE to evaluate our machine learning feature design (i.e. features for similar objects are similar);
  • we can use t-SNE to observe data weaknesses (e.g. denormalization);
  • matrix factorization is used (in machine learning), because it allows compact representation of data, plus we can use matrix rows as points;
  • in order to plot co-authorship or synonim data we can use multiple maps t-SNE. The number of maps can be choosed by the value of KL divergence as a function of number of maps;
  • larger datasets can have perplexity higher than 50.

Research Papers

Studied

  1. Maaten L., Hinton G. Visualizing data using t-SNE //Journal of Machine Learning Research. – 2008.
  2. Van Der Maaten L. Accelerating t-SNE using tree-based algorithms //Journal of machine learning research. – 2014.

Viewed

  1. Hinton G. E., Roweis S. T. Stochastic neighbor embedding //Advances in neural information processing systems. – 2002.
  2. Yang Z., Peltonen J., Kaski S. Optimization Equivalence of Divergences Improves Neighbor Embedding //ICML. – 2014.
    They prove something related to "equality" of graph- and point-visualization approaches, and give examples of performance of t-SNE with respect to graph visualization (in context of their ws-SNE approach superiority).
  3. Biuk-Aghai R. P. Visualizing co-authorship networks in online Wikipedia //2006 International Symposium on Communications and Information Technologies. – IEEE, 2006.

To read

  1. Venna J. et al. Information retrieval perspective to nonlinear dimensionality reduction for data visualization //Journal of Machine Learning Research. – 2010.
  2. Vladymyrov M., Carreira-Perpinan M. Partial-Hessian strategies for fast learning of nonlinear embeddings //arXiv preprint arXiv:1206.4646. – 2012.

On visualizing clustered overlapping data

  1. Vihrovs J. et al. An inverse distance-based potential field function for overlapping point set visualization //Information Visualization Theory and Applications (IVAPP), 2014 International Conference on. – IEEE, 2014.
  2. Santamaría R., Therón R. Overlapping clustered graphs: co-authorship networks visualization //International Symposium on Smart Graphics. – Springer Berlin Heidelberg, 2008.
  3. Vehlow C., Beck F., Weiskopf D. The state of the art in visualizing group structures in graphs //Eurographics Conference on Visualization (EuroVis)-STARs. – 2015.
Clone this wiki locally