Materials

Articles

VP trees: A data structure for finding stuff fast
An article, from which bhtsne implementation of Vantage Point trees originates;
How to use t-SNE effectively
Article talks about common pitfalls when interpreting t-SNE results.

PCA preserves global structure, while t-SNE aims local structure (nearest neighbours);
Student-t distribution permits us to place dissimilar points farther on the map;
we can use t-SNE to evaluate our machine learning feature design (i.e. features for similar objects are similar);
we can use t-SNE to observe data weaknesses (e.g. denormalization);
matrix factorization is used (in machine learning), because it allows compact representation of data, plus we can use matrix rows as points;
in order to plot co-authorship or synonim data we can use multiple maps t-SNE. The number of maps can be choosed by the value of KL divergence as a function of number of maps;
larger datasets can have perplexity higher than 50.