GitHub - poldrack/LatentStructure: code and data from Latent Structure modeling paper

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CCA		CCA
NIF-Disorders		NIF-Disorders
clustering		clustering
cogatlas		cogatlas
src		src
stopwords		stopwords
topic_modeling		topic_modeling
README		README

Repository files navigation

This repository contains code and relevant files for the Latent Structure
modeling project described in:

Poldrack RA, Mumford JA, Schonberg T, Kalar D, Barman B, Yarkoni T (2012). Discovering relations between mind, brain, and mental disorders using topic mapping. PLOS Computational Biology, in press.

Preprint available at:

http://talyarkoni.org/papers/Poldrack_et_al_in_press_PLoS_CompBio.pdf

Guide to subdirectories:
CCA - contains results from CCA analysis

NIF-Disorders - contains files used for disorders topic modeling

clustering - contains files used for clustering of disorders

cogatlas - contains files used for cogatlas topic modeling

src: contains the source files used for all analyses. these require the following
external libraries:

http://numpy.scipy.org/
http://scikit-learn.org/stable/
http://rpy.sourceforge.net/rpy2.html
http://nipy.sourceforge.net/nibabel/

also note that src/utils needs to be in the python path

1_db_foci_to_image.py - obtains coordinates from neurosynth database and creates images
- uses utils/tal_to_mni.py

1.1_merge_peakfiles.sh - creates merged image and computes mask of voxels with activation
on at least 1% of papers

1.2_extract_nidag_text.py - extract full text of each article from database

2_get_cogat_concepts.py - read cognitive atlas concepts from RDF and get loading for each document

3_create_disorder_list.py - load NIF dysfunction ontology, grab synonyms, and add
additional missing terms

4_get_disorders_NIF.py - get loading for each disorder term from corpus

5_mk_cogatlas_docs.py - make documents based on cogatlas loadings

6_mk_disorders_docs.py - make documents based on disorder loadings

7_mk_pickled_data.py - created a pickle with the full dataset to make loading easier

8_mk_8fold_cogatlas.py - make files for 8-fold CV

9_mk_8fold_disorders.py - make files for 8-fold CV

10_mk_8fold_cogatlas_mallet_data.py - make mallet data from 8-fold files

11_mk_8fold_disorders_mallet_data.py - make mallet data from 8-fold files

12_mk_8fold_cogatlas_topic_models.py - create scripts to run mallet jobs

13_mk_8fold_disorders_topic_models.py - create scripts to run mallet jobs

14_get_best_topic_likelihood.py - check likelihoods to get get dimenstionality

15.1_get_disorders_dimensionality.py - generate additional topic models to get disorder dimensionality

15.3_get_best_disorder_dimensionality.py - get dimensionality that has unique topic dists

15_run_topic_models.py - generate scripts to run final topic models

16_mk_cogatlas_loadingdata.py - load topic data and save loadingdata.txt

17_mk_disorders_loadingdata.py - load topic data and save loadingdata.txt

18_mk_all_chisq_maps.py - make all chisquare maps - uses utils/mk_chisq_maps_filter.py

18.2_mk_6mm_chisq_maps.sh - make 6mm versions of topic

19_mk_slice_images_cogatlas.py - make slice images using p values

20_mk_slice_images_disorders.py - make slice images using p values

22_mk_latexreport_cogatlas.py - create latex report

23_mk_latexreport_disorders.py - create latex report

24_run_CCA_nonneg.py - run CCA analysis

25_cluster_disorders.R - run clustering on disorders data

26_make_cognitive_topic_figure.py - generate figure 2 from initial submission

27_make_disorders_topic_figure.py - generate figure 3 from initial submission

28_mk_8fold_fulltext_topic_models.py - create scripts to run fulltext topic models

28.1_mk_8fold_fulltext_mallet_data.py - make 8-fold data for full text analysis

28.2_run_all_fulltext_nfold.sh - run mallet jobs on 8-fold full text

28.3_get_best_dimensionality.py - get best dimensionality for full text

30_get_wordhist.py - get histograms of # of docs for each filtered paper

31_count_locations.py - count # of locations reported across all papers

32_doc_topic_hist.py - make histograms of docs/topic and topics/doc (for Figure 3)

33_plot_likelihood.py - plot empirical likelihood as function of # of topics (for Figure 2)

34_run_CCA_on_topics.py - run CCA directly on topic distributions

35_mk_fold1_loadingdata.py - get loadingdata for fold1 for each ntopics

36.1_mk_slice_corr_images.disorders.py - make images using correlation rather than p value

36_mk_slice_corr_images.cogatlas.py - make images using correlation rather than p value (for figure 

37_make_cognitive_topic_corr_figure.py - make Figure 4

38_make_disorder_topic_corr_figure.py - make Figure 6

39_get_topic_hierarchy.py - create graphviz .dot file for topic hierarchy (Figure 5)