mGSEA (modified gene set enrichment analysis)
To perform GSEA with a fuzzy ranked gene list
If you use this script in your research work, please cite at least one of the following paper(s):
Genomic diversity and post-admixture adaptation in the Uyghurs (National Science Review, 2021)
- python2.7
- argparse
- numpy
- pandas
- concurrent
python mGSEA.py -h
python mGSEA.py \
--qrank genelist.rnk \
--gset kegg.gmt \
--qsize 0.005 \
--perm 2000 \
--threads 50 \
--out output
Details about the arguments
--qrank (required): file format: , tab-delimited, no header, no duplicated genes, better to keep only genes in the background gene set, e.g., KEGG genes. “quantile value” ranges from 0 to 1.0 (or something will go wrong)
--gset (required): Priori target gene list, all of the genes should be in the ranked genelist, gmt format
--qsize (required): Window size of the non-overlapping quantile regions, from 1.0 to 0.0, e.g., 0.005
--perm (optional): permutation, Default: 2000
--threads (optional): Number of threads, Default: 50
--out (required): prefix name for output file
*.enrichment_score_quarter.txt.gz: enrichment scores after permutation
*.cumulative_score.txt.gz: cumulative scores for the observed data
*.enrichment_Pvalue.txt: combined summary statistics
Gene Set Enrichment Analysis (GSEA) is a statistical method developed based on the Kolmogorove-Smirnov test to identify the enrichment of biologically functional categories in a ranked gene list. But the gene list was not strictly ranked in our study due to the resolution of the empirical P-values. We modified the traditional GSEA by sorting genes according to their quantiles (1.0 - P-values) into 200 bins of size 0.005.
For a given gene set
in which the
The cumulative score
in which the
By: Yuwen Pan, 2021
Contact: panyuwen.x@gmail.com