This is an implementation of the epiBAT algorithm for SNP epistasis detection. The epiBAT algorithm is based on the bat algorithm and the tabu search. It uses two objective functions: K2-score and Gini score. Objectives are not combined, but are separately used in two populations. The best bats of both populations are then combined together to obtain the candidate set, which is evaluated for significance with modified G-test. The epiBAT algorithm is implemented in Python3 with numpy, pandas and scipy library.
The epiBAT algorithm requires Python 3.6 with numpy, pandas and scipy library
Input parameters must be specified in the head of the epiBAT.py script. There are 6 parameters:
gini_population
represents the number of agents in a population of bats with Gini score as its objective
k2_population
represents the number of agents in a population of bats with K2 score as its objective
iteration_size
denotes the number of iterations for which the epiBAT algorithm will be running
alpha_value
denotes p-value threshold (before Bonferroni correction) which must be passed by the SNP combination to be said as significant
searching_path
path to the input file
output_file
path to the file, where the results of the epiBAT algorithm will be outputted
Other optional parameters that can be specified by the user are:
freq_min
represents the minimum possible frequency of a bat
freq_max
represents the maximum possible frequency of a bat
alpha
represents the alpha parameter in the bat algorithm (should hold that 0<alpha<1)
gamma
represents the gamma parameter in the bat algorithm (should hold that 0<gamma)
min_loudness_A
represents the minimum initial loudness of a bat
max_loudness_A
represents the maximum initial loudness of a bat
required_iterations_for_taboo
represents the number of iterations when the best solution in a population is not changed, that is needed to add that best solution to the tabu table
zeta_radius
defines the approximity when comparing solutions
Input data must be in a usual comma-delimited file containing the case-control genotype data. The first line in a file denotes the SNP IDs, whereas the last column denotes the class, i.e. case or control. The following rows contain the genotype data and the disease state, while the genotype data should have values 0, 1, 2 (i.e. the homozygous major allele, heterozygous allele, and homozygous minor allele), and the disease state should have values 0 and 1 (i.e. control and case). Example of the input data containing 5 SNPs and 3 samples is as follows:
X0,X1,X2,X3,X4,X5,Class
0,0,0,1,0,0
0,2,0,2,1,0
1,0,0,1,1,1
After setting the input parameters and preparing your input data, open the command line, go to the directory, where you have downloaded the epiBAT.py script and call:
epiBAT.py
The epiBAT algorithm outputs in a specified file results of its 1st stage (i.e. candidate set before evaluation by the G-test) and also final results (i.e. candidate set after evaluation by the G-test) with the corresponding p-value.