-
Notifications
You must be signed in to change notification settings - Fork 7
/
gwastoolkit.conf
executable file
·260 lines (216 loc) · 10.4 KB
/
gwastoolkit.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
### CONFIGURATION FILE FOR GWASTOOLKIT ###
### Precede your comments with a #-sign.
###
### Set the directory variables, the order doesn't matter.
### Don't end the directory variables with '/' (forward-slash)!
### REQUIRED: Path_to where the software resides on the server.
SOFTWARE="/hpc/local/Rocky8/dhl_ec/software"
### REQUIRED: Path_to where GWASToolKit resides on the server.
# GWASTOOLKITDIR="${SOFTWARE}/GWASToolKit"
GWASTOOLKITDIR="/hpc/dhl_ec/tpeters/git_repos/GWASToolKit"
### REQUIRED: Path_to support programs on the server
SNPTEST="${SOFTWARE}/snptest_v2.5.4"
PLINK2="${SOFTWARE}/plink2"
LOCUSZOOM13="${SOFTWARE}/locuszoom_1.3/bin/locuszoom"
### REQUIRED: SLURM settings -- these should work universally
### FOR GWAS
QMEMGWAS="8G" # '8Gb' for GWAS
QTIMEGWAS="12:00:00" # 12 hours for GWAS
QMEMGWASCLUMP="164G" # 16Gb needed for clumping
QTIMEGWASCLUMP="12:00:00" # 12 hours for clumping
QMEMGWASPLOT="8G" # 8Gb for snptest plotter
QTIMEGWASPLOT="12:00:00" # 12 hours for plotter
QMEMGWASPLOTQC="8G" # 8gb for plotter qc
QTIMEGWASPLOTQC="12:00:00" # 12 hours for plotter qc
QMEMGWASLZOOM="4G" # 4Gb needed for locuszoom
QTIMEGWASLZOOM="01:00:00" # 1 hour for locuszoom
QMEMGWASCLEANER="4G" # 4Gb needed for cleaner
QTIMEGWASCLEANER="01:00:00" # 1 hour to clean
QMEMVAR="8G" # 8Gb for variants
QTIMEVAR="00:15:00" # 15mins for variants
QMEMVARCLEANER="4G" # 4Gb needed for cleaner
QTIMEVARCLEANER="01:00:00" # 1hours to clean
### FOR VARIANT
QMEMVAR="8G" # 8Gb for variants
QTIMEVAR="00:15:00" # 15mins for variants
QMEMVARCLEANER="4G" # 4Gb needed for cleaner
QTIMEVARCLEANER="01:00:00" # 1hours to clean
### FOR REGION
QMEMREG="8G" # 8Gb for regions
QTIMEREG="00:30:00" # 30mins for regions
QMEMREGCLEANER="4G" # 4Gb needed for cleaner
QTIMEREGCLEANER="01:00:00" # 1hours to clean
### FOR GENE
QMEMGENE="8G" # 8Gb for genes
QTIMEGENE="00:30:00" # 30 minutes for genes
QMEMGENEQC="4G" # 4 Gb for snptest qc
QTIMEGENEQC="00:30:00" # 30 minutes for snptest qc
QMEMGENELZOOM="4G" # 4Gb for locuszoom
QTIMEGENELZOOM="00:15:00" #15mins for locuszoom
QMEMGENECLEANER="4G" # 4Gb needed for cleaner
QTIMEGENECLEANER="01:00:00" # 1hours to clean
### REQUIRED: mailing settings
### Your e-mail address; you'll get an email when the job has ended or when it was aborted
### 'BEGIN' Mail is sent at the beginning of the job;
### 'END' Mail is sent at the end of the job;
### 'FAIL' Mail is sent when the job fails.
### 'REQUEUE' Mail is sent when the job is re-queued;
### 'ALL' Mail sent for all the above.
YOUREMAIL="t.s.peters-4@umcutrecht.nl"
MAILSETTINGS="FAIL"
### ANALYSIS SETTINGS
### REQUIRED: Path_to where the main analysis directory resides. Make sure that it exists
PROJECTDIR="/hpc/dhl_ec/tpeters/git_repos/GWASToolKit/test_output"
### REQUIRED: Name of the project, this will automatically be made.
PROJECTNAME="test_project"
### REQUIRED: Analysis settings.
### You can choose one of these options [GWAS/VARIANT/REGION/GENES].
ANALYSIS_TYPE="GWAS"
### You can choose one of these options if GWAS is chosen [SNPTEST/REGENIE]
GWAS_TYPE="REGENIE"
### You can choose one of these options [AEGS/AAAGS/CTMM/UCORBIO/MYOMARKER/HELPFULL/RIVM].
STUDY_TYPE="AEGS"
### REQUIRED
### Indicate the file extension used for the genetic data [bgen, gen, gen.gz, vcf, vcf.gz]
GENETICEXTENSION="vcf.gz"
### REQUIRED: give a list of covariates in a file
### Example covariate-list format:
### COHORT Age sex PC1 PC2
COVARIATE_FILE="${PROJECTDIR}/covariates.txt"
### REQUIRED: give a list of phenotypes to be analyzed
PHENOTYPE_FILE="${PROJECTDIR}/phenotypes.txt"
### SPECIFIC DATA SETTINGS
### REQUIRED: location of [imputed] data to use -- all BGEN-format.
#
### AEGS, 1000G phase 3 and HRC r1.1 combined (Michigan Imputation Server) - vcf.gz files
# (note that b37 codes chromosomes as '[#]', for example '1'; this is reflected in the code hence the inclusion of 'chr' in the file names)
# b37 -- AEGS_COMBINED_EAGLE2_1000Gp3v5HRCr11 version
# IMPUTEDDATA="/hpc/dhl_ec/data/_ae_originals/AEGS_COMBINED_EAGLE2_1000Gp3v5HRCr11/aegs.qc.1kgp3hrcr11.chr"
# IMPUTEDDATA_CHRX="/hpc/dhl_ec/data/_ae_originals/AEGS_COMBINED_EAGLE2_1000Gp3v5HRCr11/_chr23_1kg_gonl5/aegs.1kgp3gonl5.chr"
#
### AEGS, TOPMed r3, f10, b38 - vcf.gz files
# (note that b38 codes chromosomes as 'chr[#]', for example 'chr1')
# IMPUTEDDATA="/hpc/dhl_ec/data/_ae_originals/AEGS_QC_imputation_2023/aegscombo/_topmed_r3_f10_b38/aegscombo.topmed_r3_f10_b38.split_norm_af_filter.chr"
IMPUTEDDATA="/hpc/dhl_ec/tpeters/regenie_pgen/converted/aegscombo_topmed_r3_f10_b38.chr"
### AAAGS, 1000G phase 3, GoNL5
# IMPUTEDDATA="/hpc/dhl_ec/data/_aaa_originals/AAAGS_IMPUTE2_1000Gp3_GoNL5/aaags_1kGp3GoNL5_RAW_chr"
### AAAGS, 1000G phase 3 (Michigan Imputation Server)
# IMPUTEDDATA="/hpc/dhl_ec/data/_aaa_originals/AAAGS_EAGLE2_1000Gp3/aaags.1kgp3.chr"
### AAAGS, HRC r1.1 (Michigan Imputation Server)
# IMPUTEDDATA="/hpc/dhl_ec/data/_aaa_originals/AAAGS_EAGLE2_HRC_r11_2016/aaags.hrc_r11_2016.chr"
### CTMMGS, 1000G phase 3, GoNL5
# IMPUTEDDATA="/hpc/dhl_ec/data/_ctmm_originals/CTMMAxiomTX_IMPUTE2_1000Gp3_GoNL5/ctmm_1kGp3GoNL5_RAW_chr"
### CTMMGS, 1000G phase 3 (Michigan Imputation Server)
# IMPUTEDDATA="/hpc/dhl_ec/data/_ctmm_originals/CTMMAxiomTX_EAGLE2_1000Gp3/ctmmgs.1kgp3.chr"
### CTMMGS, HRC r1.1 (Michigan Imputation Server)
# IMPUTEDDATA="/hpc/dhl_ec/data/_ctmm_originals/CTMMAxiomTX_EAGLE2_HRC_r11_2016/ctmmgs.hrc_r11_2016.chr"
### REQUIRED: location of sample file.
#
### AEGS
# b37 -- AEGS_COMBINED_EAGLE2_1000Gp3v5HRCr11 version
# SAMPLE_FILE="${PROJECTDIR}/20230523.PCSK9.AEGS123.females.sample"
# SAMPLE_FILE="${PROJECTDIR}/20230523.PCSK9.AEGS123.males.sample"
# SAMPLE_FILE="${PROJECTDIR}/20230523.PCSK9.AEGS123.sample"
# SAMPLE_FILE_CHRX="${PROJECTDIR}/20230523.PCSK9.AEGS123.chrX.sample"
SAMPLE_FILE="/hpc/dhl_ec/tpeters/regenie_pgen/20240905.IPH_Binary.AEGS123.pheno"
### AAAGS
# SAMPLE_FILE="/hpc/dhl_ec/data/_aaa_originals/pheno_cov_exclusions/aaags_phenocov.sample"
### CTMMGS
# SAMPLE_FILE="/hpc/dhl_ec/data/_ctmm_originals/pheno_cov_exclusions/ctmm_phenocov.sample"
### REQUIRED: exclusion criteria according to the format "-[in/ex]clude_samples_where <name> [=|==|!=] <value>"
### OLD Version
### exclusion-lists
### SampleID123X
### SampleID123Y
### SampleID123Z
### REQUIRED: exclusion requirement; DEFAULT
EXCLUSION_CRITERIA="-exclude_samples_where \"SELECTION\"==\"not_selected\" "
### REQUIRED: provide specific exclusion description, no space, all capitals
EXCLUSION="EXCL_DEFAULT"
# EXCLUSION="EXCL_FEMALES"
# EXCLUSION="EXCL_MALES"
# EXCLUSION="EXCL_CKD"
# EXCLUSION="EXCL_NONCKD"
# EXCLUSION="EXCL_T2D"
# EXCLUSION="EXCL_NONT2D"
# EXCLUSION="EXCL_SMOKER"
# EXCLUSION="EXCL_NONSMOKER"
# EXCLUSION="EXCL_DIURETICS"
# EXCLUSION="EXCL_NONDIURETICS"
### REQUIRED: ANALYSIS SPECIFIC ARGUMENTS
### For per-variant analysis
### EXAMPLE FORMAT
### rs1234 1 12345567
### rs5678 2 12345567
### rs4321 14 12345567
### rs9876 20 12345567
VARIANTLIST="${PROJECTDIR}/variantlist.txt"
### REQUIRED: For GWAS, GENE, REGIONAL, and VARIANT analyses -- options: [STANDARDIZE/RAW]
STANDARDIZE="RAW"
### REQUIRED: You can choose one of these method options [expected/score/newml] -- expected is likely best;
### refer to SNPTEST documentation and more method options.
### If you choose `-method newml`, you must supply the baseline-phenotype to which the other
### discrete phenotypes are compared.
METHOD="expected"
BASELINEPHENOTYPE="control"
### REQUIRED: You can indicate to condition on a (list of) variant(s) [NORMAL/CONDITION]; refer to SNPTEST documentation.
CONDITION="NORMAL"
# CONDITIONLIST="${PROJECTDIR}/conditionvariants.rs2521501.txt"
CONDITIONLIST="${PROJECTDIR}/conditionvariants.rs17514846.txt"
### REQUIRED: For GWAS -- make PLINK/this work with VCF files NEW VERSION
CLUMP_P2="1"
CLUMP_P1="0.000005" # should be of the form 0.005 rather than 5e-3
CLUMP_R2="0.2"
CLUMP_KB="500"
CLUMP_FIELD="P"
### REQUIRED: For regional analysis -- handle this via a file! NEW VERSION
CHR="1" # e.g. 1
REGION_START="154376264" # e.g. 154376264
REGION_END="154476264" # e.g. 154476264
### REQUIRED: For per-gene analysis
GENES_FILE="${PROJECTDIR}/genelist.txt"
### REQUIRED: For GWAS/REGION/GENE analysis
RANGE="500000" # 500000=500kb, needed for GWAS (LocusZoom plots); and GENE analyses (analysis and LocusZoom plots)
### REQUIRED: Filter settings -- specifically, GWAS, GENE and REGIONAL analyses
INFO="0.3"
MAC="6"
CAF="0.005"
BETA_SE="100"
### SYSTEM REQUIRED | NEVER CHANGE
OUTPUT_DIR=${PROJECT}/snptest_results # review use of this considering the updates to scripts NEW VERSION
VARIANTID="2" # this can handle by parseTable! NEW VERSION
PVALUE="17" # this can handle by parseTable! NEW VERSION
RANGELZ=$(expr "$RANGE" / 1000) # move this to the locuszoom-script! NEW VERSION
### REQUIRED: References -- these will be created upon installation
### You can choose one of these options [1kGp3v5GoNL5/1kGp1v3/GoNL4].
REFERENCE="1kGp3v5GoNL5"
REFERENCEDATA="${GWASTOOLKITDIR}/RESOURCES/1000Gp3v5_EUR/1000Gp3v5.20130502.EUR"
### You can choose one of these:
### - refSeq based: refseq_GRCh37_hg19_Feb2009.txt.gz
### - GENCODE based: gencode_v19_GRCh37_hg19_Feb2009.txt.gz
### - PLINK-style gene list: glist-hg19.gz
HG19_GENES="${GWASTOOLKITDIR}/RESOURCES/glist-hg19.gz"
# REGENIE Addition
REGENIE="${SOFTWARE}/mambaforge3/envs/gwasregenie/bin/regenie"
REGENIE_CALL_RATE="0.10" # PLINK call rate with the flag --geno
REGENIE_MAF="0.10" # PLINK MAF with the flag --maf
REGENIE_HWE="1e-3" # PLINK Hardy-Weinberg Equilibrium (HWE) with the flag --hwe
REGENIE_PRUNE="100 10 0.2" #Prune the data to only select independent SNPs (with low LD r^2) of one pair each with r^2 = 0.2 with the flags --indep-pairwise
REGENIE_STEP1_BZISE="1000"
REGENIE_STEP2_BZISE="1000"
IMPUTEDDATA_ALLCHR="/hpc/dhl_ec/tpeters/regenie_pgen/OUT/aegscombo_topmed_r3_f10_b38.allChrs"
QMEMGWASREGENIE="16G" # '8Gb' for GWAS Regenie
QTIMEGWASREGENIE="00:15:00" # 12 hours for GWAS Regenie
QMEMGWASREGENIE1="64G" # '8Gb' for GWAS Regenie
QTIMEGWASREGENIE1="2:00:00" # 12 hours for GWAS Regenie
QMEMGWASREGENIE2="64G" # '8Gb' for GWAS Regenie
QTIMEGWASREGENIE2="8:00:00" # 12 hours for GWAS Regenie
QMEMGWASREGENIEWRAP="16G" # '8Gb' for GWAS Regenie
QTIMEGWASREGENIEWRAP="00:15:00" # 12 hours for GWAS Regenie
EXCLUDE_RANGE_FILE="/hpc/dhl_ec/tpeters/regenie_pgen/exclude_problematic_range_b38.txt"
COVARIATE_QUANTATIVE="Age,PC1,PC2,ORyear"
COVARIATE_BINARY="SEX"
# PHENOTYPE_QUANTATIVE="IPH_CLAM_prob,IPH_CLAM_area,IPH_CLAM_prob_rankNorm,IPH_CLAM_area_rankNorm"
# PHENOTYPE_QUANTATIVE=""
PHENOTYPE_BINARY="IPH,IPH_CLAM"
# PHENOTYPE_BINARY=""