Introduction

Supporting Code and Data for "Sex-biased reduction in reproductive success drives selective constraint on human genes"

Introduction

This repository contains the supporting code for our manuscript on the relationship between rare genetic variant burden as measured by s_het and fertility. This repository consists of a few different resources necessary to replicate our findings:

An RStudio project consisting of three RMarkdown documents in RMarkdown/:
- SNVCalling_Filtering.Rmd - Examples on how CNV QC and annotation was performed.
- CNVCalling_Filtering.Rmd - Examples on how SNV annotation was performed.
- PhenotypeTesting.Rmd - Code to replicate all main text figures, supplementary figures, and findings of the manuscript.
These documents are easily loadable into RStudio by simply doing File -> Open Project and selecting the UKBBFertility.Rproj file. The first two documents (i.e. *Calling_Filtering.Rmd are not intended to be actually runnable, but are provided as examples of how we processed data as part of the project. PhenotypeTesting.Rmd on the other hand, is intended to be run, but requires the user to download UK Biobank participant protected data using their own UK Biobank access. While our manuscript is undergoing peer review, UK Biobank will not provide required files to run this document. We will update this document when data is made available thru data access with UK Biobank. We also provide in the directory compiled_html/ html documents produced by knitter which represent the data as we ran it on our system. Please view these documents in your browser if you want better formating with the following links:
Scripts used as part of Rmarkdown and other data processing in scripts/. Please see individual RMarkdown documents for more details.
Rawdata that is used as input for RMarkdown. This is provided as a tarball to save space. To use it please do the following:
```
tar -zxf rawdata.tar.gz
```
You should then be ready to use at least PhenotypeTesting.Rmd pending acquisition of UK Biobank data.
Java source code for tools that we created to do CNV and SNV/InDel annotation and QC in src/. This source code is provided with an Eclipse IDE project file to enable easy loading into eclipse. Both of these projects require external jars to compile:
- Apache Commons Math, CLI, and Exec
- htsjdk
Compiled jars which are runnable with a distribution of the java14 JRE/JDK are also provided in scripts/. Please see CNV/SNV Calling and Filtering RMarkdowns for more information.

Required Packages:

This project requires the following packages/dependencies:

R:
- biomaRt - Get gene lists we need (Need to install via bioconductor)
- readxl - Read Supplemental Excel tables
- data.table - Better than data.frame
- patchwork - Arranging ggplots
- broom - Makes getting odds ratios/betas/errors/p.values out of glm() much easier
- meta - For doing meta analysis
- mratios - Need this to calculate 95% CIs for ratios of two means
- svglite - Need to create main text figures properly (ggsave doesn't like anything with an alpha shading)
- tidyverse - Loads ggplot, tidyr, dplyr, and stringr
- randomForest - R randomForest implementation
- ROCR - Builds ROC curves
- cvAUC - Determines CI for AUC from ROCR
- gdata - Allows for accurate random sampling when making training sets for Random Forest
- rcompanion - Used for calculating Naglekerke's pseudo r-squared for logistic regression
- lubridate - For measuring time between patient birth/ICD-code incidence
Perl:
- None
Java:
- Apache Commons Math
- Apache Commons CLI
- Apache Commons Exec
- HTSJDK
Python
- scipy
- pandas

Citing this Repo

If you use any of the materials in the repository, we would appreciate it if you cited our manuscript:

(https://www.biorxiv.org/content/10.1101/2020.05.26.116111v2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Required Packages:

Citing this Repo

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
compiled_html		compiled_html
scripts		scripts
single_gene_scripts		single_gene_scripts
src		src
.gitignore		.gitignore
CNVCalling_Filtering.Rmd		CNVCalling_Filtering.Rmd
LICENSE		LICENSE
PhenotypeTesting.Rmd		PhenotypeTesting.Rmd
README.md		README.md
SNVCalling_Filtering.Rmd		SNVCalling_Filtering.Rmd
UKBBFertility.Rproj		UKBBFertility.Rproj
rawdata.tar.gz		rawdata.tar.gz

License

HurlesGroupSanger/UKBBFertility

Folders and files

Latest commit

History

Repository files navigation

Introduction

Required Packages:

Citing this Repo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages