Combining Google Cloud Platform tools and Genomic workflow pipelines

Disclaimer: I am actively learning and exploring the Google Cloud platform, Bioinformatics tools, and Snakemake. As a beginner in these domains, I aim to share my journey and progress with others who may find it helpful. Please note that the content provided here may not always reflect the most advanced or expert-level knowledge. Instead, it represents my ongoing learning process as I figure some things out and test my understanding.
I encourage feedback, any collaboration, and suggestions!

Using Google Cloud Platform

In this repo, we utilize the following products from Google Cloud:

Google VM Instances

As a cheap workspace (e2-micro is free)
Where we modify code and submit jobs

Google Kubernetes Clusters

To do a lot of the heavy processing
Snakemake has built-in functionality to use kubernetes clusters for jobs

Google Cloud Storage Buckets

to store raw, processed, and meta data
also store our workspace for cheap! (for objects not pushed to github)

Google Dataflow

simple task of using GZIP on fastq files

Looking to learn and incorporate:

Google BigQuery Eventually:
Google Cloud Functions | Cloud Run

adding metadata to google cloud storage bucket?
performing quality control?
creating vizualizations
postprocessing/archival

Batch

with the deprecation of Google Cloud Life Sciences for the preferred Batch, Snakemake is working on a collaboration to bring Batch as an executor for Snakemake.
- this might provide added benefit as there might not be a need to setup any kubernetes cluster at all
- all that is required is an idea of what resources each job/rule in snakemake will need, and GCP Batch will handle the rest

Google Workflows https://console.cloud.google.com/workflows/

seems cool

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
build/dockerfiles		build/dockerfiles
docs		docs
metadata		metadata
workflow		workflow
.gitignore		.gitignore
CCLE_data_paths.txt		CCLE_data_paths.txt
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
dag.pdf		dag.pdf
dag.svg		dag.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Combining Google Cloud Platform tools and Genomic workflow pipelines

Using Google Cloud Platform

About

Releases

Packages

Languages

jjjermiah/GCP-for-genomicworkflows

Folders and files

Latest commit

History

Repository files navigation

Combining Google Cloud Platform tools and Genomic workflow pipelines

Using Google Cloud Platform

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages