A server implementation of the htsget protocol for bioinformatics in Rust. It is:
- Fully-featured: supports BAM and CRAM for reads, and VCF and BCF for variants, as well as other aspects of the protocol such as TLS, and CORS.
- Serverless: supports local server instances using Axum and Actix Web, and serverless instances using AWS Lambda Rust Runtime.
- Storage interchangeable: supports local filesystem storage as well as objects via Minio and AWS S3.
- Thoroughly tested and benchmarked: tested using a purpose-built test suite and benchmarked using criterion-rs.
To run a local instance htsget-rs, run [htsget-axum]:
cargo run -p htsget-axum
And fetch tickets from 127.0.0.1:8080
, which serves data from data:
curl 'http://127.0.0.1:8080/variants/data/vcf/sample1-bcbio-cancer'
Htsget-rs is configured using environment variables or config files, see htsget-config for details.
Cloud-based htsget-rs uses htsget-lambda. For an example deployment of this crate see deploy.
Htsget-rs implements the htsget protocol, which is an HTTP-based protocol for querying bioinformatics files. The htsget protocol outlines how a htsget server should behave, and it is an effective way to fetch regions of large bioinformatics files.
A htsget server responds to queries which ask for regions of bioinformatics files. It does this by returning an array of URL tickets, that the client must fetch and concatenate. This process is outlined in the diagram below:
htsget-rs implements this process as closely as possible, and aims to return byte ranges that are as small as possible. htsget-rs is written asynchronously using the Tokio runtime. It aims to be as efficient and safe as possible, having a thorough set of tests and benchmarks.
htsget-rs implements the following components of the protocol:
GET
requests.POST
requests.- BAM and CRAM for the
reads
endpoint. - VCF and BCF for the
variants
endpoint. service-info
endpoint.- TLS on the data block server.
- CORS support on the ticket and data block servers.
Tests can be run tests by executing:
cargo test --all-features
To run benchmarks, see the benchmark sections of htsget-actix and htsget-search.
This repository is a workspace of crates:
- htsget-config: Configuration of the server.
- htsget-actix: Local instance of the htsget server. Contains framework dependent code using Actix Web.
- [htsget-axum]: Local instance of the htsget server. Contains framework dependent code using Axum.
- htsget-http: Handling of htsget HTTP requests. Framework independent code.
- htsget-lambda: Cloud-based instance of the htsget server. Contains framework dependent code using the Rust Runtime for AWS Lambda.
- htsget-search: Core logic needed to search bioinformatics files based on htsget queries.
- htsget-storage: Storage interfaces for local and cloud-based files.
- htsget-test: Test suite used by other crates in the project.
Other directories contain further applications or data:
- data: Contains example data files used by htsget-rs and in tests.
- deploy: Deployments for htsget-rs.
Thanks for your interest in contributing, we would love to have you! See the contributing guide for more information.
This project is licensed under the MIT license.