PipeVal is an easy to use CLI tool that can be used to validate different inputs and parameters in various settings, including Nextflow scripts/pipelines. It can be used standalone or using a Docker container.
Its primary functions are to generate and/or compare checksum files and validate input files.
The tool can be used via the docker image ghcr.io/uclahs-cds/pipeval:<tag>
The tool can be installed as a standalone command line tool. The following dependencies must be installed for this option:
Tool | Version |
---|---|
Python | 3.10 |
VCFtools | 0.1.16 |
Additionally, the libmagic
C library must also be installed on the system.
On Debian/Ubuntu, install through:
sudo apt-get install libmagic-dev
On Mac, install through homebrew (https://brew.sh/):
brew install libmagic
libmagic
can also be installed through the conda
package manager:
conda install -c conda-forge libmagic
With the dependencies (and the proper versions) installed, install pipeval
through one of the options below:
pip install git+ssh://git@github.com/uclahs-cds/package-PipeVal.git
pip install git+https://git@github.com/uclahs-cds/package-PipeVal.git
<clone the PipeVal GitHub repository>
cd </path/to/cloned/repository>
pip install .
usage: pipeval validate [-h] [-v] [-r CRAM_REFERENCE] path [path ...]
positional arguments:
path one or more paths of files to validate
options:
-h, --help show this help message and exit
-r CRAM_REFERENCE, --cram-reference CRAM_REFERENCE
Path to reference file for CRAM
-p PROCESSES, --processes PROCESSES
Number of processes to run in parallel when validating multiple files
-t, --test-integrity Whether to perform a full integrity test on compressed files
The tool will attempt to automatically detect the file type based on extension and perform the appropriate validations. The tool will also perform an existence check along with a checksum check if an MD5 or SHA512 checksum exists regardless of file type.
File Type | Validation |
---|---|
BAM | Validate BAM/CRAM/SAM using pysam . Check for an index file in same directory as the BAM. Note: If a BAM input is missing an accompanying BAM index file in the same directory, validate will not throw an exception but will print a warning. |
SAM | Validate SAM file using pysam . |
CRAM | Validate CRAM file using pysam . Check for existence of an index file in the same directory as the CRAM. Accept an optional reference genome parameter for use with CRAM. In the absence of the parameter, the reference URL from the CRAM header will be used. Note: If a CRAM input is missing an accompanying CRAM index file in the same directory, validate will not throw an exception but will print a warning. |
VCF | Validate VCF using VCFtools |
Note: If the input is invalid in any way, validate
will exit with a non-zero status code.
- Valid input:
Input: path/to/input is valid <file-type>
- Invalid input or failed validation
Error: path/to/input <error message>
Certain validations can be skipped through environment variables.
ENV VAR | Notes |
---|---|
PIPEVAL_SKIP_CHECKSUM | Flag to disable checksum validation. Set to true to disable checksum validation within PipeVal. |
usage: pipeval generate-checksum [-h] [-t {md5,sha512}] [-v] path [path ...]
positional arguments:
path one or more paths of files to validate
options:
-h, --help show this help message and exit
-t {md5,sha512}, --type {md5,sha512}
Checksum type
Testing for PipeVal itself can be done through pytest
by running the following:
pytest
- Repository: pysam-developers/pysam
- Repository: vcftools/vcftools
- Issue tracker to report errors and enhancement ideas.
- Discussions can take place in package-PipeVal Discussions
- package-PipeVal pull requests are also open for discussion
Please see list of Contributors at GitHub.
Author: Yash Patel (YashPatel@mednet.ucla.edu), Arpi Beshlikyan (abeshlikyan@mednet.ucla.edu), Madison Jordan (MBJordan@mednet.ucla.edu), Gina Kim (ginakim@mednet.ucla.edu)
PipeVal is licensed under the GNU General Public License version 2. See the file LICENSE for the terms of the GNU GPL license.
PipeVal is a tool which can be used to validate the inputs and outputs of various bioinformatic pipelines.
Copyright (C) 2020-2023 University of California Los Angeles ("Boutros Lab") All rights reserved.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.