AliNe is a pipeline written in nextflow that aims to efficiently align reads against a reference genome using the tools of your choice.
AliNe is a pipeline written in nextflow that aims to efficienlty align reads against a reference genome.
- Can handle short reads paired or single, pacbio and ont (nanopore) data (see list of aligner in table1) .
- A QC with FastQC is made at each step if option activated.
- A trimming is feasible before alignment if option activated.
- The pipeline deals automaticallu with all quality encoding ('sanger', 'solexa', 'illumina-1.3+', 'illumina-1.5+', 'illumina-1.8+'). All fastq will be standardised in Phred+33 for downstream alignments by seqkit.
- Deal automatically with the type of library used: stranded or not, firstrand, secondstrand etc... (see list of aligner in table2)
- Can deal with annotation file (see list of aligner in table3) You can choose to run one or several aligner in parallel.
** Table1** Here is the list of implemented aligners and the type of reads accepted:
Tool | Single End (short reads) | Paired end (short reads) | Pacbio | ONT |
---|---|---|---|---|
bbmap | X | X | x | x |
bowtie2 | X | X | x | x |
bwaaln | X | X R1 and R2 independently aligned then merged with bwa sampe | X | X |
bwamem | X | X | x | x |
bwasw | X | X | x | x |
graphmap2 | x | x R1 and R2 independently aligned then merged with cat | X | X |
hisat2 | x | x | x | x |
minimap2 | x | x | X | X |
ngmlr | x | na | X | X |
novoalign | X | X | X | x |
nucmer | X | X R1 and R2 are concatenated then aligned | x | x |
star | X | X | x | x |
star 2pass mode | X | X | x | x |
subread | X | X | x | x |
sublong | x | na | X | X |
tophat | X | X | na | na |
Legend
X Recomended
x Not recommended
na Not applicable
It is possible to bypass the default authorized read type using the AliNe --relax parameter.
The pipeline deals automatically with the library types. It extract 10 000 reads by default and-d run salmon to guess the library type. It is then translated to the correct option in the following aligners:
Tool | tool option | Library type by salmon | Comment |
---|---|---|---|
bbmap | xs=fr / xs=ss / xs=us | ISF ISR / OSF OSR / U | strand information |
bbmap | - / rcs=f / | ISF ISR IU / OSF OSR OU MSF MSR MU | read orientation |
bowtie2 | --fr / --rf / --ff | ISF ISR IU / OSF OSR OU / MSF MSR MU | read orientation |
bwaaln | na | na | |
bwamem | na | na | |
bwasw | na | na | |
graphmap2 | na | na | |
hisat2 | --rna-strandness [ F / R / FR / RF ] | SF / SR / ISF OSF MSF / ISR OSR MSR | strand information |
hisat2 | --fr / --rf / --ff | I / O / M | read orientation |
minimap2 | na | na | |
ngmlr | na | na | |
novoalign | na | na | |
nucmer | na | na | |
star | na | na | |
star 2pass mode | na | na | |
subread | -S fr / -S rf / -S ff | ISF ISR IU / OSF OSR OU / MSF MSR MU | read orientation |
sublong | na | na | |
tophat2 | fr-unstranded / fr-firststrand / fr-secondstrand | U / SR / SF | strand information |
If the skip_libray_usage paramater is set the information provided about the library type provided by the user or guessed by the pipeline via the --library_type parameter is not used. /!\ If you provide yourself the librairy type via the aligner parameter, it will be used over the information provided or guessed via --library_type.
If you provide an annotation file the pipeline will pass automatically the file to the following aligner:
Tool | accept |
---|---|
bbmap | na |
bowtie2 | na |
bwaaln | na |
bwamem | na |
bwasw | na |
graphmap2 | GTF (--gtf) |
hisat2 | na |
minimap2 | na |
ngmlr | na |
novoalign | na |
nucmer | na |
star | GTF / GFF ( --sjdbGTFfile + --sjdbGTFtagExonParentTranscript Parent in case of GFF ) |
star 2pass mode | GTF / GFF (--sjdbGTFfile + --sjdbGTFtagExonParentTranscript Parent in case of GFF ) |
subread | GTF or compatible GFF format (-a) |
sublong | na |
tophat | GTF/GFF3 (-G) |
---
config:
look: handDrawn
theme: neutral
---
graph TD;
Genome-->Index;
Index-->Aligner1;
Index-->Aligner2;
Annotation--> Aligner1;
Annotation--> Aligner2;
Reads --> QCraw;
Reads --> StandardizeScore
StandardizeScore --> Trim;
Trim --> LibraryGuessing;
Trim --> QCtrim;
LibraryGuessing --> Aligner1;
LibraryGuessing --> Aligner2;
Trim --> Aligner1;
Aligner1 --> QCaligner1;
Trim --> Aligner2;
Aligner2 --> QCaligner2;
QCraw --> MultiQC;
QCtrim --> MultiQC;
QCaligner1 --> MultiQC;
QCaligner2 --> MultiQC;
The prerequisites to run the pipeline are:
- The AliNe repository
- Nextflow >= 22.04.0
- Docker or Singularity
# clone the workflow repository
git clone https://github.com/Juke34/AliNe.git
# Move in it
cd AliNe
-
Via conda
See here
``` conda create -n nextflow conda activate nextflow conda install nextflow ``` -
Manually
See here
Nextflow runs on most POSIX systems (Linux, macOS, etc) and can typically be installed by running these commands:# Make sure 11 or later is installed on your computer by using the command: java -version # Install Nextflow by entering this command in your terminal(it creates a file nextflow in the current dir): curl -s https://get.nextflow.io | bash # Add Nextflow binary to your user's PATH: mv nextflow ~/bin/ # OR system-wide installation: # sudo mv nextflow /usr/local/bin
To run the workflow you will need a container platform: docker or singularity.
Please follow the instructions at the Docker website
Please follow the instructions at the Singularity website
You can first check the available options and parameters by running:
nextflow run aline.nf --help
To run the workflow you must select a profile according to the container platform you want to use:
singularity
, a profile using Singularity to run the containersdocker
, a profile using Docker to run the containers
The command will look like that:
nextflow run aline.nf -profile docker <rest of paramaters>
Another profile is available (/!\ actually not yet implemented):
slurm
, to add if your system has a slurm executor (local by default)
The use of the slurm
profile will give a command like this one:
nextflow run aline.nf -profile singularity,slurm <rest of paramaters>
Test data are included in the AliNe repository in the test
folder.
Test with short single reads:
nextflow run -profile docker,test_illumina_single aline.nf
Test with short paired reads:
nextflow run -profile docker,test_illumina_paired aline.nf
Test with ont reads:
nextflow run -profile docker,test_ont aline.nf
Test with pacbio reads:
nextflow run -profile docker,test_pacbio aline.nf
On success you should get a message looking like this:
AliNe Pipeline execution summary
--------------------------------------
Completed at : 2024-03-07T21:40:23.180547+01:00
UUID : e2a131e3-3652-4c90-b3ad-78f758c06070
Duration : 8.4s
Success : true
Exit Status : 0
Error report : -
--help prints the help section
General Parameters
--reads path to the reads file or folder
--reads_extension extension of the reads files (default: .fastq.gz)
--genome path to the genome file
--aligner aligner(s) to use among this list (comma or space separated) [bbmap, bowtie2, bwaaln, bwamem, bwasw, graphmap2, hisat2, minimap2, novoalign, nucmer, ngmlr, star, subread, sublong, tophat2]
--outdir path to the output directory (default: alignment_results)
--annotation [Optional][used by STAR, Tophat2] Absolute path to the annotation file (gtf or gff3)
Type of input reads
--read_type type of reads among this list [short_paired, short_single, pacbio, ont] (default: short_paired)
--paired_reads_pattern pattern to detect paired reads (default: {1,2})
--library_type Set the library_type of your reads (default: auto). In auto mode salmon will guess the library type for each sample.
If you know the library type you can set it to one of the following: [U, IU, MU, OU, ISF, ISR, MSF, MSR, OSF, OSR]. See https://salmon.readthedocs.io/en/latest/library_type.html for more information.
In such case the sample library type will be used for all the samples.
--skip_libray_usage Skip the usage of library type provided by the user or guessed by salmon.
Extra steps
--trimming_fastp run fastp for trimming (default: false)
--fastqc run fastqc on raw and aligned reads (default: false)
--multiqc_config path to the multiqc config file (default: config/multiqc_conf.yml)
Aligner specific options
--bbmap_options additional options for bbmap
--bowtie2_options additional options for bowtie2
--bwaaln_options additional options for bwaaln
--bwamem_options additional options for bwamem
--bwasw_options additional options for bwasw
--graphmap2_options additional options for graphmap2
--hisat2_options additional options for hisat2
--minimap2_options additional options for minimap2 (default: -a (to get sam output))
--minimap2_index_options additional options for minimap2 index
--ngmlr_options additional options for ngmlr
--novoalign_options additional options for novoalign
--novoalign_license license for novoalign. You can ask for one month free trial license at http://www.novocraft.com/products/novoalign/
--nucmer_options additional options for nucmer
--star_options additional options for star
--star_2pass set to true to run STAR in 2pass mode (default: false)
--read_length [Optional][used by STAR] length of the reads, if none provided it is automatically deduced
--subread_options additional options for subread
--sublong_options additional options for sublong
--tophat2_options additional options for tophat
You can simply remove the AliNe
directory from your computer, and remove the nextflow conda environment:
conda remove -n nextflow