"You must feel the Force around you." Yoda
- Convert comma separated files into tab-separated files
Convert delimiters to TAB
- FASTA files with unique sequences
FASTA-to-Tabular
→Unique occurrences of each record
(advanced parameters) →Tabular-to-FASTA
- Remove sequences with
N
or any other character
FASTA-to-Tabular
→Filter data on any column using simple expressions
with
(condition:c2.find('N') != -1
) →Tabular-to-FASTA
- Extracting the 3rd column from a 5 column file
Cut columns from a table
withc3
- Reorder columns or column swap
Cut columns from a table
withc3,c2,c1
- Count how often one entry appears in column 1
Datamash
withGroup by fields
: 1 andOperation to perform
: count - Remove all lines that contain a character (comma in this case)
Text transformation with sed
withSED Programm
: /,/d - Group all rows where column 1, 4 and 5 are identical
Datamash
withGroup by fields
: 1,4,5 - Column-to-rows and rows-to-columns (transpose matrix)
Transpose rows/columns
- Make your files smaller, e.g. for testing; subsampling of files
Select random lines from a file
- Make your sequence files smaller, e.g. for testing; subsampling sequences
Sub-sample sequences files
- Merge two files together according to one column in every file
Join two files
- Add unique column
Add column to an existing dataset
withiterate
: Yes - Get rid of all rows where column 2 has values greater than 0
Filter data on any column using simple expressions
withc2<=0
- Get all rows where column 4 starts with hsa
Filter data on any column using simple expressions
withc4.startswith('hsa')
- Get rid of all rows where the sum of column 2 and 3 is greater than 10
Filter data on any column using simple expressions
withc2+c3<=10
- Get rid of all rows where the length of my text in column 2 is greater than 10
Filter data on any column using simple expressions
withlen(c2)<=10
- Create new rows for every comma separated value in column 3; Unfolding
Unfold columns from a table
withColumn 3
andComma
- Split the first four characters of a line into it's own column
Replace Text in entire line
withFind Pattern
: ^(.{4}) andReplace Pattern
: &\t - Add the basepairs "TA" to the end of each sequences
FASTA to Tabular
→Add column
withTA
→Merge Columns
→Cut columns
→Tabular to FASTA
- Add a quotation mark to every row
Compute an expression on every row
withchr(34)
(34 is the ASCII code for"
) - Count all columns with numbers that do not contain 0. Usefull if you want to calculate the mean but want to exclude all columns that are 0.
Compute an expression on every row
withbool(c1) + bool(c1) + bool(c3)
... - Calculate log2 (not log10) from a column (e.g. c1) adding a new column
Compute an expression on every row
withlog(c1,2)
- Map RNA-seq data
HISAT
orTopHat
- Map DNA-seq data
Bowtie
orBWA
- Map methylC-seq data
Bismark
- Downsample BAM/SAM files
BAM/SAM Mapping Stats
will give you the number of reads/read pair in your BAM file in case you don't know it already. Then you just divide the number of reads you want to downscale to with the number of reads you have and use this fraction as the probability inPicard – Downsample SAM/BAM
. - Get all genes that are covert by reads
htseq-count
with a gene annotation GTF file on your BAM file →Filter data on any column using simple expressions
withc2>0
- Extract sequences from intercal files, like gff, bed, gtf. Returning FASTA file →
Extract Genomic DNA using coordinates from assembled/unassembled genomes
- Find two genes located nearby
Description :: Tool Shed
- Galaxy 101 - a must read for all HTS Padawan: https://galaxyproject.org/learn/galaxy-ngs101/
- A lot of videos to learn using Galaxy while eating popcorn: https://vimeo.com/galaxyproject
All tools mentioned here are available from the Galaxy Tool Shed. Kindly ask your Galaxy Administrator to get access to them.