Skip to content

Usage Documentation

Quarkins edited this page Aug 25, 2016 · 21 revisions

The files used in this example are find in the Examples folder of the repository.

Producing a SuperTranscript

mkdir Test #Make Output directory
python Ribbon.py Example/Example_genome.fasta Example/clusters.txt -a -o Test

Where Genome.fasta if a fasta file which contains all the transcripts for all the genes/clusters you wish to construct a SueprTranscript for. Clusters.txt is a text file containing two tab separated columns containing the transcript/contig name in the first column and the cluster/gene name in the second column (as is the output of Corset).

In this example, there are two mock genes (A and B) which each are expressed by two different transcripts.

This runs in parallel mode (each gene can be run as a separate stand alone thread).

To get the help options simply type:
python Ribbon.py --help usage: Ribbon.py [-h] [--cores CORES] [--alternate] [--clear] GenomeFile ClusterFile

positional arguments:  
  GenomeFile        The name of the fasta file containing all transcripts  
  ClusterFile       The name of the text file with the transcript to cluster
                mapping  

optional arguments:  
  -h, --help        show this help message and exit  
  --cores CORES     The number of cores you wish to run the job on (default =  
                4)  
  --alternate, -a  Create alternate annotations and create metrics on success  
                of SuperTranscript Building  
  --maxTran MAXTRAN  Set a maximum for the number of transcripts from a
                 cluster to be included for building the SuperTranscript
                 (default=50). 
  --outputDir OUTPUTDIR, -o OUTPUTDIR
                    Output Directory

Note: By default all the fasta files and psl files required for the BLAT pair-wise allignment will be produced in the folder where your run in if not specified otherwise.

The outputs of this script are:

  • A .fasta file containing all transcripts for each gene.
  • A .psl containing the pairwise alignment of all transcripts by blat per gene.
  • SuperDuper.fasta containing the SuperTranscript sequence per gene.
  • SuperDuper.gff The annotation for each SuperTranscript obtained from the overlap graph.
  • SuperDuperTrans.gff The annotation of the transcripts on the SuperTranscript [Optional - if --alternate flag invoked]
  • LogOut.pdf A pdf documenting various metrics for assessing the quality of the SuperTranscript construction. [Optional]

Extracting the annotation of transcripts against the SuperTranscript

If one did not originally create the alternate annotation by calling flag --alternate in the previous step, one can easily create this afterwards. Simply

Move into output directory: mv Test

Make the alternate annotation (if not called as flag in original ribbon running):

python ../Checker.py SuperDuper.fasta

usage: Checker.py [-h] [--cores CORES] SuperFile  

positional arguments:  
  SuperFile      The name of the SuperDuper.fasta file created by  
             SuperTranscript  

optional arguments:  
  -h, --help     show this help message and exit   
  --cores CORES  The number of cores you wish to run the job on (default = 1)  

Outputs:

  • SuperDuperTrans.gff The annotation of the transcripts on the SuperTranscript [Optional - if --alternate flag invoked]
  • LogOut.pdf A pdf documenting various metrics for assessing the quality of the SuperTranscript construction.

IGV viewer

To start IGV from the command line, simply type: igv This will load igv (if you have it installed), then one simply has to load the SuperDuper.fasta file which contains the sequence for each gene. The sorted .bam files which contains the reads mapped to the SuperDuper.fasta and the annotation files, SuperDuper.gff and SuperDuper_trans.gff (remembering to expand them using a right click on the annotation object in igv and choosing expanded view mode).

Viewing transcript coverage on SuperTranscript

Another function which the Ribbon package includes is to view for a given gene the coverage of each transcript on the SuperTranscript. To run this script make sure to be in the same directory as SuperDuper.fasta.

python ../STViewer.py GeneA

usage: STViewer.py [-h] GeneName  

positional arguments:  
  GeneName    The name of the gene whom you wish to view  

optional arguments:  
  -h, --help  show this help message and exit  

Outputs:

  • Visualise.pdf - A pdf displaying the transcript coverage to the SuperTranscript.