Skip to content

4. Running the pipeline

Breon Schmidt edited this page Jul 5, 2018 · 16 revisions

The Operations

The pipeline contains a grand total of four operations.

  1. A Python program which generates the superTranscriptome and various annotation files.
  2. An alignment of the FASTQ files to the superTranscriptome (used to generate the fusion finders results).
  3. A bash script that prepares files for the plotting stage.
  4. A R script that plots the figure.

How best to use this tool

There are two methods that can be used to visualise the results of this tool: through a genome browser such as IGV or through the inbuilt plotting function.

Why not both?

My advice would be to perform the first two operations described above and then investigate the Fusion superTranscriptome with IGV. This allows you to interact with the reference and explore the subtleties of individual fusions, you may even end your investigation here as IGV allows you to view sashimi plots and save the results.

Once you have discovered fusions that you would like to get individual figures of (perhaps for a paper), then simply complete steps 3-4 for each fusion to create a PDF. The R plotting function does some extra filtering, colouring and highlighting to give the best (I think) representation of the results.

Instructions on how to do this can be found below under Execution.

Viewing Results

IGV

  1. Load the fst_reference.fasta as a genome.

    1. In the IGV menu, click Genomes > Load Genome from File...
    2. Navigate to the output folder that you originally specified in the python/bpipe file.
    3. Open reference/fst_reference.fst

    You should now get a list of fused superTranscripts via the dropdowns in the IGV toolbar.

  2. Load the alignment and annotation data.

    1. In the IGV menu, click File/Load from File...
    2. Navigate to the output folder that you originally specified in the python/bpipe file.
    3. Open alignment/sample_name/Aligned.sortedByCoord.out.bam
    4. Repeat the above steps for each of
      • annotation/transcripts.gtf
      • annotation/protein_domains.bed
      • annotation/gene_boundaries.bed (after load, right click on this track - select expanded from the menu)
  3. Viewing fusions.

    1. Choose a fusion from the dropdown, i.e. BCR:ABL1
    2. A few different ways to view the data:
      • Right click on the Aligned.sortedByCoord.out.bam track, click "Sashimi Plot" from the menu, select either annotation track (you might want to view both!). Once the plot has generated, click on the transcripts below (if jaffa_annotation.gtf) and you can filter the splice junctions. Neat!
      • Right click on the Aligned.sortedByCoord.out.bam track, click "Show Splice Junction Track". This will create a new track that shows you the splice/fusion junctions, but can slow down IGV quite considerably.
    _Note: the image above involved some Photoshop magic to differentiate the curve indicating the fusion._
    1. Save an image of your fusion.
      • You can save an image anytime from IGV by going to file > Save Image
      • You can also right-click the sashimi plot and select the "Save Image..." option (perhaps after filtering the junctions)

R Plot

If you have executed steps 3 and 4 from The Operations, via bpipe or the manual method, you will find your requested fusion PDF here path/to/output/plot/sample_name/fusion_name.pdf.

Execution

Bpipe

To run through the pipeline automatically, you must have Bpipe installed.

  1. The workflow/clinker.pipe script has been developed to accept all parameters that Clinker requires through the command line, the only time you will need to edit this file is if you have different naming conventions for your fastq files (see note below).

  2. To run Clinker, simply change the paramaters in the following snippet and enter/paste it into the command line:
    bpipe -p out="/path/to/clinker/output" -p caller="/path/to/jaffa_results.csv" -p col="3,4,5,6" -p del="c" -p genome="19" -p print="true" -p fusions="BCR:ABL1" -p pdf_width="9" -p pdf_height="16" -p sizing="1,3,1,2,4,2" -p support=2 -p competitive="false" /path/to/clinker/clinker.pipe /path/to/*.fastq.gz

    Note you should be able to pass through a space delimited list of fastq files, please make sure that the naming convention mimics the following, otherwise you will need to change this in the clinker.pipe file: samplename_R1.fastq.gz samplename_R2.fastq.gz

    Note You might want to consider running this in the background (e.g. using screen).

    Note If you're interested in multiple fusions, simply enter a comma delimited list into the fusions parameter, i.e. "BCR:ABL1,ETV6:RUNX1"

    For further information on the parameters used in the snippet above, simply looks under the hood

  3. Let it finish!

  4. View your results in IGV or through the printed PDF (depending on whether find_fusion = true)

Clone this wiki locally