Reference Sequence Library (RSL)

DIGS is a project-based framework in which investigations are centered around a genome feature of interest. Any genome feature can be investigated in principle, so long as it contains sufficient sequence conservation to be reliably detected in a similarity search.

DIGS requires a library of FASTA-formatted reference sequences for the gene or genetic element under investigation. Probes for screening are selected from the reference library.

The 'reference sequence library' is a curated set of sequences relevant to the genome feature under investigation. Usually, this will consist of:

A set of conserved DNA or polypeptide sequences derived from the genome feature of interest.

However, depending on the kind of investigation being performed, it may also contain:

Sequences that do not derive from the genome feature under investigation, but can provide useful information about the locus in which it occurs.
Sequences representing genome features that are not relevant to the investigation, but are sufficiently similar to them to generate 'false positive' matches.

The DIGS tool uses a simple rule to capture data from the headers of FASTA-formatted reference (and probe) sequences. headers should be structured so as to define two hierarchical name elements; ‘name’ and ‘gene_name’, separated by an underscore. In the example shown above these are a virus name and a gene name. Other two-level hierarchical naming schemes (e.g. species & gene name, gene-subdomain name) can also be used, providing the same scheme is used consistently throughout the project. Reference sequences should be stored in a file, the path to which will be specified in the DIGS control file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference Sequence Library (RSL)

DIGS Tool

Overview

Input Components

Process

Reference

Further Information

Source Code

Clone this wiki locally