Skip to content

Readers

sacha schutz edited this page Sep 3, 2019 · 8 revisions

Cutevariant imports variant data from an Abstract class called AbstractReader. You can inherit from it to support custom format which contains variants. VCF readers already have been done and are located in readers package.

Methods to override

3 methods must be overrided :

AbstractReader.get_fields()

Yield fields as dictionnaries with the following structure:

  • name (str): the field name
  • category (str): the table which the field belongs to. It can be (variants,annotations, samples) #TODO rename
  • description (str): The definition of the field
  • type (str): Field type in Python format ( str, int, float,bool)
  • constraint (str:optional): SQL constraints
{
     "name": "chr", 
     "category": "variants",    
     "description": "chromosom",    
     "type": "str",    
     "constraint": "NOT NULL",     
 }

AbstractReader.get_variants()

Yield variants as dictionnaries with the following structure:

  • chr (str): chromosom name
  • pos (str): chromosom name
  • ref (str): chromosom name
  • alt (str): chromosom name
  • field n (type): n fields more returns by get_fields with category variants
  • annotations (list):
    • transcript (str): Transcript name
    • gene (str): gene name
    • field n (type_): n fields more returns by get_fields with category annotations
  • samples (list):
    • name (str): name of sample
    • gt (int): Genotype of variant for the sample. ( 0: homozygous wild, 1: heterozygous, 2: homozygous muted, -1: unknown)
{
"chr": "11",
 "pos": 125010,
 "ref": "T",
 "alt": "A",
 "dp": 32,
 "annotations": [
     {"transcript": "NM_234234", "gene": "CFTR", "in_exon": true, "pathogen_score": 0.2},
     {"transcript": "NM_234235", "gene": "CFTR","in_exon": false, "pathogen_score": 0.5},
 ],
 "samples": [{"name": "sacha", "gt": 1, "af": 0.4}]
}

AbstractReader.get_samples()

Return a list of samples. If you have no sample, you can avoid to override this method.

Exemple

You can get inspired by the FakeReader

from .abstractreader import AbstractReader

class FakeReader(AbstractReader):
    def __init__(self):
        super().__init__(None)

    def get_variants(self):
        yield {
            "chr": "11",
            "pos": 125010,
            "ref": "T",
            "alt": "A",
            "annotations": [
                {"transcript": "NM_234234", "gene": "CFTR"},
                {"transcript": "NM_234235", "gene": "CFTR"},
            ],
            "samples": [{"name": "sacha", "gt": 1}],
        }

        yield {
            "chr": "12",
            "pos": 125010,
            "ref": "T",
            "alt": "A",
            "annotations": [
                {"transcript": "NM_234234", "gene": "CFTR"},
                {"transcript": "NM_234235", "gene": "CFTR"},
            ],
            "samples": [{"name": "sacha", "gt": 1}],
        }

        yield {
            "chr": "13",
            "pos": 125010,
            "ref": "T",
            "alt": "A",
            "annotations": [
                {"transcript": "NM_234234", "gene": "CFTR"},
                {"transcript": "NM_234235", "gene": "CFTR"},
            ],
            "samples": [{"name": "sacha", "gt": 1}],
        }

    def get_fields(self):
        yield {
            "name": "chr",
            "category": "variants",
            "description": "chromosom",
            "type": "str",
            "constraint": "NOT NULL",
        }
        yield {
            "name": "pos",
            "category": "variants",
            "description": "position",
            "type": "int",
            "constraint": "NOT NULL",
        }

        yield {
            "name": "ref",
            "category": "variants",
            "description": "reference base",
            "type": "str",
            "constraint": "NOT NULL",
        }
        yield {
            "name": "alt",
            "category": "variants",
            "description": "alternative base",
            "type": "str",
            "constraint": "NOT NULL",
        }

        yield {
            "name": "gt",
            "category": "samples",
            "description": "genotype",
            "type": "int",
        }

        yield {
            "name": "af",
            "category": "samples",
            "description": "allele frequency",
            "type": "float",
        }

        yield {
            "name": "gene",
            "category": "annotations",
            "description": "gene name",
            "type": "str",
        }

        yield {
            "name": "transcript",
            "category": "annotations",
            "description": "gene transcripts",
            "type": "str",
        }

    def get_samples(self):
        return ["sacha"]

Usage

AbstractReader take a device as input.

reader = FakeReader(open("yourfile","r))
for variant in reader.get_variants():
     print(variant)