Skip to content

Commit

Permalink
Merge branch 'update_doc' of github.com:NBISweden/AGAT into update_doc
Browse files Browse the repository at this point in the history
  • Loading branch information
Juke34 committed Feb 23, 2024
2 parents 8603626 + c829dbd commit a65ac9a
Showing 1 changed file with 25 additions and 3 deletions.
28 changes: 25 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -313,25 +313,47 @@ See the AGAT parser section for more information about it.

#### with \_sq\_ prefix => Means SEQUENTIAL

The gff file is read and processed from its top to the end line by line without sanity check. This is memory efficient.
The gff file is read and processed from its top to the end line by line without sanity check (e.g. relationship between the features). This is memory efficient.

## The AGAT parser - Standardisation to create GXF files compliant to any tool

All tools with `agat_sp_` prefix will parse and slurps the entire data into a specific data structure called.
All tools with `agat_sp_` prefix will parse and slurps the entire data into a specific data structure.
Below you will find more information about peculiarity of the data structure,
and the parsing approach used.

#### the data structure

The method create a hash structure containing all the data in memory. We can call it OMNISCIENT. The OMNISCIENT structure is a three levels structure:
<details>
<summary>See data structure details</summary>

The method create a hash structure containing all the data in memory. We can call it OMNISCIENT.
The OMNISCIENT hold the GFF/GTF header information in that structure:
```
$omniscient{other}{header} = header information from the beginning of the file starting by #
```
The OMNISCIENT hold the GFF/GTF feature information in that structure:
```
$omniscient{level1}{tag_l1}{level1_id} = feature <= tag could be gene, match
$omniscient{level2}{tag_l2}{idY} = @featureListL2 <= tag could be mRNA,rRNA,tRNA,etc. idY is a level1_id (know as Parent attribute within the level2 feature). The @featureListL2 is a list to be able to manage isoform cases.
$omniscient{level3}{tag_l3}{idZ} = @featureListL3 <= tag could be exon,cds,utr3,utr5,etc. idZ is the ID of a level2 feature (know as Parent attribute within the level3 feature). The @featureListL3 is a list to be able to put all the feature of a same tag together.
```
The OMNISCIENT hold the `agat_config.yml` information in that structure:
```
$omniscient{config}{parameter1} = value parameter1
$omniscient{config}{parameter2} = value parameter2
```
The OMNISCIENT hold the `feature_levels.yaml` information in that structure:
```
$omniscient{other}{level}{level1}{featureTypeX} = value featureTypeX (standalone, topfeature)
$omniscient{other}{level}{level2}{featureTypeY} = value featureTypeY
$omniscient{other}{level}{level2}{featureTypeZ} = value featureTypeZ
```
</details>

#### How does the AGAT parser work

[<img align="right" src="docs/img/agat_parsing_overview.jpg" width="400" height="250" />](https://nbis.se)

The AGAT parser phylosophy:
* 1) Parse by Parent/child relationship or gene_id/transcript_id relationship.
* 2) ELSE Parse by a common tag (an attribute value shared by feature that must be grouped together. By default we are using locus_tag but can be set by parameter).
Expand Down

0 comments on commit a65ac9a

Please sign in to comment.