Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FGS+ doesn't use the reverse complement in negative strand ORF's #19

Open
nielsdg opened this issue Mar 15, 2017 · 1 comment
Open

FGS+ doesn't use the reverse complement in negative strand ORF's #19

nielsdg opened this issue Mar 15, 2017 · 1 comment

Comments

@nielsdg
Copy link
Contributor

nielsdg commented Mar 15, 2017

FragGeneScanPlus makes an error when translating protein fragments in negative frames, as it doesn't transform the sequence to its inverse complement before translating.

Attached to this issue is a small FASTA file, which is wrongly translated to:

ELNLNILSFNTNWVRTVSTPGSTFLTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPH

While the solution should be:

MRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL

Full test example:

Translating the following sequence with length 551:

TGTTCTGCTTCCTTGTACATGTGAGGACTAGAGTTGAATTTGAATATCCTGTCCTTCAACACGAACTGGGTACGTACATAGGTTTCTACACCCGGGTCCACTTTTTAGCTCACCTGTCTGAATGTCAAAAATTGCTTCGTGCAATGGACATTCCACAGTTTGATCTTCGATAAATCCTTCCGTTAATAAGGCATACGCATGAGGGCAAACATTTTCGATAGCGAAGTAATTTTCATCGACAAAAAAAACACCAATTTTTTTCCCTTCAACTTCGACGGCTTTCGGCTCATCTTCGCTAACGTCACCCTGCTGACAAACTGAGATCCAACTCATACCTTGCGTCCTCATTTTGTTTTATATACAAAACATAATTTGATTTTCAAAACACAAGCTAAGCATAATCCTCTTGATTAATTTTTGTCAAAGTAAAAATAAACATTAAAATCAATTGATTAATAAATTTTAAATAATTTGTTACGTTTCAAGTCAGAAACAATGTTTTAAATATAAAAATTGTTTTATGTAATCTTTATAATTACAATAGTTCTAAA


Performing 6-frame translation:
+1: CSASLYM*GLELNLNILSFNTNWVRT*VSTPGSTF*LTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPHFVLYTKHNLIFKTQAKHNPLD*FLSK*K*TLKSID**ILNNLLRFKSETMF*I*KLFYVIFIITIVL
+2: VLLPCTCED*S*I*ISCPSTRTGYVHRFLHPGPLFSSPV*MSKIASCNGHSTV*SSINPSVNKAYA*GQTFSIAK*FSSTKKTPIFFPSTSTAFGSSSLTSPC*QTEIQLIPCVLILFYIQNII*FSKHKLSIILLINFCQSKNKH*NQLINKF*IICYVSSQKQCFKYKNCFM*SL*LQ*F*
+3: FCFLVHVRTRVEFEYPVLQHELGTYIGFYTRVHFLAHLSECQKLLRAMDIPQFDLR*ILPLIRHTHEGKHFR*RSNFHRQKKHQFFSLQLRRLSAHLR*RHPADKLRSNSYLASSFCFIYKT*FDFQNTS*A*SS*LIFVKVKINIKIN*LINFK*FVTFQVRNNVLNIKIVLCNLYNYNSSK
-1: FRTIVIIKIT*NNFYI*NIVSDLKRNKLFKIY*SIDFNVYFYFDKN*SRGLCLACVLKIKLCFVYKTK*GRKV*VGSQFVSRVTLAKMSRKPSKLKGKKLVFFLSMKITSLSKMFALMRMPY*RKDLSKIKLWNVHCTKQFLTFRQVS*KVDPGVETYVRTQFVLKDRIFKFNSSPHMYKEAE
-2: LELL*L*RLHKTIFIFKTLFLT*NVTNYLKFINQLILMFIFTLTKINQEDYA*LVF*KSNYVLYIKQNEDARYELDLSLSAG*R*RR*AESRRS*REKNWCFFCR*KLLRYRKCLPSCVCLINGRIYRRSNCGMSIARSNF*HSDR*AKKWTRV*KPMYVPSSC*RTGYSNSTLVLTCTRKQN
-3: *NYCNYKDYIKQFLYLKHCF*LET*QII*NLLIN*F*CLFLL*QKLIKRIMLSLCFENQIMFCI*NKMRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL*SSHVQGSRT

Solution FragGeneScanPlus:
ELNLNILSFNTNWVRT*VSTPGSTF*LTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPH

Correct solution (using reverse complement of ORF):
MRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL

Generating code (don't forget test_fasta.txt):

#!/usr/bin/env python3

from Bio import SeqIO

seq = SeqIO.read('test_fasta.txt', 'fasta').seq
print('Translating the following sequence with length {}:'.format(len(seq)))
print('\n{}\n\n'.format(seq))

print('Performing 6-frame translation:\n')
for s, strand in ((seq, 1), (seq.reverse_complement(), -1)):
    for frame in range(3):
        print('{:+2d}: {}'.format((frame + 1) * strand, s[frame:].translate(table=11)))
print()


# Known ORF on the negative strand (visible in the 6-frame translation on -3)
orf_start = 28
orf_stop = 348
orf = seq[(orf_start + 2):orf_stop]

print('Solution FragGeneScanPlus:\n{}\n'.format(orf.translate(table=11)))

orf = orf.reverse_complement()
print('Correct solution (using reverse complement of ORF):\n{}\n'.format(orf.translate(table=11)))
@nielsdg
Copy link
Contributor Author

nielsdg commented Mar 15, 2017

I found the cuplrit, which I think would be solved with PR #15, which I sent in some time ago. More specifically, it's thanks to the if-else checking the strand in print_outputs()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant