-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply cutoffs for domains #65
Comments
A bit of background before I give you the easy answer 😁 So there are two kind of thresholds in HMMER, reporting and inclusion. When you use bitscore cutoffs with the pipeline, you actually change the inclusion thresholds, not the reporting thresholds. That's why you sometimes get hits that are not "included", but all hits should have The reason why there are different thresholds for hits and domains is to account for multi-domain hits, where sometimes a hit passes the inclusion thresholds because the combination of domains do, however individual domains do not pass the threshold (this is reported in HMMER's output, sometimes you'll see "no domain above threshold" or something like that). Now in your example, to only iterate on included hits, and for each hit on included domains, you can use the
|
Thanks for the explanation, Martin! That makes sense. However, I've seem at least a case where the hit was reported, but it had a single domain that wasn't. If having multiple domains is what makes hit with unreported domains being reported, how can a hit with a single unreported domain be reported? Here's an example:
print("pfam\tprotein\tdomain_number\thit_included\tdomain_included")
with pyhmmer.easel.SequenceFile("test.faa", digital=True) as seq_file:
seqs = seq_file.read_block()
with pyhmmer.plan7.HMMFile("Pfam-A.hmm") as hmm_file:
for hits in pyhmmer.hmmsearch(hmm_file, seqs, bit_cutoffs="gathering"):
for hit in hits:
for i, domain in enumerate(hit.domains, 1):
print(
f"{hit.name.decode()}\t{hits.query_accession.decode()}\t"
f"{i}\t{hit.included}\t{domain.included}"
) Output:
As you can see, |
Also interested here! Maybe useful - the domain.score (26.71) for This behavior is consistent with using HMMER3 though, where similarly domain.score of 26.7 and the full sequence score is 27.9.
Results in test_hmmer3.txt:
|
This is more of a question than a bug.
My understanding is that when the user set a cutoff, PyHMMER will not report hits that would have
hit.included == False
. However, the same is not applied at the domain level, so there are cases where the hit is reported because one of the domains passes the cutoffs, but domains that do not satisfy the cutoffs are also reported.For example:
Generates this output:
To be honest, I don't really know what determines if a domain is included or not, given that the cutoffs are applied to the hit. I expected that when a cutoff is applied, domains were also filtered.
The text was updated successfully, but these errors were encountered: