Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boomer: How to deal with huge cliques? #292

Open
matentzn opened this issue May 25, 2022 · 3 comments
Open

Boomer: How to deal with huge cliques? #292

matentzn opened this issue May 25, 2022 · 3 comments

Comments

@matentzn
Copy link

This issue is just so I can link to some discussion while I make other tickets. The question is what boomer should do when faced with an enormous clique:

  1. Ignore it and spit the clique out for people to break it
  2. Try to break by applying some trivial heuristics (fast ones, like bulk dropping low probability axioms)
  3. Anything else come to mind?

I would like boomer to at least try 2, but its hard to do this in a principled manner. Maybe you have a better idea @balhoff?

@cmungall
Copy link
Contributor

Two steps:

1 additional local probability refinement (optional, but will help avoid throwing out things accidentally in next step)
2 for each clique, gradually raise a threshold and throw out any axioms from the clique with Pr < threshold until clique is broken. recursively apply to each sub clique until below desired size.

I think there are some other tickets for step 1. I will add more detail later. A very simple naive approach is to simply look for parallel structures

E.g. given:

  • A' eq A
  • B' eq B.

then the following reinforce one another:

  1. A r B
  2. A' r B'

Boost the weight of both by a constant factor if both present. If only one is present then decrease the weight of the other

This can be done in a more formal way with a probabilistic open world approach with priors for missing information but as a heuristic this can help modify local probs to help bust cliques than are then tractable with global calculations

@matentzn
Copy link
Author

Does this make sense to you @balhoff? I think it does to me!

@cmungall
Copy link
Contributor

I'm also playing with some code in python that uses networkx to break cliques.

The algorithm is:

PartitionCliques(G):
  for clique in cliques(G):
    if size(clique) > N:
        SG = subgraph(clique, G)
        E = sort(SG.edge, key=confidence)
        while |E|:
            e = pop(E)
            SG.remove(e)
            subcliques = cliques(SG)
            if len(subcliques) > 1:
                  PartitionCliques(subgraph(c), SG) for c in subcliques
                  break

there are possibly more efficient ways. The way this is written it works for bidi and unidi graphs. For boomer we would use diagraphs and asserted edges both directions for equivalence. The existing boomer clique extract should just swap in here.

In theory this could quite brutally remove a lot of informative edges, but this is worth it not to have boomer run forever. So long as broken cliques are reported a curator can handle accordingly. We could also just reduce the overall confidence in any broken cliques. Even a crude measure like reducing confidence by 50% in any clique that was broken from a larger one should be sufficient

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants