Boomer: How to deal with huge cliques? #292

matentzn · 2022-05-25T11:09:14Z

This issue is just so I can link to some discussion while I make other tickets. The question is what boomer should do when faced with an enormous clique:

Ignore it and spit the clique out for people to break it
Try to break by applying some trivial heuristics (fast ones, like bulk dropping low probability axioms)
Anything else come to mind?

I would like boomer to at least try 2, but its hard to do this in a principled manner. Maybe you have a better idea @balhoff?

cmungall · 2022-06-29T23:27:05Z

Two steps:

1 additional local probability refinement (optional, but will help avoid throwing out things accidentally in next step)
2 for each clique, gradually raise a threshold and throw out any axioms from the clique with Pr < threshold until clique is broken. recursively apply to each sub clique until below desired size.

I think there are some other tickets for step 1. I will add more detail later. A very simple naive approach is to simply look for parallel structures

E.g. given:

A' eq A
B' eq B.

then the following reinforce one another:

A r B
A' r B'

Boost the weight of both by a constant factor if both present. If only one is present then decrease the weight of the other

This can be done in a more formal way with a probabilistic open world approach with priors for missing information but as a heuristic this can help modify local probs to help bust cliques than are then tractable with global calculations

matentzn · 2022-06-30T12:27:37Z

Does this make sense to you @balhoff? I think it does to me!

cmungall · 2023-06-12T19:20:21Z

I'm also playing with some code in python that uses networkx to break cliques.

The algorithm is:

PartitionCliques(G):
  for clique in cliques(G):
    if size(clique) > N:
        SG = subgraph(clique, G)
        E = sort(SG.edge, key=confidence)
        while |E|:
            e = pop(E)
            SG.remove(e)
            subcliques = cliques(SG)
            if len(subcliques) > 1:
                  PartitionCliques(subgraph(c), SG) for c in subcliques
                  break

there are possibly more efficient ways. The way this is written it works for bidi and unidi graphs. For boomer we would use diagraphs and asserted edges both directions for equivalence. The existing boomer clique extract should just swap in here.

In theory this could quite brutally remove a lot of informative edges, but this is worth it not to have boomer run forever. So long as broken cliques are reported a curator can handle accordingly. We could also just reduce the overall confidence in any broken cliques. Even a crude measure like reducing confidence by 50% in any clique that was broken from a larger one should be sufficient

matentzn mentioned this issue May 25, 2022

Experiment with Mondo+OMIM+DO+ORDO monarch-initiative/monarch-mapping-commons#8

Open

cmungall mentioned this issue Jun 12, 2023

Add additional diagnostics to figure out points where boomer doesn't complete #361

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boomer: How to deal with huge cliques? #292

Boomer: How to deal with huge cliques? #292

matentzn commented May 25, 2022

cmungall commented Jun 29, 2022

matentzn commented Jun 30, 2022

cmungall commented Jun 12, 2023

Boomer: How to deal with huge cliques? #292

Boomer: How to deal with huge cliques? #292

Comments

matentzn commented May 25, 2022

cmungall commented Jun 29, 2022

matentzn commented Jun 30, 2022

cmungall commented Jun 12, 2023