Design of subgraph_sum and subtree_sum leads to very suboptimal performance #145

ilumsden · 2024-07-28T04:16:54Z

Rule number 1 of any dataframe library is "don't do operations by iterating over rows." However, this is exactly what we do in subgraph_sum and subtree_sum. We need to refactor this to use a better mechanism (e.g., DataFrame.apply).

To get a sense of the performance impact, I can anecdotally say that subgraph_sum is 3-4x slower than the query language. And the query language is solving a version of subgraph isomorphism, an NP Hard problem.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design of subgraph_sum and subtree_sum leads to very suboptimal performance #145

Design of subgraph_sum and subtree_sum leads to very suboptimal performance #145

ilumsden commented Jul 28, 2024

Design of subgraph_sum and subtree_sum leads to very suboptimal performance #145

Design of subgraph_sum and subtree_sum leads to very suboptimal performance #145

Comments

ilumsden commented Jul 28, 2024