Biclique detection is a well-known problem in graph theory and data mining, with numerous real-world applications across different domain. Take for example text networks depicting the relationship between words and documents. Bicliques reveal documents that share common words, which can later be utilized for automated topic classification and tagging.
Given a bipartite graph and its corresponding partition into two disjoint sets of vertices, a biclique is a complete bipartite subgraph such that every vertex of the first partition is connected to every vertex of the second partition. Mathematically, the notion of biclique is defined as follows. Let G = (U ∪ V, E) be a bipartite graph, where U and V are two disjoint sets of vertices, and E is an edge set such that ∀(i,j) ∈ E, i ∈ U, j ∈ V . A biclique within G is a couple (set pair) (I,J) such that I ⊆ U, J ⊆ V and ∀i ∈ I, j ∈ J, (i,j) ∈ E.
The computational complexity of finding the maximum biclique depends on the exact objective function used. In this benchmark, we focus on the maximum edge biclique, i.e., |I| × |J|. Finding a maximum edge biclique was proved to be NP-complete [3].
dataset info | nodes | edges | density | maximum biclique |
---|---|---|---|---|
Snap database | ||||
com-LiveJournal | 2,325,297 x 3,996,402 | 34,681,189 | 0.00037% | 33,984 = 192 x 177 |
com-Youtube | 374,785 x 1,134,140 | 2,987,624 | 0.00070% | 8,008 = 2 x 4,004 |
wiki-topcats | 1,791,489 x 1,791,489 | 28,511,807 | 0.00089% | 126,824 = 31,706 x 4 |
com-Amazon | 265,933 x 264,147 | 925,872 | 0.0013% | 188 = 94 x 2 |
com-Orkut | 2,723,360 x 3,072,440 | 117,185,083 | 0.0014% | 16,726 = 2 x 8,363 |
com-DBLP | 189,114 x 304,374 | 1,049,866 | 0.0018% | 3,249 = 57 x 57 |
email-Eu-core | 868 x 991 | 25,571 | 2.97% | 390 = 5 x 78 |
Konect database | ||||
movieLens 10M | 69,878 x 10,677 | 10,000,054 | 1.34% | 67,736 = 8,467 x 8 |
movieLens 1M | 6,040 x 3,706 | 1,000,209 | 4.47% | 7,002 = 778 x 9 |
movieLens 100K | 943 x 1,682 | 100,000 | 6.30% | 1,267 = 181 x 7 |
dataset info | nodes | edges | density | maximum biclique |
---|---|---|---|---|
random.m_10000_n_1000_p_0.02 | 10,000 x 1,000 | 200,000 | 2% | 32 = 16 x 2 |
random.m_10000_n_5000_p_0.02 | 10,000 x 5,000 | 1,000,000 | 2% | 36 = 18 x 2 |
random.m_10000_n_10000_p_0.02 | 10,000 x 10,000 | 2,000,000 | 2% | 38 = 2 x 19 |
random.m_10000_n_1000_p_0.045_cv_0.3 | 9,992 x 1,000 | 450,042 | 4.5% | 96 = 48 x 2 |
random.m_10000_n_1000_p_0.045_cv_0.8 | 8,928 x 1,000 | 468,303 | 5.25% | 128 = 64 x 2 |
random.m_10000_n_1000_p_0.045_cv_1.3 | 7,781 x 1,000 | 522,023 | 6.7% | 170 = 85 x 2 |
random.m_300_n_300_p_0.1 | 300 x 300 | 9,000 | 10% | 26 = 13 x 2 |
random.m_300_n_300_p_0.2 | 300 x 300 | 18,000 | 20% | 58 = 29 x 2 |
random.m_300_n_300_p_0.3 | 300 x 300 | 27,000 | 30% | 110 = 2 x 55 |
random.m_10000_n_100_p_0.03 | 9,513 x 100 | 30,000 | 3.15% | 44 = 22 x 2 |
random.m_10000_n_500_p_0.03 | 10,000 x 500 | 150,000 | 3% | 50 = 25 x 2 |
random.m_10000_n_1000_p_0.03 | 10,000 x 1,000 | 300,000 | 3% | 52 = 26 x 2 |
random.m_5000_n_500_p_0.03 | 5,000 x 500 | 75,000 | 3% | 32 = 16 x 2 |
random.m_25000_n_500_p_0.03 | 25,000 x 500 | 375,000 | 3% | 90 = 45 x 2 |
random.m_50000_n_500_p_0.03 | 50,000 x 500 | 750,000 | 3% | 150 = 75 x 2 |
random.m_100_n_100_r_0.2_p_0.2 | 100 x 100 | 2,381 | 23.81% | 400 = 20 x 20 |
random.m_100_n_100_r_0.2_p_0.5 | 100 x 100 | 5,126 | 51.26% | 400 = 20 x 20 |
random.m_100_n_100_r_0.2_p_0.8 | 100 x 100 | 8,092 | 80.92% | 456 = 24 x 19 |
random.m_100_n_100_r_0.5_p_0.2 | 100 x 100 | 4,008 | 40.08% | 2,500 = 50 x 50 |
random.m_100_n_100_r_0.5_p_0.5 | 2,500 | 100 x 100 | 6,208 | 62.08% |
random.m_100_n_100_r_0.5_p_0.8 | 2,500 | 100 x 100 | 8,456 | 84.56% |
random.m_100_n_100_r_0.8_p_0.2 | 6,400 | 100 x 100 | 7,127 | 71.27% |
random.m_100_n_100_r_0.8_p_0.5 | 6,400 | 100 x 100 | 8,236 | 82.36% |
random.m_100_n_100_r_0.8_p_0.8 | 6,400 | 100 x 100 | 9,277 | 92.77% |
When publishing results obtained using this benchmark, the original dataset authors should be cited. Please use the following BibTeX citation for citing this benchmark:
@misc{MaximumBicliqueBenchmark,
author = {Eran Shaham},
title = {maximum biclique benchmark},
howpublished = {\url{https://github.com/shahamer/maximum-biclique-benchmark}},
month = dec,
year = 2019
}
- Eran Shaham, Honghai Yu, and Xiao-Li Li. On finding the maximum edge biclique in a bipartite graph: a subspace clustering approach. In SIAM International Conference on Data Mining, 2016, pp. 315–323.
- Yun Zhang, Charles A Phillips, Gary L. Rogers, Erich J. Baker, Elissa J. Chesler, and Michael A. Langston. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC Bioinformatics, 15, 2014, pp. 110–127.
- Peeters, Rene. The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics, 131(3), 2003, pp. 651–654.
Please email any feedback or suggestions