Today increasing amount of data demands appropriate algorithm to discover the unknown. Mining of frequent itemsets over a data stream still witholds some challenges. As a remedy, frequent closed itemsets (FCIs) were studied, yet they rise their own issues, especially with sparse datasets. We introduce ciclad
, an efficient stream CI miner that leverages in-depth insights into the mathematics of FCI evolution into an intersection-based approach. It combines efficient storage and access techniques into a sliding-window method of a surprising simplicity. Experimental results indicate that ciclad~largely outperforms its main competitors by a large margin (up to 70 times faster) on sparse dataset whereas on dense ones its performance remains comparable.
Leaving data behind because they are not frequent is not reasonable. Not frequent may be rare and rare itemsets could be useful to take appropriate decisions. Furthermore, some algorithms compute using min_supp to optimize the production of results. Some may be using approximate computation. These techniques are producing near acceptable results in some case, but why approximate when you can get the real picture in all cases. Ciclad is by far the fastest algorithm for years. Not only fast but also great on RAM usage. Stay away from RAMovore algorithms. This makes ciclad a superb solution for embedded devices, FPGA or any device.
Keywords : frequent pattern mining, closed itemsets, data streams, sliding window, inverted lists
- Tomas Martin 2015-2020
- Guy Francoeur 2015-2020
- Petko Valtchev ©️
NOTA: /ciclad_nogen is the folder with the implementation used in the KDD'20 paper.
NOTA: /alternatives contains sub-folders with implementations of Moment, NewMoment, CloStream and CFI-Stream. All were implemented and tested for Windows.
NOTA: The original Moment(FP) implementation used in our KDD-20 paper is not publicly available, you should contact its authors.
Contributors :
- Mickael Wajnberg 2018
- see github contributors
Accepted paper at KDD 2020 :
Reference :
- A framework for incremental generation of closed itemsets
- Generating frequent itemsets incrementally: two novel approaches based on Galois lattice theory
ciclad ©️ 2015-2020