Extracting MFI (Maximum Frequent Itemset)
is one of the problem in computer science. It is famously known as Market Basket Analysis
. In this project I am using a Data Structure known as CLM (Completely Linked Matrix)
to find MFI more efficiently than ever requiring less time and space. The current project works for extracting 3-FI (FI with 3 itemset length)
and the MFI from them. Further work is being done to enhance the scope to finding FI of greater itemset length.
To be able to have this project work on your local machine, all you need is a C++ compiler such as Clang which can be downloaded through MinGW; a cross compilation toolchain. If you are having difficulties downloading MinGW, it can also be downloaded through MSYS2. After installing a compiler you can download a Code Editor like VSC or an IDE like Clion for viewing the code and editing it.
You can simply add this project to your system using Git Clone or just downloading this project as a zip file and then extracting it.
Following are the few arguments to be provided when the program is run:
Transactions
: The transactions using which MFI are foundMinSup
: The minimum support of the transactions for finding MFI
A simple example of a run having:
Transactions
: BCDE ABDE BCDE ABCDE DE CMinSup
: 2
Input:
clm-miner -transactions "BCDE ABDE BCDE ABCDE DE C" -MinSup 2
Output:
Graph:
Node : Weight
A : 2
B : 4
C : 4
D : 5
E : 5
From : To : Additional information : weight
A : B : D E : 1
A : D : E : 2
A : E : : 2
A : B : C D E : 1
A : C : D E : 1
B : C : D E : 3
B : D : E : 4
B : E : : 4
C : D : E : 3
C : E : : 3
D : E : : 5
CLM:
| A | A B C D E | B | A B C D E | C | A B C D E | D | A B C D E | E | A B C D E
A | 2 | 0 0 0 0 0 | 2 | 0 0 1 2 2 | 1 | 0 0 0 1 1 | 2 | 0 0 0 0 2 | 2 | 0 0 0 0 0
B | 0 | 0 0 0 0 0 | 4 | 0 0 0 0 0 | 3 | 0 0 0 3 3 | 4 | 0 0 0 0 4 | 4 | 0 0 0 0 0
C | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 4 | 0 0 0 0 0 | 3 | 0 0 0 0 3 | 3 | 0 0 0 0 0
D | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 5 | 0 0 0 0 0 | 5 | 0 0 0 0 0
E | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 5 | 0 0 0 0 0
FIs : (minSup : 2)
A AB ABD ABE AD ADE AE B BC BCD BCE BD BDE BE C CD CDE CE D DE E
MFI : ABD ABE ADE BCD BCE BDE CDE
- Ubaid Ur Rehman (Graph, CLM, CLM-Miner, UI)
- Abdur Rahman Goraya (Alpha Tester)
- Course Instructor: Dr. Zahid Halim
- Course Lab Instructor: Sir Usama Arshad
This project was based on the following paper:
On the Efficient Representation of Datasets as Graphs to Mine Maximal Frequent Itemsets
by
- Zahid Halim, Senior Member, IEEE
- Omer Ali
- Muhammad Ghufran Khan