Skip to content

CLM is a new data structure that uses matrices in which data from graph is stored and CLM-Miner is the algorithm that is used to extract MFI from the CLM.

License

Notifications You must be signed in to change notification settings

Xiron7077/Extracting-MFI-Using-CLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extracting MFI Using CLM

Extracting MFI (Maximum Frequent Itemset) is one of the problem in computer science. It is famously known as Market Basket Analysis. In this project I am using a Data Structure known as CLM (Completely Linked Matrix) to find MFI more efficiently than ever requiring less time and space. The current project works for extracting 3-FI (FI with 3 itemset length) and the MFI from them. Further work is being done to enhance the scope to finding FI of greater itemset length.

Getting Started

To be able to have this project work on your local machine, all you need is a C++ compiler such as Clang which can be downloaded through MinGW; a cross compilation toolchain. If you are having difficulties downloading MinGW, it can also be downloaded through MSYS2. After installing a compiler you can download a Code Editor like VSC or an IDE like Clion for viewing the code and editing it.

Installing

You can simply add this project to your system using Git Clone or just downloading this project as a zip file and then extracting it.

Running

Following are the few arguments to be provided when the program is run:

  • Transactions : The transactions using which MFI are found
  • MinSup : The minimum support of the transactions for finding MFI

Example Run

A simple example of a run having:

  • Transactions : BCDE ABDE BCDE ABCDE DE C
  • MinSup : 2

Input:

clm-miner -transactions "BCDE ABDE BCDE ABCDE DE C" -MinSup 2

Output:

Graph:

Node : Weight
A : 2
B : 4
C : 4
D : 5
E : 5

From : To : Additional information : weight
A : B : D E  : 1
A : D : E  : 2
A : E :  : 2
A : B : C D E  : 1
A : C : D E  : 1
B : C : D E  : 3
B : D : E  : 4
B : E :  : 4
C : D : E  : 3
C : E :  : 3
D : E :  : 5

CLM:

  | A | A B C D E | B | A B C D E | C | A B C D E | D | A B C D E | E | A B C D E
A | 2 | 0 0 0 0 0 | 2 | 0 0 1 2 2 | 1 | 0 0 0 1 1 | 2 | 0 0 0 0 2 | 2 | 0 0 0 0 0
B | 0 | 0 0 0 0 0 | 4 | 0 0 0 0 0 | 3 | 0 0 0 3 3 | 4 | 0 0 0 0 4 | 4 | 0 0 0 0 0
C | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 4 | 0 0 0 0 0 | 3 | 0 0 0 0 3 | 3 | 0 0 0 0 0
D | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 5 | 0 0 0 0 0 | 5 | 0 0 0 0 0
E | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 0 | 0 0 0 0 0 | 5 | 0 0 0 0 0


FIs : (minSup : 2)
A AB ABD ABE AD ADE AE B BC BCD BCE BD BDE BE C CD CDE CE D DE E

MFI : ABD ABE ADE BCD BCE BDE CDE

Built With

  • MinGW - The compiler used
  • CLion - The IDE used
  • Qt 6 - The GUI software

Contributors

- Ubaid Ur Rehman (Graph, CLM, CLM-Miner, UI)
- Abdur Rahman Goraya (Alpha Tester)
- Course Instructor: Dr. Zahid Halim 
- Course Lab Instructor: Sir Usama Arshad

Based On

This project was based on the following paper:

On the Efficient Representation of Datasets as Graphs to Mine Maximal Frequent Itemsets

by

  • Zahid Halim, Senior Member, IEEE
  • Omer Ali
  • Muhammad Ghufran Khan