File Exchange

image thumbnail

FP-Growth Association Rule Mining

version 1.0.0.0 (8.57 KB) by Yarpiz
A MATLAB implementation of FP-Growth for Assiciation Rule Mining in Transactional Datasets

4 Downloads

Updated 04 Sep 2015

View License

For more information, see following link:
http://yarpiz.com/98/ypml116-fp-growth

Comments and Ratings (9)

Hi
Whats the time it needs to enumerate rules from a 100 by 100 matrix?
thanks

If you wanted to test it plz just use these two transaction as input data set

1 4 9 12 13 16 19 23 27 30 31 35
1 4 7 10 13 17 19 23 26 29 31 35

Hi
I have the same issue like @Shokry, and I can not find your changes on the codes on line 26,

"After filling the Count vector at the beginning of FPGrowth.m (line 26), I added the following:
support = Count/length(T)
Items(support < MST) = [];
Count(support < MST) = [];"

Thanks for your help

Shokry

The algorithm is extremely heavy (takes some days and even does not converge) when handling relatively big datasets, specifically, I used it –in comparison to the “apriori” algorithm - to find the frequent item sets in the “Mushroom” standard database http://fimi.ua.ac.be/data/ (that includes 8124 transactions and 119 items):

When using high support values not less that 4000/8124, the algorithm works rapidly (matter of few minutes), and provides all the frequent items sets that the apriori finds.

However, when using lower support values (e.g. 3500/8124), the algorithm never converge; it continue running over days. And what makes me wondering is that the apriori still converges in few minutes for the same support values (e.g. 3500/8124), in opposite to what has been commonly stated in the literature “the FP-GT is more rapid and efficient than the apriori”, is there any help?

Note: using matlab 2015 running of a machine of core i5 2.3GHz, RAM 8GB

Josh

Thanks for the helpful implementation!

I found that the efficiency can be greatly improved (especially for higher numbers of items in each transaction) by removing items that don't meet the minimum support threshold before constructing the FP-tree. After filling the Count vector at the beginning of FPGrowth.m (line 26), I added the following:

support = Count/length(T)
Items(support < MST) = [];
Count(support < MST) = [];

Depending on the specific data you're applying this to, this can increase the speed by multiple orders of magnitude while discovering the same rules.

Josh

JennyH

The issue has been resolved~Thanks very much

JennyH

There are some mistakes when I used in my 2014b. T T (Undefined variable "data" or function "data.T".)

MATLAB Release Compatibility
Created with R2012b
Compatible with any release
Platform Compatibility
Windows macOS Linux