Wednesday, March 20, 2013

EMFP Alg, early stop implemented

My algorithm now stops early if FP Growth finds the same set of in two sequential runs.

This likely means that the attributes do not form statistically significant clusters and that there are likely no more attribute sets that will be stat. sig. because FP Growth finds the most common column sets first.

It is currently running with a cutoff of a 95% Binomial CI, I can lower the CI percentage as Dr. Menzies suggested.

Below is the latest printout demonstrating the early exit.


 time ./trial.sh main synthDb_M100_N500_m5_n10_p0.050000_num1-1.arff

***Initial Values***
confidence levels:      0.99 0.95 0.90 0.85 0.80
min perc levels (rows): 0.05 0.04 0.03 0.02 0.01
min cols:               2
FPGrowth Step level:    .01
FPGrowth Upper limit:   0.5
---------------------------------------
number of times in loop=  1
Confidence Perc =  0.99
Min Rules Perc= 0.05
dvAvg dims=500
 dbAvg rows=100
 running FPGrowth
500,Number of Cols from FPGrowth=  9
in idCentroids file  3 cluster1 cluster2 cluster3 numArffs=  0
End Clause
#Cols
#Cols
#Cols
#Cols
#Cols
#Cols
numArffs=  1
---------------------------------------
number of times in loop=  2
Confidence Perc =  0.99
Min Rules Perc= 0.05
dvAvg dims=500
 dbAvg rows=100
 running FPGrowth
500,Number of Cols from FPGrowth=  9
NumDiff  6
in idCentroids file  3 cluster1 cluster3 cluster4 numArffs=  1
End Clause
#Cols
#Cols
#Cols
#Cols
#Cols
#Cols
numArffs=  2
---------------------------------------
number of times in loop=  3
Confidence Perc =  0.99
Min Rules Perc= 0.05
dvAvg dims=500
 dbAvg rows=100
 running FPGrowth
500,Number of Cols from FPGrowth=  9
NumDiff  0
FP Same Cols-break

real 0m19.057s
user 0m15.725s
sys 0m1.425s

No comments:

Post a Comment