Wednesday, March 20, 2013

EMFP Alg, early stop implemented

My algorithm now stops early if FP Growth finds the same set of in two sequential runs.

This likely means that the attributes do not form statistically significant clusters and that there are likely no more attribute sets that will be stat. sig. because FP Growth finds the most common column sets first.

It is currently running with a cutoff of a 95% Binomial CI, I can lower the CI percentage as Dr. Menzies suggested.

Below is the latest printout demonstrating the early exit.


 time ./trial.sh main synthDb_M100_N500_m5_n10_p0.050000_num1-1.arff

***Initial Values***
confidence levels:      0.99 0.95 0.90 0.85 0.80
min perc levels (rows): 0.05 0.04 0.03 0.02 0.01
min cols:               2
FPGrowth Step level:    .01
FPGrowth Upper limit:   0.5
---------------------------------------
number of times in loop=  1
Confidence Perc =  0.99
Min Rules Perc= 0.05
dvAvg dims=500
 dbAvg rows=100
 running FPGrowth
500,Number of Cols from FPGrowth=  9
in idCentroids file  3 cluster1 cluster2 cluster3 numArffs=  0
End Clause
#Cols
#Cols
#Cols
#Cols
#Cols
#Cols
numArffs=  1
---------------------------------------
number of times in loop=  2
Confidence Perc =  0.99
Min Rules Perc= 0.05
dvAvg dims=500
 dbAvg rows=100
 running FPGrowth
500,Number of Cols from FPGrowth=  9
NumDiff  6
in idCentroids file  3 cluster1 cluster3 cluster4 numArffs=  1
End Clause
#Cols
#Cols
#Cols
#Cols
#Cols
#Cols
numArffs=  2
---------------------------------------
number of times in loop=  3
Confidence Perc =  0.99
Min Rules Perc= 0.05
dvAvg dims=500
 dbAvg rows=100
 running FPGrowth
500,Number of Cols from FPGrowth=  9
NumDiff  0
FP Same Cols-break

real 0m19.057s
user 0m15.725s
sys 0m1.425s

Playing with DIMACS feature models -- updated 4/10/2013

Running IBEA for 1 hour over the available DIMACS feature models
Parameters: Population = 300, constrained mutation at rate = 0.001, NO crossover.
Technique: Before evolution, TWO rounds of rule checking are made:
1) First round: all features standing by themselves in a line are fixed: some are selected (mandatory), and some are deselected ("prohibited"!!!)
2) Second round: all features that share a line with "fixed features" from first round, they become fixed as well.
Mutation is constrained so that it doesn't mess with the fixed features.


Skipped features Skipped Rules
FM Features Rules %correct Round 1 Total Total
toybox 544 1020 100% 361 66% 363 67% 394 39%
axTLS 684 2155 100% 382 56% 384 56% 259 12%
ecos 1244 3146 100% 10 1% 19 2% 11 0%
freebsd 1396 62183 100% 3 0% 3 0% 20 0%
fiasco 1638 5228 100% 995 61% 995 61% 553 11%
uClinux 1850 2468 100% 1234 67% 1244 67% 1850 75%
busybox 6796 17836 100% 3947 58% 3949 58% 2644 15%
uClinuxconfig 11254 31637 4% 6025 54% 6027 54% 4641 15%
coreboot 12268 47091 1% 4592 37% 4672 38% 2060 4%
buildroot 14910 45603 0% 6755 45% 6759 45% 3534 8%
embtoolkit 23516 180511 0% 6370 27% 6619 28% 657 0%
freetz 31012 102705 0% 14444 47% 14493 47% 3911 4%
Linux 2.6.32 60072 268223 0% 32329 54% 32479 54% 18734 7%
Linux 2.6.33 62482 273799 0% 33597 54% 33766 54% 19394 7%
Linux 2.6.28.6 6888 343944 0%








1) Verify how many features are fixed in the first and second rounds of rule checking. Do we benefit by making further rounds? Done. No benefit expected from further rounds.
2) The first 7 feature models are "easy"... why? Do they have a lot of "fixed" features? or just because they're smaller? Done. ecos and freebsd have very few "fixed" features, yet they are solved easily... Size is a factor that is part of the larger notion of "hardness" or "complexity".
3) These 7 can be used in experiments to compare IBEA with others (NSGAII etc)... We already know others will suck. This could be our "scale-up" verification. No.
4) We could also compare this "rule detection" method with complete randomness (ICSE'13 style) but we've already learned not to rely on complete randomness. No.
5) We could try to optimize two objectives only, then plug the result into 5-objective optimization... This can save a lot of wasted "number crunching". We could run 2-objective and identify the rules that are most frequently violated, make recommendations, take feedback, fix a bunch of features... run again.. sound familiar?

Intrinsic Dimensions for Defects and Effort

Wednesday, March 13, 2013

Sticking points in TSE paper


1-      We’re actually expanding two papers: ICSE and CMSBSE.
2-      In ICSE we fix the number of fitness evaluations. In CMSBSE (and TSE) we fix the run time… A reviewer complained about this in CMSBSE…
3-      Statistical analysis needed in addition to effect size.

Wednesday, March 6, 2013

bagging+FSS+dataset selection for defect prediction

http://unbox.org/things/var/zhimin/

SBSE MOO in Few Evals: GALE

GALE = Geometric Active Learning Evolution


Hypervolume versus IBD and IBS
  -  IBD = how good we are
  -  IBS = how spread out we are
  -  Both based on IBEA indicator



More Results:
 - added some more flavors just so we have one of every cookie
Notes:
  - Left table shows the Raw Averages for everyone over 20 repeats
  - Right table uses t-statistic to generate confidence interval (99%) 
  -  7-3-2 on wins-losses-ties when only considering IBD (quality)
  - WIN or LOSS = no overlap of confidence interval
  - RRSL should be GALE, here

Analysis Notes:
  - RRSL wins badly on NumEval
  - NSGA-II outperforms RRSL in runtime for small MOPs that are cheap to evaluate.  But when models become expensive (i.e. POM2), NSGA-II becomes slow and RRSL wins in runtime
  - RRSL is a very decision-sided routine.  For ZDT1 (30 decisions) and osyzcka2( 5 decisions ), the runtime of RRSL is punished

Paper "3/4 Draft":
 - nothing in here, yet: http://unbox.org/things/var/joe/active/
 - promise to be integrated in linux/latex fully by full draft time
 - https://www.dropbox.com/s/y9ey0hr6agsolzt/solving_problems_in_few_evaluations.pdf?m