ai @ wvu: March 2013

Wednesday, March 20, 2013

EMFP Alg, early stop implemented

My algorithm now stops early if FP Growth finds the same set of in two sequential runs.

This likely means that the attributes do not form statistically significant clusters and that there are likely no more attribute sets that will be stat. sig. because FP Growth finds the most common column sets first.

It is currently running with a cutoff of a 95% Binomial CI, I can lower the CI percentage as Dr. Menzies suggested.

Below is the latest printout demonstrating the early exit.

time ./trial.sh main synthDb_M100_N500_m5_n10_p0.050000_num1-1.arff

***Initial Values***
confidence levels: 0.99 0.95 0.90 0.85 0.80
min perc levels (rows): 0.05 0.04 0.03 0.02 0.01
min cols: 2
FPGrowth Step level: .01
FPGrowth Upper limit: 0.5
---------------------------------------
number of times in loop= 1
Confidence Perc = 0.99
Min Rules Perc= 0.05
dvAvg dims=500
dbAvg rows=100
running FPGrowth
500,Number of Cols from FPGrowth= 9
in idCentroids file 3 cluster1 cluster2 cluster3 numArffs= 0
End Clause
#Cols
#Cols
#Cols
#Cols
#Cols
#Cols
numArffs= 1
---------------------------------------
number of times in loop= 2
Confidence Perc = 0.99
Min Rules Perc= 0.05
dvAvg dims=500
dbAvg rows=100
running FPGrowth
500,Number of Cols from FPGrowth= 9
NumDiff 6
in idCentroids file 3 cluster1 cluster3 cluster4 numArffs= 1
End Clause
#Cols
#Cols
#Cols
#Cols
#Cols
#Cols
numArffs= 2
---------------------------------------
number of times in loop= 3
Confidence Perc = 0.99
Min Rules Perc= 0.05
dvAvg dims=500
dbAvg rows=100
running FPGrowth
500,Number of Cols from FPGrowth= 9
NumDiff 0
FP Same Cols-break

real 0m19.057s
user 0m15.725s
sys 0m1.425s

Playing with DIMACS feature models -- updated 4/10/2013

Running IBEA for 1 hour over the available DIMACS feature models
Parameters: Population = 300, constrained mutation at rate = 0.001, NO crossover.
Technique: Before evolution, TWO rounds of rule checking are made:
1) First round: all features standing by themselves in a line are fixed: some are selected (mandatory), and some are deselected ("prohibited"!!!)
2) Second round: all features that share a line with "fixed features" from first round, they become fixed as well.
Mutation is constrained so that it doesn't mess with the fixed features.

				Skipped features				Skipped Rules
FM	Features	Rules	%correct	Round 1		Total		Total
toybox	544	1020	100%	361	66%	363	67%	394	39%
axTLS	684	2155	100%	382	56%	384	56%	259	12%
ecos	1244	3146	100%	10	1%	19	2%	11	0%
freebsd	1396	62183	100%	3	0%	3	0%	20	0%
fiasco	1638	5228	100%	995	61%	995	61%	553	11%
uClinux	1850	2468	100%	1234	67%	1244	67%	1850	75%
busybox	6796	17836	100%	3947	58%	3949	58%	2644	15%
uClinuxconfig	11254	31637	4%	6025	54%	6027	54%	4641	15%
coreboot	12268	47091	1%	4592	37%	4672	38%	2060	4%
buildroot	14910	45603	0%	6755	45%	6759	45%	3534	8%
embtoolkit	23516	180511	0%	6370	27%	6619	28%	657	0%
freetz	31012	102705	0%	14444	47%	14493	47%	3911	4%
Linux 2.6.32	60072	268223	0%	32329	54%	32479	54%	18734	7%
Linux 2.6.33	62482	273799	0%	33597	54%	33766	54%	19394	7%
Linux 2.6.28.6	6888	343944	0%

1) Verify how many features are fixed in the first and second rounds of rule checking. Do we benefit by making further rounds? Done. No benefit expected from further rounds.
2) The first 7 feature models are "easy"... why? Do they have a lot of "fixed" features? or just because they're smaller? Done. ecos and freebsd have very few "fixed" features, yet they are solved easily... Size is a factor that is part of the larger notion of "hardness" or "complexity".
~~3) These 7 can be used in experiments to compare IBEA with others (NSGAII etc)... We already know others will suck. This could be our "scale-up" verification.~~ No.
~~4) We could also compare this "rule detection" method with complete randomness (ICSE'13 style) but we've already learned not to rely on complete randomness.~~ No.
5) We could try to optimize two objectives only, then plug the result into 5-objective optimization... This can save a lot of wasted "number crunching". We could run 2-objective and identify the rules that are most frequently violated, make recommendations, take feedback, fix a bunch of features... run again.. sound familiar?

Intrinsic Dimensions for Defects and Effort

Wednesday, March 13, 2013

Sticking points in TSE paper

1- We’re actually expanding two papers: ICSE and CMSBSE.

2- In ICSE we fix the number of fitness evaluations. In CMSBSE (and TSE) we fix the run time… A reviewer complained about this in CMSBSE…

3- Statistical analysis needed in addition to effect size.

Tuesday, March 12, 2013

Landscape of Defect Data

Wednesday, March 6, 2013

bagging+FSS+dataset selection for defect prediction

http://unbox.org/things/var/zhimin/

SBSE MOO in Few Evals: GALE

GALE = Geometric Active Learning Evolution

Hypervolume versus IBD and IBS
- IBD = how good we are
- IBS = how spread out we are
- Both based on IBEA indicator

More Results:

- added some more flavors just so we have one of every cookie

Notes:

- Left table shows the Raw Averages for everyone over 20 repeats

- Right table uses t-statistic to generate confidence interval (99%)

- 7-3-2 on wins-losses-ties when only considering IBD (quality)

- WIN or LOSS = no overlap of confidence interval
- RRSL should be GALE, here

Analysis Notes:
- RRSL wins badly on NumEval
- NSGA-II outperforms RRSL in runtime for small MOPs that are cheap to evaluate. But when models become expensive (i.e. POM2), NSGA-II becomes slow and RRSL wins in runtime
- RRSL is a very decision-sided routine. For ZDT1 (30 decisions) and osyzcka2( 5 decisions ), the runtime of RRSL is punished

Paper "3/4 Draft":
- nothing in here, yet: http://unbox.org/things/var/joe/active/
- promise to be integrated in linux/latex fully by full draft time
- https://www.dropbox.com/s/y9ey0hr6agsolzt/solving_problems_in_few_evaluations.pdf?m