Wednesday, April 21, 2010

Tuesday, April 13, 2010

Monday, April 12, 2010

More Context Variables on Nasa93


Possible Context Variables for Datasets
  • Cocomo81: Development mode
  • Nasa93: Dev. center, flight-ground, category of application, year of development
  • Desharnais: Development language
  • Albrecht: None (all numeric attributes)
Previous results suggested that there is not a strong correlation between above context variables and selection. Further experiments for Nasa93 (flight-ground, category of application, year of development) point to a similar conclusion: Afore mentioned context variables do not have strong influence on selection.



Component vs. All Learning - Summary


It appears, from our experiments, that learning on dense components and testing on all components wins over the traditional learning on all, and testing on all. Here is the summary table.



Monday, April 5, 2010

Context Variables

About the context variables, we have the following matrices. In parenthesis, the total number of instances located in this particular context variable is written. Those context variables are the ones which we used to divide the datasets into subsets.

It is difficult to see a pattern here. In each case, there is one dominant column (or context variable) from which most of the instances are selected. On the other hand, this particular column is not the most densely populated context variable. Therefore:
1) Center (for nasa93), development type (for cocomo81) or development language (for desharnais) are influential context variables
2) There is one particular context variable from which always most of the instances are selected but this context variable is not the most densely populated one, so size is not a dominant indicator either.




Master's Defense - The Robust Optimization of Non-Linear Requirements Models



Master's Defense - WVU Lane Dept. CSEE
Gregory Gay
Wednesday, April 7, 2010
1:00 PM, room G112 ESB

Open to the public, so feel free to come and see me make a fool of myself. :D

which2 : a stochastic anytime rule learner

(Code: http://unbox.org/wisp/tags/which2/0.1/which2.awk.)

WHICH2 builds rules by ranking ideas, then repeatedly building new ideas by picking and combining two old ideas (favoring those with higher ranks). New ideas generated in this way are ranked and thrown back into the same pot as the old ideas so, if they are any good, they might be picked and extended in subsequent rounds. Alternatively, if the new idea stinks, it gets buried by the better ideas and is ignore.

One important aspect of the following is that the scoring routines for ideas are completely seperate from the rest of the code (see the "score1" function). Hence, it is a simple matter to try our different search biases.
For example, here's the current score routine:
function score(i,rows,data, 
            cols,row, col,a,b,c,d,triggered,pd,pf,
prec,acc,supports,s,fits) 
{
    a=b=c=d=Pinch # stop divide by zero errors
    cols=Name[0]
    for(row=1;row<=rows;row++) {
        triggered = matched(row,cols,data,i)
        if (data[row,cols]) {
            if (triggered) {d++} else {b++}
        } else {
            if (triggered) {c++} else {a++}
        }
    } 
        fits    = c + d
    pd      = d/(b+d)
    pf      = a/(a+c)
    prec    = d/(c+d)
    acc     = (a+d)/(a+b+c+d)
    support = (c+d)/(a+b+c+d)
    return score1(pd,pf,prec,acc,support,fits)
}
function score1(pd,pf,prec,acc,supportm,fits) {
        if (fits <= Overfitted)
                return 0
    if (Eval==1) return acc
    if (Eval==2) return 2 * pd * prec/(pd+prec)
    if (Eval==3) return 2 * pd * pf/(pd+pf)
    if (Eval==4) return support * 2 * pd * pf/(pd+pf)
    return support * 2 * pd * prec/(pd+prec)
}
Various students (ZachM and Joe) have implemented this before and found it surprisingly slow.  For example, in the above, note that scoring one rule means running it over the entire data set. Which2 therefore uses two speed up tricks: 
  • Rather that sorting new ideas straight away into the old ones, it generates (say) 20 new ones at each round before sorting them back into the old.
  • Which2 keeps a cache of how ideas scored before. If we re-score an old idea, we just return that score (no need to recompute it).
The result is pretty zip: under a second for the 2000+ records of titanic.arff. e.g. This call produces the following output.

gawk -f which2.awk weather.nominal.arff 

In the following, the "candidates" are ideas that look promising and "score" ranks the candidates. If the max score does not improve from the last round, then "lives" decreases.

Each round tries random combinations of the stuff from prior rounds (favoring those things with higher scores). Hence, at round 1, all the candidates are singletons. But. later on (see line 54) the candidates can grow to combinations of things.

Each round prunes the candiates so that only the better candidates survive to round+1.
The final output are the candidates of the last round. In this example, the best rule is tempareture = cool or mild.


----------------------------------------------------
% class: yes seed: 1 round: 1 max: 0 lives: 5

% candidate: yes,humidity=normal
% candidate: yes,humidity=normal,outlook=overcast
% candidate: yes,humidity=normal,temperature=cool
% candidate: yes,humidity=normal,temperature=mild
% candidate: yes,humidity=normal,windy=FALSE
% candidate: yes,outlook=overcast
% candidate: yes,outlook=overcast,temperature=cool
% candidate: yes,outlook=overcast,windy=FALSE
% candidate: yes,temperature=cool
% candidate: yes,temperature=cool,temperature=mild
% candidate: yes,temperature=cool,windy=FALSE
% candidate: yes,temperature=mild
% candidate: yes,temperature=mild,windy=FALSE
% candidate: yes,windy=FALSE

[0.072352013,yes,temperature=mild,windy=FALSE].
[0.13229596,yes,humidity=normal,temperature=cool].
[0.13264848,yes,temperature=cool].
[0.17611714,yes,humidity=normal,windy=FALSE].
[0.17652966,yes,outlook=overcast].
[0.22876845,yes,temperature=mild].
[0.37582652,yes,humidity=normal].
[0.40354402,yes,windy=FALSE].
[0.52640158,yes,temperature=cool,temperature=mild].

----------------------------------------------------
% class: yes seed: 1 round: 2 max: 0.52640158 lives: 5

% candidate: yes,humidity=normal
% candidate: yes,humidity=normal,outlook=overcast
% candidate: yes,humidity=normal,outlook=overcast,temperature=cool
% candidate: yes,humidity=normal,temperature=cool
% candidate: yes,humidity=normal,temperature=cool,temperature=mild
% candidate: yes,humidity=normal,windy=FALSE
% candidate: yes,outlook=overcast
% candidate: yes,outlook=overcast,temperature=cool
% candidate: yes,outlook=overcast,temperature=cool,temperature=mild
% candidate: yes,outlook=overcast,windy=FALSE
% candidate: yes,temperature=cool
% candidate: yes,temperature=cool,temperature=mild
% candidate: yes,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,temperature=mild
% candidate: yes,temperature=mild,windy=FALSE
% candidate: yes,windy=FALSE

[0.13229596,yes,humidity=normal,temperature=cool].
[0.13264848,yes,temperature=cool].
[0.17611714,yes,humidity=normal,windy=FALSE].
[0.17652966,yes,outlook=overcast].
[0.20419978,yes,temperature=cool,temperature=mild,windy=FALSE].
[0.22876845,yes,temperature=mild].
[0.28604711,yes,humidity=normal,temperature=cool,temperature=mild].
[0.37582652,yes,humidity=normal].
[0.40354402,yes,windy=FALSE].
[0.52640158,yes,temperature=cool,temperature=mild].

----------------------------------------------------
% class: yes seed: 1 round: 3 max: 0.52640158 lives: 4

% candidate: yes,humidity=normal
% candidate: yes,humidity=normal,outlook=overcast
% candidate: yes,humidity=normal,temperature=cool
% candidate: yes,humidity=normal,temperature=cool,temperature=mild
% candidate: yes,humidity=normal,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,humidity=normal,temperature=cool,windy=FALSE
% candidate: yes,humidity=normal,windy=FALSE
% candidate: yes,outlook=overcast
% candidate: yes,outlook=overcast,temperature=cool,temperature=mild
% candidate: yes,outlook=overcast,windy=FALSE
% candidate: yes,temperature=cool
% candidate: yes,temperature=cool,temperature=mild
% candidate: yes,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,temperature=cool,windy=FALSE
% candidate: yes,temperature=mild
% candidate: yes,temperature=mild,windy=FALSE
% candidate: yes,windy=FALSE

[0.13229596,yes,humidity=normal,temperature=cool].
[0.13264848,yes,temperature=cool].
[0.17611714,yes,humidity=normal,windy=FALSE].
[0.17652966,yes,outlook=overcast].
[0.20419978,yes,temperature=cool,temperature=mild,windy=FALSE].
[0.22876845,yes,temperature=mild].
[0.28604711,yes,humidity=normal,temperature=cool,temperature=mild].
[0.37582652,yes,humidity=normal].
[0.40354402,yes,windy=FALSE].
[0.52640158,yes,temperature=cool,temperature=mild].

----------------------------------------------------
% class: yes seed: 1 round: 4 max: 0.52640158 lives: 3

% candidate: yes,humidity=normal
% candidate: yes,humidity=normal,outlook=overcast,windy=FALSE
% candidate: yes,humidity=normal,temperature=cool
% candidate: yes,humidity=normal,temperature=cool,temperature=mild
% candidate: yes,humidity=normal,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,humidity=normal,windy=FALSE
% candidate: yes,outlook=overcast
% candidate: yes,outlook=overcast,temperature=cool
% candidate: yes,outlook=overcast,temperature=cool,temperature=mild
% candidate: yes,outlook=overcast,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,outlook=overcast,windy=FALSE
% candidate: yes,temperature=cool
% candidate: yes,temperature=cool,temperature=mild
% candidate: yes,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,temperature=cool,windy=FALSE
% candidate: yes,temperature=mild
% candidate: yes,temperature=mild,windy=FALSE
% candidate: yes,windy=FALSE

[0.13229596,yes,humidity=normal,temperature=cool].
[0.13264848,yes,temperature=cool].
[0.17611714,yes,humidity=normal,windy=FALSE].
[0.17652966,yes,outlook=overcast].
[0.20419978,yes,temperature=cool,temperature=mild,windy=FALSE].
[0.22876845,yes,temperature=mild].
[0.28604711,yes,humidity=normal,temperature=cool,temperature=mild].
[0.37582652,yes,humidity=normal].
[0.40354402,yes,windy=FALSE].
[0.52640158,yes,temperature=cool,temperature=mild].

----------------------------------------------------
% class: yes seed: 1 round: 5 max: 0.52640158 lives: 2

% candidate: yes,humidity=normal
% candidate: yes,humidity=normal,outlook=overcast,temperature=cool
% candidate: yes,humidity=normal,temperature=cool
% candidate: yes,humidity=normal,temperature=cool,temperature=mild
% candidate: yes,humidity=normal,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,humidity=normal,temperature=cool,windy=FALSE
% candidate: yes,humidity=normal,temperature=mild
% candidate: yes,humidity=normal,windy=FALSE
% candidate: yes,outlook=overcast
% candidate: yes,temperature=cool
% candidate: yes,temperature=cool,temperature=mild
% candidate: yes,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,temperature=cool,windy=FALSE
% candidate: yes,temperature=mild
% candidate: yes,windy=FALSE

[0.13229596,yes,humidity=normal,temperature=cool].
[0.13264848,yes,temperature=cool].
[0.17611714,yes,humidity=normal,windy=FALSE].
[0.17652966,yes,outlook=overcast].
[0.20419978,yes,temperature=cool,temperature=mild,windy=FALSE].
[0.22876845,yes,temperature=mild].
[0.28604711,yes,humidity=normal,temperature=cool,temperature=mild].
[0.37582652,yes,humidity=normal].
[0.40354402,yes,windy=FALSE].
[0.52640158,yes,temperature=cool,temperature=mild].

----------------------------------------------------
% class: yes seed: 1 round: 6 max: 0.52640158 lives: 1

% candidate: yes,humidity=normal
% candidate: yes,humidity=normal,outlook=overcast
% candidate: yes,humidity=normal,temperature=cool
% candidate: yes,humidity=normal,temperature=cool,temperature=mild
% candidate: yes,humidity=normal,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,humidity=normal,temperature=cool,windy=FALSE
% candidate: yes,humidity=normal,windy=FALSE
% candidate: yes,outlook=overcast
% candidate: yes,outlook=overcast,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,temperature=cool
% candidate: yes,temperature=cool,temperature=mild
% candidate: yes,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,temperature=mild
% candidate: yes,windy=FALSE

[0.13229596,yes,humidity=normal,temperature=cool].
[0.13264848,yes,temperature=cool].
[0.17611714,yes,humidity=normal,windy=FALSE].
[0.17652966,yes,outlook=overcast].
[0.20419978,yes,temperature=cool,temperature=mild,windy=FALSE].
[0.22876845,yes,temperature=mild].
[0.28604711,yes,humidity=normal,temperature=cool,temperature=mild].
[0.37582652,yes,humidity=normal].
[0.40354402,yes,windy=FALSE].
[0.52640158,yes,temperature=cool,temperature=mild].

----------------------------------------------------
% class: yes seed: 1 round: 7 max: 0.52640158 lives: 0

% candidate: yes,humidity=normal
% candidate: yes,humidity=normal,outlook=overcast
% candidate: yes,humidity=normal,temperature=cool
% candidate: yes,humidity=normal,temperature=cool,temperature=mild
% candidate: yes,humidity=normal,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,humidity=normal,windy=FALSE
% candidate: yes,outlook=overcast
% candidate: yes,outlook=overcast,temperature=cool,temperature=mild
% candidate: yes,outlook=overcast,temperature=mild
% candidate: yes,outlook=overcast,windy=FALSE
% candidate: yes,temperature=cool
% candidate: yes,temperature=cool,temperature=mild
% candidate: yes,temperature=cool,temperature=mild,windy=FALSE
% candidate: yes,temperature=mild
% candidate: yes,windy=FALSE

[0.13229596,yes,humidity=normal,temperature=cool].
[0.13264848,yes,temperature=cool].
[0.17611714,yes,humidity=normal,windy=FALSE].
[0.17652966,yes,outlook=overcast].
[0.20419978,yes,temperature=cool,temperature=mild,windy=FALSE].
[0.22876845,yes,temperature=mild].
[0.28604711,yes,humidity=normal,temperature=cool,temperature=mild].
[0.37582652,yes,humidity=normal].
[0.40354402,yes,windy=FALSE].
[0.52640158,yes,temperature=cool,temperature=mild].

This is not the usual solution to weather but note how it covers most of the examples. The reason for this selection is that the scoring function (shown above) has a support term. Hence, this run favors things with high support.