Tuesday, December 3, 2013

Explanation in MOEA = Planning

                                                                           
                                                                .-''-.     
       __.....__        _..._ .----.     .----.               .' .-.  )    
   .-''         '.    .'     '.\    \   /    /.-.          .-/ .'  / /     
  /     .-''"'-.  `. .   .-.   .'   '. /'   /  \ \        / (_/   / /      
 /     /________\   \|  '   '  ||    |'    /    \ \      / /     / /       
 |                  ||  |   |  ||    ||    |     \ \    / /     / /        
 \    .-------------'|  |   |  |'.   `'   .'      \ \  / /     . '         
  \    '-.____...---.|  |   |  | \        /        \ `  /     / /    _.-') 
   `.             .' |  |   |  |  \      /          \  /    .' '  _.'.-''  
     `''-...... -'   |  |   |  |   '----'           / /    /  /.-'_.'      
                     |  |   |  |                |`-' /    /    _.'         
                     '--'   '--'                 '..'    ( _.-'       
  

Here's  problem :

The above shows 93 projects (with 23 decisions and 3 objectives) mapped into a 2-d space. 
  • The x-axis is the cosine dimension (where the two end points are found via the FastMap heuristic)
  • The y-axis is the Pythagorus dimension: sqrt(a^2 -x^2)
  • The colors denote clusters (so all the red ones at the bottom left are in one cluster). 
    • Clustering done via 4-way recursive splits on mean x,y points
The principle of envy says
  • Find your nearest cluster with a better class score
    • We use the mean IBEA score of the objectives = sum_i e^(obj[i] - min[i]) where all objectives have been normalized (x-min)/(max - min).
  • Find the the delta from you to them
  • That is is your plan on how to make your life better.
And here's the problem: 
  • how to explain that plan 
  • when "nearest" is expressed in the above weirdo  cosine and Pythagorus dimensions.

Here's one solution:

  • Given a pair  of centroids (asIs, Envied)
  • Sample:
    • N times repeat
      • Generate  an example Eg at random from the examples in those two clusters
      • Let value(Eg) be the score of each example (and  its expressed as the distance to the Envied cluster)
        • So smallest scores are better
      • For each range in the example
        • Add value(Eg) to value(range)
  • Prune:
    • Sort ranges by their value
    • Let gap = distance(centroid(asIs), centroid(Envied))
    • Score ranges by their ability to move rows in asIs towards Envied, as follows.
    • For i = 1 to 10 (some magic number) 
      • Let elite be the best i ranges e.g. 
        • ((colNum, range), 
           [(18, 'l'),  
            (15, 'vh'),  
            (19, 'n'), 
            (8, 'xh'), ....]
      • For each row in asIs (i.e. the things you want to change)
        • Let before = distance(row,centroid(Envied))
        • Let mutant = copy(row)
        • Inject all the ranges into elite into mutant
        • Let after = distance(mutant, centroid(Envied))
        • Let movement(elite) = (before - after)/gap
    • Policy = the ranges with  most movement

Details:

  • Estimates of defect, effort, months from Vasil's algorithm (interpolation between 2 nearest clusters)
  • To generate an example, I used the DE trick of interpolating between values. This has the benefit of preserving known distributions
def smear(rows,cols,cr=0.5,f=0.5):
  i    = one(rows)
  j    = one(rows)
  k    = one(rows)
  must = any(0, len(i))
  new  = []
  for n,(a,b,c,col) in enumerate(zip(i,j,k,cols)):
    if a == "?" or b == "?" or c == "?":
      x = "?"
    else:
      x = a
      if n == must or rand() < cr:
        if isa(col,Num):
          x = a + f*(b-c)
        else:
          if rand() < f: 
            x = b if rand() < 0.5 else c 
    new += [x]
  return new

Results

When applied in a 5*5 cross-val, the 25,50,75,100th percentiles of defect, effort, months were as follows. Note the large decreases at the 75th percentile:

effort
before: [0.44, 0.98, 4.52, 199.5]
after: [0.39, 0.76, 2.01, 120.2]

defects
before: [0.34, 0.84, 3.57, 124.76]
after: [0.37, 0.72, 2.09, 127.61]

months
before: [0.15, 0.33, 0.58, 3.7]
after: [0.16, 0.35, 0.51, 3.76]



No comments:

Post a Comment