Monday, May 21, 2012

Rapid recursive spectral learning (RRSL):

Executive summary:

  • For our current rig, we have a new optimizer that does interesting things for 2-d goals.
  • If we added more dimensions to the goals, then we'd may some really interesting results.
    • Abdel: 
      • We need those distributions for size, effort, defects, have we done this before. That would blow us out to 6-d.
      • We need to baseline this against something else. GAs? DEs?

Rapid recursive spectral learning (RRSL):

  • A spectral learner learns important parameters by reflecting over training data’s eigenvectors.
  • If applied recursively, this approach generates a tree of clusters where each split is defined in terms of the principle component of the data at that level of the tree .
  • Traditional recursive spectral learners are slow since the matrix calculations of PCA take polynomial time. Our rapid RRSL algorithm  splits data on parameters found by a very fast  Nyström approximation [1].
 Our RRSL supports domination pruning and Bayesian contrast set learning.
  • Any any level in the recursion, RRSL finds two distant points then divides the data on the median position between them. RRSL checks if one these two points dominates the other. If so, then it prunes the dominated half of the data.
  • For each leaf cluster generated after domination pruning, RRSL sorts  attribute ranges according to how much more frequently they occur  in the leaf cluster (compared to the rest of the data). RRSL then generates contrast  rules from the first i-th items in that sort. The rules are then applied to data not used in training.

For example:

Recursive spectral learning

Here's RRSL working of a 2-d objective space from feature maps (the goal is most features with fewest violations). In the following:
  •  the thickest black line is the first found dimension 
  •  each thinner line represents a binary split of some thicker line.

Note that this results in 33 leaf classes (since RRSL recursed down to sqrt(N) and there are N=1000 examples in this data),.

Domination pruning

After applying domination pruning, we only get the following eight classes:

Range sorting

When RRSL took one of the above clusters and sorted attribute ranges (bu how more often it appears in that cluster vs anything else), we get the following (this list is sorted most important at the top to least important at the bottom).
(_ID_192    0)
(REGISTRATION_ENFORCEMENT    1)
(REGISTRATION    1)
(_ID_14    1)
(_ID_13    1)
(CUSTOMER_PREFERENCES    0)
(WISH_LIST_CONTENT    0)
(SHIPPING_OPTIONS    0)
(_ID_92    0)
(_ID_193    0)
(_ID_237    0)
(_ID_89    0)
(_ID_226    0)
(_ID_73    0)
(PERMISSIONS    0)
(_ID_191    0)
(_ID_233    0)
(_ID_194    0)
(_ID_187    0)
(_ID_186    0)
(_ID_190    0)
(_ID_189    0)
(PREVIOUSLY_VISITED_PAGES    0)
(_ID_196    0)
(DISCOUNTS    0)
(_ID_239    0)
(_ID_76    0)
(_ID_90    0)
(_ID_75    0)
(_ID_68    0)
(_ID_77    0)
(TARGETING_CRITERIA_PREVIOUS_PURCHASES    0)
(WISH_LIST    0)
(_ID_53    0)
(_ID_235    0)
(_ID_236    0)
(_ID_88    0)
(_ID_91    0)
(_ID_66    0)
(EMAIL_WISH_LIST    0)
(_ID_69    0)
(Aside: Curiously   most of the advise is about what not to do.)

Contrast set rule testing

If we build rules using the first "x" items in the above list, then generate 10,000 more projects (rejecting any that contradict the rules), we get the following effects.


Note that
  •  After the first 16 items shown above, further changes become mostly noise.
  • In terms of max features with least violations, 6 changes look most interesting. 

References


[1] John C. Platt, FastMap, MetricMap, and Landmark MDS are all Nyström Algorithms, Proceedings 10th In. Workshop on Artificial Intelligence and Statistics, 2005..

Tuesday, May 15, 2012