Monday, October 10, 2011

Reviews and unsupervised feature-selectors

a) Papers: In case you want to take a look at some of the recent reviews we have submitted, here are two papers: 1) kernel density estimation and 2) ensemble methods.

Another one with 40% of the reviews done is about active learning (here).

b) QUICK+ (under implementation): An unsupervised instance and feature selector, which works in two steps:

  • Step1: Run QUICK on the dataset and find how many instances (q) you need.
  • Step2: 
    • For q instances, transpose the dataset and find the popularity of features in a space defined by instances. 
    • Throw away features from least popular to most popular until there is a statistically significant deterioration in performance.
    • Stop when performance drops significantly and put the last-thrown feature back.

NSGA-II with Clustering Results

Clustering appears to be capable of helping NSGA-II converge faster, but it does not appear to do so reliably. For the two easier problems (Schaffer and Fonseca), the clustering version converges in significantly fewer generations than the baseline version. For the harder problems (Kursawe and Tanaka), it appears to have had little impact.

The clustering appears to have no impact on the variance in convergence.

The following are graphs of the convergences of the two NSGA-II versions on the four problems. The left column shows the median performance, and the right shows the 25th and 75th percentile performances.









The following are graphs of the running times for solving the previous problems. The clustering version is linearly slower than the baseline version.





There are some odd blips in some of the running times that seem to be noise. I tested the Tanaka problem with a population size of 2000 for 2000 generations, and I got a very linear result:



The only thing that can determine whether clustering truly helps NSGA-II perform better is a distribution metric. If clustering helps distribute solutions across the Pareto frontier in fewer generations than the baseline, it may be a viable technique even when it provides no benefit for the convergence.

Tuesday, October 4, 2011

The Continuing Adventures of NSGA-II

I have implemented clustering inside the official NSGA-II implementation in C. It seems to converge faster and with more spread than the baseline:




















The trick is processing these results to determine whether the apparent convergence is real or not. The only way I know to test the convergence and distribution is to use the Upsilon (average distance to Pareto frontier) and Delta (distances to neighbors weighted by average distances) measures Deb used, but these require knowing the mathematical definition of the Pareto frontier to calculate equally distributed points across it.

To approximate this, I tried writing an incremental search through the problem space using kd-clustering to refine the search space. To my surprise, it did not seem to work well:









So, I've tried switching to use calculus to determine these points by hand. This is not going well, either:

Schaffer Pareto Frontier: f2 = (sqrt(f1) - 2)^ 2
Area under Frontier: Integral fx dx = 1/2 x^2 - 8/3 x sqrt(x) + 4x + C
Calculating x's that generate precise areas... ?

Crowd pruning

Tried 2 methods:

1) Based on the ratio of (max allowed / current rule count), I randomly decide whether to include each point.

2) I sort by the first dimension, then add in in order every nth item to get to 100. (If n is not an integer, it selects every item that makes the "step" go over the next integer value.

Using algorithm one, I saw a fair amount of performance loss, as gaps began to appear in the frontier.



However, when I used the second algorithm, I saw mostly the same level of performance as the data where I didn't crowd prune the rules, and generated the results is minutes rather than a few hours.


For comparison, here's the fonseca curve at generation 9 without rule pruning.

Still reviewing..

Finished the reviews of the Comba paper minor revision for TSE. The paper is here. I still need a proof-read.

Submitted version of the kernel paper to EMSE is here.

Active learning paper major revisions for TSE are in progress..

Discussion on the TSE paper asserting that OLS is the best.

Experiments: 1) unsupervised feature selectors, 2) different ranking methods.