## Results 1/21/14

## Results of A/B/C/D prediction: dismal

## Results 2:

Back to the CSV: class names are listed

Type A: 4% B: 11% C: 12% D: 71% NoMatch: 0%

Type A: 3% B: 17% C: 8% D: 63% NoMatch: 5%

Type A: 5% B: 5% C: 18% D: 69% NoMatch: 0%

Type A: 17% B: 8% C: 15% D: 58% NoMatch: 0%

['camel-1.0.csv', 'camel-1.2.csv', 'camel-1.4.csv', 'camel-1.6.csv']

Type A: 3% B: 0% C: 22% D: 51% NoMatch: 22%

Type A: 15% B: 18% C: 3% D: 55% NoMatch: 6%

Type A: 9% B: 7% C: 10% D: 71% NoMatch: 1%

['ivy-1.1.csv', 'ivy-1.4.csv', 'ivy-2.0.csv']

Type A: 7% B: 47% C: 2% D: 40% NoMatch: 1%

Type A: 0% B: 0% C: 0% D: 0% NoMatch: 100%

['jedit-3.2.csv', 'jedit-4.0.csv', 'jedit-4.1.csv', 'jedit-4.2.csv', 'jedit-4.3.csv']

Type A: 17% B: 15% C: 5% D: 58% NoMatch: 2%

Type A: 16% B: 7% C: 9% D: 62% NoMatch: 4%

Type A: 9% B: 15% C: 3% D: 64% NoMatch: 6%

Type A: 0% B: 11% C: 0% D: 47% NoMatch: 38%

['log4j-1.0.csv', 'log4j-1.1.csv', 'log4j-1.2.csv']

Type A: 16% B: 6% C: 8% D: 41% NoMatch: 27%

Type A: 30% B: 1% C: 56% D: 5% NoMatch: 5%

['lucene-2.0.csv', 'lucene-2.2.csv', 'lucene-2.4.csv']

Type A: 33% B: 12% C: 24% D: 28% NoMatch: 1%

Type A: 42% B: 15% C: 21% D: 15% NoMatch: 4%

['synapse-1.0.csv', 'synapse-1.1.csv', 'synapse-1.2.csv']

Type A: 5% B: 4% C: 22% D: 63% NoMatch: 3%

Type A: 13% B: 12% C: 19% D: 53% NoMatch: 1%

['velocity-1.4.csv', 'velocity-1.5.csv', 'velocity-1.6.csv']

Type A: 40% B: 34% C: 2% D: 2% NoMatch: 20%

Type A: 26% B: 37% C: 3% D: 29% NoMatch: 2%

['xalan-2.4.csv', 'xalan-2.5.csv', 'xalan-2.6.csv', 'xalan-2.7.csv']

Type A: 9% B: 4% C: 36% D: 44% NoMatch: 4%

Type A: 27% B: 20% C: 15% D: 31% NoMatch: 4%

Type A: 44% B: 0% C: 51% D: 1% NoMatch: 2%

['xerces-1.2.csv', 'xerces-1.3.csv', 'xerces-1.4.csv']

Type A: 3% B: 11% C: 10% D: 72% NoMatch: 1%

Type A: 7% B: 0% C: 38% D: 25% NoMatch: 27%

Idea: New dataset consisting of:

- All attributes of N
- All attributes of N+1
- The delta between N and N+1
- Class of defect change

## Result1

- Preliminary feature selection with info gain selecting top 50%
- Normalized and discredited with Fayyed-Irani
- PCA via FastMap
- Grid clustering
- Centroids plotted along with version n+1 nearest neighbor lines. (Not terribly useful)
- Do I smell transforms of best fit around the corner?

## Results0

k-means 5 to cluster each data-set within itself

Eigenvalues used to determine select features with most influance

Actual selected columns are plotted, not synthesized dimensions

-- significant correlations could be reported as synonmyms

rules for connecting the dots?