Saturday, September 29, 2012

experiments with random projections


Vast literature on random projections: 
Doing the simplest way possible...

Given best 3 (ish) random projections on the following data, then a 1-D rnn on each projection, how after are you someone else's nearest neighbor?

---| weather |-----------------------

0,    3,    21%,    *********************
1,    2,    14%,    **************
2,    6,    42%,    ******************************************
3,    1,     7%,    *******
4,    2,    14%,    **************

---| autompg |-----------------------

0,    5,      1%,    *
1,    40,    10%,    **********
2,    99,    24%,    ************************
3,    111,   27%,    ***************************
4,    105,   26%,    **************************
5,    31,     7%,    *******
6,    7,      1%,    *

---| china |-----------------------

0,    7,       1%,    *
1,    43,      8%,    ********
2,    131,    26%,    **************************
3,    156,    31%,    *******************************
4,    101,    20%,    ********************
5,    52,     10%,    **********
6,    9,       1%,    *

---| nasa93 |-----------------------

0,    5,     5%,    *****
1,    27,   29%,    *****************************
2,    34,   36%,    ************************************
3,    21,   22%,    **********************
4,    6,     6%,    ******

Monday, September 24, 2012

Tukutuku paper

The first set of corrections are here.
The second set of corrections are here.
The final version of the paper after these corrections are here.

Sunday, September 23, 2012



we're achieving the ICSE'13 results in 7 minutes (as compared with 3 hours)... The only parameter that comes short is spread... but I'm sure we're still performing better than NSGA-II and SPEA2... We haven't yet run those with the new "orderly" mutation operator.



Profiling indicates that we're spending a long time (65%) evaluating the population before sorting and removing worst... This is done a 100 times for our size-100 population... I think if we remove the 5 worst individuals at once we'll be able to shave about half the execution time with little effect on the final results.

So we've got things to try:
1- run more feature models.
2- run more algorithms with orderly mutation.
3- shave time off of IBEA by removing 5 worst.

Tuesday, September 18, 2012

Metrics of Interest: Keeping Players in the Game


Period: A period of time (i.e. 1 day, 1 week, 1 month)

Retention: How many players are still playing in period i+1

Churn: 1 - Retention

Stickiness: [Users per Period] / [Users per next Largest Period].  i.e. DAU/MAU (daily vs monthly)

Viral Rate: New Unique User per period / Total Users per period

Research Objectives = 
 = Maximize Retention
 = Minimize Churn
 = Maximize Stickiness (shoot for 15-20%)
 = For every 1% increase in Churn, Want 2.3% increase in Viral Rate
 = = (reference)


--------------------------------------------
pom2 charts: http://i.imgur.com/LQxLw.png

Our Guest from China: Zhimin He



Prediction Scientific Success

The following is bogus and stupid but oh my it will be visited by 1,000,000 deans:
http://klab.smpp.northwestern.edu/h-index.html


talks for promise

Top two talks: does size matter and learning to change projects.

Monday, September 17, 2012

Aptamer Deadend Estimation


Cluster A3 doubles from the initial round to round 3, indicating that the cluster is a likely candidate for Atrazine aptamers.





Where the clusters contain a combination of the following attributes:


The cluster A3 has all of the attributes in the A cluster as 1 for its centroid.



The concentration of cluster A3 in subsequent rounds varies in a manner that suggests that the system has reached a steady state.  The concentration in round 3 is a reasonable estimate for the final value reached in round 12.


If the concentration of cluster A3 at round 3 was used as an estimate for the concentration of the target cluster, then the non-target clusters, i.e. deadends, are illustrated by the grey section of the chart.  



The Bromacil experiment shows a similar trend, with an increase of 1 1/2 from the initial round to round 3.


The experiment is once again dominated by cluster A3.



Tuesday, September 11, 2012

The Cloud is Large

pom2:

http://i.imgur.com/IMzQj.png

games: 

http://aimazed2d-web-dev-env.elasticbeanstalk.com/

cloud pricing:


EC2 Service for Web Hosting a container for Game Applications

Cliff notes:
* ~ 10 cents per HOUR of CPU
* ~ 12 cents per GB of data transferred

RDS Service for Relational Database Storage

Cliff notes:
* ~ 12 cents per GB of storage

Typically; standard projects can expect to pay ~72$/month.
Several high-end gaming companies use Amazon Web Services:http://aws.amazon.com/solutions/case-studies/

Joe Ingram's profiling

The two cases below are not exactly comparable, but the results are intuitive.

Original IBEA with FM43


IBEA with "good mutation"...


Mobile Phone Feature Model for Z3 (in Python)


from z3 import *

# All features are declared as Boolean variables
Mobile_Phone = Bool('Mobile_Phone')
Calls = Bool('Calls')
Screen = Bool('Screen')
GPS = Bool('GPS')
Media = Bool('Media')
Basic = Bool('Basic')
Color = Bool('Color')
High_res = Bool('High_res')
Camera = Bool('Camera')
MP3 = Bool('MP3')

s = Solver()

# Mandatory features; double implication
s.add( Mobile_Phone == Calls )
s.add( Mobile_Phone == Screen )

# Optional features; child implies parent
s.add( Implies (GPS,Mobile_Phone) )
s.add( Implies (Media,Mobile_Phone) )
s.add( Implies (Basic,Screen) )
s.add( Implies (Color,Screen) )
s.add( Implies (High_res,Screen) )
s.add( Implies (Camera,Media) )
s.add( Implies (MP3,Media) )

# [1,1] group
s.add( Or ( And(Basic,Not(Color),Not(High_res)) ,
And(Not(Basic),Color,Not(High_res)) ,
And(Not(Basic),Not(Color),High_res) ) )

# [1,*] group
s.add( Or (Camera,MP3) )

print s.check()
print s.model()

NEMS Graph

Selex Experiment Questions

Questions for Drs. Sooter and Gannett

Sunday, September 9, 2012

Predicting the Future of Predictive Modelling

by Tim Menzies

A discussion paper for AISE'12.

It is now well established that predictive models can be generated from the artifacts of software projects. So it is time to ask “what’s next?”.

 I suggest that predictive modelling tools can and should be refactored to address the near-term issue of decision systems and the long-term goal of social reasoning.

To download, right click and save to disk.