Tuesday, August 30, 2011

Results on CITRE from last semester

5 "bands" further examination

ReviewDocs2 = 250
ReviewDocs2 = 275
ReviewDocs2 = 300
ReviewDocs2 = 325
ReviewDocs2 = 350

Density Distribution Map

Here is the density distribution map that I came up with for the China Data set. The larger the density, the darker the shade of gray. The pure white spots are where there are no clusters present.

Monday, August 29, 2011

Which2 Multidimensional optimizer

Immediate results using the given multi-dimensional functions are poor, but promising

Fonseca data

All of our rules with 2-bin discretization using Which were in the tiny green square. However, the goal with fonseca is to minimize, so being in the top right corner is very bad. By comparison, with 8-bin, are rules were mostly in the top right blue square, but we had one rule with coordinates f1=0.2497 f2=0.9575.

In Kursawe

Our rules were all in the mass in the center left when I chose maximize to optimize. With 8-bin, the rules were spread out with than with 2-bin.

This is because 8-bin allows more detail than 2-bin. However, I posit once I am able to recurse this process, applying the constraints of the rules, that 2-bin will be better overall.

Further exploring these rules (by applying the rules as new constraints on the randomized input vectors on the data database) will involve a massive recoding. However, having done some manual constraints using the generated rules, the results improve in Round 2 (treating this as round 1). However, you cannot simply pick one rule to explore. Basically, your unconstrained start point is the head of an infinite tree, the branches from each node are the rules generated by each run of which using that node and all ancestor nodes to that node's rules as constraints on the input data. The rules can then be mapped to coordinates in the space of (f1, f2, ...fn). Ideally, these rules will approach the Pareto frontier.

Which is running through the data very quickly, but until I have further results which will take a massive reworking of code, can't say anything definitive about it's long term usefulness just yet.

Tuesday, August 23, 2011

Privacy Algorithms

4 Privacy algorithms tested against 4 learners (random forests, naive bayes, k-nearest neighbour and logistic regression.


what i did on my holidays

think before you report

fayola eg1

fayola eg2


This summer I worked on creating a stronger version of idea in lisp. The first code was fastmap which was the 2D point generator. Then instead of storing the data in a tree structure, they were stored in a grid structure. The size and number of quadrantas was determined by the square-root of the amount of the data divided by two. When looking to cluster the quadrants, the gridclus function could take a look at all directions surrounding it to find its' closest neighbors.
Then the neighbors of the neighbors were searched for acceptance rate of 0.5 also. Gaps was then coded to compare clusters and locate the closet neighbor feared by the cluster being looked at. Finally, Keys was created to look for the best treatment. It used (b/B)^2/((b/B) +(r/R)) to determine the best rule overall. b is the frequency the rules appears in the 20% best, while r is the frequency that the rule appears in the 80% rest. While all of these functions were coded up individually, they are not yielding the correct results.

As seen from the image, the grid still contains too many clusters. When taking a closer look, it can be seen tath several of tehse clusters should be mereged t into one. The gridclus error is the start of the problems with the different functions interacting.

This example is from the velocity 1.6 data.

There are 30 clusters with the most in one cluster equally 4 quadrants.
An example of the printout when keys is run on this data follows:

(TREATMENT #((0.0 $MOA)))
(TREATMENT #((0.0 $NOC)))
(TREATMENT #((0.0 $LCOM)))

Monday, August 22, 2011

Lua Games

Over the summer, I worked on and developed three games in Lua. The game concepts come out of the book; Land of Lisp.

Attack of the Robots
- Avoid robots, and as they chase you down, get them to run into each other

Grand Theft Wumpus
- The Wumpus is hiding in Congestion City, but where?
- Use clues to track down his blood trail, and avoid Glow-worm gangs and Police
- Fire your one and only shot at the Wumpus if you think you've got his location tracked down

- Watch as animals reproduce while eating and expanding across the game world
- Notice how most animals will remain in the lush jungle; very few wander out across the steppes.

Graphics Engine chosen for development was Lua LOVE: http://love2d.org/

Dungeon Gen & Explorer

Dungeon built based off an algorithm by Jamis Buck:http://blog.kromatyk.fr/wp-content/random-dungeon-design.pdf.

The agent doing the exploring is what is interesting in the Dungeon Project - I developed an algorithm on my own in which the agent discovers and uses pathfinding to track down "darkways" - points of interest which the agent wants to go explore. The algorithm marks down all darkways, and chooses the closest one to go explore.

This is a human-way of exploring an unknown dungeon, and it is human because the agent has been restricted in what it knows about the dungeon. For instance, the agent only knows what has been revealed into vision, and it's pathfinding includes only visible regions.

Left: Regions the agent has explored. Right: The entire dungeon; explored.