Monday, March 15, 2010

Active Machine Learning

Unlabeled data is easy to come by and labeling that data can be a tedious task. Imagine that you've been tasked with gathering sound bytes. All you have to do is walk around with a microphone and you've got your data. Once you have the data you have to label each individual sound byte and catalogue it. Obviously, combing and labeling all the data wouldn't be fun -- regardless of the domain.

Active machine learning is a supervised learning technique whose goal is to produce adequately labeled (classified) data with as little human interference as possible. The active learning process takes in a small chunk of data which has already been assigned a classification by a human (oracle) with extensive domain knowledge. The learner then uses that data to create a classifier and applies it to a larger set of unlabeled data. The entire learning process aims at keeping human annotation to a minimum -- only referring to the oracle when the cost of querying the data is high.

Active learning typically allows for monitoring of the learning process and offers the human expert the ability to halt learning. If the classification error grows beyond a heuristic the oracle can pull the plug on the learner and attempt to rectify the problem...

For a less high level view of Active Machine Learning see the following literature survey on the topic.

No comments:

Post a Comment