Active learning is a form of semi-supervised machine learning where the algorithm can choose which data it wants to learn from. With this approach, the program can actively query an authority source, either the programmer or a labeled dataset, to learn the correct prediction for a given problem.
The goal of this iterative learning approach is to speed along the learning process, especially if you don’t have a large labeled dataset to practice traditional supervised learning methods.
One of the most popular applications for active learning is in the labelling intensive Natural Language Processing field. This method can produce similar results as supervised learning, with a fraction of the human involvement.
How does Active Learning Work in Practice?
While there are many specific query strategies, such as least confidence, margin sampling and entropy sampling, there are just three broad scenarios where the active learning AI needs to query the proper labels of data.
- Membership Query Synthesis: This is where the learner generates its own instance from an underlying natural distribution. For example, if the dataset are pictures of humans and animals, the learner could send a clipped image of a leg to the teacher and query if this appendage belongs to an animal or human. This is particularly useful if your dataset is small.
- Stream-Based Selective Sampling: Here, each unlabeled data point is examined one at a time with the machine evaluating the informativeness of each item against its query parameters. The learner decides for itself whether to assign a label or query the teacher for each datapoint.
- Pool-Based Sampling: In this scenario, instances are drawn from the entire data pool and assigned an informative score, a measurement of how well the learner “understands” the data. The system then selects the most informative instances and queries the teacher for the labels.
Minimum marginal hyperplane
Some active learning algorithms are built upon support-vector machines (SVMs) and exploit the structure of the SVM to determine which data points to label. Such methods usually calculate the margin, W, of each unlabeled datum in TU,i and treat W as an n-dimensional distance from that datum to the separating hyperplane.
Minimum Marginal Hyperplane methods assume that the data with the smallest W are those that the SVM is most uncertain about and therefore should be placed in TC,i to be labeled. Other similar methods, such as Maximum Marginal Hyperplane, choose data with the largest W. Tradeoff methods choose a mix of the smallest and largest Ws.