paint-brush
An Introduction to Active Learningby@whatsai
488 reads
488 reads

An Introduction to Active Learning

by Louis BouchardJune 18th, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Active learning aims to optimize the annotation of your dataset and train the best possible model using the least amount of training data. It's a supervised learning approach that involves an iterative process between your model's predictions and your data. By annotating fewer images overall, you save time and money while achieving an optimized model.
featured image - An Introduction to Active Learning
Louis Bouchard HackerNoon profile picture

In today's world, we have access to an enormous amount of data, thanks to powerful AI models like ChatGPT, as well as vision models and other similar technologies. However, it's not just about the quantity of data that these models rely on, but also the quality. Creating a good dataset quickly and at scale can be a challenging and costly task.


That's where active learning comes in.

In simple terms, active learning aims to optimize the annotation of your dataset and train the best possible model using the least amount of training data.


It's a supervised learning approach that involves an iterative process between your model's predictions and your data. Instead of waiting for a complete dataset, you can start with a small batch of curated annotated data and train your model with it.


Then, using active learning, you can leverage your model to label unseen data, evaluate the accuracy of predictions, and select the next set of data to annotate based on acquisition functions.


One advantage of active learning is that you can analyze the confidence level of your model's predictions.


If a prediction has low confidence, the model will request additional images of that type to be labeled. On the other hand, predictions with high confidence won't require more data. By annotating fewer images overall, you save time and money while achieving an optimized model. Active learning is a highly promising approach for working with large-scale datasets.


Representation of active learning. Image from Kumar et al.



There are a few key points to remember about active learning.

First, it involves human annotation, giving you control over the quality of your model's predictions. It's not a black box trained on millions of images. You actively participate in its development and assist in improving its performance. This aspect makes active learning important and interesting, even though it may increase costs compared to unsupervised approaches. However, the time saved in training and deploying the model often outweighs these costs.


Additionally, you can use automatic annotation tools and manually correct them, further reducing expenses.


In active learning, you have a labeled set of data that your model is trained on, while the unlabeled set contains potential data that hasn't been annotated yet. A crucial concept is the query strategies, which determine which data to label. There are various approaches to finding the most informative subsets in the large pool of unlabeled data. For example, uncertainty sampling involves testing your model on unlabeled data and selecting the least confidently classified examples for annotation.


Representation of active learning with the Query by Committee approach. Image from Kumar et al.



Another technique in active learning is Query by Committee (QBC), where multiple models, each trained on a different subset of labeled data, form a committee. These models have distinct perspectives on the classification problem, just as people with different experiences have varying understandings of certain concepts. The data to be annotated is selected based on the disagreement among the committee models, indicating complexity. This iterative process continues as the selected data is annotated continuously.


This is just a basic explanation of active learning, showcasing one example of a query strategy.

If you're interested, I can provide more information or videos on other machine learning strategies. A real-life example of active learning is when you answer captchas on Google. By doing so, you help them identify complex images and build datasets with the collective input of multiple users, ensuring both dataset quality and human verification. So, the next time you encounter a captcha, remember that you're contributing to the progress of AI models!


To learn more and see a practical example using an excellent tool developed by my friends at Encord, check out the video: