Active Learning with Uncertain Annotators
Towards Dedicated Collaborative Interactive Learning
- 160 Seiten
- 6 Lesestunden
In the digital age, data collection is crucial for various applications, particularly in classification algorithms that predict class labels for samples. These algorithms require labeled instances for training, leading to the emergence of active learning—a machine learning paradigm where a model is trained in a supervised manner using a dataset with a limited number of labeled samples. The active learner queries an annotator, often referred to as an oracle, for labels on unlabeled samples, aiming to maximize task performance metrics like classification accuracy while minimizing the number of queries. However, many strategies assume an omniscient annotator providing accurate labels, which is unrealistic as humans can be error-prone. This assumption is often violated in real-world scenarios with multiple annotators. The text discusses dedicated collaborative interactive learning, focusing on challenges related to uncertain and multiple uncertain oracles. It reviews the current state of active learning and introduces methods for simulating uncertain annotators due to the lack of publicly available datasets that reflect annotator confidence. A novel approach is proposed that transforms annotator confidence into gradual labels, evaluated through a case study with 30,000 handwritten images. Additionally, meritocratic learning is introduced, selecting and weighing annotators based on their quality, which enhances label accuracy while
