A Comprehensive Survey of Few-Shot Learning Techniques

In a world where new domains are constantly emerging and machine learning (ML) is increasingly used to automate new tasks, the limited availability of training samples poses a significant challenge. Traditional ML training relies heavily on large datasets, but acquiring a large dataset with sufficient usable samples is often difficult and time-consuming. Few-shot learning (FSL) emerges as a solution for this problem. FSL leverages prior knowledge to quickly generalize to new tasks, even when only a few labeled samples are available. This article presents a comprehensive survey of FSL, exploring its challenges, opportunities, and diverse applications across various domains.

The Essence of Few-Shot Learning

Few-shot learning is a critical tool in situations where only small datasets are initially available, and decisions must be made before larger datasets can be acquired. In such scenarios, a machine learning model is designed to train with a minimal dataset and then to make reasonably accurate predictions.

For instance, during a sudden outbreak like the COVID-19 pandemic, few-shot learning can be used to predict infected individuals quickly and better control the disease spread through human transmissions. This can reduce the large death toll and associated economic loss. The 2019 COVID-19 pandemic, caused by the SARS-CoV-2 virus, led to a global healthcare crisis as well as economic disruption and societal crisis due to illness, lockdowns, and vaccine rollouts. By July 2020, an estimated 120,000 people had been affected, and more than 4,000 died due to the limited availability of data needed to develop traditional ML models.

Few-shot learning can also play a role in an economic recession. When a recession approaches, there are often signs in markets such as the job market and housing market. However, the signs are slow at first, resulting in only a small amount of data available to evaluate if a recession will occur. After the economic downturn in 2007, research was conducted on how it impacted the elderly. It was noted that an estimated 42% of elderly people were struggling to buy essential items, resulting in reduced or skipped meals.

Formal Definition of Few-Shot Learning

Few-Shot Learning (FSL) is proposed to tackle the problem of machine learning hampered when the data set is small. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. The core issue in FSL is that the empirical risk minimized is unreliable. Based on how prior knowledge can be used to handle this core issue, FSL methods can be categorized from three perspectives:

Read also: Protecting Bruins from the Flu

(i) Data: uses prior knowledge to augment the supervised experience
(ii) Model: uses prior knowledge to reduce the size of the hypothesis space
(iii) Algorithm: uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space.

Few-Shot Learning in the Audio Domain

Few-shot learning in the audio domain takes on different forms with different methods. In some cases, few-shot sound detection is applied to classify unseen sounds with only a few support labeled samples to fine tune the model. Another problem that few-shot learning can be applied to in the audio domain is multilingual speech emotion recognition, as there are sometimes not large datasets for languages which are not as widely used as others. Few-shot learning in the audio domain is also developed for the extraction of a target sound from a group of other sounds, such as picking out the violin sound from a concert recording. One noted issue with the audio domain is that labels of a higher quality are more labor-intensive to create, making model development more challenging.

Algorithms, including the model-agnostic and meta-curvature, can be classified as overall meta-learning methods that learn tasks to be able to process new and similar ones. Another few-shot learning algorithm that can be applied is the transfer learning algorithm, which are based on learning from previous models and adapting that knowledge to a new, smaller dataset. Approaches including the optimization-based model-agnostic algorithms, transfer learning algorithms and dynamic few-shot learning algorithms are taken to fulfill these applications.

Model-Agnostic Meta-Learning

Meta-learning is an approach that was popularized for image-related tasks and which is being expanded into the sphere of audio tasks. These tasks include audio event recognition, text-to-speech, speaker recognition, speech recognition, and other related areas to speech processing. Meta-learning can design models that adapt to new environments and learn new skills quickly with only a few training samples, making it suitable for solving few-shot tasks. The model accomplishes this by using multiple subtasks to learn the parameter initialization so that fine-tuning can be applied to the initialization with only a few labels and still perform well with the targeted tasks. In each approach, there is a task-independent encoding function and a task-specific classifier in task-independent learning.

ProtoNet is a model for learning embeddings for classification, while Ridge is a model for preventing overfitting, uses linear regression, and MetaOptNet is a model for optimizing feature representations. The models use nearest neighbor, linear regression, and linear SVM, respectively, as classifiers. For SVM-based methods, approximate gradients are captured using implicit functions to facilitate end-to-end training. There are typically two stages in the meta-learning method: the meta-train stage and the meta-test stage. During the meta-train stage, the model is trained on base classes using an episodic training strategy. In the next stage, the meta-test stage, the network is transferred to previously unseen classes. The transfer is task-agnostic, so the model may not generate the most effective feature representations for distinguishing specific tasks that include novel classes.

One of the most effective meta-learning methods that has been observed is the model-agnostic meta-learning (MAML) model, which is gradient-based. The MAML algorithm learns the initialization parameters from the meta training set so that the model can quickly adapt to new tasks after only a few steps of gradient descent. The MAML algorithm was applied to a multi-speaker TTS model, and the results of the study conducted on the Meta-TTS model demonstrated that it could generate speech with high speaker similarity with only a small number of samples.

Read also: The Power of Zero-Shot Learning

Gradient-based meta-learning is a subset of models that can rapidly adapt to new tasks using gradient-based updates, usually employing a bilevel optimization procedure. Research has shown that the aforementioned approaches consistently outperform both metric-based and baseline methods. However, there are drawbacks to MAML. If the source task knowledge is too diverse, it has been noted to fail to generalize the common knowledge of the source tasks. In the DeepASC model, the team overcame this by accommodating task-wide general knowledge and task-specific knowledge into the MAML-initialized parameters.

Transfer Learning

Another approach that can be applied to audio-based few-shot learning is transfer learning. Transfer learning takes advantage of the knowledge learned from other data or features to solve given tasks and to ensure that the model is not completely re-trained. When using a neural network, transferring pre-trained weights can significantly reduce the number of trainable parameters in the target-task model, thus enabling effective learning with a smaller dataset.

Transfer learning consists of two tasks: a source task and a target task. The source task uses existing training data, of which there is an abundance. The target task has a limited amount of data, and any knowledge learned in the source task that may be useful is transferred to the target task. The transfer-learning approach has been used in the construction of acoustic models for low-resource languages, the adaptation of generative adversarial networks, and the transference of knowledge from the visual to the audio domain. It has been noted that while with visual-based few-shot learning it is standard to reuse networks which are pre-trained on large datasets; that is not the case with audio-based few-shot learning.

It has been noted that a basic transfer learning approach becomes increasingly effective when the number of support samples grows. Wang et al. found that retraining a supervised model with additional novel examples led to the highest mean average predictions (mAP) on both base and novel classes, while also achieving an F-measure comparable to other methods, likely due to the use of transfer-learning. A novel method utilizing hierarchical long- and short-term memory neural networks has been proposed to solve an emerging overfitting problem. This approach integrates the transfer-learning framework and accelerates the converge speed of the task model. It also helps maintain the sensitivity to input data while minimizing overfitting when working with a limited number of new task samples. Another proposed solution for overfitting is using transfer learning with deep networks. One deep convolutional neural network in the image domain, called AlexNet, has been applied with transfer learning to audio tasks. In the case of AlexNet, the proposed model is a CNN (convolutional neural network) model which consists of a pre-trained AlexNet, where the preserved parameters of the pre-trained model serve as initialization for training. The trained classification model was able to accurately identify different whale sounds with a low test time of 9.5 seconds and with F1 scores varying from 0.9533 to 0.9850.

Dynamic Few-Shot Learning

Dynamic few-shot learning (DFSL) was first proposed for dynamic few-shot visual learning by Gidaris and Komodakis for the purpose of few-shot class-incremental learning in the visual domain. Since them, the same approach has been proposed to be applied in the audio domain by Wang et al. for the purposes of few-shot continual learning. DFSL was optimally designed to solve few-shot problems because it uses an additional episodic training stage in order to train the few-shot weight generator. The goal of DFSL is to learn categories while using only a few labeled points and factoring in the base categories with which the model was initially trained.

Read also: Few-Shot Learning Innovations

Few-Shot Learning Methods for Image Classification

In recent years, with the development of science and technology, powerful computing devices have been constantly developing. As an important foundation, deep learning (DL) technology has achieved many successes in multiple fields. In addition, the success of deep learning also relies on the support of large-scale datasets, which can provide models with a variety of images. The rich information in these images can help the model learn more about various categories of images, thereby improving the classification performance and generalization ability of the model. However, in real application scenarios, it may be difficult for most tasks to collect a large number of images or enough images for model training, which also restricts the performance of the trained model to a certain extent. Therefore, how to use limited samples to train the model with high performance becomes key. In order to improve this problem, the few-shot learning (FSL) strategy is proposed, which aims to obtain a model with strong performance through a small amount of data. Therefore, FSL can play its advantages in some real scene tasks where a large number of training data cannot be obtained. In this review, we will mainly introduce the FSL methods for image classification based on DL, which are mainly divided into four categories: methods based on data enhancement, metric learning, meta-learning and adding other tasks. First, we introduce some classic and advanced FSL methods in the order of categories. Second, we introduce some datasets that are often used to test the performance of FSL methods and the performance of some classical and advanced FSL methods on two common datasets.

Data Augmentation Methods

Data augmentation techniques aim to expand the training set by creating new samples from existing ones. These methods can be broadly categorized into:

Mixing-based methods: These techniques create new samples by mixing existing images or features. Examples include SuperMix and StyleMix.
Transformation-based methods: These methods apply various transformations to existing images, such as rotations, flips, and crops. AutoAugment is a popular technique that automatically learns augmentation strategies from data.
Generative methods: These approaches use generative models, such as Generative Adversarial Networks (GANs), to create synthetic images. CycleGAN is used for unpaired image-to-image translation.

Metric Learning Methods

Metric learning aims to learn a distance metric that can effectively compare different images. These methods typically involve:

Learning embeddings: Mapping images to a lower-dimensional space where similar images are close together and dissimilar images are far apart.
Comparing embeddings: Using a distance function (e.g., Euclidean distance, cosine similarity) to compare the embeddings of different images. Relation Network is a popular metric learning method.

Meta-Learning Methods

Meta-learning, also known as "learning to learn," aims to train models that can quickly adapt to new tasks with limited data. Meta-learning approaches typically involve:

Training on a distribution of tasks: The model is trained on a set of tasks, each with its own training and test data.
Learning to initialize or adapt: The model learns to initialize its parameters or adapt its learning process to new tasks. Meta-Transfer Learning is a popular meta-learning method.

Applications of Few-Shot Learning

The ability to learn from limited data makes FSL valuable in various applications, including:

Medical imaging: FSL can be used to classify medical images, such as X-rays and MRIs, even when only a small number of labeled images are available.
Smart agriculture: FSL can be applied to tasks such as plant disease identification and crop monitoring.
Remaining useful life prediction: FSL can be used to predict the remaining useful life of equipment based on limited data.

tags: #few #shot #learning #survey #papers