Unveiling the Learning Dynamics of LLM Finetuning

Modern Large Language Models (LLMs) have demonstrated remarkable proficiency in adapting to finetuning data. However, they often face challenges when encountering unseen examples. To equip models with genuine reasoning capabilities, moving beyond mere superficial pattern matching, it's essential to deeply understand the learning dynamics of LLM finetuning and how these dynamics shape downstream generalization. This article delves into the research surrounding the learning dynamics of LLM finetuning, exploring various facets such as the influence of training examples, the "squeezing effect" in Direct Preference Optimization (DPO), and the role of pre-memorization train accuracy in generalization.

Understanding Learning Dynamics in LLMs

Learning dynamics, in the context of LLMs, refers to how the learning of specific training examples influences the model's predictions on other examples. Analyzing these dynamics provides a potent tool for deciphering the behavior of deep learning systems. One approach involves analyzing the step-wise decomposition of how influence accumulates among different potential responses during various types of finetuning. This framework allows for a uniform interpretation of observations related to the training of algorithms for both instruction tuning and preference tuning.

This analysis focuses on reasoning tasks, where the problem structure allows differentiation between memorization (exact replication of reasoning steps from training data) and performance (correctness of the final solution).

Key Components of Learning Dynamics Analysis

The change in model prediction can be formalized using a decomposition into three key terms, adaptable to finetuning algorithms like Supervised Finetuning (SFT) and Direct Preference Optimization (DPO). The decomposition has a shape like $A(xo)K(xo,xu)G(xu)$, where $K$ is an eNTK term measuring the similarity between $xu$ and $xo$. This framework focuses on how learning example $xo$ influences the model's confidence on $xu$. It is a qualitative and microcosmic observation, offering complementary insights into how specific training instances shape the model’s behavior.

The "Squeezing Effect" in Direct Preference Optimization (DPO)

A significant aspect of learning dynamics is the "squeezing effect" observed during Direct Preference Optimization (DPO). During DPO training, especially when the negative sample has very low probability, not only does the probability of the negative answer decrease as expected, but the probability of the positive answer can also decrease. The probability mass instead concentrates on outputs that were already very likely before DPO. This phenomenon occurs because negative gradients during preference tuning can reduce the confidence of most responses, especially when the model is already confident or finetuning is off-policy. This means that running DPO for too long can make even the desired outputs less likely.

Read also: Understanding PLCs

To mitigate this, it has been proposed to first run SFT that also includes the negative examples, and only then apply DPO.

Hallucinations and Finetuning

The framework of learning dynamics also provides insights into why certain types of hallucination are strengthened after finetuning. For example, the model might use phrases or facts in the response for question B to answer question A, or the model might keep repeating similar simple phrases when generating responses. Training on a target pair not only increases the probability of the correct answer, but also slightly increases the probability of similar responses, especially early in training. If a similar response is also the preferred answer for another input, that response gets reinforced twice. This creates a concrete mechanism for a common SFT failure mode: the model answers with something that is correct, but for the wrong question.

Pre-Memorization Train Accuracy as a Predictor of Generalization

A model’s performance on test prompts can be effectively characterized by a training metric called pre-memorization train accuracy: the accuracy of model samples on training queries before they begin to copy the exact reasoning steps from the training set. On the dataset level, this metric is able to almost perfectly predict test accuracy, achieving $R^2$ of $\geq 0.9$ across various models (Llama3 8B, Gemma2 9B), datasets (GSM8k, MATH), and training configurations. On a per-example level, this metric is also indicative of whether individual model predictions are robust to perturbations in the training query.

By connecting a model’s learning dynamics to test performance, pre-memorization train accuracy can inform training decisions, such as the makeup of the training data. Experiments on data curation show that prioritizing examples with low pre-memorization accuracy leads to 1.5-2x improvements in data efficiency compared to i.i.d.

Experimental Validation and Datasets

The framework is validated through experiments on both the MNIST dataset and LLM finetuning, demonstrating its ability to explain counter-intuitive phenomena like the confidence decay observed in DPO. The methodology is grounded in systematic experiments utilizing datasets such as Antropic-HH and UltraFeedback. The probing datasets (Dprob and Dprobtest) allow the researchers to measure how the model's prediction changes in response to various stimulus inputs. Results from finetuning with SFT and DPO are compared rigorously, illustrating variations in influence on desired outputs.

Read also: Learning Resources Near You

Implications for Training Strategies

Understanding the learning dynamics of LLM finetuning has significant implications for developing effective training strategies. By understanding how models learn and generalize, researchers and practitioners can:

Mitigate the "Squeezing Effect": Employ strategies such as initial SFT training with negative examples before DPO to prevent the reduction of probability in desired outputs.
Reduce Hallucinations: Design training regimes that minimize the reinforcement of incorrect associations between questions and answers.
Improve Data Efficiency: Prioritize training examples with low pre-memorization accuracy to enhance generalization and reduce the need for large datasets.

Read also: Learning Civil Procedure

tags: #learning #dynamics #of #LLM #finetuning #research

Unveiling the Learning Dynamics of LLM Finetuning

Understanding Learning Dynamics in LLMs

Key Components of Learning Dynamics Analysis

The "Squeezing Effect" in Direct Preference Optimization (DPO)

Hallucinations and Finetuning

Pre-Memorization Train Accuracy as a Predictor of Generalization

Experimental Validation and Datasets

Implications for Training Strategies

Popular posts:

Company

For Learners

Connect with us