arxiv: 2604.16332 · v1 · submitted 2026-03-12 · 💻 cs.LG · cs.CL

Recognition: no theorem link

Annotation Entropy Predicts Per-Example Learning Dynamics in LoRA Fine-Tuning

Brady Steele

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:30 UTC · model grok-4.3

classification 💻 cs.LG cs.CL

keywords annotation entropyLoRA fine-tuninglearning dynamicsun-learningSNLIMNLIChaosNLIper-example loss

0 comments

The pith

LoRA fine-tuning produces rising loss on high-annotator-disagreement examples, unlike full fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that examples carrying high annotation entropy exhibit increasing per-example loss during LoRA adaptation, a dynamic the authors interpret as un-learning. The pattern appears across six models and both SNLI and MNLI, yet remains largely absent when the same tasks are trained with full fine-tuning. Annotation entropy is computed from the 100-label distributions supplied by ChaosNLI and is shown to correlate positively with the area under each example's loss curve. The correlation survives partial-correlation controls and replicates across random seeds, pointing to a systematic difference in how LoRA versus full updates handle contested training signals. A preliminary noise-injection check yields results consistent with this entropy-driven account.

Core claim

LoRA fine-tuning exhibits un-learning on contested examples: items with high annotator disagreement show increasing loss during training, a qualitatively distinct pattern largely absent under full fine-tuning and consistent across all six models tested (four encoder, two decoder-only). This discovery emerges from correlating annotation entropy, computed from ChaosNLI's 100 labels per example, with per-example area under the loss curve (AULC) on SNLI and MNLI.

What carries the argument

Annotation entropy, derived from the label distribution over 100 annotators per example, correlated with each example's area under the loss curve during adaptation.

If this is right

Positive Spearman correlation between annotation entropy and AULC holds in every one of the 25 tested conditions.
Decoder-only models display stronger correlations than encoder models at matched LoRA rank.
The entropy-AULC relationship survives partial-correlation controls and replicates across random seeds and datasets.
A noise-injection experiment produces loss trajectories consistent with the entropy-driven pattern.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners facing ambiguous labels may obtain more stable adaptation by switching from LoRA to full fine-tuning on the same data.
The restricted parameter space of LoRA could make its updates more vulnerable to label noise than full-rank updates.
Similar entropy-based diagnostics could be applied to other low-rank adaptation techniques to test whether the un-learning pattern generalizes beyond LoRA.

Load-bearing premise

The observed rise in loss on high-entropy items is produced by un-learning driven by annotation disagreement rather than by other training dynamics or dataset artifacts.

What would settle it

If a controlled experiment that equalizes annotator labels on the same examples eliminates the loss increase under LoRA while preserving the same training schedule, the claimed link between entropy and un-learning would be falsified.

Figures

Figures reproduced from arXiv: 2604.16332 by Brady Steele.

**Figure 2.** Figure 2: Per-example loss under full fine-tuning (RoBERTa, SNLI, seed 42). Unlike LoRA ( [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of annotation entropy across the [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Per-example gradient norms by entropy cate [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Mean AULC by entropy bin under three bin [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 7.** Figure 7: plots the Dataset Cartography map (confidence vs. variability) colored by annotation entropy category, pooled across all available tracker runs (54 main-matrix runs plus additional rank-sweep configurations). Clean examples cluster at high confidence / low variability, while contested examples spread across the cartography space, confirming that entropy categories and cartography regions capture related… view at source ↗

**Figure 6.** Figure 6: Cross-architecture comparison of AULC– entropy Spearman ρ on SNLI. Decoder-only models (Qwen2.5-1.5B, Qwen2.5-3B) show stronger and more consistent correlations than the encoder baseline (DeBERTa v3). Within each architecture family, higher LoRA rank produces stronger correlations. Error bars show ±1 sample standard deviation across seeds. I Dataset Cartography Comparison [PITH_FULL_IMAGE:figures/full_fi… view at source ↗

**Figure 9.** Figure 9: Prediction entropy (left) and expected calibra [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

read the original abstract

We find that LoRA fine-tuning exhibits un-learning on contested examples: items with high annotator disagreement show increasing loss during training, a qualitatively distinct pattern largely absent under full fine-tuning and consistent across all six models tested (four encoder, two decoder-only). This discovery emerges from correlating annotation entropy, computed from ChaosNLI's 100 labels per example, with per-example area under the loss curve (AULC) on SNLI and MNLI. The correlation is positive in all 25 conditions tested (Spearman $\rho = 0.06$-$0.43$), with decoder-only models showing stronger correlations than encoders at matched LoRA rank. The effect survives partial-correlation controls and replicates across seeds and datasets. A preliminary noise-injection experiment is consistent with these findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LoRA shows rising loss on high-entropy examples while full fine-tuning does not, with modest but consistent correlations.

read the letter

The main finding is that LoRA fine-tuning produces increasing loss on examples with high annotator disagreement, measured via entropy from ChaosNLI, while full fine-tuning largely avoids this pattern. The positive Spearman correlations hold in all 25 conditions across six models and two datasets, ranging from 0.06 to 0.43, and decoder-only models show stronger effects at matched ranks. Partial-correlation controls and seed replication add some weight to the descriptive result, and the preliminary noise-injection check lines up with it. The direct contrast with full fine-tuning is the clearest new angle here, since prior work on label noise or LoRA dynamics does not isolate this per-example behavior under parameter-efficient updates. The paper does a reasonable job keeping the analysis straightforward and grounded in public data without obvious circularity. The correlations are modest in size, so the practical difference may be limited, and the full methods section would need to spell out exact loss-curve handling and any filtering rules. The jump from rising loss to “un-learning” is an interpretation that could use tighter support, but the raw pattern itself looks solid enough on the evidence given. This is the sort of targeted empirical note that would interest people working on LoRA for NLP tasks where data ambiguity matters. It deserves peer review so the details can be checked and the effect size can be put in context.

Referee Report

3 major / 2 minor

Summary. The paper claims that LoRA fine-tuning on SNLI/MNLI exhibits a distinct 'un-learning' pattern on high-entropy examples from ChaosNLI: per-example area under the loss curve (AULC) shows positive Spearman correlations (ρ = 0.06–0.43) with annotation entropy across all 25 tested conditions, six models (four encoder, two decoder-only), and two datasets. This pattern is largely absent under full fine-tuning, survives partial-correlation controls, replicates across seeds, and is supported by a preliminary noise-injection check.

Significance. If the central empirical correlation holds, the work provides a concrete, replicable link between annotator disagreement and per-example loss dynamics that differentiates LoRA from full fine-tuning. This has potential value for understanding parameter-efficient adaptation, designing noise-robust training schedules, and prioritizing data curation for contested examples. The use of public ChaosNLI labels and standard loss curves makes the finding falsifiable and extensible.

major comments (3)

[Experimental Setup / Methods] The manuscript reports consistent positive correlations but provides insufficient methodological detail on AULC computation (e.g., exact integration limits, handling of early-stopping or epoch boundaries), example exclusion criteria, and whether loss curves are normalized per example. These omissions are load-bearing for interpreting the reported ρ values and the 'un-learning' claim.
[Results] The partial-correlation controls are described only at a high level; the specific covariates, the resulting controlled ρ values, and any multiple-testing correction are not reported. Without these numbers it is impossible to assess whether annotation entropy retains independent explanatory power over simpler proxies such as example length or initial loss.
[Results / Figures] Error bars, confidence intervals, or per-condition p-values are absent from the correlation tables and figures. Given the modest effect sizes (ρ down to 0.06) and the claim of replication across 25 conditions, statistical characterization is required to evaluate robustness.

minor comments (2)

[Experimental Setup] Decoder-only models are reported to show stronger correlations than encoders at matched LoRA rank, but the precise rank values, adapter placement, and learning-rate schedules used for each model family are not tabulated.
[Results] The noise-injection experiment is described as 'preliminary' and 'consistent'; a brief quantitative summary (e.g., change in ρ after noise injection) would strengthen the causal interpretation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to provide the requested methodological clarifications, numerical details, and statistical characterizations.

read point-by-point responses

Referee: [Experimental Setup / Methods] The manuscript reports consistent positive correlations but provides insufficient methodological detail on AULC computation (e.g., exact integration limits, handling of early-stopping or epoch boundaries), example exclusion criteria, and whether loss curves are normalized per example. These omissions are load-bearing for interpreting the reported ρ values and the 'un-learning' claim.

Authors: We agree that the current description of AULC is insufficiently precise. In the revised manuscript we will add an explicit subsection in Methods that states: AULC is computed via trapezoidal integration of the raw per-example cross-entropy loss from the first to the final training step; training uses a fixed number of epochs with no early stopping; loss curves are not normalized per example; and example exclusion is limited to the small fraction of items that produce NaN losses after tokenization (reported in the appendix). These additions will directly support interpretation of the reported correlations. revision: yes
Referee: [Results] The partial-correlation controls are described only at a high level; the specific covariates, the resulting controlled ρ values, and any multiple-testing correction are not reported. Without these numbers it is impossible to assess whether annotation entropy retains independent explanatory power over simpler proxies such as example length or initial loss.

Authors: We will expand the partial-correlation section to list the exact covariates (sequence length in tokens, initial loss at epoch 0, and original SNLI/MNLI label entropy), report the controlled Spearman ρ values for each of the 25 conditions, and describe the multiple-testing correction applied. These numbers and the full procedure will appear in a new table and accompanying text in the revised Results. revision: yes
Referee: [Results / Figures] Error bars, confidence intervals, or per-condition p-values are absent from the correlation tables and figures. Given the modest effect sizes (ρ down to 0.06) and the claim of replication across 25 conditions, statistical characterization is required to evaluate robustness.

Authors: We acknowledge the absence of statistical detail in the current tables and figures. The revision will add 95% bootstrap confidence intervals (1,000 resamples) and per-condition p-values (with FDR correction) to all reported ρ values, and will include error bars on the relevant figures. This will allow readers to assess robustness given the observed effect-size range. revision: yes

Circularity Check

0 steps flagged

Empirical correlation with no self-referential derivation

full rationale

The paper reports an observed positive Spearman correlation between annotation entropy (computed directly from ChaosNLI's 100 labels per example) and per-example area under the loss curve (AULC) during LoRA fine-tuning on SNLI/MNLI. AULC is a standard integral of training loss trajectories with no fitted parameters or equations that reduce the reported statistic to its inputs by construction. The analysis replicates across models, datasets, and controls without invoking self-citations, uniqueness theorems, or ansatzes as load-bearing steps. The central claim is therefore a descriptive empirical pattern rather than a derived result that collapses to its own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on direct empirical correlations from public multi-annotation datasets without introducing new fitted parameters, axioms beyond standard statistics, or invented entities.

axioms (1)

standard math Standard assumptions underlying Spearman rank correlation and partial correlation controls
Invoked to establish the reported positive correlations and their robustness.

pith-pipeline@v0.9.0 · 5423 in / 1066 out tokens · 32655 ms · 2026-05-15T12:30:20.053843+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages · 2 internal anchors

[1]

Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective

LoRA learns less and forgets less.Transac- tions on Machine Learning Research (TMLR). Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large anno- tated corpus for learning natural language inference. InProceedings of the Conference on Empirical Meth- ods in Natural Language Processing (EMNLP). Beiduo Chen, Tiancheng ...

work page internal anchor Pith review Pith/arXiv arXiv 2015
[2]

Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R

Robustness beyond known groups with low- rank adaptation.arXiv preprint arXiv:2602.06924. Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, and Noah A. Smith. 2018. Annotation artifacts in natural language inference data. InProceedings of the Conference of the North American Chapter of the Association for Computational Ling...

work page arXiv 2018
[3]

InProceedings of the 37th International Conference on Machine Learning (ICML)

Let’s agree to agree: Neural networks share classification order on real datasets. InProceedings of the 37th International Conference on Machine Learning (ICML). Pengcheng He, Jianfeng Gao, and Weizhu Chen

work page
[4]

InProceedings of the International Conference on Learning Repre- sentations (ICLR)

DeBERTaV3: Improving DeBERTa us- ing ELECTRA-style pre-training with gradient- disentangled embedding sharing. InProceedings of the International Conference on Learning Repre- sentations (ICLR). Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-rank adaptation of large language m...

work page arXiv 2022
[5]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. InAd- vances in Neural Information Processing Systems (NeurIPS). Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach.arX...

work page internal anchor Pith review Pith/arXiv arXiv 2019