hub

Intrinsic dimen- sionality explains the effectiveness of language model fine-tuning

Intrinsic dimensionality explains the effectiveness of language model fine-tuning , author= · 2012 · arXiv 2012.13255

22 Pith papers cite this work. Polarity classification is still indexing.

22 Pith papers citing it

read on arXiv browse 22 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 2

citation-polarity summary

background 3 use method 1

representative citing papers

FRAME: Learning the Adaptation Domain with a Mixture of Fractional-Fourier Experts

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

FRAME adds a learnable fractional-Fourier order per expert in a MoE-LoRA setup so that low-rank updates are placed in the domain where they are most compact, yielding gains over fixed-domain baselines on LLaMA-3.1-8B and Qwen2.5-7B.

DataDignity: Training Data Attribution for Large Language Models

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

ScoringModel raises mean Recall@10 to 52.2 on the FakeWiki provenance benchmark from 35.0 for the best baseline, winning 41 of 45 model-by-condition comparisons and gaining 15.7 points on jailbreak-style queries.

Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights

cs.LG · 2025-05-06 · unverdicted · novelty 7.0

Transformers achieve approximation and generalization error bounds for noisy manifold regression that scale with the intrinsic dimension of the task-level manifold.

LoRA: Low-Rank Adaptation of Large Language Models

cs.CL · 2021-06-17 · accept · novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

Prefix-Tuning: Optimizing Continuous Prompts for Generation

cs.CL · 2021-01-01 · conditional · novelty 7.0

Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.

Interpretable Discriminative Text Representations via Agreement and Label Disentanglement

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

LFD discovers predictive text features via LLM contrastive proposals, cross-LLM Cohen's kappa screening, and residual held-out gain selection, matching baseline accuracy while achieving higher human agreement and lower label leakage on ten tasks.

Combining pre-trained models via localized model averaging

stat.ME · 2026-05-13 · unverdicted · novelty 6.0

Localized model averaging with covariate-dependent weights achieves asymptotic optimality and weight consistency for combining pre-trained models under a general loss framework.

Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Health foundation model embeddings contain an interpretable symbolic organization shared across modalities that supports cross-domain transfer without joint training.

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.

TLoRA: Task-aware Low Rank Adaptation of Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.

CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

CR-Net uses cross-layer low-rank residuals in a dual-path network plus specialized recomputation to outperform prior low-rank methods on 60M-7B model pre-training while using less compute and memory.

HyperAdapt: Simple High-Rank Adaptation

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.

The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

cs.LG · 2026-06-05 · unverdicted · novelty 5.0

Full fine-tuning causes negative transfer and performance collapse in sub-300M SLMs on math tasks, establishing PEFT as a stability requirement.

LoCO: Low-rank Compositional Rotation Fine-tuning

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

LoCO is a PEFT technique that constructs orthogonal transformations via low-rank skew-symmetric matrices and compositional rotation chains with a parallelizable approximation, validated on transformer adaptations.

Training Transformers in Cosine Coefficient Space

cs.PF · 2026-04-06 · unverdicted · novelty 5.0

Training transformers by optimizing only half the DCT coefficients per linear layer achieves validation loss within 0.024 of a dense baseline on Shakespeare character prediction, outperforming matched-parameter LoRA due to preserved rank flexibility.

ARIA: Adaptive Retrieval Intelligence Assistant -- A Multimodal RAG Framework for Domain-Specific Engineering Education

cs.IR · 2026-02-04 · conditional · novelty 5.0

ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.

Document Retrieval Augmented Fine-Tuning (DRAFT) for safety-critical software assessments

cs.SE · 2025-05-02 · unverdicted · novelty 5.0

DRAFT fine-tunes LLMs with a dual-retrieval architecture and semi-automated datasets containing distractors to achieve 7% higher correctness in safety compliance assessments.

Reversible Foundations: Training a 120B Sparse MoE through State-Preserving Scaling

cs.LG · 2026-06-05 · unverdicted · novelty 4.0

A 120B sparse MoE model with 460 experts was trained on one 8-GPU node to loss 1.78 using reversible recurrence and state-preserving scaling from a 1.78B dense seed, with 5.93B active parameters.

DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs

cs.CR · 2026-04-21 · unverdicted · novelty 4.0

DP-FLogTinyLLM combines federated learning, differential privacy, and LoRA-tuned tiny LLMs to match centralized log anomaly detection performance on Thunderbird and BGL datasets while preserving privacy.

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

cs.LG · 2024-03-21 · accept · novelty 4.0

A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026)

cs.CL · 2025-01-03 · unverdicted · novelty 2.0

A literature survey of Small Language Models (1-8B parameters) that can perform comparably or better than larger models, covering general-purpose and task-specific approaches plus creation techniques.

Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts

cs.LG · 2025-06-26

citing papers explorer

Showing 2 of 2 citing papers after filters.

LoRA: Low-Rank Adaptation of Large Language Models cs.CL · 2021-06-17 · accept · none · ref 1
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey cs.LG · 2024-03-21 · accept · none · ref 75
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

Intrinsic dimen- sionality explains the effectiveness of language model fine-tuning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer