hub

Intrinsic dimen- sionality explains the effectiveness of language model fine-tuning

Aghajanyan, A · 2012 · arXiv 2012.13255

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

read on arXiv browse 19 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 2

citation-polarity summary

background 3 use method 1

representative citing papers

DataDignity: Training Data Attribution for Large Language Models

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

ScoringModel raises mean Recall@10 to 52.2 on the FakeWiki provenance benchmark from 35.0 for the best baseline, winning 41 of 45 model-by-condition comparisons and gaining 15.7 points on jailbreak-style queries.

Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights

cs.LG · 2025-05-06 · unverdicted · novelty 7.0

Transformers achieve approximation and generalization error bounds for noisy manifold regression that scale with the intrinsic dimension of the task-level manifold.

LoRA: Low-Rank Adaptation of Large Language Models

cs.CL · 2021-06-17 · accept · novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

Prefix-Tuning: Optimizing Continuous Prompts for Generation

cs.CL · 2021-01-01 · conditional · novelty 7.0

Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.

Interpretable Discriminative Text Representations via Agreement and Label Disentanglement

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

LFD discovers predictive text features via LLM contrastive proposals, cross-LLM Cohen's kappa screening, and residual held-out gain selection, matching baseline accuracy while achieving higher human agreement and lower label leakage on ten tasks.

Combining pre-trained models via localized model averaging

stat.ME · 2026-05-13 · unverdicted · novelty 6.0

Localized model averaging with covariate-dependent weights achieves asymptotic optimality and weight consistency for combining pre-trained models under a general loss framework.

Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Health foundation model embeddings contain an interpretable symbolic organization shared across modalities that supports cross-domain transfer without joint training.

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.

TLoRA: Task-aware Low Rank Adaptation of Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.

CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

CR-Net uses cross-layer low-rank residuals in a dual-path network plus specialized recomputation to outperform prior low-rank methods on 60M-7B model pre-training while using less compute and memory.

HyperAdapt: Simple High-Rank Adaptation

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.

LoCO: Low-rank Compositional Rotation Fine-tuning

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

LoCO is a PEFT technique that constructs orthogonal transformations via low-rank skew-symmetric matrices and compositional rotation chains with a parallelizable approximation, validated on transformer adaptations.

Training Transformers in Cosine Coefficient Space

cs.PF · 2026-04-06 · unverdicted · novelty 5.0

Training transformers by optimizing only half the DCT coefficients per linear layer achieves validation loss within 0.024 of a dense baseline on Shakespeare character prediction, outperforming matched-parameter LoRA due to preserved rank flexibility.

ARIA: Adaptive Retrieval Intelligence Assistant -- A Multimodal RAG Framework for Domain-Specific Engineering Education

cs.IR · 2026-02-04 · conditional · novelty 5.0

ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.

Document Retrieval Augmented Fine-Tuning (DRAFT) for safety-critical software assessments

cs.SE · 2025-05-02 · unverdicted · novelty 5.0

DRAFT fine-tunes LLMs with a dual-retrieval architecture and semi-automated datasets containing distractors to achieve 7% higher correctness in safety compliance assessments.

DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs

cs.CR · 2026-04-21 · unverdicted · novelty 4.0

DP-FLogTinyLLM combines federated learning, differential privacy, and LoRA-tuned tiny LLMs to match centralized log anomaly detection performance on Thunderbird and BGL datasets while preserving privacy.

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

cs.LG · 2024-03-21 · accept · novelty 4.0

A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026)

cs.CL · 2025-01-03 · unverdicted · novelty 2.0

A literature survey of Small Language Models (1-8B parameters) that can perform comparably or better than larger models, covering general-purpose and task-specific approaches plus creation techniques.

Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts

cs.LG · 2025-06-26

citing papers explorer

Showing 19 of 19 citing papers.

DataDignity: Training Data Attribution for Large Language Models cs.AI · 2026-05-07 · unverdicted · none · ref 12
ScoringModel raises mean Recall@10 to 52.2 on the FakeWiki provenance benchmark from 35.0 for the best baseline, winning 41 of 45 model-by-condition comparisons and gaining 15.7 points on jailbreak-style queries.
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights cs.LG · 2025-05-06 · unverdicted · none · ref 2
Transformers achieve approximation and generalization error bounds for noisy manifold regression that scale with the intrinsic dimension of the task-level manifold.
LoRA: Low-Rank Adaptation of Large Language Models cs.CL · 2021-06-17 · accept · none · ref 1
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
Prefix-Tuning: Optimizing Continuous Prompts for Generation cs.CL · 2021-01-01 · conditional · none · ref 1
Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement cs.CL · 2026-05-20 · unverdicted · none · ref 1
LFD discovers predictive text features via LLM contrastive proposals, cross-LLM Cohen's kappa screening, and residual held-out gain selection, matching baseline accuracy while achieving higher human agreement and lower label leakage on ten tasks.
Combining pre-trained models via localized model averaging stat.ME · 2026-05-13 · unverdicted · none · ref 162
Localized model averaging with covariate-dependent weights achieves asymptotic optimality and weight consistency for combining pre-trained models under a general loss framework.
Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer cs.LG · 2026-05-08 · unverdicted · none · ref 62
Health foundation model embeddings contain an interpretable symbolic organization shared across modalities that supports cross-domain transfer without joint training.
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors cs.CV · 2026-05-01 · unverdicted · none · ref 138
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
TLoRA: Task-aware Low Rank Adaptation of Large Language Models cs.CL · 2026-04-20 · unverdicted · none · ref 53
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure cs.LG · 2025-09-23 · unverdicted · none · ref 1
CR-Net uses cross-layer low-rank residuals in a dual-path network plus specialized recomputation to outperform prior low-rank methods on 60M-7B model pre-training while using less compute and memory.
HyperAdapt: Simple High-Rank Adaptation cs.LG · 2025-09-23 · unverdicted · none · ref 2
HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.
LoCO: Low-rank Compositional Rotation Fine-tuning cs.LG · 2026-05-15 · unverdicted · none · ref 1
LoCO is a PEFT technique that constructs orthogonal transformations via low-rank skew-symmetric matrices and compositional rotation chains with a parallelizable approximation, validated on transformer adaptations.
Training Transformers in Cosine Coefficient Space cs.PF · 2026-04-06 · unverdicted · none · ref 1
Training transformers by optimizing only half the DCT coefficients per linear layer achieves validation loss within 0.024 of a dense baseline on Shakespeare character prediction, outperforming matched-parameter LoRA due to preserved rank flexibility.
ARIA: Adaptive Retrieval Intelligence Assistant -- A Multimodal RAG Framework for Domain-Specific Engineering Education cs.IR · 2026-02-04 · conditional · none · ref 29
ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.
Document Retrieval Augmented Fine-Tuning (DRAFT) for safety-critical software assessments cs.SE · 2025-05-02 · unverdicted · none · ref 1
DRAFT fine-tunes LLMs with a dual-retrieval architecture and semi-automated datasets containing distractors to achieve 7% higher correctness in safety compliance assessments.
DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs cs.CR · 2026-04-21 · unverdicted · none · ref 39
DP-FLogTinyLLM combines federated learning, differential privacy, and LoRA-tuned tiny LLMs to match centralized log anomaly detection performance on Thunderbird and BGL datasets while preserving privacy.
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey cs.LG · 2024-03-21 · accept · none · ref 75
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026) cs.CL · 2025-01-03 · unverdicted · none · ref 8
A literature survey of Small Language Models (1-8B parameters) that can perform comparably or better than larger models, covering general-purpose and task-specific approaches plus creation techniques.
Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts cs.LG · 2025-06-26 · unreviewed · ref 1

Intrinsic dimen- sionality explains the effectiveness of language model fine-tuning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer