hub

Measuring the intrinsic dimension of objective landscapes

· 2018 · cs.LG · arXiv 1804.08838

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

open full Pith review browse 15 citing papers arXiv PDF

abstract

Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Spectra defines and controls effective capacity in graph embeddings via the Shannon effective rank of a trace-normalized kernel spectrum, making capacity a post-fit property rather than a pre-training hyperparameter.

LoRA: Low-Rank Adaptation of Large Language Models

cs.CL · 2021-06-17 · accept · novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

HyperAdapt: Simple High-Rank Adaptation

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

cs.CL · 2023-05-23 · conditional · novelty 6.0

UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.

Scaling Laws for Transfer

cs.LG · 2021-02-02 · unverdicted · novelty 6.0

Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

Multi-task Self-Supervised Learning for Human Activity Detection

cs.LG · 2019-07-27 · unverdicted · novelty 6.0

A multi-task self-supervised approach trains a temporal CNN to detect transformations on sensory data, yielding features that match or exceed fully supervised performance in semi-supervised and transfer settings for smartphone-based HAR.

Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Pretraining induces stable leading singular vectors that form a reusable spectral basis inherited by downstream tasks, enabling competitive performance with 0.2% trainable parameters on GLUE.

TLoRA: Task-aware Low Rank Adaptation of Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

ARIA: Adaptive Retrieval Intelligence Assistant -- A Multimodal RAG Framework for Domain-Specific Engineering Education

cs.IR · 2026-02-04 · conditional · novelty 5.0

ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.

Geometric Analysis of Neural Regression Collapse via Intrinsic Dimension

cs.LG · 2025-10-01 · unverdicted · novelty 5.0

Neural regression collapse occurs when last-layer feature intrinsic dimension falls below target intrinsic dimension, creating over-compressed and under-compressed regimes that govern generalization based on data quantity and noise.

Using predefined vector systems to speed up neural network multimillion class classification

cs.LG · 2026-04-01 · unverdicted · novelty 5.0

Predefined vector systems structure neural network latent spaces to allow O(1) label prediction via index searches on embedding vectors, delivering up to 11.6x speedup on multimillion-class tasks while preserving accuracy and enabling new-class detection.

What does it mean to understand a neural network?

cs.LG · 2019-07-15 · unverdicted · novelty 4.0

Simple training code produces complex neural networks, suggesting that brain learning rules may be easier to understand than mature brain properties and that neuroscience should shift focus accordingly.

Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts

cs.LG · 2025-06-26

citing papers explorer

Showing 15 of 15 citing papers.

Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models cs.LG · 2026-05-11 · unverdicted · none · ref 34
Spectra defines and controls effective capacity in graph embeddings via the Shannon effective rank of a trace-normalized kernel spectrum, making capacity a post-fit property rather than a pre-training hyperparameter.
LoRA: Low-Rank Adaptation of Large Language Models cs.CL · 2021-06-17 · accept · none · ref 28
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
HyperAdapt: Simple High-Rank Adaptation cs.LG · 2025-09-23 · unverdicted · none · ref 22 · internal anchor
HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations cs.CL · 2023-05-23 · conditional · none · ref 182 · internal anchor
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
Scaling Laws for Transfer cs.LG · 2021-02-02 · unverdicted · none · ref 132 · internal anchor
Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.
Multi-task Self-Supervised Learning for Human Activity Detection cs.LG · 2019-07-27 · unverdicted · none · ref 36 · internal anchor
A multi-task self-supervised approach trains a temporal CNN to detect transformations on sensory data, yielding features that match or exceed fully supervised performance in semi-supervised and transfer settings for smartphone-based HAR.
Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation cs.LG · 2026-05-08 · unverdicted · none · ref 10
Pretraining induces stable leading singular vectors that form a reusable spectral basis inherited by downstream tasks, enabling competitive performance with 0.2% trainable parameters on GLUE.
TLoRA: Task-aware Low Rank Adaptation of Large Language Models cs.CL · 2026-04-20 · unverdicted · none · ref 67
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 251
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 174
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
ARIA: Adaptive Retrieval Intelligence Assistant -- A Multimodal RAG Framework for Domain-Specific Engineering Education cs.IR · 2026-02-04 · conditional · none · ref 28 · internal anchor
ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.
Geometric Analysis of Neural Regression Collapse via Intrinsic Dimension cs.LG · 2025-10-01 · unverdicted · none · ref 11 · internal anchor
Neural regression collapse occurs when last-layer feature intrinsic dimension falls below target intrinsic dimension, creating over-compressed and under-compressed regimes that govern generalization based on data quantity and noise.
Using predefined vector systems to speed up neural network multimillion class classification cs.LG · 2026-04-01 · unverdicted · none · ref 15
Predefined vector systems structure neural network latent spaces to allow O(1) label prediction via index searches on embedding vectors, delivering up to 11.6x speedup on multimillion-class tasks while preserving accuracy and enabling new-class detection.
What does it mean to understand a neural network? cs.LG · 2019-07-15 · unverdicted · none · ref 20 · internal anchor
Simple training code produces complex neural networks, suggesting that brain learning rules may be easier to understand than mature brain properties and that neuroscience should shift focus accordingly.
Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts cs.LG · 2025-06-26 · unreviewed · ref 41 · internal anchor

Measuring the intrinsic dimension of objective landscapes

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer