pith. sign in

hub

Measuring the intrinsic dimension of objective landscapes

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it
abstract

Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

hub tools

citation-role summary

background 2

citation-polarity summary

roles

background 2

polarities

background 2

representative citing papers

Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Spectra defines and controls effective capacity in graph embeddings via the Shannon effective rank of a trace-normalized kernel spectrum, making capacity a post-fit property rather than a pre-training hyperparameter.

LoRA: Low-Rank Adaptation of Large Language Models

cs.CL · 2021-06-17 · accept · novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

HyperAdapt: Simple High-Rank Adaptation

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.

Scaling Laws for Transfer

cs.LG · 2021-02-02 · unverdicted · novelty 6.0

Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

Multi-task Self-Supervised Learning for Human Activity Detection

cs.LG · 2019-07-27 · unverdicted · novelty 6.0

A multi-task self-supervised approach trains a temporal CNN to detect transformations on sensory data, yielding features that match or exceed fully supervised performance in semi-supervised and transfer settings for smartphone-based HAR.

TLoRA: Task-aware Low Rank Adaptation of Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

What does it mean to understand a neural network?

cs.LG · 2019-07-15 · unverdicted · novelty 4.0

Simple training code produces complex neural networks, suggesting that brain learning rules may be easier to understand than mature brain properties and that neuroscience should shift focus accordingly.

citing papers explorer

Showing 15 of 15 citing papers.