pith. machine review for the scientific record. sign in

arxiv: 1804.08838 · v1 · submitted 2018-04-24 · 💻 cs.LG · cs.NE· stat.ML

Recognition: unknown

Measuring the Intrinsic Dimension of Objective Landscapes

Authors on Pith no claims yet
classification 💻 cs.LG cs.NEstat.ML
keywords dimensionintrinsicparametersproblemmanynetworksobjectivesimple
0
0 comments X
read the original abstract

Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

    cs.LG 2026-05 unverdicted novelty 7.0

    Spectra defines and controls effective capacity in graph embeddings via the Shannon effective rank of a trace-normalized kernel spectrum, making capacity a post-fit property rather than a pre-training hyperparameter.

  2. LoRA: Low-Rank Adaptation of Large Language Models

    cs.CL 2021-06 accept novelty 7.0

    Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

  3. Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation

    cs.LG 2026-05 unverdicted novelty 6.0

    Pretraining induces stable leading singular vectors that form a reusable spectral basis inherited by downstream tasks, enabling competitive performance with 0.2% trainable parameters on GLUE.

  4. TLoRA: Task-aware Low Rank Adaptation of Large Language Models

    cs.CL 2026-04 unverdicted novelty 6.0

    TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer ...

  5. Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

    cs.CL 2023-05 conditional novelty 6.0

    UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.

  6. Language Models (Mostly) Know What They Know

    cs.CL 2022-07 unverdicted novelty 6.0

    Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

  7. A General Language Assistant as a Laboratory for Alignment

    cs.CL 2021-12 conditional novelty 6.0

    Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

  8. Using predefined vector systems to speed up neural network multimillion class classification

    cs.LG 2026-04 unverdicted novelty 5.0

    Predefined vector systems structure neural network latent spaces to allow O(1) label prediction via index searches on embedding vectors, delivering up to 11.6x speedup on multimillion-class tasks while preserving accu...