Title resolution pending

Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, Toutanova, Kristina , booktitle =

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

method 2

citation-polarity summary

use method 2

representative citing papers

Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.

Generating Pretraining Tokens from Organic Data for Data-Bound Scaling

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

SynPro uses RL-optimized rephrasing and reformatting of organic data to generate synthetic pretraining tokens that deliver 3.7-5.2x the effective learning of simple repetition and can exceed training on unique data at 1.1B scale.

Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers

stat.ML · 2026-05-08 · unverdicted · novelty 6.0

Spectrum-adaptive post-hoc generalization bounds for multi-layer Transformers are derived using layerwise Schatten quantities whose indices are chosen after training based on singular-value profiles.

Understanding the Mechanism of Altruism in Large Language Models

econ.GN · 2026-04-21 · unverdicted · novelty 6.0

A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

cs.CL · 2024-10-23 · conditional · novelty 6.0

Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

cs.CL · 2022-11-09 · unverdicted · novelty 6.0

BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.

citing papers explorer

Showing 6 of 6 citing papers.

Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis cs.LG · 2026-05-15 · unverdicted · none · ref 118
QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.
Generating Pretraining Tokens from Organic Data for Data-Bound Scaling cs.CL · 2026-05-18 · unverdicted · none · ref 15
SynPro uses RL-optimized rephrasing and reformatting of organic data to generate synthetic pretraining tokens that deliver 3.7-5.2x the effective learning of simple repetition and can exceed training on unique data at 1.1B scale.
Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers stat.ML · 2026-05-08 · unverdicted · none · ref 7
Spectrum-adaptive post-hoc generalization bounds for multi-layer Transformers are derived using layerwise Schatten quantities whose indices are chosen after training based on singular-value profiles.
Understanding the Mechanism of Altruism in Large Language Models econ.GN · 2026-04-21 · unverdicted · none · ref 139
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
Scaling Diffusion Language Models via Adaptation from Autoregressive Models cs.CL · 2024-10-23 · conditional · none · ref 84
Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model cs.CL · 2022-11-09 · unverdicted · none · ref 45
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer