Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Pith reviewed 2026-05-17 04:44 UTC · model grok-4.3
The pith
Language models can reason more accurately by generating compressed continuous tokens that stand in for full reasoning chains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Compressed Chain-of-Thought generates contentful and continuous contemplation tokens of variable sequence length that serve as compressed representations of explicit reasoning chains; when decoder language models perform additional reasoning over these dense representations, accuracy improves, and the amount of improvement can be controlled on demand simply by varying the number of contemplation tokens produced.
What carries the argument
The generation of variable-length continuous contemplation tokens as compressed representations of explicit reasoning chains, which allow extra computation inside the model without producing discrete text output.
Load-bearing premise
The continuous contemplation tokens actually encode and preserve the semantic content of explicit reasoning chains instead of acting mainly as extra learned parameters whose benefit is unrelated to interpretable reasoning.
What would settle it
An experiment in which the generated contemplation tokens are replaced at inference time with random vectors of the same length and the accuracy gains disappear would show that the tokens are not carrying semantic reasoning content.
read the original abstract
Chain-of-thought (CoT) decoding enables language models to improve reasoning performance at the cost of high generation latency in decoding. Recent proposals have explored variants of contemplation tokens, a term we introduce that refers to special tokens used during inference to allow for extra computation. Prior work has considered fixed-length sequences drawn from a discrete set of embeddings as contemplation tokens. Here we propose Compressed Chain-of-Thought (CCoT), a framework to generate contentful and continuous contemplation tokens of variable sequence length. The generated contemplation tokens are compressed representations of explicit reasoning chains, and our method can be applied to off-the-shelf decoder language models. Through experiments, we illustrate how CCoT enables additional reasoning over dense contentful representations to achieve corresponding improvements in accuracy. Moreover, the reasoning improvements can be adaptively modified on demand by controlling the number of contemplation tokens generated.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Compressed Chain-of-Thought (CCoT), a framework for generating variable-length continuous contemplation tokens as compressed representations of explicit reasoning chains. These tokens are intended to enable additional reasoning over dense contentful representations in off-the-shelf decoder language models, yielding accuracy gains that can be adaptively controlled by varying the number of tokens generated.
Significance. If the central claims are substantiated, the work could advance efficient inference-time reasoning in language models by replacing explicit discrete chains with dense continuous representations, offering potential latency benefits and adaptive control. The extension from fixed discrete contemplation tokens to continuous variable-length ones is a clear technical step. However, the significance depends heavily on evidence that the tokens preserve specific semantic content from reasoning steps rather than providing generic extra computation.
major comments (2)
- [Abstract] Abstract: the claim that experiments demonstrate accuracy improvements from 'additional reasoning over dense contentful representations' is unsupported by any quantitative results, baselines, error bars, or details on training/decoding of the continuous tokens; without these, the central empirical claim cannot be evaluated.
- [Abstract / Methods] The framing that contemplation tokens are 'compressed representations of explicit reasoning chains' and 'contentful' requires load-bearing evidence such as ablations separating semantic content from token count/length effects or probing/reconstruction experiments linking individual tokens to specific reasoning steps; absent this, gains are consistent with standard inference compute scaling.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major comment below and commit to revisions that strengthen the presentation of our empirical results and the evidence for the semantic content of the contemplation tokens.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that experiments demonstrate accuracy improvements from 'additional reasoning over dense contentful representations' is unsupported by any quantitative results, baselines, error bars, or details on training/decoding of the continuous tokens; without these, the central empirical claim cannot be evaluated.
Authors: We agree that the abstract is too high-level. The full manuscript reports concrete results on standard reasoning benchmarks (including accuracy deltas versus CoT and other baselines, with standard error bars across multiple random seeds) and describes the training objective plus decoding procedure for the continuous tokens. We will revise the abstract to include a concise summary of these quantitative findings and methodological details so that the central claim can be evaluated directly from the abstract. revision: yes
-
Referee: [Abstract / Methods] The framing that contemplation tokens are 'compressed representations of explicit reasoning chains' and 'contentful' requires load-bearing evidence such as ablations separating semantic content from token count/length effects or probing/reconstruction experiments linking individual tokens to specific reasoning steps; absent this, gains are consistent with standard inference compute scaling.
Authors: We acknowledge that the current manuscript does not contain explicit probing or reconstruction experiments that directly map individual tokens to specific reasoning steps. However, our experiments already include controls that vary the number of contemplation tokens while holding total inference compute roughly constant and compare against both standard CoT and fixed-length discrete contemplation baselines; the observed accuracy gains exceed what is explained by additional compute alone. To address the referee's concern more directly, we will add an ablation that replaces the learned continuous tokens with random vectors of identical length and dimensionality, thereby isolating semantic content from mere length effects. revision: partial
Circularity Check
No significant circularity: empirical training procedure with no derivation chain
full rationale
The paper describes CCoT as a framework for generating variable-length continuous contemplation tokens from off-the-shelf decoder LMs, with the claim that these tokens serve as compressed representations of explicit reasoning chains. No equations, first-principles derivations, or closed-form predictions appear in the provided abstract or description. The approach is presented as a training/inference procedure whose benefits are illustrated through experiments on accuracy improvements, rather than any mathematical reduction that equates outputs to inputs by construction. Self-citations or ansatzes are not load-bearing in the given material, and the central claim does not reduce to a fitted parameter renamed as a prediction.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 49 Pith papers
-
Learning through Internalization
A simplified one-layer transformer provably learns parities first with explicit CoT supervision then internalizes to direct computation as CoT tokens are removed.
-
DeepLatent: Think with Images via Parallel Latent Visual Reasoning
DeepLatent introduces a parallel latent visual reasoning framework with learnable 2D tokens and continuous RL, trained via distillation then RL, plus a new 180K dataset, claiming SOTA benchmark results.
-
Unlocking the Working Memory of Large Language Models for Latent Reasoning
RiM trains LLMs to perform latent reasoning via fixed memory blocks processed in one forward pass using a two-stage curriculum, matching or exceeding prior latent methods on benchmarks.
-
Training-Free Looped Transformers
Training-free looped transformers retrofit recurrence to frozen models via damped ODE sub-steps on mid-stack blocks, yielding gains such as +2.64 pp on MMLU-Pro for Qwen3-4B.
-
On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective
Chain of Thought risk decomposes into oracle-trajectory benefit and trajectory-mismatch cost, with stability determining bounded, linear, or exponential error growth.
-
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG
LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.
-
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
-
Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought
Abstract-CoT lets models reason with short discrete latent token sequences from a reserved vocabulary, using warm-up training and RL to match verbal CoT performance with up to 11.6x fewer tokens.
-
V-Reflection: Transforming MLLMs from Passive Observers to Active Interrogators
V-Reflection introduces a think-then-look mechanism where MLLM latent states actively interrogate visual features via two-stage distillation from a box-guided teacher to a dynamic autoregressive student, narrowing the...
-
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner
CCDD defines a joint multimodal diffusion on continuous representation space and discrete token space to combine expressivity with explicit token supervision for diffusion language models.
-
Latent Visual Reasoning
Latent Visual Reasoning enables autoregressive generation of latent visual states that reconstruct critical image tokens, yielding gains on perception-heavy VQA benchmarks such as 71.67% on MMVP.
-
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than ...
-
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
-
CoLT: Teaching Multi-Modal Models to Think with Chain of Latent Thoughts
CoLT replaces text-based chain-of-thought in MLLMs with 3-step latent thought chains supervised by a removable external decoder in forward and backward modes, yielding 10.1x faster inference on eight benchmarks.
-
Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers
LOTUS uses a looped padded Transformer with parallel cross-entropy supervision on gold CoT tokens to match explicit CoT performance at 3B parameters while reducing thought-phase latency 2.5x-6.9x.
-
VisReflect: Latent Visual Reflection for Fine-Grained Perception in Long Visual Context
VisReflect generates continuous latent visual reflections to emphasize relevant visual features and guide attention in LVLMs, yielding 4.1% gains on image benchmarks and 1.8% on video benchmarks with 44% less inferenc...
-
When LLMs Develop Languages: Symbolic Communication for Efficient Multi-Agent Reasoning
CLSR lets LLM agents evolve and route symbolic languages that reduce generated tokens by 3-6x versus chain-of-thought while keeping accuracy on benchmarks.
-
Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models
Dynamic Rollout Editing reduces overthinking in RL-trained LLMs by editing post-answer continuations in successful rollouts and preferring the edited versions within GRPO groups.
-
Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning
Dropout-GRPO uses structured dropout to generate trajectory variance for GRPO in latent-reasoning models like Coconut, raising GSM8K pass@1 from 27.29% to 29.01%.
-
Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
No-CoT 50% task-completion time horizons for frontier models have doubled yearly for six years, reaching over 3 minutes for GPT-5.5, with median projections of 7 minutes by 2028 and 25 minutes by 2030.
-
MPCoT: Reward-Guided Multi-Path Latent Reasoning for Test-Time Scalable Vision-Language-Action
MPCoT improves long-horizon VLA performance on LIBERO and CALVIN by initializing M latent hypotheses, refining them over K steps, and aggregating via a reward-trained path scorer while preserving the original 8-step a...
-
Adaptive Latent Agentic Reasoning
ALAR trains LLM agents to perform most reasoning in a latent space supervised by actions and escalates to explicit CoT only when needed, cutting tokens by up to 84.6% while preserving accuracy on search and tool-use b...
-
Spectral-Progressive Thought Flow for Lightweight Multimodal Reasoning
SpecFlow represents intermediate visual thoughts in fixed-size DCT space and uses classifier-free guidance to steer updates from textual thoughts, achieving up to 2.1x lower computation and KV cache costs.
-
ThinkSwitch: Context Distillation with LoRA and Weight Interpolation for Specific-Purpose Reasoning Tasks
ThinkSwitch uses iterative self-distillation with QLoRA and spherical weight interpolation to raise both instruct and thinking checkpoint accuracy on small AIME and PubMedQA sets using only 15 human prompts per domain.
-
Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems
Latent interventions can reactivate attack effects in clean executions of latent-based multi-agent systems, degrading performance especially via inter-agent KV-cache handoffs.
-
CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving
CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and t...
-
CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving
CoWorld-VLA encodes world information into four expert tokens that condition a diffusion-based planner, yielding competitive collision avoidance and trajectory accuracy on the NAVSIM benchmark.
-
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
-
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
-
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
OneVL is the first latent CoT method to exceed explicit CoT accuracy on four driving benchmarks while running at answer-only speed, by supervising latent tokens with a visual world model decoder.
-
LEPO: Latent Reasoning Policy Optimization for Large Language Models
LEPO applies RL to stochastic latent representations in LLMs via Gumbel-Softmax to support diverse reasoning paths and unified optimization.
-
Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
Visual replay module and adaptive depth scaling improve multimodal latent reasoning, reaching SOTA benchmarks with faster inference than explicit chain-of-thought methods.
-
SeLaR: Selective Latent Reasoning in Large Language Models
SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.
-
LightThinker++: From Reasoning Compression to Memory Management
LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.
-
DiscoLoop: Looping Discrete Embeddings and Continuous Hidden States for Multi-hop Reasoning
DiscoLoop adds a discrete embedding channel to looped transformers to fix representational misalignment in two-hop reasoning, yielding near-perfect accuracy on synthetic tasks and better pretraining loss on real data.
-
Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
Frontier AI models' no-CoT 50% task-completion time horizons have doubled yearly over six years, reaching over 3 minutes for GPT-5.5 with projections to 25 minutes by 2030.
-
Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces
Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.
-
Thinking Economically: A Hierarchical Framework for Adaptive-Complexity Reasoning in LLMs
HAB applies coarse-to-fine budgeting to LLM reasoning, predicting per-problem depth and learning intra-step token budgets via PPL comparisons and adaptive Pareto optimization, yielding higher accuracy and lower token ...
-
Generative Spatiotemporal Intent Sequence Recommendation via Implicit Reasoning in Amap
GPlan compresses LLM reasoning into small models via Progressive Implicit CoT Distillation and Spatiotemporal Counterfactual DPO to generate logically coherent and physically executable intent sequences for recommendation.
-
Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models
STARS trains looped language models with Jacobian spectral radius regularization and random loop sampling to drive latent states toward asymptotically stable fixed points, yielding reliable test-time scaling on arithm...
-
LEPO: Latent Reasoning Policy Optimization for Large Language Models
LEPO applies RL to continuous latent representations in LLMs by injecting Gumbel-Softmax stochasticity for diverse trajectory sampling and unified gradient estimation, outperforming existing discrete and latent RL methods.
-
Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
Visual replay and depth scaling in latent reasoning produce state-of-the-art multimodal results with faster inference than explicit CoT.
-
Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
A visual replay module combined with adaptive depth scaling improves multimodal latent reasoning, delivering state-of-the-art benchmark results and faster inference than explicit chain-of-thought methods.
-
MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
MedLVR interleaves latent visual reasoning segments in autoregressive decoding and uses two-stage training to raise average medical VQA accuracy from 48.3% to 53.4% over a Qwen2.5-VL-7B backbone on OmniMedVQA and five...
-
ConFu: Contemplate the Future for Better Speculative Sampling
ConFu boosts speculative decoding acceptance rates 8-20% over EAGLE-3 by letting draft models use contemplate tokens and MoE to anticipate future generation direction.
-
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
-
Efficient Reasoning with Hidden Thinking
Heima compresses verbose CoT into hidden thinking tokens via information-theoretic analysis and an adaptive interpreter, claiming maintained or improved zero-shot accuracy on reasoning benchmarks.
-
The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes
A literature survey that introduces a taxonomy for LLM reasoning paradigms, analyzes methodological trends, and synthesizes failure modes from over 300 papers.
-
Token Economics for LLM Agents: A Dual-View Study from Computing and Economics
The paper delivers a unified survey of token economics for LLM agents, conceptualizing tokens as production factors, exchange mediums, and units of account across micro, meso, macro, and security dimensions using esta...
Reference graph
Works this paper leans on
-
[1]
URL https://arxiv. org/abs/2006.11527. Cobbe, K., Kosaraju, V ., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., and Schulman, J. Training verifiers to solve math word problems,
-
[2]
Training Verifiers to Solve Math Word Problems
URL https://arxiv. org/abs/2110.14168. Deng, Y ., Prasad, K., Fernandez, R., Smolensky, P., Chaud- hary, V ., and Shieber, S. Implicit chain of thought rea- soning via knowledge distillation,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Implicit chain of thought reasoning via knowledge distillation.arXiv preprint arXiv:2311.01460, 2023
URL https: //arxiv.org/abs/2311.01460. Deng, Y ., Choi, Y ., and Shieber, S. From explicit cot to implicit cot: Learning to internalize cot step by step,
-
[4]
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
URL https://arxiv.org/abs/2405.14838. Ge, T., Hu, J., Wang, L., Wang, X., Chen, S.-Q., and Wei, F. In-context autoencoder for context compression in a large language model,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
In-context autoencoder for context compression in a large language model
URL https://arxiv. org/abs/2307.06945. Goyal, S., Ji, Z., Rawat, A. S., Menon, A. K., Kumar, S., and Nagarajan, V . Think before you speak: Training language models with pause tokens,
-
[6]
arXiv preprint arXiv:2310.02226 , year =
URL https: //arxiv.org/abs/2310.02226. Hao, S., Sukhbaatar, S., Su, D., Li, X., Hu, Z., Weston, J., and Tian, Y . Training large language models to reason in a continuous latent space,
-
[7]
Training Large Language Models to Reason in a Continuous Latent Space
URL https:// arxiv.org/abs/2412.06769. Herel, D. and Mikolov, T. Thinking tokens for language modeling,
work page internal anchor Pith review Pith/arXiv arXiv
- [8]
-
[9]
LoRA: Low-Rank Adaptation of Large Language Models
URL https://arxiv. org/abs/2106.09685. Jiang, H., Wu, Q., Lin, C.-Y ., Yang, Y ., and Qiu, L. Llmlingua: Compressing prompts for accelerated infer- ence of large language models,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Llmlingua: Compressing prompts for accelerated inference of large language models
URL https: //arxiv.org/abs/2310.05736. Kojima, T., Gu, S. S., Reid, M., Matsuo, Y ., and Iwasawa, Y . Large language models are zero-shot reasoners,
-
[11]
Large Language Models are Zero-Shot Reasoners
URL https://arxiv.org/abs/2205.11916. Kou, S., Hu, L., He, Z., Deng, Z., and Zhang, H. Cllms: Consistency large language models,
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
URL https: //arxiv.org/abs/2403.00835. 8 Compressed Chain of Thought Lanham, T., Chen, A., Radhakrishnan, A., Steiner, B., Deni- son, C., Hernandez, D., Li, D., Durmus, E., Hubinger, E., Kernion, J., Luko ˇsi¯ut˙e, K., Nguyen, K., Cheng, N., Joseph, N., Schiefer, N., Rausch, O., Larson, R., McCan- dlish, S., Kundu, S., Kadavath, S., Yang, S., Henighan, T....
-
[13]
Measuring Faithfulness in Chain-of-Thought Reasoning
URL https://arxiv.org/abs/ 2307.13702. Liu, H., Sferrazza, C., and Abbeel, P. Chain of hind- sight aligns language models with feedback,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
URL https://arxiv.org/abs/2302.02676. Liu, Y ., Li, H., Cheng, Y ., Ray, S., Huang, Y ., Zhang, Q., Du, K., Yao, J., Lu, S., Ananthanarayanan, G., Maire, M., Hoffmann, H., Holtzman, A., and Jiang, J. Cachegen: Kv cache compression and streaming for fast large language model serving. In Proceedings of the ACM SIGCOMM 2024 Conference, ACM SIGCOMM ’24, pp. 3...
-
[15]
Mellette, Alex Forencich, Rukshani Athapathu, Alex C
Associa- tion for Computing Machinery. ISBN 9798400706141. doi: 10.1145/3651890.3672274. URL https://doi. org/10.1145/3651890.3672274. Ning, X., Lin, Z., Zhou, Z., Wang, Z., Yang, H., and Wang, Y . Skeleton-of-thought: Prompting llms for efficient par- allel generation,
-
[16]
arXiv preprint arXiv:2307.15337 , year=
URL https://arxiv.org/ abs/2307.15337. Pfau, J., Merrill, W., and Bowman, S. R. Let’s think dot by dot: Hidden computation in transformer language mod- els,
-
[17]
Llama 2: Open Foundation and Fine-Tuned Chat Models
URL https://arxiv.org/abs/2307.09288. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need,
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
URL https://arxiv.org/ abs/1706.03762. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. Chain-of- thought prompting elicits reasoning in large language models,
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
URL https://arxiv.org/abs/ 2201.11903. Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y ., and Narasimhan, K. Tree of thoughts: Deliberate problem solving with large language models,
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
URL https://arxiv.org/abs/2305.10601. Zhang, H., Liu, Z., Zhao, Y ., Zheng, J., Zhuang, C., Gu, J., and Chen, G. Fast chain-of-thought: A glance of future from parallel decoding leads to answers faster,
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
Fast chain-of-thought: A glance of future from parallel decoding leads to answers faster
URL https://arxiv.org/abs/2311.08263. 9 Compressed Chain of Thought A. Varying the autoregressive layer Our method CCOT autoregressively generates contemplation tokens by using the hidden state at the lth layer at index i as the input embedding at index i +
-
[22]
NONE refers to the baseline where no contemplation tokens are decoded during inference
Accuracy on GSM8K with our method CCOT with a com- pression ratio of r = 0.05 when varying the autoregressive layer l. NONE refers to the baseline where no contemplation tokens are decoded during inference. B. Further Theoretical Considerations In this section, we formalize the two insights outlined in Section 6.2. We note that an analysis of the enhanced...
work page 2024
-
[23]
We formally introduce the new class of problems and outline the assumptions made by Goyal et al. (2024) below. Assumption B.1. (structure of underlying task) Assume a vocabulary V and a embedding dimension of d. Let ◦ be a genetic 2-ary operator on the embedding space Rd. For a given input length N, define the class of functions FM,K to be the set of all ...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.