Hopping too late: Exploring the limitations of large language models on multi-hop queries

Biran, Eden, Gottesman, Daniela, Yang, Sohee, Geva, Mor, Globerson, Amir , title = · 2024 · arXiv 2406.12775

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Operadic consistency: a label-free signal for compositional reasoning failures in LLMs

cs.CL · 2026-06-11 · unverdicted · novelty 7.0

Operadic consistency is a new per-question signal that correlates strongly with accuracy (r 0.86-0.94) across four multi-hop QA datasets and improves selective prediction over CoT-SC baselines.

Training Large Language Models to Reason in a Continuous Latent Space

cs.CL · 2024-12-09 · unverdicted · novelty 7.0

Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.

The Power of Power Law: Asymmetry Enables Compositional Reasoning

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.

How Do Language Models Compose Functions?

cs.CL · 2025-10-02 · conditional · novelty 6.0

LLMs solve compositional factual recall either by computing intermediates or directly, with mechanism choice correlated to translation geometry in embedding spaces.

DiscoLoop: Looping Discrete Embeddings and Continuous Hidden States for Multi-hop Reasoning

cs.CL · 2026-07-01 · unverdicted · novelty 5.0

DiscoLoop adds a discrete embedding channel to looped transformers to fix representational misalignment in two-hop reasoning, yielding near-perfect accuracy on synthetic tasks and better pretraining loss on real data.

NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.

Efficient Reasoning with Hidden Thinking

cs.CL · 2025-01-31 · unverdicted · novelty 5.0

Heima compresses verbose CoT into hidden thinking tokens via information-theoretic analysis and an adaptive interpreter, claiming maintained or improved zero-shot accuracy on reasoning benchmarks.

citing papers explorer

Showing 7 of 7 citing papers.

Operadic consistency: a label-free signal for compositional reasoning failures in LLMs cs.CL · 2026-06-11 · unverdicted · none · ref 47
Operadic consistency is a new per-question signal that correlates strongly with accuracy (r 0.86-0.94) across four multi-hop QA datasets and improves selective prediction over CoT-SC baselines.
Training Large Language Models to Reason in a Continuous Latent Space cs.CL · 2024-12-09 · unverdicted · none · ref 3
Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.
The Power of Power Law: Asymmetry Enables Compositional Reasoning cs.AI · 2026-04-24 · unverdicted · none · ref 9
Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.
How Do Language Models Compose Functions? cs.CL · 2025-10-02 · conditional · none · ref 2
LLMs solve compositional factual recall either by computing intermediates or directly, with mechanism choice correlated to translation geometry in embedding spaces.
DiscoLoop: Looping Discrete Embeddings and Continuous Hidden States for Multi-hop Reasoning cs.CL · 2026-07-01 · unverdicted · none · ref 3
DiscoLoop adds a discrete embedding channel to looped transformers to fix representational misalignment in two-hop reasoning, yielding near-perfect accuracy on synthetic tasks and better pretraining loss on real data.
NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning cs.LG · 2026-05-06 · unverdicted · none · ref 44
Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.
Efficient Reasoning with Hidden Thinking cs.CL · 2025-01-31 · unverdicted · none · ref 3
Heima compresses verbose CoT into hidden thinking tokens via information-theoretic analysis and an adaptive interpreter, claiming maintained or improved zero-shot accuracy on reasoning benchmarks.

Hopping too late: Exploring the limitations of large language models on multi-hop queries

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer