Faith and fate: Limits of transformers on compositionality (2023).arXiv preprint arXiv:2305.18654

Faith, Fate: Limits of Transformers on Compositionality · 2023 · arXiv 2305.18654

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Proper Scoring Rules for Agentic Uncertainty Quantification

cs.AI · 2026-05-23 · unverdicted · novelty 7.0

Introduces Trajectory Proper Score (TPS) as a strictly proper family of trajectory-level scoring rules that elicits the complete prefix-conditioned success probability process.

Training Transformers as a Universal Computer

cs.AI · 2026-04-28 · unverdicted · novelty 7.0

A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.

TSVer: A Benchmark for Fact Verification Against Time-Series Evidence

cs.CL · 2025-11-02 · unverdicted · novelty 7.0

TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.

Arithmetic Pedagogy for Language Models

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

A small GPT-2 model trained from scratch on GASING-derived CoT supervision for arithmetic reaches over 80% held-out accuracy, exhibits three learning phases, and develops both procedural and associative reasoning.

When Should Users Check? Modeling Confirmation Frequency inMulti-Step Agentic AI Tasks

cs.HC · 2025-10-06 · conditional · novelty 6.0

A decision-theoretic model based on the observed Confirmation-Diagnosis-Correction-Redo user pattern places intermediate confirmations in AI agent tasks, yielding 81% user preference and 13.54% faster completion versus confirm-at-end.

How Do Language Models Compose Functions?

cs.CL · 2025-10-02 · conditional · novelty 6.0

LLMs solve compositional factual recall either by computing intermediates or directly, with mechanism choice correlated to translation geometry in embedding spaces.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

Handling Feature Heterogeneity with Learnable Graph Patches

cs.LG · 2026-06-16 · unverdicted · novelty 5.0

Learnable graph patches enable domain-agnostic pre-training of graph models by decomposing heterogeneous graphs into transferable semantic units via patch encoders and aggregators.

Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models

cs.CL · 2025-05-30 · unverdicted · novelty 5.0

LLM errors concentrate in sparse key tokens (5-10% of sequence) at semantic decision junctions, yielding a new reliability model that explains sustained long-context coherence.

citing papers explorer

Showing 7 of 7 citing papers after filters.

Proper Scoring Rules for Agentic Uncertainty Quantification cs.AI · 2026-05-23 · unverdicted · none · ref 27
Introduces Trajectory Proper Score (TPS) as a strictly proper family of trajectory-level scoring rules that elicits the complete prefix-conditioned success probability process.
Training Transformers as a Universal Computer cs.AI · 2026-04-28 · unverdicted · none · ref 4
A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.
TSVer: A Benchmark for Fact Verification Against Time-Series Evidence cs.CL · 2025-11-02 · unverdicted · none · ref 23
TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.
Arithmetic Pedagogy for Language Models cs.CL · 2026-06-03 · unverdicted · none · ref 5
A small GPT-2 model trained from scratch on GASING-derived CoT supervision for arithmetic reaches over 80% held-out accuracy, exhibits three learning phases, and develops both procedural and associative reasoning.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code cs.SE · 2024-03-12 · unverdicted · none · ref 95
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
Handling Feature Heterogeneity with Learnable Graph Patches cs.LG · 2026-06-16 · unverdicted · none · ref 9
Learnable graph patches enable domain-agnostic pre-training of graph models by decomposing heterogeneous graphs into transferable semantic units via patch encoders and aggregators.
Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models cs.CL · 2025-05-30 · unverdicted · none · ref 2
LLM errors concentrate in sparse key tokens (5-10% of sequence) at semantic decision junctions, yielding a new reliability model that explains sustained long-context coherence.

Faith and fate: Limits of transformers on compositionality (2023).arXiv preprint arXiv:2305.18654

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer