Title resolution pending

Anthropic , title = · 2025

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering?

cs.SE · 2026-05-21 · unverdicted · novelty 6.0

SWE-Mutation benchmark shows current LLMs achieve low verification (10.20%) and detection (36.15%) rates on 2,636 mutated variants, exposing weaknesses in generating reliable test suites.

From Documents to Segments: A Contextual Reformulation for Topic Assignment

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

SBTA reformulates topic modeling to assign topics at the segment level rather than document level, yielding cleaner topics on a new SemEval-STM dataset created via LLM decomposition and human refinement.

What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

In LLM agents, memory routing circuits emerge at 0.6B scale while content circuits appear only at 4B, and write/read operations recruit a pre-existing late-layer context hub instead of creating a new one, enabling a 76% accurate unsupervised failure diagnostic.

On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length

cs.AI · 2026-05-04 · unverdicted · novelty 5.0

Longer action horizons bottleneck LLM agent training through instability, but training with reduced horizons stabilizes learning and enables better generalization to longer horizons.

Measuring AI Reasoning: A Guide for Researchers

cs.AI · 2026-05-04 · unverdicted · novelty 4.0

Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.

Human-Guided Harm Recovery for Computer Use Agents

cs.AI · 2026-04-20

citing papers explorer

Showing 6 of 6 citing papers.

SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering? cs.SE · 2026-05-21 · unverdicted · none · ref 48
SWE-Mutation benchmark shows current LLMs achieve low verification (10.20%) and detection (36.15%) rates on 2,636 mutated variants, exposing weaknesses in generating reliable test suites.
From Documents to Segments: A Contextual Reformulation for Topic Assignment cs.CL · 2026-05-18 · unverdicted · none · ref 35
SBTA reformulates topic modeling to assign topics at the segment level rather than document level, yielding cleaner topics on a new SemEval-STM dataset created via LLM decomposition and human refinement.
What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis cs.AI · 2026-05-05 · unverdicted · none · ref 29
In LLM agents, memory routing circuits emerge at 0.6B scale while content circuits appear only at 4B, and write/read operations recruit a pre-existing late-layer context hub instead of creating a new one, enabling a 76% accurate unsupervised failure diagnostic.
On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length cs.AI · 2026-05-04 · unverdicted · none · ref 33
Longer action horizons bottleneck LLM agent training through instability, but training with reduced horizons stabilizes learning and enables better generalization to longer horizons.
Measuring AI Reasoning: A Guide for Researchers cs.AI · 2026-05-04 · unverdicted · none · ref 61
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
Human-Guided Harm Recovery for Computer Use Agents cs.AI · 2026-04-20 · unreviewed · ref 30

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer