arXiv preprint arXiv:2412.06512 (2024)

· 2024 · arXiv 2412.06512

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples

cs.AI · 2026-05-08 · conditional · novelty 7.0

TraceFix repairs LLM-generated multi-agent protocols via TLA+ counterexamples to achieve full verification on all tested tasks and higher completion rates than prompt-only baselines.

MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

MANTRA automatically synthesizes SMT-validated compliance benchmarks for LLM agents from natural language manuals and tool schemas, producing 285 tasks across 6 domains with minimal human effort.

RACC: Representation-Aware Coverage Criteria for LLM Safety Testing

cs.SE · 2026-02-02 · unverdicted · novelty 7.0

RACC defines six representation-aware coverage criteria that score jailbreak test suites by measuring activation of safety concepts extracted from LLM hidden states on a calibration set.

ReGA: Model-Based Safeguard for LLMs via Representation-Guided Abstraction

cs.CR · 2025-06-02 · unverdicted · novelty 5.0

ReGA uses safety-critical representations to guide abstraction in model-based analysis, enabling scalable detection of harmful LLM inputs with reported AUROC of 0.975 at prompt level.

Syntax Is Easy, Semantics Is Hard: Evaluating LLMs for LTL Translation

cs.LO · 2026-04-08 · unverdicted · novelty 4.0

LLMs handle LTL syntax better than semantics, improve with detailed prompts, and perform substantially better when the task is reframed as Python code completion.

citing papers explorer

Showing 5 of 5 citing papers.

TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples cs.AI · 2026-05-08 · conditional · none · ref 42
TraceFix repairs LLM-generated multi-agent protocols via TLA+ counterexamples to achieve full verification on all tested tasks and higher completion rates than prompt-only baselines.
MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents cs.CL · 2026-05-07 · unverdicted · none · ref 46
MANTRA automatically synthesizes SMT-validated compliance benchmarks for LLM agents from natural language manuals and tool schemas, producing 285 tasks across 6 domains with minimal human effort.
RACC: Representation-Aware Coverage Criteria for LLM Safety Testing cs.SE · 2026-02-02 · unverdicted · none · ref 63
RACC defines six representation-aware coverage criteria that score jailbreak test suites by measuring activation of safety concepts extracted from LLM hidden states on a calibration set.
ReGA: Model-Based Safeguard for LLMs via Representation-Guided Abstraction cs.CR · 2025-06-02 · unverdicted · none · ref 10
ReGA uses safety-critical representations to guide abstraction in model-based analysis, enabling scalable detection of harmful LLM inputs with reported AUROC of 0.975 at prompt level.
Syntax Is Easy, Semantics Is Hard: Evaluating LLMs for LTL Translation cs.LO · 2026-04-08 · unverdicted · none · ref 70
LLMs handle LTL syntax better than semantics, improve with detailed prompts, and perform substantially better when the task is reframed as Python code completion.

arXiv preprint arXiv:2412.06512 (2024)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer