TeleResilienceBench quantifies LLM reasoning resilience in telecom by measuring recovery from mid-trace errors, finding low success rates (max 29.1% CFR) and that model scale does not reliably improve performance.
Chain-of-thought prompting elicits reasoning in large language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Empirical study identifies patterns in how model classes respond to structured prompts, optimization, and other techniques across two Verilog benchmarks.
citing papers explorer
-
TeleResilienceBench: Quantifying Resilience for LLM Reasoning in Telecommunications
TeleResilienceBench quantifies LLM reasoning resilience in telecom by measuring recovery from mid-trace errors, finding low success rates (max 29.1% CFR) and that model scale does not reliably improve performance.
-
VeriInteresting: An Empirical Study of Model Prompt Interactions in Verilog Code Generation
Empirical study identifies patterns in how model classes respond to structured prompts, optimization, and other techniques across two Verilog benchmarks.