pith. sign in
Pith Number

pith:RTJFRTJV

pith:2026:RTJFRTJVNTUIQRGSYKVKHLYTSH
not attested not anchored not stored refs resolved

RTL-BenchMT: Dynamic Maintenance of RTL Generation Benchmark Through Agent-Assisted Analysis and Revision

Hangan Zhou, Jing Wang, Shang Liu, Zhiyao Xie

An agentic framework automatically identifies flawed RTL benchmark cases and detects overfitting to produce a refined suite.

arxiv:2605.15537 v1 · 2026-05-15 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{RTJFRTJVNTUIQRGSYKVKHLYTSH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

With the assistance of RTL-BenchMT, we conduct a thorough, in-depth analysis of flawed and overfitting cases and produce a refined benchmark suite that will be open-sourced to the community.

C2weakest assumption

That AI agents can reliably and accurately detect flawed benchmark cases and overfitting instances in RTL generation tasks without introducing new errors or requiring substantial human validation.

C3one line summary

RTL-BenchMT is an agent-assisted framework for dynamically maintaining RTL generation benchmarks by fixing flaws and reducing overfitting in LLM-based EDA applications.

References

20 extracted · 20 resolved · 3 Pith anchors

[1] GPT-4 Technical Report 2023 · arXiv:2303.08774
[2] Mohammad Akyash, Kimia Azar, and Hadi Kamali. 2025. DecoRTL: A Run- time Decoding Framework for RTL Code Generation with LLMs.arXiv preprint arXiv:2507.02226(2025) 2025
[3] doi:10.48550/arXiv.2502.07445 arXiv:2502.07445 [cs] 2025
[4] Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection 2024
[5] Verilogcoder: Autonomous verilog coding agents with graph-based planning and abstract syntax tree (ast)-based waveform tracing tool 2024

Formal links

1 machine-checked theorem link

Receipt and verification
First computed 2026-05-20T00:01:04.131587Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

8cd258cd356ce88844d2c2aaa3af1391ebdf2c556c7903a4099e6ad791cca20a

Aliases

arxiv: 2605.15537 · arxiv_version: 2605.15537v1 · doi: 10.48550/arxiv.2605.15537 · pith_short_12: RTJFRTJVNTUI · pith_short_16: RTJFRTJVNTUIQRGS · pith_short_8: RTJFRTJV
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/RTJFRTJVNTUIQRGSYKVKHLYTSH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8cd258cd356ce88844d2c2aaa3af1391ebdf2c556c7903a4099e6ad791cca20a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "77448166df1ba81182aa49d40ee0d1fb855f31b15510469b6de6dfd93c09400b",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-15T02:17:46Z",
    "title_canon_sha256": "1f71be7c1cf100268d7be3729717ec7370b7cae987c2cd84a54e2bb44f5ee70c"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15537",
    "kind": "arxiv",
    "version": 1
  }
}