Pith Number

pith:QLSM76VM

pith:2026:QLSM76VMA5KPRGJZMTHSLJUUJB

not attested not anchored not stored refs resolved

Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents

Grace Hui Yang, Shihao Wang, Xubo Lin, Yang Deng, Zezhii Deng

A dual hierarchical reinforcement learning method lets conversational agents proactively extract information by coordinating high-level strategy and low-level question generation in legal dialogues.

arxiv:2605.14057 v1 · 2026-05-13 · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{QLSM76VMA5KPRGJZMTHSLJUUJB}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Evaluations on a U.S. Supreme Court dataset show that our method outperforms various baselines across multiple metrics.

C2weakest assumption

The U.S. Supreme Court dataset captures representative judicial questioning patterns and that the dual RL agents can learn effective strategies aligned with legal objectives without additional human feedback or validation.

C3one line summary

A dual hierarchical RL framework lets agents learn when and how to ask probing questions in U.S. Supreme Court arguments, outperforming baselines on a court dataset.

References

206 extracted · 206 resolved · 21 Pith anchors

[1] and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Chen, Weizhu , booktitle = 2022

[2] Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue , year =

[3] Multiwoz–a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling

[4] Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue , year =

[5] Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year = 2016

Receipt and verification

First computed	2026-05-17T23:39:12.573495Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

82e4cffaac0754f8993964cf25a69448621e04f30cee1cd7c1e06e9d5f3ecc51

Aliases

arxiv: 2605.14057 · arxiv_version: 2605.14057v1 · doi: 10.48550/arxiv.2605.14057 · pith_short_12: QLSM76VMA5KP · pith_short_16: QLSM76VMA5KPRGJZ · pith_short_8: QLSM76VM

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/QLSM76VMA5KPRGJZMTHSLJUUJB \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 82e4cffaac0754f8993964cf25a69448621e04f30cee1cd7c1e06e9d5f3ecc51

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "888d9dabcc3bda53c71fc2b98d89ea7e1f39f70a1fd7380d95bbcfa936e07d9e",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-13T19:29:11Z",
    "title_canon_sha256": "9b56ec121339d94c129fa3c84b0c6d81d58b1987b84b25379563b9d2a64b3eae"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14057",
    "kind": "arxiv",
    "version": 1
  }
}