{"paper":{"title":"Heterogeneous Dependency Graph-Guided Attentionfor Patent Representation Learning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A heterogeneous graph encoder that preserves patent claim dependencies outperforms text-only baselines by treating intra-document topology as the primary inductive bias.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Longbing Cao, Qiongkai Xu, Yongmin Yoo, Zhangkai Wu","submitted_at":"2026-05-11T06:54:44Z","abstract_excerpt":"Pre-trained language models advance patent classification and retrieval via encoding claims as flat token sequences, yet overlooking the dependency hierarchy among claims. Incorporating the hierarchy into self-attention poses two challenges. First, claim dependencies involve relation types with varying reliability: treating them indiscriminately allows noisy technical relations to corrupt cleaner legal citation signals. Second, when the dependency graph is defined over claims, Transformer models fail as they operate at the token level; broadcasting claim-level adjacency can dilute structural i"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"PHAGE outperforms all baselines on classification, retrieval, and clustering, showing that intra-document claim topology is a stronger inductive bias than inter-document structure and that this bias persists in the encoder weights after training.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The deterministic graph construction pipeline can reliably separate near-deterministic legal citations from noisier rule-based technical relations while preserving type distinctions as heterogeneous edges.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"PHAGE encodes patent claim hierarchies as heterogeneous graphs inside Transformers and outperforms baselines on classification, retrieval, and clustering by treating intra-patent topology as a stronger signal than inter-patent links.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A heterogeneous graph encoder that preserves patent claim dependencies outperforms text-only baselines by treating intra-document topology as the primary inductive bias.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"ec20059d0f9562f386f281c9923f3050574309e7422ae2cf0d8c783da71ecd35"},"source":{"id":"2605.10073","kind":"arxiv","version":2},"verdict":{"id":"74e39ae8-f3b6-4f56-a253-48e80c26c168","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-12T04:13:13.150341Z","strongest_claim":"PHAGE outperforms all baselines on classification, retrieval, and clustering, showing that intra-document claim topology is a stronger inductive bias than inter-document structure and that this bias persists in the encoder weights after training.","one_line_summary":"PHAGE encodes patent claim hierarchies as heterogeneous graphs inside Transformers and outperforms baselines on classification, retrieval, and clustering by treating intra-patent topology as a stronger signal than inter-patent links.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The deterministic graph construction pipeline can reliably separate near-deterministic legal citations from noisier rule-based technical relations while preserving type distinctions as heterogeneous edges.","pith_extraction_headline":"A heterogeneous graph encoder that preserves patent claim dependencies outperforms text-only baselines by treating intra-document topology as the primary inductive bias."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.10073/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"claim_evidence","ran_at":"2026-05-20T06:42:01.068834Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T15:41:22.474544Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T12:01:17.918793Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T09:41:54.688843Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"ad74be6cc4685d7bddbdf1b3d7e5ec3e4b3d2905bf70a13178c47aa0da23da13"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"9c462749de642a8044fb7ebfd2fd755c4cc81478a2d37869abe5326faecd4c15"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}