{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2021:YPL7GVZFXPHQW4NBPG5OEQ2K4W","short_pith_number":"pith:YPL7GVZF","schema_version":"1.0","canonical_sha256":"c3d7f35725bbcf0b71a179bae2434ae587aff5aaf0474de3a8ef794bf0c15a25","source":{"kind":"arxiv","id":"2105.14491","version":3},"attestation_state":"computed","paper":{"title":"How Attentive are Graph Attention Networks?","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Graph Attention Networks use static attention that cannot express simple graph problems, fixed by reordering to create dynamic GATv2.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Eran Yahav, Shaked Brody, Uri Alon","submitted_at":"2021-05-30T10:17:58Z","abstract_excerpt":"Graph Attention Networks (GATs) are one of the most popular GNN architectures and are considered as the state-of-the-art architecture for representation learning with graphs. In GAT, every node attends to its neighbors given its own representation as the query. However, in this paper we show that GAT computes a very limited kind of attention: the ranking of the attention scores is unconditioned on the query node. We formally define this restricted kind of attention as static attention and distinguish it from a strictly more expressive dynamic attention. Because GATs use a static attention mech"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2105.14491","kind":"arxiv","version":3},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2021-05-30T10:17:58Z","cross_cats_sorted":[],"title_canon_sha256":"215510f58299ba0a78cba7aa74025151976b6ef2c24a42b5114174673e589208","abstract_canon_sha256":"044bcd14d1a2616432d49ef12852583c4df320ff08cd76da89b98519a36718e6"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:15.390041Z","signature_b64":"GZu4q0LMmgBpV3d8lIW2Rk0t3RgH8T4DiHbrFEf0pNtWbcOzPTvCuynON7Sm6wV5H675A43VKQhjTuf+AEJlDA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"c3d7f35725bbcf0b71a179bae2434ae587aff5aaf0474de3a8ef794bf0c15a25","last_reissued_at":"2026-05-17T23:38:15.389458Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:15.389458Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"How Attentive are Graph Attention Networks?","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Graph Attention Networks use static attention that cannot express simple graph problems, fixed by reordering to create dynamic GATv2.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Eran Yahav, Shaked Brody, Uri Alon","submitted_at":"2021-05-30T10:17:58Z","abstract_excerpt":"Graph Attention Networks (GATs) are one of the most popular GNN architectures and are considered as the state-of-the-art architecture for representation learning with graphs. In GAT, every node attends to its neighbors given its own representation as the query. However, in this paper we show that GAT computes a very limited kind of attention: the ranking of the attention scores is unconditioned on the query node. We formally define this restricted kind of attention as static attention and distinguish it from a strictly more expressive dynamic attention. Because GATs use a static attention mech"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Because GATs use a static attention mechanism, there are simple graph problems that GAT cannot express: in a controlled problem, we show that static attention hinders GAT from even fitting the training data. ... GATv2: a dynamic graph attention variant that is strictly more expressive than GAT.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the controlled synthetic problem is representative of the limitations that matter in real benchmarks, and that reordering the attention operations fully converts static attention into dynamic attention without side effects on optimization or generalization.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"GAT uses static attention where neighbor rankings ignore the query node and thus cannot express some graph problems; GATv2 enables dynamic attention and outperforms GAT on 11 OGB and other benchmarks with equal parameters.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Graph Attention Networks use static attention that cannot express simple graph problems, fixed by reordering to create dynamic GATv2.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"14ea25d6a84064fa403263761241d04840d79cec79fe6fe091be2614de82e8bf"},"source":{"id":"2105.14491","kind":"arxiv","version":3},"verdict":{"id":"6b921b2b-a1e4-47a4-a3d8-484679498b1f","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T02:26:02.690351Z","strongest_claim":"Because GATs use a static attention mechanism, there are simple graph problems that GAT cannot express: in a controlled problem, we show that static attention hinders GAT from even fitting the training data. ... GATv2: a dynamic graph attention variant that is strictly more expressive than GAT.","one_line_summary":"GAT uses static attention where neighbor rankings ignore the query node and thus cannot express some graph problems; GATv2 enables dynamic attention and outperforms GAT on 11 OGB and other benchmarks with equal parameters.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the controlled synthetic problem is representative of the limitations that matter in real benchmarks, and that reordering the attention operations fully converts static attention into dynamic attention without side effects on optimization or generalization.","pith_extraction_headline":"Graph Attention Networks use static attention that cannot express simple graph problems, fixed by reordering to create dynamic GATv2."},"references":{"count":70,"sample":[{"doi":"","year":2018,"title":"Learning to represent programs with graphs","work_id":"7a597620-0dd2-4229-9c8e-eb2089541ab7","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"On the bottleneck of graph neural networks and its practical implications","work_id":"17b47a81-dc95-47f1-84af-50161b9c0a78","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1993,"title":"Diffusion-convolutional neural networks","work_id":"103ff6d2-99ee-456c-9336-2bc6c8dbb6aa","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2014,"title":"Neural Machine Translation by Jointly Learning to Align and Translate","work_id":"d831e763-d530-4029-a65c-ac595d82cb2a","ref_index":4,"cited_arxiv_id":"1409.0473","is_internal_anchor":true},{"doi":"","year":2016,"title":"Interaction networks for learning about objects, relations and physics","work_id":"22f5e757-33a3-4521-b989-944e606e92da","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":70,"snapshot_sha256":"c58a8ffe2b6c93cf7db396f01222836bfdf481e12b17b42f93957fb5baf4c820","internal_anchors":5},"formal_canon":{"evidence_count":2,"snapshot_sha256":"7cc9638c3e4f71f1f8b9702bf538ba143ce315f8a77b572ee72032d5a1ba8971"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2105.14491","created_at":"2026-05-17T23:38:15.389562+00:00"},{"alias_kind":"arxiv_version","alias_value":"2105.14491v3","created_at":"2026-05-17T23:38:15.389562+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2105.14491","created_at":"2026-05-17T23:38:15.389562+00:00"},{"alias_kind":"pith_short_12","alias_value":"YPL7GVZFXPHQ","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_16","alias_value":"YPL7GVZFXPHQW4NB","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_8","alias_value":"YPL7GVZF","created_at":"2026-05-18T12:33:33.725879+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":28,"internal_anchor_count":28,"sample":[{"citing_arxiv_id":"2512.15767","citing_title":"Bridging Data and Physics: A Graph Neural Network-Based Hybrid Twin Framework","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2512.22597","citing_title":"Energy-Guided Generative Modeling for Low-Energy Molecular Structure Discovery","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2605.23708","citing_title":"Learning Dynamic Stability Landscapes in Synchronization Networks","ref_index":83,"is_internal_anchor":true},{"citing_arxiv_id":"2411.17429","citing_title":"Graph Rewiring in GNNs to Mitigate Over-Squashing and Over-Smoothing: A Survey","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21311","citing_title":"DeCoR: Design and Control Co-Optimization for Urban Streets Using Reinforcement Learning","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02075","citing_title":"Graph Transformers and Stabilized Reinforcement Learning for Large-Scale Dynamic Routing Modulation and Spectrum Allocation in Elastic Optical Networks","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16451","citing_title":"Physics-Guided Geometric Diffusion for Macro Placement Generation","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19021","citing_title":"Deep Neural Sheaf Diffusion","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2506.21107","citing_title":"Doloris: Dual Conditional Diffusion Implicit Bridges with Sparsity Masking Strategy for Unpaired Single-Cell Perturbation Estimation","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2507.03787","citing_title":"Effective Capacitance Modeling Using Graph Neural Networks","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2512.06779","citing_title":"A Texture-Generalizable Deep Material Network via Orientation-Aware Interaction Learning for Polycrystal Modeling and Texture Evolution","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2512.10211","citing_title":"ID-PaS+ : Identity-Aware Predict-and-Search for General Mixed-Integer Linear Programs","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13899","citing_title":"Frequency-Space Mechanics: A Sequence and Coordinate-Free Representation for Protein Function Prediction","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11987","citing_title":"Random-Set Graph Neural Networks","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07383","citing_title":"SCOT: Multi-Source Cross-City Transfer with Optimal-Transport Soft-Correspondence Objective","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03842","citing_title":"SOAR: Real-Time Joint Optimization of Order Allocation and Robot Scheduling in Robotic Mobile Fulfillment Systems","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06154","citing_title":"Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2604.22194","citing_title":"Qubit-Scalable CVRP via Lagrangian Knapsack Decomposition and Noise-Aware Quantum Execution","ref_index":55,"is_internal_anchor":true},{"citing_arxiv_id":"2605.04449","citing_title":"GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11202","citing_title":"CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07383","citing_title":"SCOT: Multi-Source Cross-City Transfer with Optimal-Transport Soft-Correspondence Objective","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2604.14811","citing_title":"Learning Ad Hoc Network Dynamics via Graph-Structured World Models","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2604.14685","citing_title":"Beyond Nodes vs. Edges: A Multi-View Fusion Framework for Provenance-Based Intrusion Detection","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2604.15617","citing_title":"A Structure-Preserving Graph Neural Solver for Parametric Hyperbolic Conservation Laws","ref_index":85,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18145","citing_title":"Region-Grounded Report Generation for 3D Medical Imaging: A Fine-Grained Dataset and Graph-Enhanced Framework","ref_index":58,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/YPL7GVZFXPHQW4NBPG5OEQ2K4W","json":"https://pith.science/pith/YPL7GVZFXPHQW4NBPG5OEQ2K4W.json","graph_json":"https://pith.science/api/pith-number/YPL7GVZFXPHQW4NBPG5OEQ2K4W/graph.json","events_json":"https://pith.science/api/pith-number/YPL7GVZFXPHQW4NBPG5OEQ2K4W/events.json","paper":"https://pith.science/paper/YPL7GVZF"},"agent_actions":{"view_html":"https://pith.science/pith/YPL7GVZFXPHQW4NBPG5OEQ2K4W","download_json":"https://pith.science/pith/YPL7GVZFXPHQW4NBPG5OEQ2K4W.json","view_paper":"https://pith.science/paper/YPL7GVZF","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2105.14491&json=true","fetch_graph":"https://pith.science/api/pith-number/YPL7GVZFXPHQW4NBPG5OEQ2K4W/graph.json","fetch_events":"https://pith.science/api/pith-number/YPL7GVZFXPHQW4NBPG5OEQ2K4W/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/YPL7GVZFXPHQW4NBPG5OEQ2K4W/action/timestamp_anchor","attest_storage":"https://pith.science/pith/YPL7GVZFXPHQW4NBPG5OEQ2K4W/action/storage_attestation","attest_author":"https://pith.science/pith/YPL7GVZFXPHQW4NBPG5OEQ2K4W/action/author_attestation","sign_citation":"https://pith.science/pith/YPL7GVZFXPHQW4NBPG5OEQ2K4W/action/citation_signature","submit_replication":"https://pith.science/pith/YPL7GVZFXPHQW4NBPG5OEQ2K4W/action/replication_record"}},"created_at":"2026-05-17T23:38:15.389562+00:00","updated_at":"2026-05-17T23:38:15.389562+00:00"}