Recognition: 2 theorem links
· Lean TheoremAdaTKG: Adaptive Memory for Temporal Knowledge Graph Reasoning
Pith reviewed 2026-05-11 01:21 UTC · model grok-4.3
The pith
Entity representations in temporal knowledge graphs should update adaptively with each new interaction rather than remain fixed after training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We depart from the static view and propose that each entity be modeled as an adaptive process whose representation is refined every time the entity participates in a fact. AdaTKG maintains a per-entity memory that is updated with every observed interaction, with the memory accumulating online and predictions improving as more interactions arrive. We instantiate the memory update as a learnable exponential moving average governed by a single shared scalar instead of using learnable parameters for each entity, enabling AdaTKG to handle entities unseen during training.
What carries the argument
Per-entity memory updated by a learnable exponential moving average controlled by one shared scalar parameter.
If this is right
- Representations improve automatically as additional facts about an entity are observed.
- The model can accept new entities at test time without retraining or extra parameters.
- Performance gains appear consistently across standard temporal knowledge graph benchmarks.
- Memory operates online, supporting streaming fact sequences without full retraining.
Where Pith is reading between the lines
- The shared-scalar design could extend to other online graph tasks where entities evolve, such as dynamic social networks or transaction graphs.
- Long-horizon experiments could test whether the single scalar remains stable or whether drift requires occasional recalibration.
- The approach implies that temporal reasoning benefits from treating memory as a running average rather than a fixed lookup table.
Load-bearing premise
A single shared scalar is enough to set the right update rate for every entity and relation without needing entity-specific rules or more complex memory mechanisms.
What would settle it
Run the same model but replace the shared scalar with either per-entity learnable update rates or a more elaborate update rule; if accuracy rises substantially or unseen-entity performance collapses under the shared scalar, the central claim is false.
Figures
read the original abstract
Temporal knowledge graphs (TKGs) represent time-stamped relational facts and support a wide range of reasoning tasks over evolving events. However, existing methods produce entity representations that are static at the entity level, in that each representation is a function of learned parameters only and retains no trace of the interactions in which the entity has participated. In this paper, we depart from this static view and propose that each entity be modeled as an adaptive process whose representation is refined every time the entity participates in a fact. To this end, we propose AdaTKG, which maintains a per-entity memory that is updated with every observed interaction, with the memory accumulating online and predictions improving as more interactions arrive. Specifically, we instantiate the memory update as a learnable exponential moving average governed by a single shared scalar instead of using learnable parameters for each entity, enabling AdaTKG to handle entities unseen during training. Extensive experiments confirm consistent gains over TKG baselines, demonstrating the effectiveness of adaptive memory. Code is publicly available at: https://github.com/seunghan96/AdaTKG.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that standard TKG methods produce static entity representations that ignore interaction history, and introduces AdaTKG to model each entity as an adaptive process whose memory is updated online via a learnable exponential moving average controlled by a single shared scalar. This design is argued to improve reasoning as more facts arrive and to enable generalization to entities unseen at training time, with experiments showing consistent gains over TKG baselines on public benchmarks and with code released.
Significance. If the empirical gains hold under closer scrutiny, the work offers a lightweight, parameter-efficient way to introduce adaptivity into TKG entity representations without per-entity parameters, directly addressing generalization to new entities. The public code release is a clear strength that supports reproducibility and allows independent verification of the reported improvements.
major comments (2)
- [§4 (Experimental Results)] §4 (Experimental Results): the manuscript reports consistent gains over baselines but provides no details on baseline re-implementations, hyper-parameter tuning protocols, or statistical significance (standard deviations across runs or hypothesis tests), which is required to confirm that the observed improvements are attributable to the adaptive memory rather than implementation differences.
- [§3.2 (Memory Update Rule)] §3.2 (Memory Update Rule): while the single shared EMA scalar is presented as sufficient for both seen and unseen entities, the paper does not include an ablation comparing it against entity-specific scalars or more expressive update functions, leaving the central design choice (shared scalar for generalization) without direct empirical support for its sufficiency across diverse entity dynamics.
minor comments (2)
- [§3] Notation for the memory state and the shared scalar α should be introduced once with a clear equation reference and then used consistently; occasional reuse of symbols for different quantities appears in the model description.
- [§4] Figure captions and axis labels in the result plots would benefit from explicit mention of the evaluation metric (e.g., MRR or Hits@10) and the exact time-split protocol used.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation for minor revision. We address each major comment below with our responses and indicate the revisions planned for the manuscript.
read point-by-point responses
-
Referee: §4 (Experimental Results): the manuscript reports consistent gains over baselines but provides no details on baseline re-implementations, hyper-parameter tuning protocols, or statistical significance (standard deviations across runs or hypothesis tests), which is required to confirm that the observed improvements are attributable to the adaptive memory rather than implementation differences.
Authors: We agree that these details are important for reproducibility and to isolate the contribution of the adaptive memory. In the revised manuscript, Section 4 will be expanded to describe: (i) baseline re-implementations, including use of official code repositories where available and adherence to the original papers' settings; (ii) the hyper-parameter tuning protocol, specifying the search ranges, validation metric, and selection procedure applied uniformly to AdaTKG and all baselines; and (iii) statistical significance, with mean performance and standard deviations over five independent runs plus paired t-test p-values against the strongest baseline. These additions will be cross-referenced to the already-public code repository, which contains the exact configurations. revision: yes
-
Referee: §3.2 (Memory Update Rule): while the single shared EMA scalar is presented as sufficient for both seen and unseen entities, the paper does not include an ablation comparing it against entity-specific scalars or more expressive update functions, leaving the central design choice (shared scalar for generalization) without direct empirical support for its sufficiency across diverse entity dynamics.
Authors: The single shared scalar is intentionally chosen to support generalization to entities unseen at training time; entity-specific scalars or parameters are inapplicable by design for such entities. We acknowledge that an explicit ablation would strengthen the empirical case. In the revision we will (a) add a paragraph in §3.2 clarifying this design rationale and (b) include new experimental results comparing the learnable shared scalar against a fixed (non-learnable) EMA and against a more expressive update (e.g., a small feed-forward network) on seen entities. These results will be reported for the standard benchmarks while noting that entity-specific alternatives remain infeasible for the unseen-entity setting that motivates the method. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper's core proposal is an architectural design choice: modeling each entity as an adaptive process whose memory is updated online via a single shared learnable EMA scalar. This is explicitly motivated by the goal of parameter-efficient generalization to unseen entities and is not derived from or equivalent to any prior fitted quantity, self-cited result, or input data pattern. The update rule is presented as a new instantiation rather than a renaming or redefinition of existing components, and the claims rest on empirical gains over baselines rather than any reduction by construction. No load-bearing step in the provided derivation chain equates a prediction to its own inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- shared EMA scalar
axioms (1)
- domain assumption Entity representations benefit from online refinement via exponential moving average of observed interactions
invented entities (1)
-
per-entity adaptive memory
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we instantiate the memory update as a learnable exponential moving average governed by a single shared scalar instead of using learnable parameters for each entity
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the memory accumulating online and predictions improving as more interactions arrive
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities
Anonymous. Transfir: Transferable few-shot inductive reasoning for emerging entities on temporal knowledge graphs.arXiv preprint arXiv:2604.10164, 2026. To appear at ICLR 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
Elizabeth Boschee, Jennifer Lautenschlager, Sean O’Brien, Steve Shellman, James Starz, and Michael Ward. ICEWS Coded Event Data, 2015
work page 2015
-
[3]
Temporal knowledge graph completion: a survey
Borui Cai, Yong Xiang, Longxiang Gao, He Zhang, Yunfeng Li, and Jianxin Li. Temporal knowledge graph completion: a survey. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pages 6545–6553, 2023
work page 2023
-
[4]
Topology-aware correlations between relations for inductive link prediction in knowledge graphs
Jiajun Chen, Huarui He, Feng Wu, and Jie Wang. Topology-aware correlations between relations for inductive link prediction in knowledge graphs. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 6271–6278, 2021
work page 2021
-
[5]
Meta-knowledge transfer for inductive knowledge graph embedding
Mingyang Chen, Wen Zhang, Yushan Zhu, Hongting Zhou, Zonggang Yuan, Changliang Xu, and Huajun Chen. Meta-knowledge transfer for inductive knowledge graph embedding. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, pages 927–937, 2022
work page 2022
-
[6]
Local-global history-aware contrastive learning for temporal knowledge graph reasoning
Wei Chen, Huaiyu Wan, Yuting Wu, Shuyuan Zhao, Jiayaqi Cheng, Yuxin Li, and Youfang Lin. Local-global history-aware contrastive learning for temporal knowledge graph reasoning. In 2024 IEEE 40th International Conference on Data Engineering (ICDE), pages 733–746. IEEE, 2024
work page 2024
-
[7]
Learning phrase representations using RNN encoder– decoder for statistical machine translation
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder– decoder for statistical machine translation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, 2014
work page 2014
-
[8]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019
work page 2019
-
[9]
zrllm: Zero-shot relational learning on temporal knowledge graphs with large language models
Zifeng Ding, Heling Cai, Jingpei Wu, Yunpu Ma, Ruotong Liao, Bo Xiong, and V olker Tresp. zrllm: Zero-shot relational learning on temporal knowledge graphs with large language models. InProceedings of the 2024 conference of the North American chapter of the association for computational linguistics: Human language technologies (Volume 1: Long papers), pag...
work page 2024
-
[10]
Transformer-based reasoning for learning evolutionary chain of events on temporal knowledge graph
Zhiyu Fang, Shuai-Long Lei, Xiaobin Zhu, Chun Yang, Shi-Xue Zhang, Xu-Cheng Yin, and Jingyan Qin. Transformer-based reasoning for learning evolutionary chain of events on temporal knowledge graph. InProceedings of the 47th international ACM SIGIR conference on research and development in information retrieval, pages 70–79, 2024
work page 2024
-
[11]
Towards foundation models for knowledge graph reasoning
Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, and Zhaocheng Zhu. Towards foundation models for knowledge graph reasoning. InThe Twelfth International Conference on Learning Representations, 2024. 10
work page 2024
-
[12]
Learning sequence encoders for temporal knowledge graph completion
Alberto Garcia-Duran, Sebastijan Dumanˇci´c, and Mathias Niepert. Learning sequence encoders for temporal knowledge graph completion. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4816–4821, 2018
work page 2018
-
[13]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations (ICLR), 2015
work page 2015
-
[14]
Tensor decompositions for temporal knowledge base completion
Timothée Lacroix, Guillaume Obozinski, and Nicolas Usunier. Tensor decompositions for temporal knowledge base completion. InInternational Conference on Learning Representations, 2020
work page 2020
-
[15]
Dong-Ho Lee, Kian Ahrabian, Woojeong Jin, Fred Morstatter, and Jay Pujara. Temporal knowledge graph forecasting without knowledge using in-context learning.arXiv preprint arXiv:2305.10613, 2023
-
[16]
Ingram: Inductive knowledge graph embedding via relation graphs
Jaejun Lee, Chanyoung Chung, and Joyce Jiyoung Whang. Ingram: Inductive knowledge graph embedding via relation graphs. InInternational conference on machine learning, pages 18796–18809. PMLR, 2023
work page 2023
-
[17]
Kalev Leetaru and Philip A Schrodt. Gdelt. InISA annual convention, volume 2, pages 1–49. Citeseer, 2013
work page 2013
-
[18]
Hismatch: Historical structure matching based temporal knowledge graph reasoning
Zixuan Li, Zhongni Hou, Saiping Guan, Xiaolong Jin, Weihua Peng, Long Bai, Yajuan Lyu, Wei Li, Jiafeng Guo, and Xueqi Cheng. Hismatch: Historical structure matching based temporal knowledge graph reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 7328–7338, 2022
work page 2022
-
[19]
Temporal knowledge graph reasoning based on evolutional representation learning
Zixuan Li, Xiaolong Jin, Wei Li, Saiping Guan, Jiafeng Guo, Huawei Shen, Yuanzhuo Wang, and Xueqi Cheng. Temporal knowledge graph reasoning based on evolutional representation learning. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pages 408–417, 2021
work page 2021
-
[20]
Gentkg: Generative forecasting on temporal knowledge graph with large language models
Ruotong Liao, Xu Jia, Yangzhe Li, Yunpu Ma, and V olker Tresp. Gentkg: Generative forecasting on temporal knowledge graph with large language models. InFindings of the Association for Computational Linguistics: NAACL 2024, pages 4303–4317, 2024
work page 2024
-
[21]
Shuwen Liu, Bernardo Grau, Ian Horrocks, and Egor Kostylev. Indigo: Gnn-based inductive knowledge graph completion using pair-wise encoding.Advances in Neural Information Processing Systems, 34:2034–2045, 2021
work page 2034
-
[22]
Tlogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs
Yushan Liu, Yunpu Ma, Marcel Hildebrandt, Mitchell Joblin, and V olker Tresp. Tlogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 4120–4127, 2022
work page 2022
-
[23]
An adaptive logical rule embedding model for inductive reasoning over temporal knowledge graphs
Xin Mei, Libin Yang, Xiaoyan Cai, and Zuowei Jiang. An adaptive logical rule embedding model for inductive reasoning over temporal knowledge graphs. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7304–7316, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics
work page 2022
-
[24]
Multi-granularity history and entity similarity learning for temporal knowledge graph reasoning
Shi Mingcong, Chunjiang Zhu, Detian Zhang, Shiting Wen, and Li Qing. Multi-granularity history and entity similarity learning for temporal knowledge graph reasoning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5232–5243, 2024
work page 2024
-
[25]
Towards foundation model on temporal knowledge graph reasoning
Jiaxin Pan, Mojtaba Nayyeri, Osama Mohammed, Daniel Hernandez, Rongchuan Zhang, Cheng Cheng, and Steffen Staab. Towards foundation model on temporal knowledge graph reasoning. arXiv preprint arXiv:2506.06367, 2025
-
[26]
Ye Qian, Xiaoyan Wang, Fuhui Sun, and Li Pan. Compressing transfer: Mutual learning- empowered knowledge distillation for temporal knowledge graph reasoning.IEEE Transactions on Neural Networks and Learning Systems, 2025. 11
work page 2025
-
[27]
Temporal graph networks for deep learning on dynamic graphs
Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael Bronstein. Temporal graph networks for deep learning on dynamic graphs. InICML 2020 Workshop on Graph Representation Learning and Beyond, 2020
work page 2020
-
[28]
End-to-end structure-aware convolutional networks for knowledge base completion
Chao Shang, Yun Tang, Jing Huang, Jinbo Bi, Xiaodong He, and Bowen Zhou. End-to-end structure-aware convolutional networks for knowledge base completion. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 3060–3067, 2019
work page 2019
-
[29]
Inductive relation prediction by subgraph reasoning
Komal Teru, Etienne Denis, and Will Hamilton. Inductive relation prediction by subgraph reasoning. InInternational conference on machine learning, pages 9448–9457. PMLR, 2020
work page 2020
-
[30]
Know-evolve: Deep temporal reasoning for dynamic knowledge graphs
Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. InInternational conference on machine learning, pages 3462–3471. PMLR, 2017
work page 2017
-
[31]
DyRep: Learning representations over dynamic graphs
Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, and Hongyuan Zha. DyRep: Learning representations over dynamic graphs. InInternational Conference on Learning Representations (ICLR), 2019
work page 2019
-
[32]
Composition-based multi-relational graph convolutional networks
Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, and Partha P Talukdar. Composition-based multi-relational graph convolutional networks. InICLR, 2020
work page 2020
-
[33]
Jiapu Wang, Sun Kai, Linhao Luo, Wei Wei, Yongli Hu, Alan Wee-Chung Liew, Shirui Pan, and Baocai Yin. Large language models-guided dynamic adaptation for temporal knowledge graph reasoning.Advances in Neural Information Processing Systems, 37:8384–8410, 2024
work page 2024
-
[34]
Siheng Xiong, Yuan Yang, Faramarz Fekri, and James Clayton Kerce. Tilp: Differentiable learning of temporal logical rules on knowledge graphs.arXiv preprint arXiv:2402.12309, 2024
-
[35]
Teilp: Time prediction over knowledge graphs via logical reasoning
Siheng Xiong, Yuan Yang, Ali Payani, James C Kerce, and Faramarz Fekri. Teilp: Time prediction over knowledge graphs via logical reasoning. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 16112–16119, 2024
work page 2024
-
[36]
Inductive repre- sentation learning on temporal graphs
Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. Inductive repre- sentation learning on temporal graphs. InInternational Conference on Learning Representations (ICLR), 2020
work page 2020
-
[37]
arXiv preprint arXiv:2305.07912 , year=
Wenjie Xu, Ben Liu, Miao Peng, Xu Jia, and Min Peng. Pre-trained language model with prompts for temporal knowledge graph completion.arXiv preprint arXiv:2305.07912, 2023
-
[38]
Temporal knowledge graph reasoning with his- torical contrastive learning
Yi Xu, Junjie Ou, Hui Xu, and Luoyi Fu. Temporal knowledge graph reasoning with his- torical contrastive learning. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposiu...
work page 2023
-
[39]
Barlow twins: Self- supervised learning via redundancy reduction
Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self- supervised learning via redundancy reduction. InInternational conference on machine learning, pages 12310–12320. PMLR, 2021
work page 2021
-
[40]
Histori- cally relevant event structuring for temporal knowledge graph reasoning
Jinchuan Zhang, Ming Sun, Chong Mu, Jinhao Zhang, Quanjiang Guo, and Ling Tian. Histori- cally relevant event structuring for temporal knowledge graph reasoning. In2025 IEEE 41st International Conference on Data Engineering (ICDE), pages 3179–3192. IEEE, 2025
work page 2025
-
[41]
A survey of dynamic graph neural networks.Frontiers of Computer Science, 19(6):196323, 2025
Yanping Zheng, Lu Yi, and Zhewei Wei. A survey of dynamic graph neural networks.Frontiers of Computer Science, 19(6):196323, 2025
work page 2025
-
[42]
Learning from history: Modeling temporal knowledge graphs with sequential copy-generation networks
Cunchao Zhu, Muhao Chen, Changjun Fan, Guangquan Cheng, and Yan Zhang. Learning from history: Modeling temporal knowledge graphs with sequential copy-generation networks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 4732–4740, 2021. 12 Appendix A Dataset Details 14 B Baselines 15 C Implementation Details 16 D Backbone...
work page 2021
-
[43]
At every query timestamp t, the cluster prototypec π(e),t is recomputed by pooling the IC-encoder outputs of all cluster mates, so the prototype evolves as the graph evolves. The transfer gate ωe = Ψ([he ∥c π(e),t])∈[0,1] d regulates how much of the prototype each entity inherits. The codebook is trained jointly with the rest of the model through the comm...
work page 2011
-
[44]
Efficiency [2] Performance # Parameters(M)Training time(s/epoch)FLOPs(M/query)MRR H@3 H@10Value∆(%) Value∆(%) Value∆(%) Base (w/o Adaptivity) [1] 72.77 – 330.7 – 1643.9 – .1114 .1230 .2252 AdaTKG EMA (default) 78.02+7.2% 361.3+9.2% 1654.4+0.6% .1379 .1543 .2612 GRU 84.32+15.9% 378.0+14.3% 1658.8+0.9% .1428 .1599 .2605 Cross-attention82.22+13.0% 360.5+9.0%...
-
[45]
Efficiency [2] Performance # Parameters(M)Training time(s/epoch)FLOPs(M/query)MRR H@3 H@10Value∆(%) Value∆(%) Value∆(%) Base (w/o Adaptivity) [1] 72.71 – 420.5 – 1913.9 – .2177 .2530 .3708 AdaTKG EMA (default) 77.96+7.2% 466.3+10.9% 1924.4+0.5% .2270 .2573 .3850 GRU 84.25+15.9% 491.0+16.8% 1930.2+0.8% .2330 .2700 .3925 Cross-attention82.15+13.0% 459.8+9.4...
work page 1913
-
[46]
Efficiency [2] Performance # Parameters(M)Training time(s/epoch)FLOPs(M/query)MRR H@3 H@10Value∆(%) Value∆(%) Value∆(%) Base (w/o Adaptivity) [1] 60.08 – 1384.6 – 1062.6 – .1013 .0994 .2131 AdaTKG EMA (default) 65.33+8.7% 1488.7+7.5% 1073.1+1.0% .1051 .1129 .2301 GRU 71.63+19.2% 1549.0+11.9% 1075.9+1.3% .1112 .1141 .2243 Cross-attention69.53+15.7% 1503.2+...
-
[47]
Test-time online updates (right). 24 L Gate Distribution Across Benchmarks Figure L.1 extends the main-paper Figure 5 by showing the full train-time gate distribution stratified by the number of observed interactions, for every (dataset,update operator) pair. The same monotonic upward shift in the gate value with more interactions holds across all four be...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.