pith. machine review for the scientific record. sign in

arxiv: 2605.07121 · v1 · submitted 2026-05-08 · 💻 cs.AI · cs.LG

Recognition: 2 theorem links

· Lean Theorem

AdaTKG: Adaptive Memory for Temporal Knowledge Graph Reasoning

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:21 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords temporal knowledge graphsadaptive memoryentity representationsexponential moving averageknowledge graph reasoningdynamic updatesonline learningunseen entities
0
0 comments X

The pith

Entity representations in temporal knowledge graphs should update adaptively with each new interaction rather than remain fixed after training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that static entity embeddings limit reasoning over time-stamped facts because they ignore the sequence of interactions an entity has experienced. Instead, it models each entity as an ongoing process whose memory is refreshed online whenever it appears in a new fact. The update uses a learnable exponential moving average controlled by one shared scalar, which lets the model incorporate unseen entities without extra parameters. If this holds, predictions improve as more facts arrive and the approach works for streaming or evolving knowledge. A sympathetic reader would care because it replaces fixed parameters with accumulating memory that matches the dynamic nature of real events.

Core claim

We depart from the static view and propose that each entity be modeled as an adaptive process whose representation is refined every time the entity participates in a fact. AdaTKG maintains a per-entity memory that is updated with every observed interaction, with the memory accumulating online and predictions improving as more interactions arrive. We instantiate the memory update as a learnable exponential moving average governed by a single shared scalar instead of using learnable parameters for each entity, enabling AdaTKG to handle entities unseen during training.

What carries the argument

Per-entity memory updated by a learnable exponential moving average controlled by one shared scalar parameter.

If this is right

  • Representations improve automatically as additional facts about an entity are observed.
  • The model can accept new entities at test time without retraining or extra parameters.
  • Performance gains appear consistently across standard temporal knowledge graph benchmarks.
  • Memory operates online, supporting streaming fact sequences without full retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The shared-scalar design could extend to other online graph tasks where entities evolve, such as dynamic social networks or transaction graphs.
  • Long-horizon experiments could test whether the single scalar remains stable or whether drift requires occasional recalibration.
  • The approach implies that temporal reasoning benefits from treating memory as a running average rather than a fixed lookup table.

Load-bearing premise

A single shared scalar is enough to set the right update rate for every entity and relation without needing entity-specific rules or more complex memory mechanisms.

What would settle it

Run the same model but replace the shared scalar with either per-entity learnable update rates or a more elaborate update rule; if accuracy rises substantially or unseen-entity performance collapses under the shared scalar, the central claim is false.

Figures

Figures reproduced from arXiv: 2605.07121 by Dongwan Kang, Hwanil Choi, Jaehoon Lee, Jun Seo, Minjae Kim, Seunghan Lee, Soonyoung Lee, Sungdong Yoo, Tae Yoon Lim, Wonbin Ahn.

Figure 1
Figure 1. Figure 1: Adaptive TKG reasoning. Our framework enables adaptivity through a memory refined with each interaction. While recent work has focused on whether a TKG reasoner can handle emerging entities unseen at training time (transductive vs. inductive), we focus on a different aspect, namely whether each entity’s representation is refined each time the entity partic￾ipates in a fact, which we formalize as the static… view at source ↗
Figure 2
Figure 2. Figure 2: Ours: Adaptive + Inductive. 4 From Static TKG to Adaptive TKG In this section, we cast representative prior methods and our proposal into a common scoring form. Let he ∈ R d denote the base representation of entity e and hr ∈ R d the embedding of relation r. For a query (eq, rq, ?, tq), the score assigned to a candidate eo takes the unified form ϕtq (eq, rq, eo) = f [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: AdaTKG architecture. An encoder produces an interaction signal x (τ) which updates the entity’s memory m(τ) via a learnable EMA. The decay rate α controls how strongly the previ￾ous memory is retained relative to the new signal. To realize this principle, we instantiate the state￾ful update operator U of Eq. (5) as a simple yet effective design, a learnable EMA ( [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance by # interactions at train and test time. We measure the memory’s contribution by ∆RR = RRfull − RRzero (i.e., the difference between the per-query RR of the model w/ and w/o its memory branch) and stratify it along two axes: (Left) Train-time history depth of the subject and the (Right) Test-time online updates accumulated during inference. The increase of ∆RR along both axes, consistent acros… view at source ↗
Figure 5
Figure 5. Figure 5: Gate value by # in￾teractions. Distribution of the learned gate g (tq) e , stratified by the number of interactions. 0 5 10 15 20 25 30 35 Epoch 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Test MRR ICEWS14 w/o Adaptivity AdaTKG-GRU AdaTKG-EMA AdaTKG-CrossAtt Early stop 0 50 100 150 Epoch 0.04 0.06 0.08 0.10 0.12 0.14 Test MRR ICEWS18 w/o Adaptivity AdaTKG-GRU AdaTKG-EMA AdaTKG-CrossAtt Early stop [PITH_FULL_IMAGE:… view at source ↗
Figure 7
Figure 7. Figure 7: Sensitivity to memory-related hyperpa￾rameters. Each point is one hyperparameter setting of AdaTKG, compared against the w/o Adaptivity baseline at its best setting (dashed line) [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative gate trace. (8) Qualitative ex: How memory adapts ( [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
read the original abstract

Temporal knowledge graphs (TKGs) represent time-stamped relational facts and support a wide range of reasoning tasks over evolving events. However, existing methods produce entity representations that are static at the entity level, in that each representation is a function of learned parameters only and retains no trace of the interactions in which the entity has participated. In this paper, we depart from this static view and propose that each entity be modeled as an adaptive process whose representation is refined every time the entity participates in a fact. To this end, we propose AdaTKG, which maintains a per-entity memory that is updated with every observed interaction, with the memory accumulating online and predictions improving as more interactions arrive. Specifically, we instantiate the memory update as a learnable exponential moving average governed by a single shared scalar instead of using learnable parameters for each entity, enabling AdaTKG to handle entities unseen during training. Extensive experiments confirm consistent gains over TKG baselines, demonstrating the effectiveness of adaptive memory. Code is publicly available at: https://github.com/seunghan96/AdaTKG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that standard TKG methods produce static entity representations that ignore interaction history, and introduces AdaTKG to model each entity as an adaptive process whose memory is updated online via a learnable exponential moving average controlled by a single shared scalar. This design is argued to improve reasoning as more facts arrive and to enable generalization to entities unseen at training time, with experiments showing consistent gains over TKG baselines on public benchmarks and with code released.

Significance. If the empirical gains hold under closer scrutiny, the work offers a lightweight, parameter-efficient way to introduce adaptivity into TKG entity representations without per-entity parameters, directly addressing generalization to new entities. The public code release is a clear strength that supports reproducibility and allows independent verification of the reported improvements.

major comments (2)
  1. [§4 (Experimental Results)] §4 (Experimental Results): the manuscript reports consistent gains over baselines but provides no details on baseline re-implementations, hyper-parameter tuning protocols, or statistical significance (standard deviations across runs or hypothesis tests), which is required to confirm that the observed improvements are attributable to the adaptive memory rather than implementation differences.
  2. [§3.2 (Memory Update Rule)] §3.2 (Memory Update Rule): while the single shared EMA scalar is presented as sufficient for both seen and unseen entities, the paper does not include an ablation comparing it against entity-specific scalars or more expressive update functions, leaving the central design choice (shared scalar for generalization) without direct empirical support for its sufficiency across diverse entity dynamics.
minor comments (2)
  1. [§3] Notation for the memory state and the shared scalar α should be introduced once with a clear equation reference and then used consistently; occasional reuse of symbols for different quantities appears in the model description.
  2. [§4] Figure captions and axis labels in the result plots would benefit from explicit mention of the evaluation metric (e.g., MRR or Hits@10) and the exact time-split protocol used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We address each major comment below with our responses and indicate the revisions planned for the manuscript.

read point-by-point responses
  1. Referee: §4 (Experimental Results): the manuscript reports consistent gains over baselines but provides no details on baseline re-implementations, hyper-parameter tuning protocols, or statistical significance (standard deviations across runs or hypothesis tests), which is required to confirm that the observed improvements are attributable to the adaptive memory rather than implementation differences.

    Authors: We agree that these details are important for reproducibility and to isolate the contribution of the adaptive memory. In the revised manuscript, Section 4 will be expanded to describe: (i) baseline re-implementations, including use of official code repositories where available and adherence to the original papers' settings; (ii) the hyper-parameter tuning protocol, specifying the search ranges, validation metric, and selection procedure applied uniformly to AdaTKG and all baselines; and (iii) statistical significance, with mean performance and standard deviations over five independent runs plus paired t-test p-values against the strongest baseline. These additions will be cross-referenced to the already-public code repository, which contains the exact configurations. revision: yes

  2. Referee: §3.2 (Memory Update Rule): while the single shared EMA scalar is presented as sufficient for both seen and unseen entities, the paper does not include an ablation comparing it against entity-specific scalars or more expressive update functions, leaving the central design choice (shared scalar for generalization) without direct empirical support for its sufficiency across diverse entity dynamics.

    Authors: The single shared scalar is intentionally chosen to support generalization to entities unseen at training time; entity-specific scalars or parameters are inapplicable by design for such entities. We acknowledge that an explicit ablation would strengthen the empirical case. In the revision we will (a) add a paragraph in §3.2 clarifying this design rationale and (b) include new experimental results comparing the learnable shared scalar against a fixed (non-learnable) EMA and against a more expressive update (e.g., a small feed-forward network) on seen entities. These results will be reported for the standard benchmarks while noting that entity-specific alternatives remain infeasible for the unseen-entity setting that motivates the method. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core proposal is an architectural design choice: modeling each entity as an adaptive process whose memory is updated online via a single shared learnable EMA scalar. This is explicitly motivated by the goal of parameter-efficient generalization to unseen entities and is not derived from or equivalent to any prior fitted quantity, self-cited result, or input data pattern. The update rule is presented as a new instantiation rather than a renaming or redefinition of existing components, and the claims rest on empirical gains over baselines rather than any reduction by construction. No load-bearing step in the provided derivation chain equates a prediction to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on one free parameter (the shared EMA scalar) and the domain assumption that EMA-based memory accumulation captures sufficient temporal dynamics for improved reasoning.

free parameters (1)
  • shared EMA scalar
    Single learnable parameter controlling the update rate for all entities, chosen to enable generalization to unseen entities.
axioms (1)
  • domain assumption Entity representations benefit from online refinement via exponential moving average of observed interactions
    Invoked to justify the memory update mechanism as an effective adaptive process.
invented entities (1)
  • per-entity adaptive memory no independent evidence
    purpose: To accumulate interaction history and refine representations dynamically
    New modeling construct introduced to replace static embeddings.

pith-pipeline@v0.9.0 · 5516 in / 1270 out tokens · 36802 ms · 2026-05-11T01:21:06.191340+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

  1. [1]

    Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities

    Anonymous. Transfir: Transferable few-shot inductive reasoning for emerging entities on temporal knowledge graphs.arXiv preprint arXiv:2604.10164, 2026. To appear at ICLR 2026

  2. [2]

    ICEWS Coded Event Data, 2015

    Elizabeth Boschee, Jennifer Lautenschlager, Sean O’Brien, Steve Shellman, James Starz, and Michael Ward. ICEWS Coded Event Data, 2015

  3. [3]

    Temporal knowledge graph completion: a survey

    Borui Cai, Yong Xiang, Longxiang Gao, He Zhang, Yunfeng Li, and Jianxin Li. Temporal knowledge graph completion: a survey. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pages 6545–6553, 2023

  4. [4]

    Topology-aware correlations between relations for inductive link prediction in knowledge graphs

    Jiajun Chen, Huarui He, Feng Wu, and Jie Wang. Topology-aware correlations between relations for inductive link prediction in knowledge graphs. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 6271–6278, 2021

  5. [5]

    Meta-knowledge transfer for inductive knowledge graph embedding

    Mingyang Chen, Wen Zhang, Yushan Zhu, Hongting Zhou, Zonggang Yuan, Changliang Xu, and Huajun Chen. Meta-knowledge transfer for inductive knowledge graph embedding. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, pages 927–937, 2022

  6. [6]

    Local-global history-aware contrastive learning for temporal knowledge graph reasoning

    Wei Chen, Huaiyu Wan, Yuting Wu, Shuyuan Zhao, Jiayaqi Cheng, Yuxin Li, and Youfang Lin. Local-global history-aware contrastive learning for temporal knowledge graph reasoning. In 2024 IEEE 40th International Conference on Data Engineering (ICDE), pages 733–746. IEEE, 2024

  7. [7]

    Learning phrase representations using RNN encoder– decoder for statistical machine translation

    Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder– decoder for statistical machine translation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, 2014

  8. [8]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

  9. [9]

    zrllm: Zero-shot relational learning on temporal knowledge graphs with large language models

    Zifeng Ding, Heling Cai, Jingpei Wu, Yunpu Ma, Ruotong Liao, Bo Xiong, and V olker Tresp. zrllm: Zero-shot relational learning on temporal knowledge graphs with large language models. InProceedings of the 2024 conference of the North American chapter of the association for computational linguistics: Human language technologies (Volume 1: Long papers), pag...

  10. [10]

    Transformer-based reasoning for learning evolutionary chain of events on temporal knowledge graph

    Zhiyu Fang, Shuai-Long Lei, Xiaobin Zhu, Chun Yang, Shi-Xue Zhang, Xu-Cheng Yin, and Jingyan Qin. Transformer-based reasoning for learning evolutionary chain of events on temporal knowledge graph. InProceedings of the 47th international ACM SIGIR conference on research and development in information retrieval, pages 70–79, 2024

  11. [11]

    Towards foundation models for knowledge graph reasoning

    Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, and Zhaocheng Zhu. Towards foundation models for knowledge graph reasoning. InThe Twelfth International Conference on Learning Representations, 2024. 10

  12. [12]

    Learning sequence encoders for temporal knowledge graph completion

    Alberto Garcia-Duran, Sebastijan Dumanˇci´c, and Mathias Niepert. Learning sequence encoders for temporal knowledge graph completion. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4816–4821, 2018

  13. [13]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations (ICLR), 2015

  14. [14]

    Tensor decompositions for temporal knowledge base completion

    Timothée Lacroix, Guillaume Obozinski, and Nicolas Usunier. Tensor decompositions for temporal knowledge base completion. InInternational Conference on Learning Representations, 2020

  15. [15]

    Temporal knowledge graph forecasting without knowledge using in-context learning.arXiv preprint arXiv:2305.10613, 2023a

    Dong-Ho Lee, Kian Ahrabian, Woojeong Jin, Fred Morstatter, and Jay Pujara. Temporal knowledge graph forecasting without knowledge using in-context learning.arXiv preprint arXiv:2305.10613, 2023

  16. [16]

    Ingram: Inductive knowledge graph embedding via relation graphs

    Jaejun Lee, Chanyoung Chung, and Joyce Jiyoung Whang. Ingram: Inductive knowledge graph embedding via relation graphs. InInternational conference on machine learning, pages 18796–18809. PMLR, 2023

  17. [17]

    Kalev Leetaru and Philip A Schrodt. Gdelt. InISA annual convention, volume 2, pages 1–49. Citeseer, 2013

  18. [18]

    Hismatch: Historical structure matching based temporal knowledge graph reasoning

    Zixuan Li, Zhongni Hou, Saiping Guan, Xiaolong Jin, Weihua Peng, Long Bai, Yajuan Lyu, Wei Li, Jiafeng Guo, and Xueqi Cheng. Hismatch: Historical structure matching based temporal knowledge graph reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 7328–7338, 2022

  19. [19]

    Temporal knowledge graph reasoning based on evolutional representation learning

    Zixuan Li, Xiaolong Jin, Wei Li, Saiping Guan, Jiafeng Guo, Huawei Shen, Yuanzhuo Wang, and Xueqi Cheng. Temporal knowledge graph reasoning based on evolutional representation learning. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pages 408–417, 2021

  20. [20]

    Gentkg: Generative forecasting on temporal knowledge graph with large language models

    Ruotong Liao, Xu Jia, Yangzhe Li, Yunpu Ma, and V olker Tresp. Gentkg: Generative forecasting on temporal knowledge graph with large language models. InFindings of the Association for Computational Linguistics: NAACL 2024, pages 4303–4317, 2024

  21. [21]

    Indigo: Gnn-based inductive knowledge graph completion using pair-wise encoding.Advances in Neural Information Processing Systems, 34:2034–2045, 2021

    Shuwen Liu, Bernardo Grau, Ian Horrocks, and Egor Kostylev. Indigo: Gnn-based inductive knowledge graph completion using pair-wise encoding.Advances in Neural Information Processing Systems, 34:2034–2045, 2021

  22. [22]

    Tlogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs

    Yushan Liu, Yunpu Ma, Marcel Hildebrandt, Mitchell Joblin, and V olker Tresp. Tlogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 4120–4127, 2022

  23. [23]

    An adaptive logical rule embedding model for inductive reasoning over temporal knowledge graphs

    Xin Mei, Libin Yang, Xiaoyan Cai, and Zuowei Jiang. An adaptive logical rule embedding model for inductive reasoning over temporal knowledge graphs. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7304–7316, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics

  24. [24]

    Multi-granularity history and entity similarity learning for temporal knowledge graph reasoning

    Shi Mingcong, Chunjiang Zhu, Detian Zhang, Shiting Wen, and Li Qing. Multi-granularity history and entity similarity learning for temporal knowledge graph reasoning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5232–5243, 2024

  25. [25]

    Towards foundation model on temporal knowledge graph reasoning

    Jiaxin Pan, Mojtaba Nayyeri, Osama Mohammed, Daniel Hernandez, Rongchuan Zhang, Cheng Cheng, and Steffen Staab. Towards foundation model on temporal knowledge graph reasoning. arXiv preprint arXiv:2506.06367, 2025

  26. [26]

    Compressing transfer: Mutual learning- empowered knowledge distillation for temporal knowledge graph reasoning.IEEE Transactions on Neural Networks and Learning Systems, 2025

    Ye Qian, Xiaoyan Wang, Fuhui Sun, and Li Pan. Compressing transfer: Mutual learning- empowered knowledge distillation for temporal knowledge graph reasoning.IEEE Transactions on Neural Networks and Learning Systems, 2025. 11

  27. [27]

    Temporal graph networks for deep learning on dynamic graphs

    Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael Bronstein. Temporal graph networks for deep learning on dynamic graphs. InICML 2020 Workshop on Graph Representation Learning and Beyond, 2020

  28. [28]

    End-to-end structure-aware convolutional networks for knowledge base completion

    Chao Shang, Yun Tang, Jing Huang, Jinbo Bi, Xiaodong He, and Bowen Zhou. End-to-end structure-aware convolutional networks for knowledge base completion. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 3060–3067, 2019

  29. [29]

    Inductive relation prediction by subgraph reasoning

    Komal Teru, Etienne Denis, and Will Hamilton. Inductive relation prediction by subgraph reasoning. InInternational conference on machine learning, pages 9448–9457. PMLR, 2020

  30. [30]

    Know-evolve: Deep temporal reasoning for dynamic knowledge graphs

    Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. InInternational conference on machine learning, pages 3462–3471. PMLR, 2017

  31. [31]

    DyRep: Learning representations over dynamic graphs

    Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, and Hongyuan Zha. DyRep: Learning representations over dynamic graphs. InInternational Conference on Learning Representations (ICLR), 2019

  32. [32]

    Composition-based multi-relational graph convolutional networks

    Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, and Partha P Talukdar. Composition-based multi-relational graph convolutional networks. InICLR, 2020

  33. [33]

    Large language models-guided dynamic adaptation for temporal knowledge graph reasoning.Advances in Neural Information Processing Systems, 37:8384–8410, 2024

    Jiapu Wang, Sun Kai, Linhao Luo, Wei Wei, Yongli Hu, Alan Wee-Chung Liew, Shirui Pan, and Baocai Yin. Large language models-guided dynamic adaptation for temporal knowledge graph reasoning.Advances in Neural Information Processing Systems, 37:8384–8410, 2024

  34. [34]

    Tilp: Differentiable learning of temporal logical rules on knowledge graphs.arXiv preprint arXiv:2402.12309, 2024

    Siheng Xiong, Yuan Yang, Faramarz Fekri, and James Clayton Kerce. Tilp: Differentiable learning of temporal logical rules on knowledge graphs.arXiv preprint arXiv:2402.12309, 2024

  35. [35]

    Teilp: Time prediction over knowledge graphs via logical reasoning

    Siheng Xiong, Yuan Yang, Ali Payani, James C Kerce, and Faramarz Fekri. Teilp: Time prediction over knowledge graphs via logical reasoning. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 16112–16119, 2024

  36. [36]

    Inductive repre- sentation learning on temporal graphs

    Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. Inductive repre- sentation learning on temporal graphs. InInternational Conference on Learning Representations (ICLR), 2020

  37. [37]

    arXiv preprint arXiv:2305.07912 , year=

    Wenjie Xu, Ben Liu, Miao Peng, Xu Jia, and Min Peng. Pre-trained language model with prompts for temporal knowledge graph completion.arXiv preprint arXiv:2305.07912, 2023

  38. [38]

    Temporal knowledge graph reasoning with his- torical contrastive learning

    Yi Xu, Junjie Ou, Hui Xu, and Luoyi Fu. Temporal knowledge graph reasoning with his- torical contrastive learning. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposiu...

  39. [39]

    Barlow twins: Self- supervised learning via redundancy reduction

    Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self- supervised learning via redundancy reduction. InInternational conference on machine learning, pages 12310–12320. PMLR, 2021

  40. [40]

    Histori- cally relevant event structuring for temporal knowledge graph reasoning

    Jinchuan Zhang, Ming Sun, Chong Mu, Jinhao Zhang, Quanjiang Guo, and Ling Tian. Histori- cally relevant event structuring for temporal knowledge graph reasoning. In2025 IEEE 41st International Conference on Data Engineering (ICDE), pages 3179–3192. IEEE, 2025

  41. [41]

    A survey of dynamic graph neural networks.Frontiers of Computer Science, 19(6):196323, 2025

    Yanping Zheng, Lu Yi, and Zhewei Wei. A survey of dynamic graph neural networks.Frontiers of Computer Science, 19(6):196323, 2025

  42. [42]

    Learning from history: Modeling temporal knowledge graphs with sequential copy-generation networks

    Cunchao Zhu, Muhao Chen, Changjun Fan, Guangquan Cheng, and Yan Zhang. Learning from history: Modeling temporal knowledge graphs with sequential copy-generation networks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 4732–4740, 2021. 12 Appendix A Dataset Details 14 B Baselines 15 C Implementation Details 16 D Backbone...

  43. [43]

    The transfer gate ωe = Ψ([he ∥c π(e),t])∈[0,1] d regulates how much of the prototype each entity inherits

    At every query timestamp t, the cluster prototypec π(e),t is recomputed by pooling the IC-encoder outputs of all cluster mates, so the prototype evolves as the graph evolves. The transfer gate ωe = Ψ([he ∥c π(e),t])∈[0,1] d regulates how much of the prototype each entity inherits. The codebook is trained jointly with the rest of the model through the comm...

  44. [44]

    Efficiency [2] Performance # Parameters(M)Training time(s/epoch)FLOPs(M/query)MRR H@3 H@10Value∆(%) Value∆(%) Value∆(%) Base (w/o Adaptivity) [1] 72.77 – 330.7 – 1643.9 – .1114 .1230 .2252 AdaTKG EMA (default) 78.02+7.2% 361.3+9.2% 1654.4+0.6% .1379 .1543 .2612 GRU 84.32+15.9% 378.0+14.3% 1658.8+0.9% .1428 .1599 .2605 Cross-attention82.22+13.0% 360.5+9.0%...

  45. [45]

    Efficiency [2] Performance # Parameters(M)Training time(s/epoch)FLOPs(M/query)MRR H@3 H@10Value∆(%) Value∆(%) Value∆(%) Base (w/o Adaptivity) [1] 72.71 – 420.5 – 1913.9 – .2177 .2530 .3708 AdaTKG EMA (default) 77.96+7.2% 466.3+10.9% 1924.4+0.5% .2270 .2573 .3850 GRU 84.25+15.9% 491.0+16.8% 1930.2+0.8% .2330 .2700 .3925 Cross-attention82.15+13.0% 459.8+9.4...

  46. [46]

    Efficiency [2] Performance # Parameters(M)Training time(s/epoch)FLOPs(M/query)MRR H@3 H@10Value∆(%) Value∆(%) Value∆(%) Base (w/o Adaptivity) [1] 60.08 – 1384.6 – 1062.6 – .1013 .0994 .2131 AdaTKG EMA (default) 65.33+8.7% 1488.7+7.5% 1073.1+1.0% .1051 .1129 .2301 GRU 71.63+19.2% 1549.0+11.9% 1075.9+1.3% .1112 .1141 .2243 Cross-attention69.53+15.7% 1503.2+...

  47. [47]

    Test-time online updates (right). 24 L Gate Distribution Across Benchmarks Figure L.1 extends the main-paper Figure 5 by showing the full train-time gate distribution stratified by the number of observed interactions, for every (dataset,update operator) pair. The same monotonic upward shift in the gate value with more interactions holds across all four be...