arxiv: 2605.07121 · v1 · submitted 2026-05-08 · 💻 cs.AI · cs.LG

Recognition: 2 theorem links

· Lean Theorem

AdaTKG: Adaptive Memory for Temporal Knowledge Graph Reasoning

Seunghan Lee , Jun Seo , Jaehoon Lee , Sungdong Yoo , Minjae Kim , Tae Yoon Lim , Dongwan Kang , Hwanil Choi

show 2 more authors

Soonyoung Lee Wonbin Ahn

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:21 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords temporal knowledge graphsadaptive memoryentity representationsexponential moving averageknowledge graph reasoningdynamic updatesonline learningunseen entities

0 comments

The pith

Entity representations in temporal knowledge graphs should update adaptively with each new interaction rather than remain fixed after training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that static entity embeddings limit reasoning over time-stamped facts because they ignore the sequence of interactions an entity has experienced. Instead, it models each entity as an ongoing process whose memory is refreshed online whenever it appears in a new fact. The update uses a learnable exponential moving average controlled by one shared scalar, which lets the model incorporate unseen entities without extra parameters. If this holds, predictions improve as more facts arrive and the approach works for streaming or evolving knowledge. A sympathetic reader would care because it replaces fixed parameters with accumulating memory that matches the dynamic nature of real events.

Core claim

We depart from the static view and propose that each entity be modeled as an adaptive process whose representation is refined every time the entity participates in a fact. AdaTKG maintains a per-entity memory that is updated with every observed interaction, with the memory accumulating online and predictions improving as more interactions arrive. We instantiate the memory update as a learnable exponential moving average governed by a single shared scalar instead of using learnable parameters for each entity, enabling AdaTKG to handle entities unseen during training.

What carries the argument

Per-entity memory updated by a learnable exponential moving average controlled by one shared scalar parameter.

If this is right

Representations improve automatically as additional facts about an entity are observed.
The model can accept new entities at test time without retraining or extra parameters.
Performance gains appear consistently across standard temporal knowledge graph benchmarks.
Memory operates online, supporting streaming fact sequences without full retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The shared-scalar design could extend to other online graph tasks where entities evolve, such as dynamic social networks or transaction graphs.
Long-horizon experiments could test whether the single scalar remains stable or whether drift requires occasional recalibration.
The approach implies that temporal reasoning benefits from treating memory as a running average rather than a fixed lookup table.

Load-bearing premise

A single shared scalar is enough to set the right update rate for every entity and relation without needing entity-specific rules or more complex memory mechanisms.

What would settle it

Run the same model but replace the shared scalar with either per-entity learnable update rates or a more elaborate update rule; if accuracy rises substantially or unseen-entity performance collapses under the shared scalar, the central claim is false.

Figures

Figures reproduced from arXiv: 2605.07121 by Dongwan Kang, Hwanil Choi, Jaehoon Lee, Jun Seo, Minjae Kim, Seunghan Lee, Soonyoung Lee, Sungdong Yoo, Tae Yoon Lim, Wonbin Ahn.

**Figure 1.** Figure 1: Adaptive TKG reasoning. Our framework enables adaptivity through a memory refined with each interaction. While recent work has focused on whether a TKG reasoner can handle emerging entities unseen at training time (transductive vs. inductive), we focus on a different aspect, namely whether each entity’s representation is refined each time the entity participates in a fact, which we formalize as the static… view at source ↗

**Figure 2.** Figure 2: Ours: Adaptive + Inductive. 4 From Static TKG to Adaptive TKG In this section, we cast representative prior methods and our proposal into a common scoring form. Let he ∈ R d denote the base representation of entity e and hr ∈ R d the embedding of relation r. For a query (eq, rq, ?, tq), the score assigned to a candidate eo takes the unified form ϕtq (eq, rq, eo) = f [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: AdaTKG architecture. An encoder produces an interaction signal x (τ) which updates the entity’s memory m(τ) via a learnable EMA. The decay rate α controls how strongly the previous memory is retained relative to the new signal. To realize this principle, we instantiate the stateful update operator U of Eq. (5) as a simple yet effective design, a learnable EMA ( [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Performance by # interactions at train and test time. We measure the memory’s contribution by ∆RR = RRfull − RRzero (i.e., the difference between the per-query RR of the model w/ and w/o its memory branch) and stratify it along two axes: (Left) Train-time history depth of the subject and the (Right) Test-time online updates accumulated during inference. The increase of ∆RR along both axes, consistent acros… view at source ↗

**Figure 5.** Figure 5: Gate value by # interactions. Distribution of the learned gate g (tq) e , stratified by the number of interactions. 0 5 10 15 20 25 30 35 Epoch 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Test MRR ICEWS14 w/o Adaptivity AdaTKG-GRU AdaTKG-EMA AdaTKG-CrossAtt Early stop 0 50 100 150 Epoch 0.04 0.06 0.08 0.10 0.12 0.14 Test MRR ICEWS18 w/o Adaptivity AdaTKG-GRU AdaTKG-EMA AdaTKG-CrossAtt Early stop [PITH_FULL_IMAGE:… view at source ↗

**Figure 7.** Figure 7: Sensitivity to memory-related hyperparameters. Each point is one hyperparameter setting of AdaTKG, compared against the w/o Adaptivity baseline at its best setting (dashed line) [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative gate trace. (8) Qualitative ex: How memory adapts ( [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

read the original abstract

Temporal knowledge graphs (TKGs) represent time-stamped relational facts and support a wide range of reasoning tasks over evolving events. However, existing methods produce entity representations that are static at the entity level, in that each representation is a function of learned parameters only and retains no trace of the interactions in which the entity has participated. In this paper, we depart from this static view and propose that each entity be modeled as an adaptive process whose representation is refined every time the entity participates in a fact. To this end, we propose AdaTKG, which maintains a per-entity memory that is updated with every observed interaction, with the memory accumulating online and predictions improving as more interactions arrive. Specifically, we instantiate the memory update as a learnable exponential moving average governed by a single shared scalar instead of using learnable parameters for each entity, enabling AdaTKG to handle entities unseen during training. Extensive experiments confirm consistent gains over TKG baselines, demonstrating the effectiveness of adaptive memory. Code is publicly available at: https://github.com/seunghan96/AdaTKG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AdaTKG replaces static embeddings with per-entity memory updated by one shared EMA scalar, giving a clean way to adapt online and handle unseen entities in temporal KGs.

read the letter

The main point is that this paper replaces the usual static entity embeddings in temporal knowledge graph models with a per-entity memory that gets updated every time the entity appears in a new fact. They implement the update as a learnable exponential moving average controlled by a single shared scalar rather than per-entity parameters, which keeps the model small and lets it initialize memory for entities never seen in training. This is a clear step away from the static paradigm in the baselines they cite. The approach is motivated directly from the need for representations that improve as more interactions arrive, and the math stays simple without introducing circularity. Experiments on standard TKG benchmarks show steady gains, and the public code makes the claims easy to check. That combination of a new architectural choice and reproducible results is what stands out. The soft spots are limited. A single shared scalar assumes it can capture enough variation across different entities and relations, and the paper does not provide much analysis on when this choice might fall short or why more expressive updates would not help. Baseline re-implementations and statistical significance tests could also be spelled out more fully to make the gains fully convincing, though nothing here looks like a load-bearing problem. The work is aimed at people already working on temporal knowledge graphs or dynamic graph reasoning tasks such as event prediction. A reader who wants practical, parameter-efficient improvements to online reasoning over evolving facts will find the method and results useful. It deserves a serious referee because the adaptive mechanism is novel enough within the subfield and the evidence is grounded enough to benefit from external review. I would send it out for peer review; the contribution is solid and the design choices are worth discussing in detail.

Referee Report

2 major / 2 minor

Summary. The paper claims that standard TKG methods produce static entity representations that ignore interaction history, and introduces AdaTKG to model each entity as an adaptive process whose memory is updated online via a learnable exponential moving average controlled by a single shared scalar. This design is argued to improve reasoning as more facts arrive and to enable generalization to entities unseen at training time, with experiments showing consistent gains over TKG baselines on public benchmarks and with code released.

Significance. If the empirical gains hold under closer scrutiny, the work offers a lightweight, parameter-efficient way to introduce adaptivity into TKG entity representations without per-entity parameters, directly addressing generalization to new entities. The public code release is a clear strength that supports reproducibility and allows independent verification of the reported improvements.

major comments (2)

[§4 (Experimental Results)] §4 (Experimental Results): the manuscript reports consistent gains over baselines but provides no details on baseline re-implementations, hyper-parameter tuning protocols, or statistical significance (standard deviations across runs or hypothesis tests), which is required to confirm that the observed improvements are attributable to the adaptive memory rather than implementation differences.
[§3.2 (Memory Update Rule)] §3.2 (Memory Update Rule): while the single shared EMA scalar is presented as sufficient for both seen and unseen entities, the paper does not include an ablation comparing it against entity-specific scalars or more expressive update functions, leaving the central design choice (shared scalar for generalization) without direct empirical support for its sufficiency across diverse entity dynamics.

minor comments (2)

[§3] Notation for the memory state and the shared scalar α should be introduced once with a clear equation reference and then used consistently; occasional reuse of symbols for different quantities appears in the model description.
[§4] Figure captions and axis labels in the result plots would benefit from explicit mention of the evaluation metric (e.g., MRR or Hits@10) and the exact time-split protocol used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We address each major comment below with our responses and indicate the revisions planned for the manuscript.

read point-by-point responses

Referee: §4 (Experimental Results): the manuscript reports consistent gains over baselines but provides no details on baseline re-implementations, hyper-parameter tuning protocols, or statistical significance (standard deviations across runs or hypothesis tests), which is required to confirm that the observed improvements are attributable to the adaptive memory rather than implementation differences.

Authors: We agree that these details are important for reproducibility and to isolate the contribution of the adaptive memory. In the revised manuscript, Section 4 will be expanded to describe: (i) baseline re-implementations, including use of official code repositories where available and adherence to the original papers' settings; (ii) the hyper-parameter tuning protocol, specifying the search ranges, validation metric, and selection procedure applied uniformly to AdaTKG and all baselines; and (iii) statistical significance, with mean performance and standard deviations over five independent runs plus paired t-test p-values against the strongest baseline. These additions will be cross-referenced to the already-public code repository, which contains the exact configurations. revision: yes
Referee: §3.2 (Memory Update Rule): while the single shared EMA scalar is presented as sufficient for both seen and unseen entities, the paper does not include an ablation comparing it against entity-specific scalars or more expressive update functions, leaving the central design choice (shared scalar for generalization) without direct empirical support for its sufficiency across diverse entity dynamics.

Authors: The single shared scalar is intentionally chosen to support generalization to entities unseen at training time; entity-specific scalars or parameters are inapplicable by design for such entities. We acknowledge that an explicit ablation would strengthen the empirical case. In the revision we will (a) add a paragraph in §3.2 clarifying this design rationale and (b) include new experimental results comparing the learnable shared scalar against a fixed (non-learnable) EMA and against a more expressive update (e.g., a small feed-forward network) on seen entities. These results will be reported for the standard benchmarks while noting that entity-specific alternatives remain infeasible for the unseen-entity setting that motivates the method. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core proposal is an architectural design choice: modeling each entity as an adaptive process whose memory is updated online via a single shared learnable EMA scalar. This is explicitly motivated by the goal of parameter-efficient generalization to unseen entities and is not derived from or equivalent to any prior fitted quantity, self-cited result, or input data pattern. The update rule is presented as a new instantiation rather than a renaming or redefinition of existing components, and the claims rest on empirical gains over baselines rather than any reduction by construction. No load-bearing step in the provided derivation chain equates a prediction to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on one free parameter (the shared EMA scalar) and the domain assumption that EMA-based memory accumulation captures sufficient temporal dynamics for improved reasoning.

free parameters (1)

shared EMA scalar
Single learnable parameter controlling the update rate for all entities, chosen to enable generalization to unseen entities.

axioms (1)

domain assumption Entity representations benefit from online refinement via exponential moving average of observed interactions
Invoked to justify the memory update mechanism as an effective adaptive process.

invented entities (1)

per-entity adaptive memory no independent evidence
purpose: To accumulate interaction history and refine representations dynamically
New modeling construct introduced to replace static embeddings.

pith-pipeline@v0.9.0 · 5516 in / 1270 out tokens · 36802 ms · 2026-05-11T01:21:06.191340+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we instantiate the memory update as a learnable exponential moving average governed by a single shared scalar instead of using learnable parameters for each entity
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the memory accumulating online and predictions improving as more interactions arrive

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

[1]

Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities

Anonymous. Transfir: Transferable few-shot inductive reasoning for emerging entities on temporal knowledge graphs.arXiv preprint arXiv:2604.10164, 2026. To appear at ICLR 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

ICEWS Coded Event Data, 2015

Elizabeth Boschee, Jennifer Lautenschlager, Sean O’Brien, Steve Shellman, James Starz, and Michael Ward. ICEWS Coded Event Data, 2015

work page 2015
[3]

Temporal knowledge graph completion: a survey

Borui Cai, Yong Xiang, Longxiang Gao, He Zhang, Yunfeng Li, and Jianxin Li. Temporal knowledge graph completion: a survey. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pages 6545–6553, 2023

work page 2023
[4]

Topology-aware correlations between relations for inductive link prediction in knowledge graphs

Jiajun Chen, Huarui He, Feng Wu, and Jie Wang. Topology-aware correlations between relations for inductive link prediction in knowledge graphs. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 6271–6278, 2021

work page 2021
[5]

Meta-knowledge transfer for inductive knowledge graph embedding

Mingyang Chen, Wen Zhang, Yushan Zhu, Hongting Zhou, Zonggang Yuan, Changliang Xu, and Huajun Chen. Meta-knowledge transfer for inductive knowledge graph embedding. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, pages 927–937, 2022

work page 2022
[6]

Local-global history-aware contrastive learning for temporal knowledge graph reasoning

Wei Chen, Huaiyu Wan, Yuting Wu, Shuyuan Zhao, Jiayaqi Cheng, Yuxin Li, and Youfang Lin. Local-global history-aware contrastive learning for temporal knowledge graph reasoning. In 2024 IEEE 40th International Conference on Data Engineering (ICDE), pages 733–746. IEEE, 2024

work page 2024
[7]

Learning phrase representations using RNN encoder– decoder for statistical machine translation

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder– decoder for statistical machine translation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, 2014

work page 2014
[8]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019
[9]

zrllm: Zero-shot relational learning on temporal knowledge graphs with large language models

Zifeng Ding, Heling Cai, Jingpei Wu, Yunpu Ma, Ruotong Liao, Bo Xiong, and V olker Tresp. zrllm: Zero-shot relational learning on temporal knowledge graphs with large language models. InProceedings of the 2024 conference of the North American chapter of the association for computational linguistics: Human language technologies (Volume 1: Long papers), pag...

work page 2024
[10]

Transformer-based reasoning for learning evolutionary chain of events on temporal knowledge graph

Zhiyu Fang, Shuai-Long Lei, Xiaobin Zhu, Chun Yang, Shi-Xue Zhang, Xu-Cheng Yin, and Jingyan Qin. Transformer-based reasoning for learning evolutionary chain of events on temporal knowledge graph. InProceedings of the 47th international ACM SIGIR conference on research and development in information retrieval, pages 70–79, 2024

work page 2024
[11]

Towards foundation models for knowledge graph reasoning

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, and Zhaocheng Zhu. Towards foundation models for knowledge graph reasoning. InThe Twelfth International Conference on Learning Representations, 2024. 10

work page 2024
[12]

Learning sequence encoders for temporal knowledge graph completion

Alberto Garcia-Duran, Sebastijan Dumanˇci´c, and Mathias Niepert. Learning sequence encoders for temporal knowledge graph completion. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4816–4821, 2018

work page 2018
[13]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations (ICLR), 2015

work page 2015
[14]

Tensor decompositions for temporal knowledge base completion

Timothée Lacroix, Guillaume Obozinski, and Nicolas Usunier. Tensor decompositions for temporal knowledge base completion. InInternational Conference on Learning Representations, 2020

work page 2020
[15]

Temporal knowledge graph forecasting without knowledge using in-context learning.arXiv preprint arXiv:2305.10613, 2023a

Dong-Ho Lee, Kian Ahrabian, Woojeong Jin, Fred Morstatter, and Jay Pujara. Temporal knowledge graph forecasting without knowledge using in-context learning.arXiv preprint arXiv:2305.10613, 2023

work page arXiv 2023
[16]

Ingram: Inductive knowledge graph embedding via relation graphs

Jaejun Lee, Chanyoung Chung, and Joyce Jiyoung Whang. Ingram: Inductive knowledge graph embedding via relation graphs. InInternational conference on machine learning, pages 18796–18809. PMLR, 2023

work page 2023
[17]

Kalev Leetaru and Philip A Schrodt. Gdelt. InISA annual convention, volume 2, pages 1–49. Citeseer, 2013

work page 2013
[18]

Hismatch: Historical structure matching based temporal knowledge graph reasoning

Zixuan Li, Zhongni Hou, Saiping Guan, Xiaolong Jin, Weihua Peng, Long Bai, Yajuan Lyu, Wei Li, Jiafeng Guo, and Xueqi Cheng. Hismatch: Historical structure matching based temporal knowledge graph reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 7328–7338, 2022

work page 2022
[19]

Temporal knowledge graph reasoning based on evolutional representation learning

Zixuan Li, Xiaolong Jin, Wei Li, Saiping Guan, Jiafeng Guo, Huawei Shen, Yuanzhuo Wang, and Xueqi Cheng. Temporal knowledge graph reasoning based on evolutional representation learning. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pages 408–417, 2021

work page 2021
[20]

Gentkg: Generative forecasting on temporal knowledge graph with large language models

Ruotong Liao, Xu Jia, Yangzhe Li, Yunpu Ma, and V olker Tresp. Gentkg: Generative forecasting on temporal knowledge graph with large language models. InFindings of the Association for Computational Linguistics: NAACL 2024, pages 4303–4317, 2024

work page 2024
[21]

Indigo: Gnn-based inductive knowledge graph completion using pair-wise encoding.Advances in Neural Information Processing Systems, 34:2034–2045, 2021

Shuwen Liu, Bernardo Grau, Ian Horrocks, and Egor Kostylev. Indigo: Gnn-based inductive knowledge graph completion using pair-wise encoding.Advances in Neural Information Processing Systems, 34:2034–2045, 2021

work page 2034
[22]

Tlogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs

Yushan Liu, Yunpu Ma, Marcel Hildebrandt, Mitchell Joblin, and V olker Tresp. Tlogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 4120–4127, 2022

work page 2022
[23]

An adaptive logical rule embedding model for inductive reasoning over temporal knowledge graphs

Xin Mei, Libin Yang, Xiaoyan Cai, and Zuowei Jiang. An adaptive logical rule embedding model for inductive reasoning over temporal knowledge graphs. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7304–7316, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics

work page 2022
[24]

Multi-granularity history and entity similarity learning for temporal knowledge graph reasoning

Shi Mingcong, Chunjiang Zhu, Detian Zhang, Shiting Wen, and Li Qing. Multi-granularity history and entity similarity learning for temporal knowledge graph reasoning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5232–5243, 2024

work page 2024
[25]

Towards foundation model on temporal knowledge graph reasoning

Jiaxin Pan, Mojtaba Nayyeri, Osama Mohammed, Daniel Hernandez, Rongchuan Zhang, Cheng Cheng, and Steffen Staab. Towards foundation model on temporal knowledge graph reasoning. arXiv preprint arXiv:2506.06367, 2025

work page arXiv 2025
[26]

Compressing transfer: Mutual learning- empowered knowledge distillation for temporal knowledge graph reasoning.IEEE Transactions on Neural Networks and Learning Systems, 2025

Ye Qian, Xiaoyan Wang, Fuhui Sun, and Li Pan. Compressing transfer: Mutual learning- empowered knowledge distillation for temporal knowledge graph reasoning.IEEE Transactions on Neural Networks and Learning Systems, 2025. 11

work page 2025
[27]

Temporal graph networks for deep learning on dynamic graphs

Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael Bronstein. Temporal graph networks for deep learning on dynamic graphs. InICML 2020 Workshop on Graph Representation Learning and Beyond, 2020

work page 2020
[28]

End-to-end structure-aware convolutional networks for knowledge base completion

Chao Shang, Yun Tang, Jing Huang, Jinbo Bi, Xiaodong He, and Bowen Zhou. End-to-end structure-aware convolutional networks for knowledge base completion. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 3060–3067, 2019

work page 2019
[29]

Inductive relation prediction by subgraph reasoning

Komal Teru, Etienne Denis, and Will Hamilton. Inductive relation prediction by subgraph reasoning. InInternational conference on machine learning, pages 9448–9457. PMLR, 2020

work page 2020
[30]

Know-evolve: Deep temporal reasoning for dynamic knowledge graphs

Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. InInternational conference on machine learning, pages 3462–3471. PMLR, 2017

work page 2017
[31]

DyRep: Learning representations over dynamic graphs

Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, and Hongyuan Zha. DyRep: Learning representations over dynamic graphs. InInternational Conference on Learning Representations (ICLR), 2019

work page 2019
[32]

Composition-based multi-relational graph convolutional networks

Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, and Partha P Talukdar. Composition-based multi-relational graph convolutional networks. InICLR, 2020

work page 2020
[33]

Large language models-guided dynamic adaptation for temporal knowledge graph reasoning.Advances in Neural Information Processing Systems, 37:8384–8410, 2024

Jiapu Wang, Sun Kai, Linhao Luo, Wei Wei, Yongli Hu, Alan Wee-Chung Liew, Shirui Pan, and Baocai Yin. Large language models-guided dynamic adaptation for temporal knowledge graph reasoning.Advances in Neural Information Processing Systems, 37:8384–8410, 2024

work page 2024
[34]

Tilp: Differentiable learning of temporal logical rules on knowledge graphs.arXiv preprint arXiv:2402.12309, 2024

Siheng Xiong, Yuan Yang, Faramarz Fekri, and James Clayton Kerce. Tilp: Differentiable learning of temporal logical rules on knowledge graphs.arXiv preprint arXiv:2402.12309, 2024

work page arXiv 2024
[35]

Teilp: Time prediction over knowledge graphs via logical reasoning

Siheng Xiong, Yuan Yang, Ali Payani, James C Kerce, and Faramarz Fekri. Teilp: Time prediction over knowledge graphs via logical reasoning. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 16112–16119, 2024

work page 2024
[36]

Inductive repre- sentation learning on temporal graphs

Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. Inductive repre- sentation learning on temporal graphs. InInternational Conference on Learning Representations (ICLR), 2020

work page 2020
[37]

arXiv preprint arXiv:2305.07912 , year=

Wenjie Xu, Ben Liu, Miao Peng, Xu Jia, and Min Peng. Pre-trained language model with prompts for temporal knowledge graph completion.arXiv preprint arXiv:2305.07912, 2023

work page arXiv 2023
[38]

Temporal knowledge graph reasoning with his- torical contrastive learning

Yi Xu, Junjie Ou, Hui Xu, and Luoyi Fu. Temporal knowledge graph reasoning with his- torical contrastive learning. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposiu...

work page 2023
[39]

Barlow twins: Self- supervised learning via redundancy reduction

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self- supervised learning via redundancy reduction. InInternational conference on machine learning, pages 12310–12320. PMLR, 2021

work page 2021
[40]

Histori- cally relevant event structuring for temporal knowledge graph reasoning

Jinchuan Zhang, Ming Sun, Chong Mu, Jinhao Zhang, Quanjiang Guo, and Ling Tian. Histori- cally relevant event structuring for temporal knowledge graph reasoning. In2025 IEEE 41st International Conference on Data Engineering (ICDE), pages 3179–3192. IEEE, 2025

work page 2025
[41]

A survey of dynamic graph neural networks.Frontiers of Computer Science, 19(6):196323, 2025

Yanping Zheng, Lu Yi, and Zhewei Wei. A survey of dynamic graph neural networks.Frontiers of Computer Science, 19(6):196323, 2025

work page 2025
[42]

Learning from history: Modeling temporal knowledge graphs with sequential copy-generation networks

Cunchao Zhu, Muhao Chen, Changjun Fan, Guangquan Cheng, and Yan Zhang. Learning from history: Modeling temporal knowledge graphs with sequential copy-generation networks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 4732–4740, 2021. 12 Appendix A Dataset Details 14 B Baselines 15 C Implementation Details 16 D Backbone...

work page 2021
[43]

The transfer gate ωe = Ψ([he ∥c π(e),t])∈[0,1] d regulates how much of the prototype each entity inherits

At every query timestamp t, the cluster prototypec π(e),t is recomputed by pooling the IC-encoder outputs of all cluster mates, so the prototype evolves as the graph evolves. The transfer gate ωe = Ψ([he ∥c π(e),t])∈[0,1] d regulates how much of the prototype each entity inherits. The codebook is trained jointly with the rest of the model through the comm...

work page 2011
[44]

Efficiency [2] Performance # Parameters(M)Training time(s/epoch)FLOPs(M/query)MRR H@3 H@10Value∆(%) Value∆(%) Value∆(%) Base (w/o Adaptivity) [1] 72.77 – 330.7 – 1643.9 – .1114 .1230 .2252 AdaTKG EMA (default) 78.02+7.2% 361.3+9.2% 1654.4+0.6% .1379 .1543 .2612 GRU 84.32+15.9% 378.0+14.3% 1658.8+0.9% .1428 .1599 .2605 Cross-attention82.22+13.0% 360.5+9.0%...

work page
[45]

Efficiency [2] Performance # Parameters(M)Training time(s/epoch)FLOPs(M/query)MRR H@3 H@10Value∆(%) Value∆(%) Value∆(%) Base (w/o Adaptivity) [1] 72.71 – 420.5 – 1913.9 – .2177 .2530 .3708 AdaTKG EMA (default) 77.96+7.2% 466.3+10.9% 1924.4+0.5% .2270 .2573 .3850 GRU 84.25+15.9% 491.0+16.8% 1930.2+0.8% .2330 .2700 .3925 Cross-attention82.15+13.0% 459.8+9.4...

work page 1913
[46]

Efficiency [2] Performance # Parameters(M)Training time(s/epoch)FLOPs(M/query)MRR H@3 H@10Value∆(%) Value∆(%) Value∆(%) Base (w/o Adaptivity) [1] 60.08 – 1384.6 – 1062.6 – .1013 .0994 .2131 AdaTKG EMA (default) 65.33+8.7% 1488.7+7.5% 1073.1+1.0% .1051 .1129 .2301 GRU 71.63+19.2% 1549.0+11.9% 1075.9+1.3% .1112 .1141 .2243 Cross-attention69.53+15.7% 1503.2+...

work page
[47]

Test-time online updates (right). 24 L Gate Distribution Across Benchmarks Figure L.1 extends the main-paper Figure 5 by showing the full train-time gate distribution stratified by the number of observed interactions, for every (dataset,update operator) pair. The same monotonic upward shift in the gate value with more interactions holds across all four be...

work page 2026