NumCoKE: Ordinal-Aware Numerical Reasoning over Knowledge Graphs with Mixture-of-Experts and Contrastive Learning

Chenyang Tu; Ming Yin; Neng Gao; Qiqing Xia; Zongsheng Cao

arxiv: 2411.12950 · v4 · submitted 2024-11-20 · 💻 cs.AI

NumCoKE: Ordinal-Aware Numerical Reasoning over Knowledge Graphs with Mixture-of-Experts and Contrastive Learning

Ming Yin , Zongsheng Cao , Qiqing Xia , Chenyang Tu , Neng Gao This is my paper

Pith reviewed 2026-05-23 17:38 UTC · model grok-4.3

classification 💻 cs.AI

keywords knowledge graphsnumerical reasoningmixture of expertscontrastive learningordinal relationshipsattribute integration

0 comments

The pith

NumCoKE combines a mixture-of-experts encoder with ordinal contrastive learning to integrate numerical attributes into knowledge graph reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to fix two gaps in existing knowledge graph models: they fail to blend symbolic triples with numerical values like lengths or weights into one shared space, and they cannot reliably tell apart close numbers that imply different relations. NumCoKE introduces an MoEKA encoder that routes numeric features through relation-specific experts and an OKCL module that builds contrastive samples based on known ordinal orderings. If these components work as described, models should produce more accurate inferences about new facts that depend on numerical comparisons. The authors test this on three public benchmarks and report consistent gains over prior methods across varied attribute distributions.

Core claim

NumCoKE is a numerical reasoning framework for knowledge graphs that uses a Mixture-of-Experts Knowledge-Aware encoder to jointly align entities, relations, and numerical attributes in a shared space while routing attribute features to relation-specific experts, paired with Ordinal Knowledge Contrastive Learning that constructs ordinal-aware positive and negative samples from prior knowledge to discriminate subtle semantic shifts.

What carries the argument

Mixture-of-Experts Knowledge-Aware (MoEKA) encoder that jointly encodes symbolic and numeric components with dynamic expert routing, together with Ordinal Knowledge Contrastive Learning (OKCL) that builds positive and negative samples using ordinal prior knowledge.

If this is right

Knowledge graph models become able to extract relation-aware semantics directly from numerical attribute values.
Models distinguish fine-grained ordinal relationships even when values are close or hard negatives are present.
Numerical fact inference improves on benchmarks that contain attributes with varying distributions.
The unified representation supports downstream tasks that combine symbolic triples with numeric comparisons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same routing and contrastive design could be tested on recommendation systems that rely on numerical attributes in knowledge graphs.
Extending the ordinal sampling to handle multi-attribute comparisons would connect this work to broader numerical query answering.
Applying the encoder to larger, noisier real-world graphs would show whether the gains hold beyond the three public benchmarks.

Load-bearing premise

The two shortcomings of prior work—incomplete semantic integration and ordinal indistinguishability—are the main bottlenecks, and the MoEKA encoder plus OKCL resolve them without introducing new failure modes on the chosen benchmarks.

What would settle it

A direct comparison on the three public KG benchmarks showing that NumCoKE does not outperform competitive baselines across diverse attribute distributions would falsify the claim of superiority in semantic integration and ordinal reasoning.

Figures

Figures reproduced from arXiv: 2411.12950 by Chenyang Tu, Ming Yin, Neng Gao, Qiqing Xia, Zongsheng Cao.

**Figure 2.** Figure 2: The Overview of our model. (a) The MoEKA Encoder encodes each entity with the relation and attributes to a unified, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Proportional test on Credit and Spotify. A relation [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 3.** Figure 3: It can be observed that the impact of the number of [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: Visualization of relevance scores of each numeric [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Knowledge graphs (KGs) serve as a vital backbone for a wide range of AI applications, including natural language understanding and recommendation. A promising yet underexplored direction is numerical reasoning over KGs, which involves inferring new facts by leveraging not only symbolic triples but also numerical attribute values (e.g., length, weight). However, existing methods fall short in two key aspects: (1) Incomplete semantic integration: Most models struggle to jointly encode entities, relations, and numerical attributes in a unified representation space, limiting their ability to extract relation-aware semantics from numeric information. (2) Ordinal indistinguishability: Due to subtle differences between close values and sampling imbalance, models often fail to capture fine-grained ordinal relationships (e.g., longer, heavier), especially in the presence of hard negatives. To address these challenges, we propose NumCoKE, a numerical reasoning framework for KGs based on Mixture-of-Experts and Ordinal Contrastive Embedding. To overcome (C1), we introduce a Mixture-of-Experts Knowledge-Aware (MoEKA) encoder that jointly aligns symbolic and numeric components into a shared semantic space, while dynamically routing attribute features to relation-specific experts. To handle (C2), we propose Ordinal Knowledge Contrastive Learning (OKCL), which constructs ordinal-aware positive and negative samples using prior knowledge, enabling the model to better discriminate subtle semantic shifts. Extensive experiments on three public KG benchmarks demonstrate that NumCoKE consistently outperforms competitive baselines across diverse attribute distributions, validating its superiority in both semantic integration and ordinal reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NumCoKE pairs MoE routing with ordinal contrastive sampling for numerical KG reasoning, but the abstract gives no ablations or controls to show the gains come from those pieces rather than capacity or tuning.

read the letter

NumCoKE introduces a framework that routes numerical attributes through relation-specific experts in a MoEKA encoder and uses prior-knowledge sampling in OKCL to build contrastive positives and negatives. The goal is to fix incomplete semantic integration of symbols and numbers plus poor handling of close ordinal values. That specific combination for this task is what the abstract presents as new relative to the baselines it cites. The setup is straightforward and directly targets the two gaps the authors name, which is a clear way to frame the problem. If the full experiments back it up with proper isolation, the method could be a useful incremental tool for KG work that involves attributes like length or weight. The main weakness is that the performance claims rest on experiments whose design is not visible here. There are no ablations separating the contribution of the routing mechanism from extra parameters, no error bars, no dataset statistics, and no probing to confirm that improvements trace to better numeric-symbolic alignment or finer ordinal discrimination rather than standard supervised fitting on the same three benchmarks. The stress-test concern lands: without that evidence, the superiority could come from unrelated factors. This is scoped work for people already working on numerical extensions to KG embeddings. A reader in that subfield could extract the method description and see whether the benchmarks match their needs, but the current presentation does not let an outsider verify the central claims. I would send it for peer review so the experimental controls and analysis can be checked in detail.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes NumCoKE, a numerical reasoning framework over knowledge graphs that introduces a Mixture-of-Experts Knowledge-Aware (MoEKA) encoder to jointly align symbolic triples and numerical attributes into a shared space with relation-specific routing, and an Ordinal Knowledge Contrastive Learning (OKCL) objective that constructs positive/negative samples using prior knowledge to capture fine-grained ordinal distinctions. It claims these components address incomplete semantic integration and ordinal indistinguishability, respectively, and reports consistent outperformance over competitive baselines on three public KG benchmarks across diverse attribute distributions.

Significance. If the performance gains can be isolated to the proposed mechanisms, the work would offer a concrete, extensible approach for incorporating numerical attributes into KG embeddings while explicitly handling ordinal relations, which could benefit downstream tasks such as recommendation and numerical question answering. The paper merits credit for clearly articulating two failure modes and for designing targeted components (dynamic expert routing and prior-knowledge contrastive sampling) rather than relying on generic capacity increases.

major comments (2)

[Experimental section (results and ablations)] The central claim that MoEKA and OKCL directly resolve the two stated bottlenecks (incomplete semantic integration and ordinal indistinguishability) is load-bearing, yet the experimental section provides no ablation studies that remove or disable each component individually while holding parameter count and optimization fixed; without such controls it is impossible to rule out that observed gains arise from added routing capacity or curated sample construction rather than improved numeric-symbolic alignment.
[§4 (or equivalent experimental analysis subsection)] No error analysis, probing tasks, or per-benchmark breakdown is supplied to verify that baseline failures are primarily attributable to the two identified shortcomings rather than other factors (e.g., embedding dimensionality or negative sampling strategy); this leaves the attribution of NumCoKE's superiority unverified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger experimental controls. We address each major comment below and will revise the manuscript to incorporate the suggested analyses.

read point-by-point responses

Referee: [Experimental section (results and ablations)] The central claim that MoEKA and OKCL directly resolve the two stated bottlenecks (incomplete semantic integration and ordinal indistinguishability) is load-bearing, yet the experimental section provides no ablation studies that remove or disable each component individually while holding parameter count and optimization fixed; without such controls it is impossible to rule out that observed gains arise from added routing capacity or curated sample construction rather than improved numeric-symbolic alignment.

Authors: We agree that isolating the contributions of MoEKA and OKCL via controlled ablations is necessary to substantiate the central claims. The current results show overall gains but lack these specific controls. In the revised manuscript, we will add ablation experiments that disable each component individually (e.g., single-expert variant for MoEKA and standard contrastive objective for OKCL) while matching parameter counts and optimization settings across all variants. revision: yes
Referee: [§4 (or equivalent experimental analysis subsection)] No error analysis, probing tasks, or per-benchmark breakdown is supplied to verify that baseline failures are primarily attributable to the two identified shortcomings rather than other factors (e.g., embedding dimensionality or negative sampling strategy); this leaves the attribution of NumCoKE's superiority unverified.

Authors: We acknowledge the value of additional diagnostic analysis to link baseline shortcomings directly to the two identified issues. We will expand the experimental section with an error analysis, per-benchmark breakdowns, and targeted probing of numerical attribute handling to better attribute performance differences to semantic integration and ordinal reasoning rather than confounding factors. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical ML validation on benchmarks

full rationale

The paper proposes NumCoKE with MoEKA encoder and OKCL objective, then reports empirical outperformance on three public KG benchmarks. No derivation chain, theorem, or first-principles result is claimed that reduces by construction to its inputs. The listed patterns (self-definitional, fitted-input-as-prediction, self-citation load-bearing, etc.) do not apply: performance deltas are measured against external baselines on held-out test splits, with no equations or self-citations shown to force the outcome. This is standard supervised evaluation and remains self-contained against the benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The performance claim depends on the effectiveness of two newly introduced modules whose internal parameters are learned from the benchmark data; the abstract supplies no independent evidence that these modules generalize beyond the fitted setting.

free parameters (2)

number of experts and routing mechanism in MoEKA
Chosen to dynamically route attribute features to relation-specific experts; value not stated in abstract.
contrastive margin and sampling strategy in OKCL
Used to construct ordinal-aware positive and negative samples; values not stated in abstract.

axioms (2)

domain assumption Numerical attribute values in KGs carry relation-aware semantics that can be aligned with symbolic triples in a shared space.
Invoked when stating that prior models fail at semantic integration.
ad hoc to paper Hard negatives and sampling imbalance are the main causes of ordinal indistinguishability.
Stated as the reason OKCL is needed.

invented entities (2)

MoEKA encoder no independent evidence
purpose: Jointly align symbolic and numeric components with dynamic expert routing.
New component introduced to solve challenge C1; no independent evidence supplied.
OKCL contrastive objective no independent evidence
purpose: Construct ordinal-aware positives and negatives to discriminate subtle value shifts.
New training procedure introduced to solve challenge C2; no independent evidence supplied.

pith-pipeline@v0.9.0 · 5828 in / 1610 out tokens · 56143 ms · 2026-05-23T17:38:11.033884+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

InProceedings of the AAAI Conference on Artificial Intel- ligence., 1811–1818

Convolutional 2D Knowledge Graph Embeddings. InProceedings of the AAAI Conference on Artificial Intel- ligence., 1811–1818. Duan, H.; Yang, Y .; and Tam, K. Y . 2021. Learning Numer- acy: A Simple Yet Effective Number Embedding Approach Using Knowledge Graph. InFindings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Ca...

work page 2021
[2]

InProceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023, 8016–8025

Adversarial Bootstrapped Question Representation Learning for Knowledge Tracing. InProceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023, 8016–8025. Tay, Y .; Tuan, L. A.; Phan, M. C.; and Hui, S. C. 2017. Multi-Task Neural Network for Non-discrete Attribute Pre- diction in Knowle...

work page 2023

[1] [1]

InProceedings of the AAAI Conference on Artificial Intel- ligence., 1811–1818

Convolutional 2D Knowledge Graph Embeddings. InProceedings of the AAAI Conference on Artificial Intel- ligence., 1811–1818. Duan, H.; Yang, Y .; and Tam, K. Y . 2021. Learning Numer- acy: A Simple Yet Effective Number Embedding Approach Using Knowledge Graph. InFindings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Ca...

work page 2021

[2] [2]

InProceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023, 8016–8025

Adversarial Bootstrapped Question Representation Learning for Knowledge Tracing. InProceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023, 8016–8025. Tay, Y .; Tuan, L. A.; Phan, M. C.; and Hui, S. C. 2017. Multi-Task Neural Network for Non-discrete Attribute Pre- diction in Knowle...

work page 2023