Collaboration of Fusion and Independence: Hypercomplex-driven Robust Multi-Modal Knowledge Graph Completion

Lei Liang; Mengshu Sun; Wen Zhang; Yichi Zhang; Zhiqiang Liu

arxiv: 2509.23714 · v2 · submitted 2025-09-28 · 💻 cs.CL

Collaboration of Fusion and Independence: Hypercomplex-driven Robust Multi-Modal Knowledge Graph Completion

Zhiqiang Liu , Yichi Zhang , Mengshu Sun , Lei Liang , Wen Zhang This is my paper

Pith reviewed 2026-05-18 12:23 UTC · model grok-4.3

classification 💻 cs.CL

keywords multi-modal knowledge graph completionbiquaternionhypercomplex algebramodality fusionHamilton productknowledge graph embeddingcross-modal interactionentity representation

0 comments

The pith

M-Hyper maps three independent modalities and one fused modality to biquaternion bases for collaborative cross-modal interaction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes M-Hyper to complete missing facts in multi-modal knowledge graphs by balancing fusion and independence. Existing fusion methods lose modality-specific details through fixed strategies, while ensemble methods miss context-dependent interactions between modalities. M-Hyper uses a Fine-grained Entity Representation Factorization module for three independent representations and a Robust Relation-aware Modality Fusion module for one fused representation. These four representations are assigned to the four orthogonal bases of a biquaternion so the Hamilton product can model pairwise interactions. A sympathetic reader would care if this produces more accurate link predictions in graphs that combine structural, textual, and visual data while remaining efficient.

Core claim

We propose a novel MMKGC method M-Hyper, which achieves the coexistence and collaboration of fused and independent modality representations. The resulting four modality representations are then mapped to the four orthogonal bases of a biquaternion for comprehensive modality interaction.

What carries the argument

Biquaternion with four orthogonal bases, each holding one modality representation (three independent plus one fused), where the Hamilton product computes interactions among them.

If this is right

The method reaches state-of-the-art performance on multi-modal knowledge graph completion benchmarks.
It remains robust when modality relevance changes across different contexts.
It preserves modality-specific information that fixed fusion strategies discard.
It achieves the above with lower computational cost than full ensemble approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same biquaternion construction could be tested on other multi-modal tasks such as visual question answering where independence and fusion must coexist.
Extending the algebra to octonions might allow five or more modalities without redesigning the interaction mechanism.
The approach suggests that hypercomplex numbers can serve as a parameter-light way to enforce both separation and mixing in representation learning.

Load-bearing premise

The assumption that mapping three independent modalities and one fused modality onto the four orthogonal bases of a biquaternion, combined with the Hamilton product, will enable effective cross-modal interactions while preserving modality-specific information without significant loss or added complexity.

What would settle it

An ablation study on standard MMKG benchmarks in which removing the biquaternion mapping and Hamilton product produces no statistically significant drop in Hits@10 or MRR would falsify the claim.

read the original abstract

Multi-modal knowledge graph completion (MMKGC) aims to discover missing facts in multi-modal knowledge graphs (MMKGs) by leveraging both structural relationships and diverse modality information of entities. Existing MMKGC methods follow two multi-modal paradigms: fusion-based and ensemble-based. Fusion-based methods employ fixed fusion strategies, which inevitably leads to the loss of modality-specific information and a lack of flexibility to adapt to varying modality relevance across contexts. In contrast, ensemble-based methods retain modality independence through dedicated sub-models but struggle to capture the nuanced, context-dependent semantic interplay between modalities. To overcome these dual limitations, we propose a novel MMKGC method M-Hyper, which achieves the coexistence and collaboration of fused and independent modality representations. Our method integrates the strengths of both paradigms, enabling effective cross-modal interactions while maintaining modality-specific information. Inspired by ``quaternion'' algebra, we utilize its four orthogonal bases to represent multiple independent modalities and employ the Hamilton product to efficiently model pair-wise interactions among them. Specifically, we introduce a Fine-grained Entity Representation Factorization (FERF) module and a Robust Relation-aware Modality Fusion (R2MF) module to obtain robust representations for three independent modalities and one fused modality. The resulting four modality representations are then mapped to the four orthogonal bases of a biquaternion (a hypercomplex extension of quaternion) for comprehensive modality interaction. Extensive experiments indicate its state-of-the-art performance, robustness, and computational efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes M-Hyper, a multi-modal knowledge graph completion (MMKGC) method that seeks to combine the strengths of fusion-based and ensemble-based paradigms. It introduces a Fine-grained Entity Representation Factorization (FERF) module and a Robust Relation-aware Modality Fusion (R2MF) module to produce representations for three independent modalities plus one fused modality; these are mapped onto the four orthogonal bases of a biquaternion, with interactions modeled via the (extended) Hamilton product. The central claim is that this algebraic construction simultaneously enables effective cross-modal collaboration and preserves modality-specific information, yielding state-of-the-art performance, robustness, and computational efficiency on MMKG completion tasks.

Significance. If the biquaternion construction demonstrably achieves the claimed coexistence of fused and independent representations without substantial information loss, the work would offer a principled algebraic bridge between the two dominant MMKGC paradigms and could inspire similar hypercomplex approaches in other multi-modal settings. The explicit use of four orthogonal bases and the Hamilton product is a concrete technical choice that distinguishes the method from generic fusion or ensemble baselines. However, the significance hinges on whether the empirical results include rigorous ablations, statistical significance tests, and controls that isolate the contribution of the hypercomplex interaction.

major comments (2)

[Abstract and §3] Abstract and §3 (Method): The load-bearing claim that mapping three independent modality representations plus one fused representation onto the four biquaternion bases, followed by Hamilton-product interactions, simultaneously enables 'comprehensive modality interaction' while 'maintaining modality-specific information' is not supported by any invariance argument, reconstruction loss, or post-interaction separability metric. The Hamilton product algebraically couples all eight real components through its non-commutative rules; without an explicit mechanism (e.g., an orthogonality regularizer or reconstruction term) that enforces effective isolation after the product, the independence guarantee reduces to an unverified assumption.
[§4] §4 (Experiments): The abstract asserts 'state-of-the-art performance, robustness, and computational efficiency,' yet the provided description contains no quantitative metrics, error bars, dataset statistics, ablation tables, or baseline comparisons. Without these results, it is impossible to assess whether the algebraic construction actually delivers the claimed gains or whether performance improvements are attributable to the biquaternion module versus the FERF/R2MF components alone.

minor comments (2)

[§3] Notation for the biquaternion bases and the precise definition of the extended Hamilton product should be stated explicitly (ideally with a small example) to allow readers to verify the claimed orthogonality preservation.
[§3] The manuscript should clarify the dimensionality of the final entity embeddings after the biquaternion interaction and how they are projected back for the link-prediction scoring function.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our algebraic approach to MMKGC. We respond point by point below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Method): The load-bearing claim that mapping three independent modality representations plus one fused representation onto the four biquaternion bases, followed by Hamilton-product interactions, simultaneously enables 'comprehensive modality interaction' while 'maintaining modality-specific information' is not supported by any invariance argument, reconstruction loss, or post-interaction separability metric. The Hamilton product algebraically couples all eight real components through its non-commutative rules; without an explicit mechanism (e.g., an orthogonality regularizer or reconstruction term) that enforces effective isolation after the product, the independence guarantee reduces to an unverified assumption.

Authors: We acknowledge the absence of an explicit invariance proof or reconstruction term in the current draft. The method assigns each of the three independent modality representations and the fused representation to one of the four orthogonal bases of the biquaternion; the subsequent Hamilton product then models pairwise interactions across these bases. Because the bases remain distinct linear dimensions, the structure is intended to permit collaboration without complete loss of modality identity. To make this claim verifiable, we will add a post-interaction separability metric (e.g., cosine similarity between modality-specific components before and after the product) and report an ablation that removes the hypercomplex interaction while retaining FERF and R2MF. revision: yes
Referee: [§4] §4 (Experiments): The abstract asserts 'state-of-the-art performance, robustness, and computational efficiency,' yet the provided description contains no quantitative metrics, error bars, dataset statistics, ablation tables, or baseline comparisons. Without these results, it is impossible to assess whether the algebraic construction actually delivers the claimed gains or whether performance improvements are attributable to the biquaternion module versus the FERF/R2MF components alone.

Authors: Section 4 of the full manuscript already contains the quantitative results, including performance tables on standard MMKG benchmarks, ablation studies that isolate the biquaternion module from FERF and R2MF, runtime comparisons, and statistical significance tests. We will expand the section with additional error-bar plots and a dedicated table that directly contrasts the full M-Hyper model against variants that replace the hypercomplex interaction with simple concatenation or averaging, thereby clarifying the incremental contribution of the algebraic construction. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes M-Hyper as a novel architecture that factorizes entity representations via FERF, fuses modalities via R2MF, and maps the resulting four representations onto biquaternion bases with Hamilton products. This construction is presented as an algebraic design choice inspired by quaternion properties rather than a quantity derived from or fitted to the target performance metric. No equation or module output is shown to be equivalent to its input by definition, and the central claims rest on the explicit module definitions plus external experimental validation rather than self-referential loops or load-bearing self-citations that presuppose the result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that biquaternion algebra provides an effective and efficient way to model modality interactions; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption The four orthogonal bases of a biquaternion can represent three independent modalities plus one fused modality and support pair-wise interactions via the Hamilton product.
Directly invoked in the method description to achieve coexistence of fusion and independence.

pith-pipeline@v0.9.0 · 5799 in / 1357 out tokens · 47425 ms · 2026-05-18T12:23:01.187863+00:00 · methodology

Collaboration of Fusion and Independence: Hypercomplex-driven Robust Multi-Modal Knowledge Graph Completion

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)