Counting Still Counts: Understanding Neural Complex Query Answering Through Query Relaxation

Daniel Daza; Michael Cochez; Yannick Brunink; Yunjie He

arxiv: 2511.22565 · v2 · submitted 2025-11-27 · 💻 cs.AI · cs.DB· cs.LG

Counting Still Counts: Understanding Neural Complex Query Answering Through Query Relaxation

Yannick Brunink , Daniel Daza , Yunjie He , Michael Cochez This is my paper

Pith reviewed 2026-05-17 04:22 UTC · model grok-4.3

classification 💻 cs.AI cs.DBcs.LG

keywords complex query answeringknowledge graphsneural networksquery relaxationpath countingbaselinesgeneralization

0 comments

The pith

Neural complex query answering models perform similarly to simple constraint relaxation and path counting, with no consistent outperformance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper questions the widespread view that neural models for complex query answering over knowledge graphs learn patterns that go beyond explicit graph structure. It tests this by pitting neural models against a training-free query relaxation method that relaxes constraints and counts paths to find answers. Across datasets and query types, the approaches achieve comparable results, and no neural model beats the relaxation strategy consistently. The two methods retrieve largely different answers, but merging them raises overall performance. This points to a need to reassess how much neural models actually advance reasoning on graphs.

Core claim

Neural CQA models are widely believed to infer answers unreachable by symbolic processing through learned generalization beyond explicit graph structure. Systematic comparison with a training-free query relaxation strategy that relaxes constraints and counts resulting paths shows similar performance across multiple datasets and query structures, with no neural model consistently outperforming the relaxation approach. The retrieved answers exhibit little overlap, yet combining outputs from both consistently improves results, indicating that current neural models do not subsume the reasoning patterns captured by query relaxation.

What carries the argument

A training-free query relaxation strategy that retrieves possible answers by relaxing query constraints and counting the resulting paths in the knowledge graph.

If this is right

Neural CQA models fail to subsume the reasoning patterns captured by query relaxation.
Combining outputs from neural models and relaxation methods consistently improves performance.
Future neural approaches could benefit from incorporating principles of query relaxation.
Stronger non-neural baselines are required to properly evaluate progress in neural query answering.
Answers from neural and relaxation approaches show little overlap despite similar overall performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Neural models may be capturing structural patterns similar to those used in path counting rather than abstract generalization.
Hybrid architectures that blend relaxation techniques with neural components could yield further gains.
Evaluation in CQA should routinely include sophisticated symbolic baselines to measure true advances.
Relaxation methods might serve as a way to augment training data or interpret what neural models have learned.

Load-bearing premise

The chosen query relaxation strategy and path-counting procedure fairly test whether neural models have learned generalization beyond explicit graph structure.

What would settle it

Finding a neural model that consistently and significantly outperforms the relaxation-based path-counting method across all tested datasets and query structures without relying on similar relaxation steps.

read the original abstract

Neural methods for Complex Query Answering (CQA) over knowledge graphs (KGs) are widely believed to learn patterns that generalize beyond explicit graph structure, allowing them to infer answers that are unreachable through symbolic query processing. In this work, we critically examine this assumption through a systematic analysis comparing neural CQA models with an alternative, training-free query relaxation strategy that retrieves possible answers by relaxing query constraints and counting resulting paths. Across multiple datasets and query structures, we find several cases where neural and relaxation-based approaches perform similarly, with no neural model consistently outperforming the latter. Moreover, a similarity analysis reveals that their retrieved answers exhibit little overlap, and that combining their outputs consistently improves performance. These results call for a re-evaluation of progress in neural query answering: despite their complexity, current models fail to subsume the reasoning patterns captured by query relaxation. Our findings highlight the importance of stronger non-neural baselines and suggest that future neural approaches could benefit from incorporating principles of query relaxation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that neural complex query answering (CQA) models over knowledge graphs do not consistently outperform a training-free query relaxation baseline based on constraint relaxation and path counting. Across multiple datasets and query structures, neural and relaxation approaches show similar performance with no neural model dominating, low overlap in retrieved answers, and consistent gains from ensembling, implying that current neural models have not subsumed the explicit reasoning captured by query relaxation.

Significance. If the empirical results hold, the work would be significant in calling for stronger non-neural baselines in CQA and highlighting potential for hybrid neural-relaxation methods. The direct comparisons, consistent patterns across query types, and low-overlap findings provide concrete evidence against the assumption of superior neural generalization beyond explicit graph structure.

major comments (1)

§4 (experimental results): performance similarities are reported without error bars, standard deviations, or statistical significance tests across runs, which weakens the load-bearing claim that neural models do not consistently outperform the relaxation baseline and that ensembling yields reliable gains.

minor comments (2)

The description of the path-counting procedure in the relaxation baseline would benefit from an explicit pseudocode or algorithmic outline to facilitate reproducibility.
Figure captions for the overlap and ensembling analyses could include the exact number of queries and datasets aggregated to improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation of minor revision. We address the concern regarding the statistical presentation of results in §4 below.

read point-by-point responses

Referee: [—] §4 (experimental results): performance similarities are reported without error bars, standard deviations, or statistical significance tests across runs, which weakens the load-bearing claim that neural models do not consistently outperform the relaxation baseline and that ensembling yields reliable gains.

Authors: We agree that the absence of error bars, standard deviations, and statistical significance tests limits the strength of our claims about performance similarities and ensembling gains. In the revised manuscript we will report results over multiple random seeds (where model training is involved), include standard deviations alongside mean performance metrics, add error bars to all relevant tables and figures, and conduct paired statistical tests (such as the Wilcoxon signed-rank test) to assess whether observed differences between neural models and the relaxation baseline, as well as the improvements from ensembling, are statistically significant. These additions will be incorporated into the updated §4 and associated figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper advances an empirical claim by directly measuring performance of neural CQA models against a training-free relaxation baseline (constraint relaxation plus path counting) on standard datasets and query structures. Reported similarities in accuracy, low answer overlap, and ensembling gains are obtained from explicit experimental runs rather than any derivation, fitted parameter renamed as prediction, or self-citation that reduces the central result to its own inputs. No equations, uniqueness theorems, or ansatzes are invoked whose validity depends on the present work; the evaluation protocol is externally falsifiable on public benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests entirely on experimental comparisons of existing neural models against a query-relaxation baseline; no new free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5477 in / 1039 out tokens · 29717 ms · 2026-05-17T04:22:40.341526+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

score(vt) ← number of matching paths

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.