pith. sign in

arxiv: 2605.04193 · v1 · submitted 2026-05-05 · 💻 cs.AI · cs.LG· cs.LO

ANDRE: An Attention-based Neuro-symbolic Differentiable Rule Extractor

Pith reviewed 2026-05-08 17:33 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.LO
keywords inductive logic programmingneuro-symbolic methodsdifferentiable rule learningattention mechanismsfirst-order logicprobabilistic predicatesrule extraction under noise
0
0 comments X

The pith

ANDRE learns first-order logic rules from noisy probabilistic data by optimizing a continuous space with attention-driven conjunction and disjunction operators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a neuro-symbolic framework that replaces fixed rule templates and fuzzy logical operators with fully differentiable attention mechanisms. These mechanisms softly select, negate, or combine predicates while approximating the min-max behavior of classical logic, allowing the system to induce interpretable programs even when predicate valuations are probabilistic and labels contain moderate noise. A sympathetic reader would care because the approach aims to combine the flexibility of gradient-based training with the ability to recover stable symbolic rules that prior differentiable ILP methods lose under uncertainty.

Core claim

ANDRE optimizes first-order logic programs directly over a continuous rule space. It replaces both rule templates and logical operators with attention-based conjunction and disjunction that approximate min-max semantics over probabilistic predicate valuations. The attention operators softly select, negate, or exclude predicates within each rule, preserving symbolic structure while remaining fully differentiable. Experiments on classical ILP benchmarks, large knowledge bases, and synthetic data with noise show competitive predictive accuracy and substantially more reliable recovery of the original rules compared with earlier differentiable ILP approaches.

What carries the argument

attention-based conjunction and disjunction operators that approximate min-max logical semantics while softly selecting or excluding predicates

If this is right

  • Rule induction no longer requires hand-crafted templates because attention performs soft selection and exclusion inside each clause.
  • The same continuous optimization produces both accurate predictions and stable symbolic rules even when input predicates carry probability values.
  • Moderate label noise does not destroy rule recovery, unlike earlier differentiable ILP methods that become unstable under the same conditions.
  • The framework supports negation and exclusion of predicates without breaking differentiability or interpretability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The attention mechanism could be reused as a drop-in replacement for fuzzy operators inside other neuro-symbolic pipelines that reason over uncertain facts.
  • Because the rule space is continuous, the method might be extended to jointly optimize rules and the underlying predicate embeddings in an end-to-end fashion.
  • If attention can stably approximate min-max logic, similar operators might be defined for other logical connectives or for higher-arity relations.

Load-bearing premise

Attention operators can approximate min-max logic over probabilistic predicates without vanishing gradients or collapsing the recovered symbolic structure during continuous optimization.

What would settle it

A controlled test in which ANDRE is trained on data with 20 percent random label flips and the extracted rules are then checked for exact match against the ground-truth rules; if the match rate falls below that of a template-based differentiable baseline, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2605.04193 by Iman Sharifi, Peng Wei, Saber Fallah.

Figure 1
Figure 1. Figure 1: Graphical representation of a logical subrule structure within the context of the rule space. Following this strategy, we reformu￾late the ILP problem by converting Eqs. 2 and 3 into a matrix representa￾tion. Let the matrix B n×m represents m body predicates across n subrules. As discussed, each array Bij is: Bij = {b | b ∈ {bj , ¬bj , 1}} . (4) Let Sj = {bj , ¬bj , 1}; then Bij ∈ B refers to the target sy… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of ANDRE’s architecture. ANDRE includes an innovative rule space with an Attention-based Conjunction-Disjunction Network. is to efficiently explore this parameterized space and discover a set of subrules that accurately explain the positive and negative examples provided in the training set. 3.2 LOGICAL NETWORK OF ANDRE Having constructed the continuous rule space with trainable probabilities, we … view at source ↗
Figure 3
Figure 3. Figure 3: Graphical representations of attention-based conjunction and disjunction operators. Soft view at source ↗
Figure 4
Figure 4. Figure 4: Final softmax-normalized subpredicate probabilities for the Grandparent task. Each sub view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of logical operators. The proposed attention-based operators closely approx view at source ↗
Figure 6
Figure 6. Figure 6: Optimization behavior of ANDRE on the Grandparent task. Left: evolution of training and view at source ↗
Figure 7
Figure 7. Figure 7: ANDRE performance metrics during training. view at source ↗
Figure 8
Figure 8. Figure 8: Final subpredicate probabilities for each view at source ↗
read the original abstract

Inductive Logic Programming (ILP) aims to learn interpretable first-order rules from data, but existing symbolic and neuro-symbolic approaches struggle to scale to noisy and probabilistic settings. Classical ILP relies on discrete combinatorial rule search and is brittle under uncertainty, while differentiable ILP methods typically depend on predefined rule templates or inaccurate fuzzy operators that suffer from vanishing gradients or poor approximation of logical structure when reasoning over probabilistic predicate valuations. This paper proposes an Attention-based Neuro-symbolic Differentiable Rule Extractor (ANDRE), a novel ILP framework that learns first-order logic programs by optimizing over a continuous rule space with attention-based logical operators. ANDRE replaces both rule templates and logical operators with fully differentiable, attention-driven conjunction and disjunction operators that approximate logical min-max semantics, enabling accurate, stable, and interpretable reasoning over probabilistic data. By softly selecting, negating, or excluding predicates within each rule, ANDRE supports flexible rule induction while preserving symbolic structure. Extensive experiments on classical ILP benchmarks, large-scale knowledge bases, and synthetic datasets with probabilistic predicates and noisy supervision demonstrate that ANDRE achieves competitive or superior predictive performance while reliably recovering correct symbolic rules under uncertainty. In particular, ANDRE remains robust to moderate label noise, substantially outperforming existing differentiable ILP methods in both rule extraction quality and stability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes ANDRE, a neuro-symbolic ILP framework that replaces rule templates and logical operators with fully differentiable attention-based conjunction and disjunction mechanisms to learn first-order rules over probabilistic predicate valuations and noisy supervision; it claims competitive or superior predictive performance on benchmarks and knowledge bases while reliably recovering correct symbolic rules, with particular robustness to moderate label noise over prior differentiable ILP methods.

Significance. If the core approximation and recovery claims hold, ANDRE would offer a template-free, scalable route to interpretable rule induction in uncertain domains, addressing key scalability and brittleness issues in classical and differentiable ILP.

major comments (3)
  1. [Methods section describing attention-based logical operators] The central technical claim (abstract and methods) is that attention-driven operators 'approximate logical min-max semantics' and enable 'accurate, stable' reasoning; however, standard query-key softmax attention computes weighted averages rather than min/max, and no error bounds, limit-case analysis, or gradient-flow proof is provided showing non-vanishing behavior as predicate valuations approach 0/1. This directly threatens the asserted robustness to label noise and superiority in rule-extraction quality.
  2. [Experiments and rule-extraction evaluation] Performance and rule-recovery claims rest on optimizing attention weights over a continuous space whose logical interpretation is itself defined by those same weights (abstract and experiments); without an explicit separation (e.g., post-hoc discretization analysis or independent symbolic verification step), reported rule quality may largely reflect neural fitting rather than genuine logical structure.
  3. [Experiments section] The abstract asserts 'extensive experiments... with superior performance' and 'substantially outperforming existing differentiable ILP methods,' yet the provided description supplies no concrete metrics, baseline tables, ablation results on noise levels, or error analysis; these details are load-bearing for the central empirical claim.
minor comments (2)
  1. [Methods] Clarify the exact parameterization of the attention operators (query/key dimensions, scaling factors) and how negation/exclusion is realized differentiably.
  2. [Experiments] Add explicit comparison tables with numerical results against the cited differentiable ILP baselines.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our paper. We address each of the major comments below and indicate the revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Methods section describing attention-based logical operators] The central technical claim (abstract and methods) is that attention-driven operators 'approximate logical min-max semantics' and enable 'accurate, stable' reasoning; however, standard query-key softmax attention computes weighted averages rather than min/max, and no error bounds, limit-case analysis, or gradient-flow proof is provided showing non-vanishing behavior as predicate valuations approach 0/1. This directly threatens the asserted robustness to label noise and superiority in rule-extraction quality.

    Authors: We appreciate the referee pointing out the need for stronger theoretical support for our approximation claim. Although our attention-based operators are designed to approximate min-max semantics by using attention scores to select and combine predicates in a soft manner that mimics logical conjunction and disjunction (with negation handled separately), we acknowledge that explicit error bounds and a full gradient-flow analysis were missing. In the revised manuscript, we have added a detailed analysis in the Methods section, including limit-case behavior as the temperature parameter approaches zero (approaching hard min/max) and empirical verification of gradient magnitudes to demonstrate stability. This addition directly addresses the concern regarding robustness to noise. revision: yes

  2. Referee: [Experiments and rule-extraction evaluation] Performance and rule-recovery claims rest on optimizing attention weights over a continuous space whose logical interpretation is itself defined by those same weights (abstract and experiments); without an explicit separation (e.g., post-hoc discretization analysis or independent symbolic verification step), reported rule quality may largely reflect neural fitting rather than genuine logical structure.

    Authors: This comment raises a crucial point about ensuring that the reported rule quality reflects true logical structure rather than just the neural component. In our approach, the attention weights provide a direct mapping to rule predicates, allowing for rule extraction by selecting the highest-weighted predicates. To provide the requested separation, we have incorporated a post-hoc discretization step in the revised Experiments section, where we threshold the attention weights to obtain discrete rules and then evaluate these symbolic rules using a standard logic engine on test data. We also include comparisons showing that the extracted rules perform comparably to the continuous model, supporting that the logical structure is indeed captured. revision: yes

  3. Referee: [Experiments section] The abstract asserts 'extensive experiments... with superior performance' and 'substantially outperforming existing differentiable ILP methods,' yet the provided description supplies no concrete metrics, baseline tables, ablation results on noise levels, or error analysis; these details are load-bearing for the central empirical claim.

    Authors: We regret that the experimental details may not have been sufficiently prominent or detailed in the initial version. The manuscript does contain results on multiple benchmarks, but to strengthen the presentation, we have expanded the Experiments section with additional tables showing quantitative metrics (e.g., accuracy, rule recovery precision/recall), direct comparisons to baselines such as dILP and others, ablations across different noise levels (0%, 10%, 20%, 30%), and error bars from multiple runs. These revisions ensure the empirical claims are fully supported with concrete evidence. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in ANDRE derivation

full rationale

The paper proposes a new neuro-symbolic ILP method by defining attention-based operators to approximate min-max logic, then validates the approach empirically on benchmarks, knowledge bases, and noisy synthetic data. No load-bearing step reduces by construction to its own inputs: the continuous rule space and attention mechanisms are explicitly introduced as a modeling choice, not derived from the target performance or rule-recovery metrics. Claims of robustness and superiority are presented as experimental outcomes rather than tautological consequences of the operator definitions. No self-citation chains, fitted parameters renamed as predictions, or uniqueness theorems imported from prior author work appear in the abstract or described structure. The derivation remains self-contained as a constructive proposal with independent empirical testing.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on the unproven assumption that attention mechanisms can faithfully approximate logical min-max operations in a differentiable manner while preserving rule interpretability; no free parameters are explicitly listed but attention weights are implicitly fitted; the attention operators themselves are the primary invented entity.

free parameters (1)
  • attention weights and scaling factors
    Learnable parameters in the attention-based conjunction and disjunction operators that are optimized during training.
axioms (1)
  • domain assumption Attention-driven operators can approximate logical min-max semantics over probabilistic valuations without vanishing gradients
    Invoked when claiming stable and accurate reasoning over probabilistic predicate valuations.
invented entities (1)
  • attention-based logical operators no independent evidence
    purpose: Replace fixed rule templates and inaccurate fuzzy operators with fully differentiable conjunction and disjunction
    Newly introduced mechanism that softly selects, negates, or excludes predicates while preserving symbolic structure.

pith-pipeline@v0.9.0 · 5538 in / 1402 out tokens · 51384 ms · 2026-05-08T17:33:11.416702+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

80 extracted references · 1 canonical work pages

  1. [1]

    Appear in at least two distinct body predicates, forming a connection between them, or

  2. [2]

    This condition ensures that auxiliary variables are semantically meaningful and contribute to the logical relationship expressed in the rule

    Be completely absent from the rule (i.e., not appear in any body predicates). This condition ensures that auxiliary variables are semantically meaningful and contribute to the logical relationship expressed in the rule. Valid example (connected auxiliary): grandparent(X1, X3):- parent(X 1, X2),parent(X 2, X3). VariableX 2 does not appear in the head and t...

  3. [3]

    Training Acc: 0.6825 | Eval Acc: 0.6900 Val Coverage: (N_b=53, N_r=53, N_r/N_b=1.00) Train Coverage: (N_b=219, N_r=219, N_r/N_b=1.00)

    grandparent(X1, X3) :- father(X1, X2) and mother(X2, X3). Training Acc: 0.6825 | Eval Acc: 0.6900 Val Coverage: (N_b=53, N_r=53, N_r/N_b=1.00) Train Coverage: (N_b=219, N_r=219, N_r/N_b=1.00)

  4. [4]

    Training Acc: 0.6725 | Eval Acc: 0.6850 Val Coverage: (N_b=52, N_r=52, N_r/N_b=1.00) Train Coverage: (N_b=211, N_r=211, N_r/N_b=1.00)

    grandparent(X1, X3) :- father(X1, X2) and father(X2, X3). Training Acc: 0.6725 | Eval Acc: 0.6850 Val Coverage: (N_b=52, N_r=52, N_r/N_b=1.00) Train Coverage: (N_b=211, N_r=211, N_r/N_b=1.00)

  5. [5]

    Training Acc: 0.6850 | Eval Acc: 0.6500 Val Coverage: (N_b=49, N_r=47, N_r/N_b=0.9592) Train Coverage: (N_b=225, N_r=223, N_r/N_b=0.9911)

    grandparent(X1, X3) :- mother(X1, X2) and mother(X2, X3). Training Acc: 0.6850 | Eval Acc: 0.6500 Val Coverage: (N_b=49, N_r=47, N_r/N_b=0.9592) Train Coverage: (N_b=225, N_r=223, N_r/N_b=0.9911)

  6. [6]

    grandparent(X1, X3) :- father(X2, X3) and mother(X1, X2). Training Acc: 0.6538 | Eval Acc: 0.65 Val Coverage: (N_b=45, N_r=45, N_r/N_b=1.00) Train Coverage: (N_b=196, N_r=196, N_r/N_b=1.00) These subrules collectively form a complete and interpretable definition of the grandparent relation, fully consistent with first-order logic and human intuition. Disc...

  7. [7]

    • Variable set:X={X 1, X2} • Head variable set:X h =X • Domain of constants:E={0,1,

    ThePredecessorTask: Objective:Learn a rule that identifies when one number is the predecessor of another, using back- ground knowledge about numeric succession. • Variable set:X={X 1, X2} • Head variable set:X h =X • Domain of constants:E={0,1, . . . ,8} • Body predicates:b={successor(X 2, X1)} • Head predicate:h=predecessor(X 1, X2) • Background knowledg...

  8. [8]

    • Variable set:X={X 1, X2, X3} 23 • Head variable set:X h ={X 3} • Auxiliary variable set:X a =X\X h • Domain of constants:E={0,1,

    TheOddTask: Objective:Learn the logical patterns that define odd numbers using predecessor and parity relation- ships, and generalize them to unseen numerical values. • Variable set:X={X 1, X2, X3} 23 • Head variable set:X h ={X 3} • Auxiliary variable set:X a =X\X h • Domain of constants:E={0,1, . . . ,30} • Body predicates: b={zero(X 1),zero(X 2),zero(X...

  9. [9]

    • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 3} • Auxiliary variable set:X a =X\X h • Domain of constants:E={0,1,

    TheEvenTask: Objective:Discover the rule structure that governs even numbers, exploiting arithmetic successor relationships and parity predicates. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 3} • Auxiliary variable set:X a =X\X h • Domain of constants:E={0,1, . . . ,30} • Body predicates: b={zero(X 1),zero(X 2),zero(X 3),successor(X 1, X2),...

  10. [10]

    • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Domain of constants:E={0,1,

    TheLessThanTask: Objective:Learn transitive and arithmetic rules that define the less-than relation between two inte- gers using a successor-based formulation. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Domain of constants:E={0,1, . . . ,9} • Body predicates: b={successor(X 1, X2),successor(X 2,...

  11. [11]

    TheGrandparentTask: Objective:Learn the definition of a grandparent based on transitive parent relationships using both motherandfatherfacts provided in the background knowledge. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Body predicates: b={father(X 1, X2),father(X 2, X3),father(X 1, X3), mothe...

  12. [12]

    TheSonTask: Objective:Learn how thesonrelationship can be derived fromfather,brother, andsister facts using transitivity and kinship inference. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Body predicates: b={father(X 1, X2),father(X 2, X3),father(X 1, X3), brother(X1, X2),sister(X 1, X2),son(X 2,...

  13. [13]

    TheRelatednessTask: Objective:Determine whether two individuals are related, based on transitive closure overparent relationships and recursive definitions ofrelated. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Body predicates: b={parent(X 1, X2),parent(X 2, X3),parent(X 1, X3), related(X1, X2),r...

  14. [14]

    TheFatherTask: Objective:Infer thefatherrelationship using background assumptions involving marriage and motherhood. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Body predicates: b={mother(X 1, X2),mother(X 2, X3),mother(X 1, X3), husband(X1, X2),husband(X 2, X3),husband(X 1, X3)} • Head predicate...

  15. [15]

    TheDirected EdgeTask: Objective:Determine whether two nodes are connected by a directed edge in either direction, using basic edge facts. • Variable set:X={X 1, X2} • Head variable set:X h ={X 1, X2} • Body predicates: b={edge(X 1, X2),d-edge(X 2, X1)} • Head predicate:h=d-edge(X 1, X2) • Background knowledge: B={edge(a, b),edge(b, d),edge(c, c), . . .} 2...

  16. [16]

    TheConnectednessTask: Objective:Learn theconnectednessrelation, which holds true if there exists a direct or transi- tive path (via one or moreedgerelations) between two nodes. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Body predicates: b={edge(X 1, X3),edge(X 3, X1),edge(X 2, X3), connectedness...

  17. [17]

    Training Acc: 0.9927 | Eval Acc: 0.9930

    locatedIn(X1, X2) :- locatedIn(X3, X2) and neighborOf(X3, X1). Training Acc: 0.9927 | Eval Acc: 0.9930

  18. [18]

    Training Acc: 0.9926 | Eval Acc: 0.9930

    locatedIn(X1, X2) :- locatedIn(X1, X3) and locatedIn(X3, X2). Training Acc: 0.9926 | Eval Acc: 0.9930

  19. [19]

    Training Acc: 0.9925 | Eval Acc: 0.9928

    locatedIn(X1, X2) :- locatedIn(X3, X1) and neighborOf(X2, X3). Training Acc: 0.9925 | Eval Acc: 0.9928

  20. [20]

    Training Acc: 0.9925 | Eval Acc: 0.9928

    locatedIn(X1, X2) :- locatedIn(X3, X1) and neighborOf(X3, X2). Training Acc: 0.9925 | Eval Acc: 0.9928

  21. [21]

    Training Acc: 0.9925 | Eval Acc: 0.9928

    locatedIn(X1, X2) :- locatedIn(X1, X3) and neighborOf(X3, X2). Training Acc: 0.9925 | Eval Acc: 0.9928

  22. [22]

    Training Acc: 0.9925 | Eval Acc: 0.9928

    locatedIn(X1, X2) :- neighborOf(X1, X3) and neighborOf(X2, X3). Training Acc: 0.9925 | Eval Acc: 0.9928

  23. [23]

    locatedIn(X1, X2) :- locatedIn(X2, X3) and neighborOf(X3, X1). Training Acc: 0.9925 | Eval Acc: 0.9928 I.2 NATIONSDATASET TheNationstask aims to learn semantic relations between geopolitical entities using background predicates describing economic, political, and geographic interactions. Below, we report represen- tative rules extracted by ANDRE for diffe...

  24. [24]

    Training Acc: 0.9590 | Eval Acc: 0.9620 Val Coverage: (N_b=524, N_r=488, N_r/N_b=0.9313) Train Coverage: (N_b=9668, N_r=8920, N_r/N_b=0.9226)

    blockpositionindex(X1, X2) :- blockpositionindex(X2, X1). Training Acc: 0.9590 | Eval Acc: 0.9620 Val Coverage: (N_b=524, N_r=488, N_r/N_b=0.9313) Train Coverage: (N_b=9668, N_r=8920, N_r/N_b=0.9226)

  25. [25]

    Training Acc: 0.9474 | Eval Acc: 0.9474 Val Coverage: (N_b=486, N_r=455, N_r/N_b=0.9362) Train Coverage: (N_b=9019, N_r=8383, N_r/N_b=0.9295)

    blockpositionindex(X1, X2) :-not(timesincewar(X1, X2)) and blockpositionindex(X2, X1). Training Acc: 0.9474 | Eval Acc: 0.9474 Val Coverage: (N_b=486, N_r=455, N_r/N_b=0.9362) Train Coverage: (N_b=9019, N_r=8383, N_r/N_b=0.9295)

  26. [26]

    Training Acc: 0.8625 | Eval Acc: 0.8568 Val Coverage: (N_b=290, N_r=270, N_r/N_b=0.9310) Train Coverage: (N_b=5394, N_r=5022, N_r/N_b=0.9310)

    blockpositionindex(X1, X2) :- commonbloc0(X1, X2). Training Acc: 0.8625 | Eval Acc: 0.8568 Val Coverage: (N_b=290, N_r=270, N_r/N_b=0.9310) Train Coverage: (N_b=5394, N_r=5022, N_r/N_b=0.9310)

  27. [27]

    Training Acc: 0.8627 | Eval Acc: 0.8536 Val Coverage: (N_b=284, N_r=264, N_r/N_b=0.9296) Train Coverage: (N_b=5400, N_r=5028, N_r/N_b=0.9311)

    blockpositionindex(X1, X2) :- commonbloc0(X2, X1). Training Acc: 0.8627 | Eval Acc: 0.8536 Val Coverage: (N_b=284, N_r=264, N_r/N_b=0.9296) Train Coverage: (N_b=5400, N_r=5028, N_r/N_b=0.9311)

  28. [28]

    blockpositionindex(X1, X2) :- not(relintergovorgs(X2, X1)) and embassy(X2, X1) and not( commonbloc2(X1, X2)) and not(reltreaties(X1, X2)) and not(reldiplomacy(X1, X2)) and conferences(X2, X1). Training Acc: 0.7538 | Eval Acc: 0.7474 Val Coverage: (N_b=44, N_r=42, N_r/N_b=0.9545) Train Coverage: (N_b=803, N_r=742, N_r/N_b=0.9240) Note.We refer to Appendix ...

  29. [29]

    Training Acc: 0.7466 | Eval Acc: 0.7482 Val Coverage: (N_b=648, N_r=529, N_r/N_b=0.8164) Train Coverage: (N_b=12647, N_r=10135, N_r/N_b=0.8014)

    intergovorgs3(X1, X2) :- embassy(X1, X2) and ngoorgs3(X1, X2) and not(ngoorgs3(X2, X1)) and not(exports3(X1, X2)) and not(releconomicaid(X2, X1)) and not(releconomicaid(X3, X1)) and not(duration(X2, X1)) and not(lostterritory(X1, X3)). Training Acc: 0.7466 | Eval Acc: 0.7482 Val Coverage: (N_b=648, N_r=529, N_r/N_b=0.8164) Train Coverage: (N_b=12647, N_r=...

  30. [30]

    Training Acc: 0.6836 | Eval Acc: 0.6819 Val Coverage: (N_b=430, N_r=370, N_r/N_b=0.8605) Train Coverage: (N_b=8473, N_r=7145, N_r/N_b=0.8433)

    intergovorgs3(X1, X2) :- ngoorgs3(X1, X2) and not(economicaid(X2, X1)) and ngo(X2, X1) and not(relngo(X2, X1)) and not(relexportbooks(X1, X3)) and not(violentactions(X2, X1)) and not(warning(X1, X3)). Training Acc: 0.6836 | Eval Acc: 0.6819 Val Coverage: (N_b=430, N_r=370, N_r/N_b=0.8605) Train Coverage: (N_b=8473, N_r=7145, N_r/N_b=0.8433)

  31. [31]

    Training Acc: 0.6867 | Eval Acc: 0.6806 Val Coverage: (N_b=466, N_r=387, N_r/N_b=0.8305) Train Coverage: (N_b=8934, N_r=7420, N_r/N_b=0.8305)

    intergovorgs3(X1, X2) :- not(accusation(X1, X2)) and intergovorgs(X1, X2) and ngo(X2, X1) and not(aidenemy(X2, X3)) and not(releconomicaid(X2, X1)) and not(expeldiplomats(X2, X1)) and treaties(X2, X1) and not(lostterritory(X1, X3)). Training Acc: 0.6867 | Eval Acc: 0.6806 Val Coverage: (N_b=466, N_r=387, N_r/N_b=0.8305) Train Coverage: (N_b=8934, N_r=7420...

  32. [32]

    Training Acc: 0.6551 | Eval Acc: 0.6733 Val Coverage: (N_b=347, N_r=322, N_r/N_b=0.9280) Train Coverage: (N_b=6473, N_r=5737, N_r/N_b=0.8863)

    intergovorgs3(X1, X2) :- not(accusation(X2, X1)) and not(commonbloc2(X1, X2)) and relngo( X1, X2) and timesinceally(X1, X2) and not(relemigrants(X2, X3)) and not(lostterritory(X2, X3)). Training Acc: 0.6551 | Eval Acc: 0.6733 Val Coverage: (N_b=347, N_r=322, N_r/N_b=0.9280) Train Coverage: (N_b=6473, N_r=5737, N_r/N_b=0.8863)

  33. [33]

    Training Acc: 0.6724 | Eval Acc: 0.6713 Val Coverage: (N_b=312, N_r=303, N_r/N_b=0.9712) Train Coverage: (N_b=5949, N_r=5723, N_r/N_b=0.9620)

    intergovorgs3(X1, X2) :- not(militaryalliance(X1, X2)) and intergovorgs(X1, X2) and not( expeldiplomats(X2, X3)) and not(relngo(X2, X1)) and not(relexportbooks(X1, X3)). Training Acc: 0.6724 | Eval Acc: 0.6713 Val Coverage: (N_b=312, N_r=303, N_r/N_b=0.9712) Train Coverage: (N_b=5949, N_r=5723, N_r/N_b=0.9620)

  34. [34]

    Training Acc: 0.6517 | Eval Acc: 0.6468 Val Coverage: (N_b=349, N_r=303, N_r/N_b=0.8682) Train Coverage: (N_b=6789, N_r=5846, N_r/N_b=0.8611)

    intergovorgs3(X1, X2) :- not(commonbloc2(X1, X2)) and ngoorgs3(X1, X2) and relngo(X1, X2) and not(relngo(X2, X1)) and not(students(X1, X3)) and not(tourism(X2, X1)) and not( dependent(X1, X3)) and not(violentactions(X1, X2)) and not(severdiplomatic(X1, X3)). Training Acc: 0.6517 | Eval Acc: 0.6468 Val Coverage: (N_b=349, N_r=303, N_r/N_b=0.8682) Train Cov...

  35. [35]

    Training Acc: 0.6106 | Eval Acc: 0.6163 Val Coverage: (N_b=261, N_r=236, N_r/N_b=0.9042) Train Coverage: (N_b=4515, N_r=4119, N_r/N_b=0.9123)

    intergovorgs3(X1, X2) :- relintergovorgs(X1, X2) and not(economicaid(X2, X3)) and intergovorgs(X2, X1) and not(eemigrants(X3, X2)) and timesinceally(X2, X1). Training Acc: 0.6106 | Eval Acc: 0.6163 Val Coverage: (N_b=261, N_r=236, N_r/N_b=0.9042) Train Coverage: (N_b=4515, N_r=4119, N_r/N_b=0.9123)

  36. [36]

    intergovorgs3(X1, X2) :- relintergovorgs(X1, X2) and intergovorgs(X2, X1) and timesinceally(X2, X1) and not(exportbooks(X1, X2)) and not(dependent(X3, X1)) and not( warning(X1, X3)). Training Acc: 0.6166 | Eval Acc: 0.6130 Val Coverage: (N_b=256, N_r=231, N_r/N_b=0.9023) Train Coverage: (N_b=4676, N_r=4286, N_r/N_b=0.9166) N E G A T I V E C O M M(X1, X2) ...

  37. [37]

    Training Acc: 0.9084 | Eval Acc: 0.9223 30 Val Coverage: (N_b=124, N_r=124, N_r/N_b=1.0000) Train Coverage: (N_b=2228, N_r=2228, N_r/N_b=1.0000)

    negativecomm(X1, X2) :- negativebehavior(X1, X2) and timesinceally(X2, X1). Training Acc: 0.9084 | Eval Acc: 0.9223 30 Val Coverage: (N_b=124, N_r=124, N_r/N_b=1.0000) Train Coverage: (N_b=2228, N_r=2228, N_r/N_b=1.0000)

  38. [38]

    Training Acc: 0.9208 | Eval Acc: 0.9142 Val Coverage: (N_b=141, N_r=129, N_r/N_b=0.9149) Train Coverage: (N_b=2799, N_r=2615, N_r/N_b=0.9343)

    negativecomm(X1, X2) :- negativebehavior(X1, X2) and accusation(X1, X2). Training Acc: 0.9208 | Eval Acc: 0.9142 Val Coverage: (N_b=141, N_r=129, N_r/N_b=0.9149) Train Coverage: (N_b=2799, N_r=2615, N_r/N_b=0.9343)

  39. [39]

    Training Acc: 0.8974 | Eval Acc: 0.9049 Val Coverage: (N_b=137, N_r=123, N_r/N_b=0.8978) Train Coverage: (N_b=2411, N_r=2229, N_r/N_b=0.9245)

    negativecomm(X1, X2) :- negativebehavior(X1, X2) and blockpositionindex(X1, X2). Training Acc: 0.8974 | Eval Acc: 0.9049 Val Coverage: (N_b=137, N_r=123, N_r/N_b=0.8978) Train Coverage: (N_b=2411, N_r=2229, N_r/N_b=0.9245)

  40. [40]

    Training Acc: 0.8855 | Eval Acc: 0.9026 Val Coverage: (N_b=107, N_r=107, N_r/N_b=1.0000) Train Coverage: (N_b=1853, N_r=1853, N_r/N_b=1.0000)

    negativecomm(X1, X2) :- negativebehavior(X1, X2) and pprotests(X1, X2). Training Acc: 0.8855 | Eval Acc: 0.9026 Val Coverage: (N_b=107, N_r=107, N_r/N_b=1.0000) Train Coverage: (N_b=1853, N_r=1853, N_r/N_b=1.0000)

  41. [41]

    Training Acc: 0.8695 | Eval Acc: 0.8817 Val Coverage: (N_b=111, N_r=100, N_r/N_b=0.9009) Train Coverage: (N_b=2130, N_r=1860, N_r/N_b=0.8732)

    negativecomm(X1, X2) :- negativebehavior(X1, X2) and negativebehavior(X2, X1) and blockpositionindex(X1, X2). Training Acc: 0.8695 | Eval Acc: 0.8817 Val Coverage: (N_b=111, N_r=100, N_r/N_b=0.9009) Train Coverage: (N_b=2130, N_r=1860, N_r/N_b=0.8732)

  42. [42]

    Training Acc: 0.8627 | Eval Acc: 0.8805 Val Coverage: (N_b=98, N_r=93, N_r/N_b=0.9490) Train Coverage: (N_b=1862, N_r=1671, N_r/N_b=0.8974)

    negativecomm(X1, X2) :- negativebehavior(X1, X2) and commonbloc0(X1, X2). Training Acc: 0.8627 | Eval Acc: 0.8805 Val Coverage: (N_b=98, N_r=93, N_r/N_b=0.9490) Train Coverage: (N_b=1862, N_r=1671, N_r/N_b=0.8974)

  43. [43]

    Training Acc: 0.8754 | Eval Acc: 0.8677 Val Coverage: (N_b=127, N_r=102, N_r/N_b=0.8031) Train Coverage: (N_b=2813, N_r=2250, N_r/N_b=0.7999)

    negativecomm(X1, X2) :- negativecomm(X2, X1) and violentactions(X2, X1). Training Acc: 0.8754 | Eval Acc: 0.8677 Val Coverage: (N_b=127, N_r=102, N_r/N_b=0.8031) Train Coverage: (N_b=2813, N_r=2250, N_r/N_b=0.7999)

  44. [44]

    Training Acc: 0.8754 | Eval Acc: 0.8677 Val Coverage: (N_b=127, N_r=102, N_r/N_b=0.8031) Train Coverage: (N_b=2813, N_r=2250, N_r/N_b=0.7999)

    negativecomm(X1, X2) :- negativecomm(X2, X1). Training Acc: 0.8754 | Eval Acc: 0.8677 Val Coverage: (N_b=127, N_r=102, N_r/N_b=0.8031) Train Coverage: (N_b=2813, N_r=2250, N_r/N_b=0.7999)

  45. [45]

    Training Acc: 0.8595 | Eval Acc: 0.8631 Val Coverage: (N_b=113, N_r=93, N_r/N_b=0.8230) Train Coverage: (N_b=2111, N_r=1769, N_r/N_b=0.8380)

    negativecomm(X1, X2) :- negativebehavior(X2, X1) and blockpositionindex(X1, X2) and negativecomm(X2, X1). Training Acc: 0.8595 | Eval Acc: 0.8631 Val Coverage: (N_b=113, N_r=93, N_r/N_b=0.8230) Train Coverage: (N_b=2111, N_r=1769, N_r/N_b=0.8380)

  46. [46]

    negativecomm(X1, X2) :- accusation(X1, X2) and accusation(X2, X1). Training Acc: 0.8757 | Eval Acc: 0.8619 Val Coverage: (N_b=96, N_r=84, N_r/N_b=0.8750) Train Coverage: (N_b=2060, N_r=1876, N_r/N_b=0.9107) I.3 UMLS DATASET TheUMLSdataset consists of biomedical entities connected through a large set of heterogeneous semantic relations. The task requires l...

  47. [47]

    Training Acc: 0.9449 | Eval Acc: 0.9466

    isa(X1, X2) :- isa(X3, X2) and interacts_with(X1, X3). Training Acc: 0.9449 | Eval Acc: 0.9466

  48. [48]

    Training Acc: 0.9437 | Eval Acc: 0.9453

    isa(X1, X2) :- isa(X3, X2) and conceptually_related_to(X3, X1). Training Acc: 0.9437 | Eval Acc: 0.9453

  49. [49]

    Training Acc: 0.9437 | Eval Acc: 0.9453 31

    isa(X1, X2) :- connected_to(X3, X1) and practices(X3, X2). Training Acc: 0.9437 | Eval Acc: 0.9453 31

  50. [50]

    Training Acc: 0.9430 | Eval Acc: 0.9449

    isa(X1, X2) :- not(affects(X3, X1)) and conceptual_part_of(X3, X2). Training Acc: 0.9430 | Eval Acc: 0.9449

  51. [51]

    isa(X1, X2) :- not(isa(X3, X2)) and conceptual_part_of(X1, X3). Training Acc: 0.9429 | Eval Acc: 0.9445 I N T E R A C T S W I T H(X1, X2) The target predicateinteracts with(X1, X2)captures functional, biochemical, or causal in- teractions between biomedical entities in the UMLS knowledge base. The rules below illustrate how ANDRE infers interaction patter...

  52. [52]

    Training Acc: 0.8920 | Eval Acc: 0.8946

    interacts_with(X1, X2) :- isa(X2, X1) and not(associated_with(X2, X3)) and not( interacts_with(X2, X1)) and not(ingredient_of(X2, X3)). Training Acc: 0.8920 | Eval Acc: 0.8946

  53. [53]

    Training Acc: 0.8912 | Eval Acc: 0.8938

    interacts_with(X1, X2) :- isa(X2, X1) and not(interacts_with(X2, X1)) and not(part_of(X3, X1)) and not(measures(X3, X2)). Training Acc: 0.8912 | Eval Acc: 0.8938

  54. [54]

    Training Acc: 0.8865 | Eval Acc: 0.8879

    interacts_with(X1, X2) :- not(location_of(X3, X2)) and isa(X1, X2) and not(interacts_with( X2, X1)) and not(complicates(X2, X3)). Training Acc: 0.8865 | Eval Acc: 0.8879

  55. [55]

    Training Acc: 0.8801 | Eval Acc: 0.8823

    interacts_with(X1, X2) :- interacts_with(X1, X3) and interacts_with(X3, X2). Training Acc: 0.8801 | Eval Acc: 0.8823

  56. [56]

    Training Acc: 0.8722 | Eval Acc: 0.8744

    interacts_with(X1, X2) :- associated_with(X1, X3) and performs(X2, X3). Training Acc: 0.8722 | Eval Acc: 0.8744

  57. [57]

    Training Acc: 0.8722 | Eval Acc: 0.8744

    interacts_with(X1, X2) :- co_occurs_with(X1, X3) and indicates(X2, X3). Training Acc: 0.8722 | Eval Acc: 0.8744

  58. [58]

    Training Acc: 0.8722 | Eval Acc: 0.8744

    interacts_with(X1, X2) :- treats(X3, X2) and developmental_form_of(X1, X3). Training Acc: 0.8722 | Eval Acc: 0.8744

  59. [59]

    Training Acc: 0.8722 | Eval Acc: 0.8744

    interacts_with(X1, X2) :- ingredient_of(X1, X3) and interconnects(X3, X2). Training Acc: 0.8722 | Eval Acc: 0.8744

  60. [60]

    Training Acc: 0.8722 | Eval Acc: 0.8744 J RUNTIMECOMPARISON

    interacts_with(X1, X2) :- result_of(X2, X3) and adjacent_to(X1, X3). Training Acc: 0.8722 | Eval Acc: 0.8744 J RUNTIMECOMPARISON. Table 7 reports the total running time required by NTPλ, NeuralLP, DFORL, and ANDRE to gen- erate complete sets of logic programs on the Countries, Nations, and UMLS datasets. The results highlight substantial differences in co...

  61. [61]

    Training Acc: 0.9216 | Eval Acc: 0.9217

    great_ne(X1, X2) :- not(great_ne(X2, X1)). Training Acc: 0.9216 | Eval Acc: 0.9217

  62. [62]

    Training Acc: 0.8896 | Eval Acc: 0.8786

    great_ne(X1, X2) :- not(great_ne(X2, X1)) and great_ne(X3, X2) and not(r_subst_1(X1, X3)). Training Acc: 0.8896 | Eval Acc: 0.8786

  63. [63]

    Training Acc: 0.7689 | Eval Acc: 0.7609

    great_ne(X1, X2) :- x_subst(X2, X3) and r_subst_1(X1, X3). Training Acc: 0.7689 | Eval Acc: 0.7609

  64. [64]

    Training Acc: 0.7689 | Eval Acc: 0.7609

    great_ne(X1, X2) :- gt(X1, X3) and great_pi_acc(X2, X3). Training Acc: 0.7689 | Eval Acc: 0.7609

  65. [65]

    Training Acc: 0.7689 | Eval Acc: 0.7609

    great_ne(X1, X2) :- not(ring_subst_4(X2, X3)) and ring_subst_4(X3, X1). Training Acc: 0.7689 | Eval Acc: 0.7609

  66. [66]

    Training Acc: 0.7689 | Eval Acc: 0.7609

    great_ne(X1, X2) :- great_ne(X3, X2) and flex(X1, X3). Training Acc: 0.7689 | Eval Acc: 0.7609

  67. [67]

    Training Acc: 0.7689 | Eval Acc: 0.7609

    great_ne(X1, X2) :- r_subst_2(X3, X2) and ring_subst_2(X3, X1). Training Acc: 0.7689 | Eval Acc: 0.7609

  68. [68]

    Training Acc: 0.7689 | Eval Acc: 0.7609

    great_ne(X1, X2) :- great_ne(X3, X2) and ring_subst_2(X3, X1). Training Acc: 0.7689 | Eval Acc: 0.7609

  69. [69]

    Training Acc: 0.7689 | Eval Acc: 0.7609

    great_ne(X1, X2) :- pi_doner(X3, X1) and not(ring_substitutions(X2, X3)). Training Acc: 0.7689 | Eval Acc: 0.7609

  70. [70]

    great_ne(X1, X2) :- great_ne(X2, X3) and pi_doner(X3, X1). Training Acc: 0.7689 | Eval Acc: 0.7609 33 K.2 UW-CSE DATASET TheUW-CSEdataset models academic relationships within a university domain, including roles, courses, projects, and advising relationships. The task is to infer latent advisory relations from heterogeneous academic facts. A D V I S E D B...

  71. [71]

    advisedby(X1, X2) :- advisedby(X3, X1) and yearsinprogram(X3, X2)

  72. [72]

    advisedby(X1, X2) :- courselevel(X1, X3) and hasposition(X2, X3)

  73. [73]

    advisedby(X1, X2) :- professor(X3, X2) and taughtby(X3, X1)

  74. [74]

    advisedby(X1, X2) :- courselevel(X2, X3) and not(professor(X3, X1))

  75. [75]

    advisedby(X1, X2) :- advisedby(X3, X1) and not(inphase(X3, X2))

  76. [76]

    advisedby(X1, X2) :- hasposition(X2, X3) and ta(X3, X1)

  77. [77]

    advisedby(X1, X2) :- not(hasposition(X3, X1)) and projectmember(X2, X3)

  78. [78]

    advisedby(X1, X2) :- inphase(X1, X3) and professor(X2, X3)

  79. [79]

    advisedby(X1, X2) :- not(courselevel(X3, X2)) and publication(X1, X3)

  80. [80]

    advisedby(X1, X2) :- hasposition(X2, X3) and inphase(X3, X1). 34 L SYNTHETICDATASETSTABULARRESULTS Table 9: Comparison of Rule Extraction Performance between ANDRE and DFORL on Complex Synthetic Datasets with Varying Number of Subrules Dataset Sample Size Accuracy Rule Extraction SuccessANDRE DFORL Train Test Train Test ANDRE DFORL R1 20 0.95 0.80 0.85 0....