ANDRE: An Attention-based Neuro-symbolic Differentiable Rule Extractor
Pith reviewed 2026-05-08 17:33 UTC · model grok-4.3
The pith
ANDRE learns first-order logic rules from noisy probabilistic data by optimizing a continuous space with attention-driven conjunction and disjunction operators.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ANDRE optimizes first-order logic programs directly over a continuous rule space. It replaces both rule templates and logical operators with attention-based conjunction and disjunction that approximate min-max semantics over probabilistic predicate valuations. The attention operators softly select, negate, or exclude predicates within each rule, preserving symbolic structure while remaining fully differentiable. Experiments on classical ILP benchmarks, large knowledge bases, and synthetic data with noise show competitive predictive accuracy and substantially more reliable recovery of the original rules compared with earlier differentiable ILP approaches.
What carries the argument
attention-based conjunction and disjunction operators that approximate min-max logical semantics while softly selecting or excluding predicates
If this is right
- Rule induction no longer requires hand-crafted templates because attention performs soft selection and exclusion inside each clause.
- The same continuous optimization produces both accurate predictions and stable symbolic rules even when input predicates carry probability values.
- Moderate label noise does not destroy rule recovery, unlike earlier differentiable ILP methods that become unstable under the same conditions.
- The framework supports negation and exclusion of predicates without breaking differentiability or interpretability.
Where Pith is reading between the lines
- The attention mechanism could be reused as a drop-in replacement for fuzzy operators inside other neuro-symbolic pipelines that reason over uncertain facts.
- Because the rule space is continuous, the method might be extended to jointly optimize rules and the underlying predicate embeddings in an end-to-end fashion.
- If attention can stably approximate min-max logic, similar operators might be defined for other logical connectives or for higher-arity relations.
Load-bearing premise
Attention operators can approximate min-max logic over probabilistic predicates without vanishing gradients or collapsing the recovered symbolic structure during continuous optimization.
What would settle it
A controlled test in which ANDRE is trained on data with 20 percent random label flips and the extracted rules are then checked for exact match against the ground-truth rules; if the match rate falls below that of a template-based differentiable baseline, the central claim is falsified.
Figures
read the original abstract
Inductive Logic Programming (ILP) aims to learn interpretable first-order rules from data, but existing symbolic and neuro-symbolic approaches struggle to scale to noisy and probabilistic settings. Classical ILP relies on discrete combinatorial rule search and is brittle under uncertainty, while differentiable ILP methods typically depend on predefined rule templates or inaccurate fuzzy operators that suffer from vanishing gradients or poor approximation of logical structure when reasoning over probabilistic predicate valuations. This paper proposes an Attention-based Neuro-symbolic Differentiable Rule Extractor (ANDRE), a novel ILP framework that learns first-order logic programs by optimizing over a continuous rule space with attention-based logical operators. ANDRE replaces both rule templates and logical operators with fully differentiable, attention-driven conjunction and disjunction operators that approximate logical min-max semantics, enabling accurate, stable, and interpretable reasoning over probabilistic data. By softly selecting, negating, or excluding predicates within each rule, ANDRE supports flexible rule induction while preserving symbolic structure. Extensive experiments on classical ILP benchmarks, large-scale knowledge bases, and synthetic datasets with probabilistic predicates and noisy supervision demonstrate that ANDRE achieves competitive or superior predictive performance while reliably recovering correct symbolic rules under uncertainty. In particular, ANDRE remains robust to moderate label noise, substantially outperforming existing differentiable ILP methods in both rule extraction quality and stability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ANDRE, a neuro-symbolic ILP framework that replaces rule templates and logical operators with fully differentiable attention-based conjunction and disjunction mechanisms to learn first-order rules over probabilistic predicate valuations and noisy supervision; it claims competitive or superior predictive performance on benchmarks and knowledge bases while reliably recovering correct symbolic rules, with particular robustness to moderate label noise over prior differentiable ILP methods.
Significance. If the core approximation and recovery claims hold, ANDRE would offer a template-free, scalable route to interpretable rule induction in uncertain domains, addressing key scalability and brittleness issues in classical and differentiable ILP.
major comments (3)
- [Methods section describing attention-based logical operators] The central technical claim (abstract and methods) is that attention-driven operators 'approximate logical min-max semantics' and enable 'accurate, stable' reasoning; however, standard query-key softmax attention computes weighted averages rather than min/max, and no error bounds, limit-case analysis, or gradient-flow proof is provided showing non-vanishing behavior as predicate valuations approach 0/1. This directly threatens the asserted robustness to label noise and superiority in rule-extraction quality.
- [Experiments and rule-extraction evaluation] Performance and rule-recovery claims rest on optimizing attention weights over a continuous space whose logical interpretation is itself defined by those same weights (abstract and experiments); without an explicit separation (e.g., post-hoc discretization analysis or independent symbolic verification step), reported rule quality may largely reflect neural fitting rather than genuine logical structure.
- [Experiments section] The abstract asserts 'extensive experiments... with superior performance' and 'substantially outperforming existing differentiable ILP methods,' yet the provided description supplies no concrete metrics, baseline tables, ablation results on noise levels, or error analysis; these details are load-bearing for the central empirical claim.
minor comments (2)
- [Methods] Clarify the exact parameterization of the attention operators (query/key dimensions, scaling factors) and how negation/exclusion is realized differentiably.
- [Experiments] Add explicit comparison tables with numerical results against the cited differentiable ILP baselines.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our paper. We address each of the major comments below and indicate the revisions made to the manuscript.
read point-by-point responses
-
Referee: [Methods section describing attention-based logical operators] The central technical claim (abstract and methods) is that attention-driven operators 'approximate logical min-max semantics' and enable 'accurate, stable' reasoning; however, standard query-key softmax attention computes weighted averages rather than min/max, and no error bounds, limit-case analysis, or gradient-flow proof is provided showing non-vanishing behavior as predicate valuations approach 0/1. This directly threatens the asserted robustness to label noise and superiority in rule-extraction quality.
Authors: We appreciate the referee pointing out the need for stronger theoretical support for our approximation claim. Although our attention-based operators are designed to approximate min-max semantics by using attention scores to select and combine predicates in a soft manner that mimics logical conjunction and disjunction (with negation handled separately), we acknowledge that explicit error bounds and a full gradient-flow analysis were missing. In the revised manuscript, we have added a detailed analysis in the Methods section, including limit-case behavior as the temperature parameter approaches zero (approaching hard min/max) and empirical verification of gradient magnitudes to demonstrate stability. This addition directly addresses the concern regarding robustness to noise. revision: yes
-
Referee: [Experiments and rule-extraction evaluation] Performance and rule-recovery claims rest on optimizing attention weights over a continuous space whose logical interpretation is itself defined by those same weights (abstract and experiments); without an explicit separation (e.g., post-hoc discretization analysis or independent symbolic verification step), reported rule quality may largely reflect neural fitting rather than genuine logical structure.
Authors: This comment raises a crucial point about ensuring that the reported rule quality reflects true logical structure rather than just the neural component. In our approach, the attention weights provide a direct mapping to rule predicates, allowing for rule extraction by selecting the highest-weighted predicates. To provide the requested separation, we have incorporated a post-hoc discretization step in the revised Experiments section, where we threshold the attention weights to obtain discrete rules and then evaluate these symbolic rules using a standard logic engine on test data. We also include comparisons showing that the extracted rules perform comparably to the continuous model, supporting that the logical structure is indeed captured. revision: yes
-
Referee: [Experiments section] The abstract asserts 'extensive experiments... with superior performance' and 'substantially outperforming existing differentiable ILP methods,' yet the provided description supplies no concrete metrics, baseline tables, ablation results on noise levels, or error analysis; these details are load-bearing for the central empirical claim.
Authors: We regret that the experimental details may not have been sufficiently prominent or detailed in the initial version. The manuscript does contain results on multiple benchmarks, but to strengthen the presentation, we have expanded the Experiments section with additional tables showing quantitative metrics (e.g., accuracy, rule recovery precision/recall), direct comparisons to baselines such as dILP and others, ablations across different noise levels (0%, 10%, 20%, 30%), and error bars from multiple runs. These revisions ensure the empirical claims are fully supported with concrete evidence. revision: yes
Circularity Check
No significant circularity detected in ANDRE derivation
full rationale
The paper proposes a new neuro-symbolic ILP method by defining attention-based operators to approximate min-max logic, then validates the approach empirically on benchmarks, knowledge bases, and noisy synthetic data. No load-bearing step reduces by construction to its own inputs: the continuous rule space and attention mechanisms are explicitly introduced as a modeling choice, not derived from the target performance or rule-recovery metrics. Claims of robustness and superiority are presented as experimental outcomes rather than tautological consequences of the operator definitions. No self-citation chains, fitted parameters renamed as predictions, or uniqueness theorems imported from prior author work appear in the abstract or described structure. The derivation remains self-contained as a constructive proposal with independent empirical testing.
Axiom & Free-Parameter Ledger
free parameters (1)
- attention weights and scaling factors
axioms (1)
- domain assumption Attention-driven operators can approximate logical min-max semantics over probabilistic valuations without vanishing gradients
invented entities (1)
-
attention-based logical operators
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Appear in at least two distinct body predicates, forming a connection between them, or
-
[2]
This condition ensures that auxiliary variables are semantically meaningful and contribute to the logical relationship expressed in the rule
Be completely absent from the rule (i.e., not appear in any body predicates). This condition ensures that auxiliary variables are semantically meaningful and contribute to the logical relationship expressed in the rule. Valid example (connected auxiliary): grandparent(X1, X3):- parent(X 1, X2),parent(X 2, X3). VariableX 2 does not appear in the head and t...
-
[3]
Training Acc: 0.6825 | Eval Acc: 0.6900 Val Coverage: (N_b=53, N_r=53, N_r/N_b=1.00) Train Coverage: (N_b=219, N_r=219, N_r/N_b=1.00)
grandparent(X1, X3) :- father(X1, X2) and mother(X2, X3). Training Acc: 0.6825 | Eval Acc: 0.6900 Val Coverage: (N_b=53, N_r=53, N_r/N_b=1.00) Train Coverage: (N_b=219, N_r=219, N_r/N_b=1.00)
-
[4]
Training Acc: 0.6725 | Eval Acc: 0.6850 Val Coverage: (N_b=52, N_r=52, N_r/N_b=1.00) Train Coverage: (N_b=211, N_r=211, N_r/N_b=1.00)
grandparent(X1, X3) :- father(X1, X2) and father(X2, X3). Training Acc: 0.6725 | Eval Acc: 0.6850 Val Coverage: (N_b=52, N_r=52, N_r/N_b=1.00) Train Coverage: (N_b=211, N_r=211, N_r/N_b=1.00)
-
[5]
Training Acc: 0.6850 | Eval Acc: 0.6500 Val Coverage: (N_b=49, N_r=47, N_r/N_b=0.9592) Train Coverage: (N_b=225, N_r=223, N_r/N_b=0.9911)
grandparent(X1, X3) :- mother(X1, X2) and mother(X2, X3). Training Acc: 0.6850 | Eval Acc: 0.6500 Val Coverage: (N_b=49, N_r=47, N_r/N_b=0.9592) Train Coverage: (N_b=225, N_r=223, N_r/N_b=0.9911)
-
[6]
grandparent(X1, X3) :- father(X2, X3) and mother(X1, X2). Training Acc: 0.6538 | Eval Acc: 0.65 Val Coverage: (N_b=45, N_r=45, N_r/N_b=1.00) Train Coverage: (N_b=196, N_r=196, N_r/N_b=1.00) These subrules collectively form a complete and interpretable definition of the grandparent relation, fully consistent with first-order logic and human intuition. Disc...
-
[7]
• Variable set:X={X 1, X2} • Head variable set:X h =X • Domain of constants:E={0,1,
ThePredecessorTask: Objective:Learn a rule that identifies when one number is the predecessor of another, using back- ground knowledge about numeric succession. • Variable set:X={X 1, X2} • Head variable set:X h =X • Domain of constants:E={0,1, . . . ,8} • Body predicates:b={successor(X 2, X1)} • Head predicate:h=predecessor(X 1, X2) • Background knowledg...
-
[8]
• Variable set:X={X 1, X2, X3} 23 • Head variable set:X h ={X 3} • Auxiliary variable set:X a =X\X h • Domain of constants:E={0,1,
TheOddTask: Objective:Learn the logical patterns that define odd numbers using predecessor and parity relation- ships, and generalize them to unseen numerical values. • Variable set:X={X 1, X2, X3} 23 • Head variable set:X h ={X 3} • Auxiliary variable set:X a =X\X h • Domain of constants:E={0,1, . . . ,30} • Body predicates: b={zero(X 1),zero(X 2),zero(X...
-
[9]
• Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 3} • Auxiliary variable set:X a =X\X h • Domain of constants:E={0,1,
TheEvenTask: Objective:Discover the rule structure that governs even numbers, exploiting arithmetic successor relationships and parity predicates. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 3} • Auxiliary variable set:X a =X\X h • Domain of constants:E={0,1, . . . ,30} • Body predicates: b={zero(X 1),zero(X 2),zero(X 3),successor(X 1, X2),...
-
[10]
• Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Domain of constants:E={0,1,
TheLessThanTask: Objective:Learn transitive and arithmetic rules that define the less-than relation between two inte- gers using a successor-based formulation. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Domain of constants:E={0,1, . . . ,9} • Body predicates: b={successor(X 1, X2),successor(X 2,...
-
[11]
TheGrandparentTask: Objective:Learn the definition of a grandparent based on transitive parent relationships using both motherandfatherfacts provided in the background knowledge. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Body predicates: b={father(X 1, X2),father(X 2, X3),father(X 1, X3), mothe...
-
[12]
TheSonTask: Objective:Learn how thesonrelationship can be derived fromfather,brother, andsister facts using transitivity and kinship inference. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Body predicates: b={father(X 1, X2),father(X 2, X3),father(X 1, X3), brother(X1, X2),sister(X 1, X2),son(X 2,...
-
[13]
TheRelatednessTask: Objective:Determine whether two individuals are related, based on transitive closure overparent relationships and recursive definitions ofrelated. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Body predicates: b={parent(X 1, X2),parent(X 2, X3),parent(X 1, X3), related(X1, X2),r...
-
[14]
TheFatherTask: Objective:Infer thefatherrelationship using background assumptions involving marriage and motherhood. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Body predicates: b={mother(X 1, X2),mother(X 2, X3),mother(X 1, X3), husband(X1, X2),husband(X 2, X3),husband(X 1, X3)} • Head predicate...
-
[15]
TheDirected EdgeTask: Objective:Determine whether two nodes are connected by a directed edge in either direction, using basic edge facts. • Variable set:X={X 1, X2} • Head variable set:X h ={X 1, X2} • Body predicates: b={edge(X 1, X2),d-edge(X 2, X1)} • Head predicate:h=d-edge(X 1, X2) • Background knowledge: B={edge(a, b),edge(b, d),edge(c, c), . . .} 2...
-
[16]
TheConnectednessTask: Objective:Learn theconnectednessrelation, which holds true if there exists a direct or transi- tive path (via one or moreedgerelations) between two nodes. • Variable set:X={X 1, X2, X3} • Head variable set:X h ={X 1, X3} • Auxiliary variable set:X a ={X 2} • Body predicates: b={edge(X 1, X3),edge(X 3, X1),edge(X 2, X3), connectedness...
-
[17]
Training Acc: 0.9927 | Eval Acc: 0.9930
locatedIn(X1, X2) :- locatedIn(X3, X2) and neighborOf(X3, X1). Training Acc: 0.9927 | Eval Acc: 0.9930
-
[18]
Training Acc: 0.9926 | Eval Acc: 0.9930
locatedIn(X1, X2) :- locatedIn(X1, X3) and locatedIn(X3, X2). Training Acc: 0.9926 | Eval Acc: 0.9930
-
[19]
Training Acc: 0.9925 | Eval Acc: 0.9928
locatedIn(X1, X2) :- locatedIn(X3, X1) and neighborOf(X2, X3). Training Acc: 0.9925 | Eval Acc: 0.9928
-
[20]
Training Acc: 0.9925 | Eval Acc: 0.9928
locatedIn(X1, X2) :- locatedIn(X3, X1) and neighborOf(X3, X2). Training Acc: 0.9925 | Eval Acc: 0.9928
-
[21]
Training Acc: 0.9925 | Eval Acc: 0.9928
locatedIn(X1, X2) :- locatedIn(X1, X3) and neighborOf(X3, X2). Training Acc: 0.9925 | Eval Acc: 0.9928
-
[22]
Training Acc: 0.9925 | Eval Acc: 0.9928
locatedIn(X1, X2) :- neighborOf(X1, X3) and neighborOf(X2, X3). Training Acc: 0.9925 | Eval Acc: 0.9928
-
[23]
locatedIn(X1, X2) :- locatedIn(X2, X3) and neighborOf(X3, X1). Training Acc: 0.9925 | Eval Acc: 0.9928 I.2 NATIONSDATASET TheNationstask aims to learn semantic relations between geopolitical entities using background predicates describing economic, political, and geographic interactions. Below, we report represen- tative rules extracted by ANDRE for diffe...
-
[24]
Training Acc: 0.9590 | Eval Acc: 0.9620 Val Coverage: (N_b=524, N_r=488, N_r/N_b=0.9313) Train Coverage: (N_b=9668, N_r=8920, N_r/N_b=0.9226)
blockpositionindex(X1, X2) :- blockpositionindex(X2, X1). Training Acc: 0.9590 | Eval Acc: 0.9620 Val Coverage: (N_b=524, N_r=488, N_r/N_b=0.9313) Train Coverage: (N_b=9668, N_r=8920, N_r/N_b=0.9226)
-
[25]
Training Acc: 0.9474 | Eval Acc: 0.9474 Val Coverage: (N_b=486, N_r=455, N_r/N_b=0.9362) Train Coverage: (N_b=9019, N_r=8383, N_r/N_b=0.9295)
blockpositionindex(X1, X2) :-not(timesincewar(X1, X2)) and blockpositionindex(X2, X1). Training Acc: 0.9474 | Eval Acc: 0.9474 Val Coverage: (N_b=486, N_r=455, N_r/N_b=0.9362) Train Coverage: (N_b=9019, N_r=8383, N_r/N_b=0.9295)
-
[26]
Training Acc: 0.8625 | Eval Acc: 0.8568 Val Coverage: (N_b=290, N_r=270, N_r/N_b=0.9310) Train Coverage: (N_b=5394, N_r=5022, N_r/N_b=0.9310)
blockpositionindex(X1, X2) :- commonbloc0(X1, X2). Training Acc: 0.8625 | Eval Acc: 0.8568 Val Coverage: (N_b=290, N_r=270, N_r/N_b=0.9310) Train Coverage: (N_b=5394, N_r=5022, N_r/N_b=0.9310)
-
[27]
Training Acc: 0.8627 | Eval Acc: 0.8536 Val Coverage: (N_b=284, N_r=264, N_r/N_b=0.9296) Train Coverage: (N_b=5400, N_r=5028, N_r/N_b=0.9311)
blockpositionindex(X1, X2) :- commonbloc0(X2, X1). Training Acc: 0.8627 | Eval Acc: 0.8536 Val Coverage: (N_b=284, N_r=264, N_r/N_b=0.9296) Train Coverage: (N_b=5400, N_r=5028, N_r/N_b=0.9311)
-
[28]
blockpositionindex(X1, X2) :- not(relintergovorgs(X2, X1)) and embassy(X2, X1) and not( commonbloc2(X1, X2)) and not(reltreaties(X1, X2)) and not(reldiplomacy(X1, X2)) and conferences(X2, X1). Training Acc: 0.7538 | Eval Acc: 0.7474 Val Coverage: (N_b=44, N_r=42, N_r/N_b=0.9545) Train Coverage: (N_b=803, N_r=742, N_r/N_b=0.9240) Note.We refer to Appendix ...
-
[29]
Training Acc: 0.7466 | Eval Acc: 0.7482 Val Coverage: (N_b=648, N_r=529, N_r/N_b=0.8164) Train Coverage: (N_b=12647, N_r=10135, N_r/N_b=0.8014)
intergovorgs3(X1, X2) :- embassy(X1, X2) and ngoorgs3(X1, X2) and not(ngoorgs3(X2, X1)) and not(exports3(X1, X2)) and not(releconomicaid(X2, X1)) and not(releconomicaid(X3, X1)) and not(duration(X2, X1)) and not(lostterritory(X1, X3)). Training Acc: 0.7466 | Eval Acc: 0.7482 Val Coverage: (N_b=648, N_r=529, N_r/N_b=0.8164) Train Coverage: (N_b=12647, N_r=...
-
[30]
Training Acc: 0.6836 | Eval Acc: 0.6819 Val Coverage: (N_b=430, N_r=370, N_r/N_b=0.8605) Train Coverage: (N_b=8473, N_r=7145, N_r/N_b=0.8433)
intergovorgs3(X1, X2) :- ngoorgs3(X1, X2) and not(economicaid(X2, X1)) and ngo(X2, X1) and not(relngo(X2, X1)) and not(relexportbooks(X1, X3)) and not(violentactions(X2, X1)) and not(warning(X1, X3)). Training Acc: 0.6836 | Eval Acc: 0.6819 Val Coverage: (N_b=430, N_r=370, N_r/N_b=0.8605) Train Coverage: (N_b=8473, N_r=7145, N_r/N_b=0.8433)
-
[31]
Training Acc: 0.6867 | Eval Acc: 0.6806 Val Coverage: (N_b=466, N_r=387, N_r/N_b=0.8305) Train Coverage: (N_b=8934, N_r=7420, N_r/N_b=0.8305)
intergovorgs3(X1, X2) :- not(accusation(X1, X2)) and intergovorgs(X1, X2) and ngo(X2, X1) and not(aidenemy(X2, X3)) and not(releconomicaid(X2, X1)) and not(expeldiplomats(X2, X1)) and treaties(X2, X1) and not(lostterritory(X1, X3)). Training Acc: 0.6867 | Eval Acc: 0.6806 Val Coverage: (N_b=466, N_r=387, N_r/N_b=0.8305) Train Coverage: (N_b=8934, N_r=7420...
-
[32]
Training Acc: 0.6551 | Eval Acc: 0.6733 Val Coverage: (N_b=347, N_r=322, N_r/N_b=0.9280) Train Coverage: (N_b=6473, N_r=5737, N_r/N_b=0.8863)
intergovorgs3(X1, X2) :- not(accusation(X2, X1)) and not(commonbloc2(X1, X2)) and relngo( X1, X2) and timesinceally(X1, X2) and not(relemigrants(X2, X3)) and not(lostterritory(X2, X3)). Training Acc: 0.6551 | Eval Acc: 0.6733 Val Coverage: (N_b=347, N_r=322, N_r/N_b=0.9280) Train Coverage: (N_b=6473, N_r=5737, N_r/N_b=0.8863)
-
[33]
Training Acc: 0.6724 | Eval Acc: 0.6713 Val Coverage: (N_b=312, N_r=303, N_r/N_b=0.9712) Train Coverage: (N_b=5949, N_r=5723, N_r/N_b=0.9620)
intergovorgs3(X1, X2) :- not(militaryalliance(X1, X2)) and intergovorgs(X1, X2) and not( expeldiplomats(X2, X3)) and not(relngo(X2, X1)) and not(relexportbooks(X1, X3)). Training Acc: 0.6724 | Eval Acc: 0.6713 Val Coverage: (N_b=312, N_r=303, N_r/N_b=0.9712) Train Coverage: (N_b=5949, N_r=5723, N_r/N_b=0.9620)
-
[34]
Training Acc: 0.6517 | Eval Acc: 0.6468 Val Coverage: (N_b=349, N_r=303, N_r/N_b=0.8682) Train Coverage: (N_b=6789, N_r=5846, N_r/N_b=0.8611)
intergovorgs3(X1, X2) :- not(commonbloc2(X1, X2)) and ngoorgs3(X1, X2) and relngo(X1, X2) and not(relngo(X2, X1)) and not(students(X1, X3)) and not(tourism(X2, X1)) and not( dependent(X1, X3)) and not(violentactions(X1, X2)) and not(severdiplomatic(X1, X3)). Training Acc: 0.6517 | Eval Acc: 0.6468 Val Coverage: (N_b=349, N_r=303, N_r/N_b=0.8682) Train Cov...
-
[35]
Training Acc: 0.6106 | Eval Acc: 0.6163 Val Coverage: (N_b=261, N_r=236, N_r/N_b=0.9042) Train Coverage: (N_b=4515, N_r=4119, N_r/N_b=0.9123)
intergovorgs3(X1, X2) :- relintergovorgs(X1, X2) and not(economicaid(X2, X3)) and intergovorgs(X2, X1) and not(eemigrants(X3, X2)) and timesinceally(X2, X1). Training Acc: 0.6106 | Eval Acc: 0.6163 Val Coverage: (N_b=261, N_r=236, N_r/N_b=0.9042) Train Coverage: (N_b=4515, N_r=4119, N_r/N_b=0.9123)
-
[36]
intergovorgs3(X1, X2) :- relintergovorgs(X1, X2) and intergovorgs(X2, X1) and timesinceally(X2, X1) and not(exportbooks(X1, X2)) and not(dependent(X3, X1)) and not( warning(X1, X3)). Training Acc: 0.6166 | Eval Acc: 0.6130 Val Coverage: (N_b=256, N_r=231, N_r/N_b=0.9023) Train Coverage: (N_b=4676, N_r=4286, N_r/N_b=0.9166) N E G A T I V E C O M M(X1, X2) ...
-
[37]
Training Acc: 0.9084 | Eval Acc: 0.9223 30 Val Coverage: (N_b=124, N_r=124, N_r/N_b=1.0000) Train Coverage: (N_b=2228, N_r=2228, N_r/N_b=1.0000)
negativecomm(X1, X2) :- negativebehavior(X1, X2) and timesinceally(X2, X1). Training Acc: 0.9084 | Eval Acc: 0.9223 30 Val Coverage: (N_b=124, N_r=124, N_r/N_b=1.0000) Train Coverage: (N_b=2228, N_r=2228, N_r/N_b=1.0000)
-
[38]
Training Acc: 0.9208 | Eval Acc: 0.9142 Val Coverage: (N_b=141, N_r=129, N_r/N_b=0.9149) Train Coverage: (N_b=2799, N_r=2615, N_r/N_b=0.9343)
negativecomm(X1, X2) :- negativebehavior(X1, X2) and accusation(X1, X2). Training Acc: 0.9208 | Eval Acc: 0.9142 Val Coverage: (N_b=141, N_r=129, N_r/N_b=0.9149) Train Coverage: (N_b=2799, N_r=2615, N_r/N_b=0.9343)
-
[39]
Training Acc: 0.8974 | Eval Acc: 0.9049 Val Coverage: (N_b=137, N_r=123, N_r/N_b=0.8978) Train Coverage: (N_b=2411, N_r=2229, N_r/N_b=0.9245)
negativecomm(X1, X2) :- negativebehavior(X1, X2) and blockpositionindex(X1, X2). Training Acc: 0.8974 | Eval Acc: 0.9049 Val Coverage: (N_b=137, N_r=123, N_r/N_b=0.8978) Train Coverage: (N_b=2411, N_r=2229, N_r/N_b=0.9245)
-
[40]
Training Acc: 0.8855 | Eval Acc: 0.9026 Val Coverage: (N_b=107, N_r=107, N_r/N_b=1.0000) Train Coverage: (N_b=1853, N_r=1853, N_r/N_b=1.0000)
negativecomm(X1, X2) :- negativebehavior(X1, X2) and pprotests(X1, X2). Training Acc: 0.8855 | Eval Acc: 0.9026 Val Coverage: (N_b=107, N_r=107, N_r/N_b=1.0000) Train Coverage: (N_b=1853, N_r=1853, N_r/N_b=1.0000)
-
[41]
Training Acc: 0.8695 | Eval Acc: 0.8817 Val Coverage: (N_b=111, N_r=100, N_r/N_b=0.9009) Train Coverage: (N_b=2130, N_r=1860, N_r/N_b=0.8732)
negativecomm(X1, X2) :- negativebehavior(X1, X2) and negativebehavior(X2, X1) and blockpositionindex(X1, X2). Training Acc: 0.8695 | Eval Acc: 0.8817 Val Coverage: (N_b=111, N_r=100, N_r/N_b=0.9009) Train Coverage: (N_b=2130, N_r=1860, N_r/N_b=0.8732)
-
[42]
Training Acc: 0.8627 | Eval Acc: 0.8805 Val Coverage: (N_b=98, N_r=93, N_r/N_b=0.9490) Train Coverage: (N_b=1862, N_r=1671, N_r/N_b=0.8974)
negativecomm(X1, X2) :- negativebehavior(X1, X2) and commonbloc0(X1, X2). Training Acc: 0.8627 | Eval Acc: 0.8805 Val Coverage: (N_b=98, N_r=93, N_r/N_b=0.9490) Train Coverage: (N_b=1862, N_r=1671, N_r/N_b=0.8974)
-
[43]
Training Acc: 0.8754 | Eval Acc: 0.8677 Val Coverage: (N_b=127, N_r=102, N_r/N_b=0.8031) Train Coverage: (N_b=2813, N_r=2250, N_r/N_b=0.7999)
negativecomm(X1, X2) :- negativecomm(X2, X1) and violentactions(X2, X1). Training Acc: 0.8754 | Eval Acc: 0.8677 Val Coverage: (N_b=127, N_r=102, N_r/N_b=0.8031) Train Coverage: (N_b=2813, N_r=2250, N_r/N_b=0.7999)
-
[44]
Training Acc: 0.8754 | Eval Acc: 0.8677 Val Coverage: (N_b=127, N_r=102, N_r/N_b=0.8031) Train Coverage: (N_b=2813, N_r=2250, N_r/N_b=0.7999)
negativecomm(X1, X2) :- negativecomm(X2, X1). Training Acc: 0.8754 | Eval Acc: 0.8677 Val Coverage: (N_b=127, N_r=102, N_r/N_b=0.8031) Train Coverage: (N_b=2813, N_r=2250, N_r/N_b=0.7999)
-
[45]
Training Acc: 0.8595 | Eval Acc: 0.8631 Val Coverage: (N_b=113, N_r=93, N_r/N_b=0.8230) Train Coverage: (N_b=2111, N_r=1769, N_r/N_b=0.8380)
negativecomm(X1, X2) :- negativebehavior(X2, X1) and blockpositionindex(X1, X2) and negativecomm(X2, X1). Training Acc: 0.8595 | Eval Acc: 0.8631 Val Coverage: (N_b=113, N_r=93, N_r/N_b=0.8230) Train Coverage: (N_b=2111, N_r=1769, N_r/N_b=0.8380)
-
[46]
negativecomm(X1, X2) :- accusation(X1, X2) and accusation(X2, X1). Training Acc: 0.8757 | Eval Acc: 0.8619 Val Coverage: (N_b=96, N_r=84, N_r/N_b=0.8750) Train Coverage: (N_b=2060, N_r=1876, N_r/N_b=0.9107) I.3 UMLS DATASET TheUMLSdataset consists of biomedical entities connected through a large set of heterogeneous semantic relations. The task requires l...
2060
-
[47]
Training Acc: 0.9449 | Eval Acc: 0.9466
isa(X1, X2) :- isa(X3, X2) and interacts_with(X1, X3). Training Acc: 0.9449 | Eval Acc: 0.9466
-
[48]
Training Acc: 0.9437 | Eval Acc: 0.9453
isa(X1, X2) :- isa(X3, X2) and conceptually_related_to(X3, X1). Training Acc: 0.9437 | Eval Acc: 0.9453
-
[49]
Training Acc: 0.9437 | Eval Acc: 0.9453 31
isa(X1, X2) :- connected_to(X3, X1) and practices(X3, X2). Training Acc: 0.9437 | Eval Acc: 0.9453 31
-
[50]
Training Acc: 0.9430 | Eval Acc: 0.9449
isa(X1, X2) :- not(affects(X3, X1)) and conceptual_part_of(X3, X2). Training Acc: 0.9430 | Eval Acc: 0.9449
-
[51]
isa(X1, X2) :- not(isa(X3, X2)) and conceptual_part_of(X1, X3). Training Acc: 0.9429 | Eval Acc: 0.9445 I N T E R A C T S W I T H(X1, X2) The target predicateinteracts with(X1, X2)captures functional, biochemical, or causal in- teractions between biomedical entities in the UMLS knowledge base. The rules below illustrate how ANDRE infers interaction patter...
-
[52]
Training Acc: 0.8920 | Eval Acc: 0.8946
interacts_with(X1, X2) :- isa(X2, X1) and not(associated_with(X2, X3)) and not( interacts_with(X2, X1)) and not(ingredient_of(X2, X3)). Training Acc: 0.8920 | Eval Acc: 0.8946
-
[53]
Training Acc: 0.8912 | Eval Acc: 0.8938
interacts_with(X1, X2) :- isa(X2, X1) and not(interacts_with(X2, X1)) and not(part_of(X3, X1)) and not(measures(X3, X2)). Training Acc: 0.8912 | Eval Acc: 0.8938
-
[54]
Training Acc: 0.8865 | Eval Acc: 0.8879
interacts_with(X1, X2) :- not(location_of(X3, X2)) and isa(X1, X2) and not(interacts_with( X2, X1)) and not(complicates(X2, X3)). Training Acc: 0.8865 | Eval Acc: 0.8879
-
[55]
Training Acc: 0.8801 | Eval Acc: 0.8823
interacts_with(X1, X2) :- interacts_with(X1, X3) and interacts_with(X3, X2). Training Acc: 0.8801 | Eval Acc: 0.8823
-
[56]
Training Acc: 0.8722 | Eval Acc: 0.8744
interacts_with(X1, X2) :- associated_with(X1, X3) and performs(X2, X3). Training Acc: 0.8722 | Eval Acc: 0.8744
-
[57]
Training Acc: 0.8722 | Eval Acc: 0.8744
interacts_with(X1, X2) :- co_occurs_with(X1, X3) and indicates(X2, X3). Training Acc: 0.8722 | Eval Acc: 0.8744
-
[58]
Training Acc: 0.8722 | Eval Acc: 0.8744
interacts_with(X1, X2) :- treats(X3, X2) and developmental_form_of(X1, X3). Training Acc: 0.8722 | Eval Acc: 0.8744
-
[59]
Training Acc: 0.8722 | Eval Acc: 0.8744
interacts_with(X1, X2) :- ingredient_of(X1, X3) and interconnects(X3, X2). Training Acc: 0.8722 | Eval Acc: 0.8744
-
[60]
Training Acc: 0.8722 | Eval Acc: 0.8744 J RUNTIMECOMPARISON
interacts_with(X1, X2) :- result_of(X2, X3) and adjacent_to(X1, X3). Training Acc: 0.8722 | Eval Acc: 0.8744 J RUNTIMECOMPARISON. Table 7 reports the total running time required by NTPλ, NeuralLP, DFORL, and ANDRE to gen- erate complete sets of logic programs on the Countries, Nations, and UMLS datasets. The results highlight substantial differences in co...
-
[61]
Training Acc: 0.9216 | Eval Acc: 0.9217
great_ne(X1, X2) :- not(great_ne(X2, X1)). Training Acc: 0.9216 | Eval Acc: 0.9217
-
[62]
Training Acc: 0.8896 | Eval Acc: 0.8786
great_ne(X1, X2) :- not(great_ne(X2, X1)) and great_ne(X3, X2) and not(r_subst_1(X1, X3)). Training Acc: 0.8896 | Eval Acc: 0.8786
-
[63]
Training Acc: 0.7689 | Eval Acc: 0.7609
great_ne(X1, X2) :- x_subst(X2, X3) and r_subst_1(X1, X3). Training Acc: 0.7689 | Eval Acc: 0.7609
-
[64]
Training Acc: 0.7689 | Eval Acc: 0.7609
great_ne(X1, X2) :- gt(X1, X3) and great_pi_acc(X2, X3). Training Acc: 0.7689 | Eval Acc: 0.7609
-
[65]
Training Acc: 0.7689 | Eval Acc: 0.7609
great_ne(X1, X2) :- not(ring_subst_4(X2, X3)) and ring_subst_4(X3, X1). Training Acc: 0.7689 | Eval Acc: 0.7609
-
[66]
Training Acc: 0.7689 | Eval Acc: 0.7609
great_ne(X1, X2) :- great_ne(X3, X2) and flex(X1, X3). Training Acc: 0.7689 | Eval Acc: 0.7609
-
[67]
Training Acc: 0.7689 | Eval Acc: 0.7609
great_ne(X1, X2) :- r_subst_2(X3, X2) and ring_subst_2(X3, X1). Training Acc: 0.7689 | Eval Acc: 0.7609
-
[68]
Training Acc: 0.7689 | Eval Acc: 0.7609
great_ne(X1, X2) :- great_ne(X3, X2) and ring_subst_2(X3, X1). Training Acc: 0.7689 | Eval Acc: 0.7609
-
[69]
Training Acc: 0.7689 | Eval Acc: 0.7609
great_ne(X1, X2) :- pi_doner(X3, X1) and not(ring_substitutions(X2, X3)). Training Acc: 0.7689 | Eval Acc: 0.7609
-
[70]
great_ne(X1, X2) :- great_ne(X2, X3) and pi_doner(X3, X1). Training Acc: 0.7689 | Eval Acc: 0.7609 33 K.2 UW-CSE DATASET TheUW-CSEdataset models academic relationships within a university domain, including roles, courses, projects, and advising relationships. The task is to infer latent advisory relations from heterogeneous academic facts. A D V I S E D B...
-
[71]
advisedby(X1, X2) :- advisedby(X3, X1) and yearsinprogram(X3, X2)
-
[72]
advisedby(X1, X2) :- courselevel(X1, X3) and hasposition(X2, X3)
-
[73]
advisedby(X1, X2) :- professor(X3, X2) and taughtby(X3, X1)
-
[74]
advisedby(X1, X2) :- courselevel(X2, X3) and not(professor(X3, X1))
-
[75]
advisedby(X1, X2) :- advisedby(X3, X1) and not(inphase(X3, X2))
-
[76]
advisedby(X1, X2) :- hasposition(X2, X3) and ta(X3, X1)
-
[77]
advisedby(X1, X2) :- not(hasposition(X3, X1)) and projectmember(X2, X3)
-
[78]
advisedby(X1, X2) :- inphase(X1, X3) and professor(X2, X3)
-
[79]
advisedby(X1, X2) :- not(courselevel(X3, X2)) and publication(X1, X3)
-
[80]
advisedby(X1, X2) :- hasposition(X2, X3) and inphase(X3, X1). 34 L SYNTHETICDATASETSTABULARRESULTS Table 9: Comparison of Rule Extraction Performance between ANDRE and DFORL on Complex Synthetic Datasets with Varying Number of Subrules Dataset Sample Size Accuracy Rule Extraction SuccessANDRE DFORL Train Test Train Test ANDRE DFORL R1 20 0.95 0.80 0.85 0....
2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.