Using Answer Set Programming for Commonsense Reasoning in the Winograd Schema Challenge

Arpit Sharma

arxiv: 1907.11112 · v1 · pith:F7FRFGDCnew · submitted 2019-07-25 · 💻 cs.AI · cs.CL

Using Answer Set Programming for Commonsense Reasoning in the Winograd Schema Challenge

Arpit Sharma This is my paper

Pith reviewed 2026-05-24 16:11 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords Winograd Schema ChallengeAnswer Set ProgrammingCommonsense ReasoningGraph IsomorphismNatural Language UnderstandingElaboration TolerancePronoun Resolution

0 comments

The pith

Answer Set Programming on graph representations solves 240 out of 291 Winograd Schema problems with added commonsense knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that Winograd Schema Challenge problems, which test pronoun reference through commonsense, can be solved by turning each schema into a graph and using Answer Set Programming to match it against a separate commonsense graph. The ASP encoding supports adding new facts and rules without rewriting the base program. A reader would care because the method makes the required knowledge explicit and testable rather than hidden inside statistical patterns. The reported coverage of 240 problems indicates that the combination of representation and solver reaches most cases in the standard collection.

Core claim

The paper claims that a graph based representation of WSC problems combined with an Answer Set Programming encoding of graph-subgraph isomorphism, supplied with relevant commonsense knowledge, correctly resolves 240 out of 291 schemas. This encoding permits additional constraints to be added in an elaboration tolerant manner.

What carries the argument

Graph-subgraph isomorphism encoded in Answer Set Programming on representations of the schema and commonsense facts.

If this is right

New commonsense rules can be added to the program without changes to the core encoding or solver.
The stable models produced by ASP make the reasoning steps that select one referent over another explicit and inspectable.
Unsolved cases indicate gaps in the supplied knowledge base rather than limits of the matching procedure itself.
The same graph-plus-ASP structure can be reused for other language tasks that require matching a partial description to world knowledge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could serve as a verification layer that checks or explains decisions made by statistical pronoun-resolution models.
Systematically extending the commonsense graph until all 291 problems are covered would test whether knowledge coverage is the main remaining bottleneck.
Similar graph encodings might transfer to coreference resolution in longer documents where multiple referents must be tracked.

Load-bearing premise

The additional commonsense knowledge supplied to the ASP program is accurate and sufficient to select the correct referent without allowing incorrect solutions.

What would settle it

Running the ASP program on the 51 unsolved problems and checking whether it returns the correct referent, no stable model, or an incorrect referent for each.

read the original abstract

The Winograd Schema Challenge (WSC) is a natural language understanding task proposed as an alternative to the Turing test in 2011. In this work we attempt to solve WSC problems by reasoning with additional knowledge. By using an approach built on top of graph-subgraph isomorphism encoded using Answer Set Programming (ASP) we were able to handle 240 out of 291 WSC problems. The ASP encoding allows us to add additional constraints in an elaboration tolerant manner. In the process we present a graph based representation of WSC problems as well as relevant commonsense knowledge. This paper is under consideration for acceptance in TPLP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper encodes WSC items as graphs and solves 240/291 via ASP subgraph isomorphism, but the commonsense knowledge source and construction remain unclear.

read the letter

The core contribution is a graph representation of Winograd schemas together with an ASP program that reduces pronoun resolution to subgraph isomorphism plus extra constraints. They report handling 240 out of 291 cases this way and note that ASP supports elaboration-tolerant addition of rules. That encoding is the concrete piece that is new relative to the cited prior work on the benchmark. The declarative style is a strength; it makes the added knowledge explicit and lets the solver handle consistency checking in a standard way. The result shows that a symbolic pipeline can clear a large fraction of the test set once the right facts are present. The stress-test concern about per-schema knowledge engineering is reasonable on the evidence given. The abstract supplies the success count but no description of how the commonsense facts were gathered, whether they form a reusable theory or a collection of instance-specific assertions, and no breakdown of the 51 failures or confirmation that the chosen models match the gold answers for the intended reasons rather than some other consistent assignment. If the knowledge base turns out to be largely hand-crafted per schema, the work demonstrates knowledge engineering more than general automated reasoning. A referee could reasonably ask for the KB construction procedure and some per-instance analysis. This is useful reading for anyone interested in applying answer-set programming or graph-based methods to NLU benchmarks. It deserves peer review because the encoding is reproducible in principle and the empirical target is clear; the missing details on knowledge acquisition are fixable and worth requesting rather than grounds for desk rejection.

Referee Report

1 major / 0 minor

Summary. The paper proposes solving Winograd Schema Challenge (WSC) instances via Answer Set Programming (ASP) encodings of graph-subgraph isomorphism augmented with additional commonsense knowledge constraints. It reports a success rate of 240 out of 291 problems and introduces a graph-based representation of the schemas and knowledge, highlighting the elaboration-tolerant addition of constraints in ASP.

Significance. If the approach can be shown to rely on a reusable, general commonsense theory rather than per-instance encodings, the result would illustrate a concrete application of ASP to commonsense reasoning in NLU. The reported count is an empirical demonstration on hand-crafted encodings; its significance hinges on whether the supplied knowledge is accurate, sufficient, and non-spurious without reverse-engineering from gold answers.

major comments (1)

[Abstract] Abstract: The claim that the method handles 240/291 WSC problems supplies no error analysis, no description of the source or construction procedure for the commonsense knowledge base, and no verification that the ASP models select the intended referent rather than any consistent model. This is load-bearing for the central claim because the success rate depends on the additional knowledge distinguishing the correct referent without introducing extraneous solutions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed feedback. We address the concerns about the abstract's claims and supporting evidence below, and we will revise the manuscript accordingly to strengthen the presentation of our results.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the method handles 240/291 WSC problems supplies no error analysis, no description of the source or construction procedure for the commonsense knowledge base, and no verification that the ASP models select the intended referent rather than any consistent model. This is load-bearing for the central claim because the success rate depends on the additional knowledge distinguishing the correct referent without introducing extraneous solutions.

Authors: We agree that the abstract is brief and omits these supporting details, which are important for evaluating the central empirical claim. The full manuscript presents the graph-based representation of WSC instances and the ASP encoding of graph-subgraph isomorphism, along with the elaboration-tolerant addition of commonsense constraints. However, we will revise to include: (1) an error analysis section categorizing solved and unsolved instances; (2) a description of the commonsense knowledge base, including its source in general commonsense principles (drawn from broad axioms rather than instance-specific reverse-engineering from gold labels) and the construction procedure; and (3) verification that the ASP stable models select the intended referent, by showing that the added constraints eliminate the incorrect candidate without permitting extraneous consistent models. These additions will be made in a revised version of the paper. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical count from ASP solver on hand-crafted encodings

full rationale

The paper presents an empirical result obtained by encoding WSC problems as graphs, adding commonsense knowledge, and running an off-the-shelf ASP solver to count successes (240/291). No equations, fitted parameters, predictions derived from inputs, or self-citations appear in the provided text. The central claim reduces to an implementation and evaluation against external WSC instances rather than any self-referential derivation or renaming. This matches the default expectation of a non-circular empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that WSC items admit a faithful graph representation and that externally supplied commonsense facts can be added as stable-model constraints without circularity or post-hoc tuning.

axioms (2)

domain assumption Graph-subgraph isomorphism can be encoded as an Answer Set Program whose stable models correspond to valid matches.
The paper states it builds on top of this standard ASP technique.
domain assumption Additional commonsense knowledge can be expressed as elaboration-tolerant constraints that do not alter the core graph encoding.
Explicitly claimed in the abstract as a benefit of the ASP approach.

pith-pipeline@v0.9.0 · 5623 in / 1325 out tokens · 21735 ms · 2026-05-24T16:11:18.021593+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By using an approach built on top of graph-subgraph isomorphism encoded using Answer Set Programming (ASP) we were able to handle 240 out of 291 WSC problems.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The ASP encoding allows us to add additional constraints in an elaboration tolerant manner.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.