Probing the Natural Language Inference Task with Automated Reasoning Tools
š” Research Summary
The paper investigates whether a logicābased pipeline can be used to solve the Natural Language Inference (NLI) task, focusing on the Stanford Natural Language Inference (SNLI) corpus. The authors employ Attempto Controlled English (ACE), a machineāoriented controlled natural language that can be translated directly into firstāorder logic (FOL) via the Attempto Parsing Engine (APE). Their workflow is as follows: (1) each premiseāhypothesis pair from SNLI is fed to APE; if parsing fails, a set of eight syntactic rewrite rules (R1āR8) is applied to transform the sentences into ACEācompatible form, after which parsing is attempted again. (2) Successfully parsed sentences are converted into TPTPāformat FOL formulas (P for premise, H for hypothesis) and given to a firstāorder resolution prover. The prover checks whether H or ¬H can be derived from P within a clause limit of 1500. (3) If the prover yields no decision, two groups of semantic augmentation rules are applied. The first group (S1) adds hypernym implications for nouns using WordNet, inserting formulas of the form āxāÆnā(x)ānā(x) into an auxiliary set A, and adds negative equivalences āxāÆnā(x)ā¬nā(x) into a set N when no hypernym relation exists. The second group (S2) does the same for verbs, creating predicateālevel implication or negation formulas. The augmented sets are then fed back to the prover, and if still no conclusion is reached the pair is labeled āneutralā.
The syntactic rules address several recurring obstacles: (R1) noun phrase adjective chaining, (R2) coreference resolution (replacing pronouns with unique properānoun identifiers prefixed by āp:ā), (R3) conversion of pastātense verbs to present, (R4) spelling out cardinal and ordinal numbers, (R5) removal of predeterminers, (R6) swapping adverbial phrases that precede verbs, (R7) normalizing ābutā/āyetā conjunctions to āandā, and (R8) simplifying present/past continuous constructions. These transformations preserve the original meaning as much as possible while making the sentences acceptable to ACE.
Empirically, the authors evaluate on a 10āÆ000āpair development subset of SNLI. Without any rewrite rules, APE parses only 7.06āÆ% of sentences; after applying the syntactic rules, coverage rises to 16.61āÆ%. For the subset that is successfully parsed, the proverās entailment predictions are perfectly accurate (100āÆ% precision), but overall classification accuracy remains low at roughly 28.7āÆ%, far below current deepālearning baselines and even a random guess. The low accuracy is largely due to the fact that ACEāgenerated formulas are predominantly positive descriptions and rarely contain negation, making contradiction detection difficult. Adding the semantic hypernym rules modestly improves contradiction prediction to 37.5āÆ% accuracy, still only marginally better than chance. The authors also discover several mislabeled examples in SNLI, illustrating that a logicābased approach can expose annotation errors that are invisible to statistical models.
In conclusion, the study demonstrates that a logicācentric NLI system can dramatically increase the proportion of sentences that can be formally reasoned about through relatively simple syntactic rewrites, but the overall performance is still far from stateāofātheāart neural models. The authors suggest future work on refining rewrite rules, incorporating richer semantic resources (e.g., frame semantics, predicateāargument structures), and hybridizing with naturalālogic systems such as LangPro, Monolog, or NaturalLI. They also propose using the presented pipeline as a diagnostic tool for dataset quality and as a baseline for subsequent logicābased NLI research.
Comments & Academic Discussion
Loading comments...
Leave a Comment