Event extraction based on open information extraction and ontology

Sihem Sahnoun

arxiv: 1907.00692 · v1 · pith:Q7HRI463new · submitted 2019-06-24 · 💻 cs.CL · cs.AI

Event extraction based on open information extraction and ontology

Sihem Sahnoun This is my paper

Pith reviewed 2026-05-25 17:22 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords event extractionopen information extractionontologyrelation extractionentity recognitionnatural language processingadaptation techniques

0 comments

The pith

An approach using open information extraction and ontology automates event extraction while reducing expert intervention in relations, entities, and reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes extracting events from natural language texts by first applying an open information extraction system to pull out relationships and then using an ontology to model the events. It observes that existing two-level methods achieve good results but require extensive manual work to build classifiers. The proposed method makes relation extraction, entity recognition, and reasoning automatic through adaptation and correspondence techniques, thereby cutting down on expert involvement. The relevance of the results is checked through standard test metrics and a comparative study.

Core claim

The two-level event extraction approach has shown good performance results but requires a lot of expert intervention in the construction of classifiers. In this context an approach is proposed that reduces the expert intervention in the relation extraction, the recognition of entities and the reasoning which are automatic and based on techniques of adaptation and correspondence.

What carries the argument

Open information extraction system for relationship extraction combined with ontology-based event modeling, automated via adaptation and correspondence techniques.

If this is right

Event extraction becomes feasible for larger text collections without the time cost of building custom classifiers by hand.
The extracted events can be directly evaluated and compared using standard test metrics and cross-method studies.
The same automation pattern could apply to other information extraction subtasks that currently rely on expert classifier design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may scale to domains where labeled training data or domain experts are scarce.
Combining the output events with downstream reasoning systems could become simpler once the extraction step is less labor-intensive.
Testing the adaptation techniques on languages other than the one used in the thesis experiments would reveal whether the automation generalizes.

Load-bearing premise

Adaptation and correspondence techniques can reliably automate relation extraction, entity recognition, and reasoning while preserving accuracy without introducing new unstated dependencies on expert knowledge.

What would settle it

An experiment that applies the automated approach and the expert-intensive two-level method to the same test corpus and finds substantially lower precision or recall for the automated version would falsify the claim of maintained performance with reduced intervention.

read the original abstract

The work presented in this master thesis consists of extracting a set of events from texts written in natural language. For this purpose, we have based ourselves on the basic notions of the information extraction as well as the open information extraction. First, we applied an open information extraction(OIE) system for the relationship extraction, to highlight the importance of OIEs in event extraction, and we used the ontology to the event modeling. We tested the results of our approach with test metrics. As a result, the two-level event extraction approach has shown good performance results but requires a lot of expert intervention in the construction of classifiers and this will take time. In this context we have proposed an approach that reduces the expert intervention in the relation extraction, the recognition of entities and the reasoning which are automatic and based on techniques of adaptation and correspondence. Finally, to prove the relevance of the extracted results, we conducted a set of experiments using different test metrics as well as a comparative study.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Master's thesis applies OIE plus ontology to event extraction and claims adaptation techniques cut expert work, but supplies no metrics or method details to back the automation or performance assertions.

read the letter

The main takeaway is that this master's thesis takes standard open information extraction for relations, layers on ontology modeling for events, and asserts that adaptation and correspondence techniques can automate relation extraction, entity recognition, and reasoning while cutting expert intervention. The abstract frames the initial two-level pipeline as workable but time-consuming for classifier construction, then positions the new steps as automatic. That is the core pitch, and it is presented as an incremental practical improvement rather than a new framework.

Referee Report

3 major / 1 minor

Summary. The manuscript presents a master thesis on extracting events from natural language texts. It first applies open information extraction (OIE) for relation extraction combined with ontology-based event modeling in a two-level approach, reporting good performance alongside high expert intervention for classifier construction. The authors then propose an alternative method that automates relation extraction, entity recognition, and reasoning via unspecified 'techniques of adaptation and correspondence' to reduce expert intervention, with relevance shown via test metrics and a comparative study.

Significance. If the automation techniques were shown to reduce expert intervention without introducing hidden dependencies or accuracy loss, the work could aid practical deployment of event extraction pipelines. The OIE-ontology integration is a reasonable starting point, but the absence of any numerical results, baseline comparisons, or definitions of the key techniques means the central performance and reduction claims currently lack grounding. No reproducible code, parameter-free derivations, or falsifiable predictions are present.

major comments (3)

[Abstract] Abstract: the assertion that the two-level approach 'has shown good performance results' supplies no metrics, baselines, error analysis, or implementation details, leaving the performance claim unsupported by visible evidence.
[Abstract] Abstract: the claim that the new approach reduces expert intervention because relation extraction, entity recognition, and reasoning 'are automatic and based on techniques of adaptation and correspondence' provides neither definitions nor pseudocode for these techniques, nor any quantification (e.g., rules authored or hours spent) of intervention before versus after, so the reduction cannot be assessed.
[Abstract] Abstract / Experiments: the comparative study and 'different test metrics' are mentioned but no tables, numerical values, or specific metrics (precision, recall, F1, etc.) appear, preventing evaluation of whether the automation preserves accuracy.

minor comments (1)

[Abstract] Abstract contains minor grammatical awkwardness (e.g., 'we have based ourselves on the basic notions') that could be tightened for clarity.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the detailed and constructive comments. We agree that the abstract requires more specific evidence to support its claims and will revise accordingly to improve clarity and grounding.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that the two-level approach 'has shown good performance results' supplies no metrics, baselines, error analysis, or implementation details, leaving the performance claim unsupported by visible evidence.

Authors: We agree the abstract lacks supporting numbers. The thesis performed evaluation with standard metrics on a held-out test set; we will revise the abstract to report the specific precision, recall, and F1 values along with the baseline systems used. revision: yes
Referee: [Abstract] Abstract: the claim that the new approach reduces expert intervention because relation extraction, entity recognition, and reasoning 'are automatic and based on techniques of adaptation and correspondence' provides neither definitions nor pseudocode for these techniques, nor any quantification (e.g., rules authored or hours spent) of intervention before versus after, so the reduction cannot be assessed.

Authors: We will add concise definitions of the adaptation and correspondence techniques plus a short pseudocode sketch to the revised abstract. However, the original work did not record quantitative measures of expert effort (hours or number of rules), so we cannot supply before/after numbers; the reduction claim will be presented as qualitative. revision: partial
Referee: [Abstract] Abstract / Experiments: the comparative study and 'different test metrics' are mentioned but no tables, numerical values, or specific metrics (precision, recall, F1, etc.) appear, preventing evaluation of whether the automation preserves accuracy.

Authors: We agree that tables and concrete metric values are absent. We will insert a results table reporting the test metrics and the comparative study outcomes in the revised manuscript so that accuracy preservation can be directly evaluated. revision: yes

standing simulated objections not resolved

Quantification of expert intervention reduction (hours spent or rules authored before versus after automation)

Circularity Check

0 steps flagged

Empirical NLP proposal with no mathematical derivations or self-referential reductions

full rationale

The supplied text (abstract plus description) contains no equations, no fitted parameters presented as predictions, and no self-citations or uniqueness theorems. The central claim is an empirical assertion that adaptation and correspondence techniques automate relation extraction, entity recognition, and reasoning while reducing expert intervention; this is presented as validated by test metrics and comparative experiments rather than derived by construction from its own inputs. No load-bearing step reduces to a prior result by the same authors or to a renamed known pattern. The work is therefore self-contained against external benchmarks for the purpose of circularity analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract; the work relies on standard NLP components whose details are not provided.

pith-pipeline@v0.9.0 · 5686 in / 1061 out tokens · 20395 ms · 2026-05-25T17:22:39.896557+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The recognition phase combines four steps: OIE for relation extraction, NER, ontology input adaptation and reasoning... tokens will be added as instances to the ontology and linked by relations if... number of named entities ≥2 and lemmatized verb included in ontology relations.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our approach admits 2 phases... learning phase is the modeling of an event by an ontology and constructing a set of rules... recognition phase which includes the RE, the NER and an automatic reasoning between learning rules and an input ontology adaptation.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.