pith. sign in

arxiv: 1907.04744 · v1 · pith:2INM4VYAnew · submitted 2019-07-10 · 💻 cs.CL

Modeling Semantic Compositionality with Sememe Knowledge

Pith reviewed 2026-05-24 23:52 UTC · model grok-4.3

classification 💻 cs.CL
keywords semantic compositionalitysememesHowNetmultiword expressionsword representationsknowledge integrationnatural language processing
0
0 comments X

The pith

Incorporating sememe knowledge from HowNet significantly improves models of semantic compositionality for multiword expressions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that sememes, the smallest semantic units of language, can be used to strengthen models of how word meanings combine into larger phrases. Prior approaches to semantic compositionality have emphasized complex mathematical functions while rarely drawing on external knowledge sources. The authors run a confirmatory test and then integrate sememe information drawn from the HowNet knowledge base into existing composition models. They apply the resulting models to the task of building representations for multiword expressions and report clear gains over baselines that ignore sememe knowledge. Intrinsic and extrinsic evaluations plus case studies support the benefit of this addition.

Core claim

We verify the effectiveness of sememes in modeling semantic compositionality by a confirmatory experiment and make the first attempt to incorporate sememe knowledge from HowNet into SC models. When these models are used to learn representations of multiword expressions, they achieve significant performance improvements over baseline methods that do not consider sememe knowledge, as measured by both intrinsic and extrinsic evaluations.

What carries the argument

Sememe knowledge drawn from HowNet and inserted into semantic compositionality functions for enriching multiword expression representations.

If this is right

  • Sememe knowledge produces measurable gains on both intrinsic and extrinsic evaluations of multiword expression representations.
  • Quantitative analysis and case studies confirm that sememe information aids semantic compositionality modeling.
  • The approach demonstrates that external sememe knowledge can be added to composition models without task-specific retuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sememe integration technique could be tested on other compositionality problems such as full sentence embeddings.
  • If HowNet sememes prove useful, comparable gains might appear when other structured knowledge bases are substituted.
  • Sememe signals could help models distinguish compositional from non-compositional phrases such as idioms.

Load-bearing premise

Sememe annotations in HowNet are accurate, complete, and directly relevant to the compositionality task so that adding them improves performance without introducing noise.

What would settle it

An experiment that applies the same models and datasets but replaces HowNet sememe labels with random labels and finds no performance difference or a reversal of the reported gains.

read the original abstract

Semantic compositionality (SC) refers to the phenomenon that the meaning of a complex linguistic unit can be composed of the meanings of its constituents. Most related works focus on using complicated compositionality functions to model SC while few works consider external knowledge in models. In this paper, we verify the effectiveness of sememes, the minimum semantic units of human languages, in modeling SC by a confirmatory experiment. Furthermore, we make the first attempt to incorporate sememe knowledge into SC models, and employ the sememeincorporated models in learning representations of multiword expressions, a typical task of SC. In experiments, we implement our models by incorporating knowledge from a famous sememe knowledge base HowNet and perform both intrinsic and extrinsic evaluations. Experimental results show that our models achieve significant performance boost as compared to the baseline methods without considering sememe knowledge. We further conduct quantitative analysis and case studies to demonstrate the effectiveness of applying sememe knowledge in modeling SC. All the code and data of this paper can be obtained on https://github.com/thunlp/Sememe-SC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes incorporating sememe knowledge from the HowNet knowledge base into semantic compositionality (SC) models, with a focus on learning representations for multiword expressions. It conducts a confirmatory experiment on sememes' effectiveness and implements sememe-augmented models, reporting performance gains over baselines in both intrinsic and extrinsic evaluations. Code and data are released on GitHub.

Significance. If the reported gains prove robust, the work demonstrates the value of external sememe knowledge for SC tasks and provides a reproducible baseline via the public code release, which supports verification and extension in knowledge-augmented NLP models.

major comments (1)
  1. [§4] §4 (experimental results): the central claim of a 'significant performance boost' lacks any reported statistical significance tests, error bars, exact baseline implementations, or data-split details, which directly undermines assessment of whether the gains are reliable and load-bearing for the paper's main contribution.
minor comments (2)
  1. [§3] The description of how sememes are integrated into the composition functions (likely §3) could include a clearer formal definition or pseudocode to aid replication.
  2. Table captions and axis labels in the results figures should explicitly state the evaluation metrics and number of runs for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback. We address the single major comment below and will revise the manuscript accordingly to strengthen the experimental reporting.

read point-by-point responses
  1. Referee: [§4] §4 (experimental results): the central claim of a 'significant performance boost' lacks any reported statistical significance tests, error bars, exact baseline implementations, or data-split details, which directly undermines assessment of whether the gains are reliable and load-bearing for the paper's main contribution.

    Authors: We agree that the absence of statistical significance tests, error bars, and more explicit details on baselines and splits weakens the presentation of results. In the revised version we will add paired t-tests (or equivalent) with p-values for all reported improvements, include error bars on performance figures where feasible, and expand §4 with precise descriptions of baseline re-implementations and the train/dev/test splits used. The public GitHub release already contains the code and data, but we will make the experimental protocol clearer in the text itself. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical modeling approach that incorporates external sememe annotations from HowNet into compositionality functions for multiword expressions, with performance gains shown via intrinsic and extrinsic evaluations against baselines. No derivation chain, uniqueness theorem, or prediction reduces by construction to self-defined quantities, fitted inputs, or self-citation load-bearing premises; results depend on external knowledge base and reported comparisons.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach depends on the external HowNet sememe knowledge base as an unexamined input and on standard neural composition functions whose hyperparameters are not detailed in the abstract. No new entities are postulated.

free parameters (1)
  • sememe integration hyperparameters
    Weights or gating parameters controlling how sememe vectors are combined with word vectors in the composition function; these are fitted during training.
axioms (1)
  • domain assumption Sememes are the minimum semantic units of human languages and HowNet provides a reliable mapping from words to sememes.
    Invoked in the abstract as the justification for using sememe knowledge to model SC.

pith-pipeline@v0.9.0 · 5726 in / 1193 out tokens · 19478 ms · 2026-05-24T23:52:43.647560+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.