SSP-based construction of evaluation-annotated data for fine-grained aspect-based sentiment analysis

Changhoe Hwang; Eric Laporte; Gwanghoon Yoo; Jeesun Nam; Shinwoo Kim; Suwon Choi

arxiv: 2605.07446 · v1 · submitted 2026-05-08 · 💻 cs.CL · cs.LG

SSP-based construction of evaluation-annotated data for fine-grained aspect-based sentiment analysis

Suwon Choi , Shinwoo Kim , Changhoe Hwang , Gwanghoon Yoo , Eric Laporte , Jeesun Nam This is my paper

Pith reviewed 2026-05-11 01:57 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords Korean corpusaspect-based sentiment analysissemi-automatic annotationfinite-state transducersaspect-value pairse-commerce reviewssentiment analysisKorean language processing

0 comments

The pith

A semi-automatic symbolic propagation method using finite-state transducers builds a Korean dataset that lets models recognize aspect-value pairs in e-commerce reviews at F1 0.88-0.90.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes the creation of the Evaluation Annotated Dataset (EVAD), a Korean corpus for aspect-based sentiment analysis in fashion e-commerce reviews. It extends standard ABSA by adding aspect values (classified as unary, binary, or multiple) alongside topics and aspects, and applies Semi-Automatic Symbolic Propagation with finite-state transducer resources to label the data. KoBERT and KcBERT models trained on EVAD then identify aspect-value pairs with F1 scores of 0.88 and 0.90. A sympathetic reader would care because the approach aims to extract finer opinion details from real user reviews that mix sentiment and non-sentiment language.

Core claim

The authors construct an evaluation-annotated Korean corpus called EVAD by formalizing linguistic resources as finite-state transducers and applying semi-automatic symbolic propagation to label extended ABSA components, including aspect values of unary, binary, or multiple types; pre-trained models trained on this corpus then achieve robust recognition of aspect-value pairs.

What carries the argument

Semi-Automatic Symbolic Propagation (SSP) using Finite-State Transducer (FST) resources to annotate detailed ABSA components including aspect values.

If this is right

ABSA extended with aspect values can distinguish unary, binary, and multiple value types to extract more precise features from user opinions.
The annotated EVAD corpus supports training of language models that reach F1 0.88-0.90 on aspect-value pair recognition in Korean fashion reviews.
The method covers both sentiment and non-sentiment linguistic patterns in e-commerce text.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The SSP approach might reduce the cost of creating labeled data for fine-grained ABSA in other domains if similar transducer resources can be developed.
High model performance on the new labels suggests the added aspect-value distinctions could improve real-world tasks such as targeted review summarization.
Releasing the EVAD corpus would let other researchers test whether the same SSP pipeline works on different languages or review types.

Load-bearing premise

The SSP process with finite-state transducers produces annotations accurate and consistent enough for model training without substantial manual validation or error checks.

What would settle it

A manual audit of a random sample of EVAD annotations that finds frequent mismatches in aspect-value pair labels would show the SSP annotations are not reliable enough to support the reported model performance.

read the original abstract

We report the construction of a Korean evaluation-annotated corpus, hereafter called 'Evaluation Annotated Dataset (EVAD)', and its use in Aspect-Based Sentiment Analysis (ABSA) extended in order to cover e-commerce reviews containing sentiment and non-sentiment linguistic patterns. The annotation process uses Semi-Automatic Symbolic Propagation (SSP). We built extensive linguistic resources formalized as a Finite-State Transducer (FST) to annotate corpora with detailed ABSA components in the fashion e-commerce domain. The ABSA approach is extended, in order to analyze user opinions more accurately and extract more detailed features of targets, by including aspect values in addition to topics and aspects, and by classifying aspectvalue pairs depending whether values are unary, binary, or multiple. For evaluation, the KoBERT and KcBERT models are trained on the annotated dataset, showing robust performances of F1 0.88 and F1 0.90, respectively, on recognition of aspect-value pairs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They built a new Korean fashion e-commerce ABSA corpus with aspect-value extensions via SSP/FST, but the F1 0.88/0.90 claims rest on unvalidated annotations.

read the letter

The main takeaway is that this paper ships a new Korean dataset (EVAD) for fine-grained ABSA on fashion reviews, extending the usual topic-aspect setup to include aspect values marked as unary, binary, or multiple. They use Semi-Automatic Symbolic Propagation with Finite-State Transducers to label the data at scale, then fine-tune KoBERT and KcBERT and report F1 scores of 0.88 and 0.90 on aspect-value pair recognition. That combination of language, domain, and value-type granularity looks new relative to the standard ABSA work they cite. The practical side is also useful: symbolic resources can cut down on pure manual labeling for a low-resource language like Korean in a narrow domain. They clearly put effort into building the FST rules for the extended components. The soft spot is exactly what the stress-test flags. The abstract gives no inter-annotator agreement, no sampled manual check on the propagated labels, and no error analysis. Without that, the F1 numbers are hard to read as evidence of model quality rather than possible annotation artifacts. There are also no baseline comparisons or details on train/test splits in the summary we have. This is a standard data-construction plus model run paper. It will mainly interest people working on Korean NLP, e-commerce sentiment tools, or anyone looking for examples of hybrid symbolic-neural annotation pipelines. A reader who needs a starting corpus for similar work could get value from the resource itself. The paper shows clear thinking about the task extension and honest use of existing BERT models, so it is coherent on its own terms. I would send it to peer review rather than desk reject, but the referees will need to press on the annotation validation before the performance numbers can be taken at face value.

Referee Report

2 major / 2 minor

Summary. The manuscript describes the creation of the Evaluation Annotated Dataset (EVAD) for Korean e-commerce reviews using Semi-Automatic Symbolic Propagation (SSP) with Finite-State Transducers (FST) to annotate fine-grained Aspect-Based Sentiment Analysis (ABSA) components, including topics, aspects, and aspect values (unary, binary, multiple). It then trains KoBERT and KcBERT models on this dataset, achieving F1 scores of 0.88 and 0.90 on aspect-value pair recognition.

Significance. If the SSP annotations prove accurate, this work contributes a scalable annotation method and a new Korean corpus for extended ABSA in the fashion e-commerce domain, enabling more nuanced extraction of aspect-value sentiments beyond standard topic-aspect pairs. The extensive FST resources for unary/binary/multiple value classification represent a concrete engineering contribution that could support reproducibility and further dataset expansion.

major comments (2)

[Abstract and Evaluation] Abstract and Evaluation section: The reported F1 scores of 0.88 (KoBERT) and 0.90 (KcBERT) on aspect-value pair recognition are presented without any details on train/test splits, number of annotated instances, inter-annotator agreement, error analysis of the propagated labels, or baseline comparisons, preventing assessment of whether the numbers reflect model performance or annotation artifacts.
[Annotation Process] Annotation Process section: The central claim that SSP with FST resources produces sufficiently accurate annotations for the extended ABSA components (including value classification) lacks any independent quantification such as sampled manual validation accuracy, IAA scores, or error rates on the propagated labels, which is load-bearing for interpreting the downstream F1 results.

minor comments (2)

[Abstract] The abstract uses the term 'robust performances' without defining the threshold or providing comparative context for the F1 scores.
[Introduction] Notation for the extended ABSA components (unary/binary/multiple values) would benefit from explicit examples or a table early in the paper to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and valuable feedback on our manuscript describing the EVAD corpus and SSP-based annotation for extended Korean ABSA. We will revise the paper to incorporate additional details on the experimental setup and annotation validation as outlined below.

read point-by-point responses

Referee: [Abstract and Evaluation] Abstract and Evaluation section: The reported F1 scores of 0.88 (KoBERT) and 0.90 (KcBERT) on aspect-value pair recognition are presented without any details on train/test splits, number of annotated instances, inter-annotator agreement, error analysis of the propagated labels, or baseline comparisons, preventing assessment of whether the numbers reflect model performance or annotation artifacts.

Authors: We agree that the current presentation of results in the abstract and Evaluation section is insufficient for full assessment. In the revised manuscript we will expand the Evaluation section (and update the abstract accordingly) to report the train/test split (80/20 on the full EVAD), the exact number of annotated instances, inter-annotator agreement for the manual seed annotations, a sampled error analysis of the FST-propagated labels, and performance of at least one baseline (e.g., a rule-based or simpler BERT variant). These additions will clarify that the reported F1 scores reflect model performance on validated data rather than annotation artifacts. revision: yes
Referee: [Annotation Process] Annotation Process section: The central claim that SSP with FST resources produces sufficiently accurate annotations for the extended ABSA components (including value classification) lacks any independent quantification such as sampled manual validation accuracy, IAA scores, or error rates on the propagated labels, which is load-bearing for interpreting the downstream F1 results.

Authors: We acknowledge that independent quantification of SSP accuracy is essential to support the downstream results. Although the FST resources were constructed to enforce linguistic consistency, the revised manuscript will add a dedicated validation subsection reporting (1) accuracy on a manually checked random sample of 500 propagated instances, (2) IAA scores computed on the initial manually annotated seed set, and (3) observed error rates broken down by unary/binary/multiple value categories. This will provide the necessary evidence that the annotations are sufficiently reliable for training and evaluating the KoBERT and KcBERT models. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical F1 results are independent of construction inputs

full rationale

The paper constructs the EVAD corpus via SSP/FST resources for extended ABSA components and then trains KoBERT/KcBERT models on the resulting annotations, reporting direct empirical F1 scores (0.88/0.90) on aspect-value pair recognition. These scores are measured outcomes from standard supervised training and evaluation splits on the built dataset, not reductions of any prediction to the annotation process by definition, fitted-parameter renaming, or self-citation load-bearing. No equations, uniqueness theorems, ansatzes, or derivation chains appear in the manuscript text. The work is self-contained as a resource-plus-evaluation paper whose central claims can be externally verified or falsified against the released corpus and standard benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on the reliability of the SSP annotation pipeline and the assumption that adding aspect values improves analysis accuracy; no free parameters are fitted in the reported results, and no new entities are postulated.

axioms (2)

domain assumption Finite-state transducers can capture the relevant linguistic patterns for detailed ABSA annotation in fashion reviews
Invoked when building the extensive linguistic resources for the SSP process.
domain assumption The extended ABSA scheme with aspect values yields more accurate user opinion analysis than standard topic-aspect models
Stated as the motivation for the extension without comparative evidence in the abstract.

pith-pipeline@v0.9.0 · 5481 in / 1455 out tokens · 32103 ms · 2026-05-11T01:57:52.495294+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

In Proceedings of COLING

Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions. In Proceedings of COLING. Maurice Gross. 1989. La construction de dictionnaires électroniques. Annales des Télécommunications , 44(1-2), 4-19. Maurice Gross. 1997. The Construction of local grammars. Finite-State language processing, Roche & Schabes (eds.), the MIT...

work page 1989
[2]

Language & Information Society, 44, 89-125

Construction of Language Resources for Augmenting Intent-annotated Datasets Required for Training Chatbot NLU Models . Language & Information Society, 44, 89-125. Qingnan Jiang, Lei Chen, Ruifeng Xu, Xiang Ao, and Min Yang. 2019. A Challenge Dataset and Effective Models for Aspect -Based Sentiment Analysis. In Proceedings of EMNLP . Bing Liu. 2012. Sentim...

work page 2019
[3]

In Proceedings of CoRR

YASO: A New Benchmark for Targeted Sentiment Analysis. In Proceedings of CoRR. Sébastien Paumier. 2003. De la reconnaissance des formes linguistiques à l'analyse syntaxique. Ph.D. thesis, Université Paris-Est Marne-la-Vallée, France. Haiyun Peng, Lu Xu, Lidong Bing, Fei Huang, Wei Lu, and Luo Si. 2020. Knowing What, How and Why: A Near Complete Solution f...

work page 2003

[1] [1]

In Proceedings of COLING

Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions. In Proceedings of COLING. Maurice Gross. 1989. La construction de dictionnaires électroniques. Annales des Télécommunications , 44(1-2), 4-19. Maurice Gross. 1997. The Construction of local grammars. Finite-State language processing, Roche & Schabes (eds.), the MIT...

work page 1989

[2] [2]

Language & Information Society, 44, 89-125

Construction of Language Resources for Augmenting Intent-annotated Datasets Required for Training Chatbot NLU Models . Language & Information Society, 44, 89-125. Qingnan Jiang, Lei Chen, Ruifeng Xu, Xiang Ao, and Min Yang. 2019. A Challenge Dataset and Effective Models for Aspect -Based Sentiment Analysis. In Proceedings of EMNLP . Bing Liu. 2012. Sentim...

work page 2019

[3] [3]

In Proceedings of CoRR

YASO: A New Benchmark for Targeted Sentiment Analysis. In Proceedings of CoRR. Sébastien Paumier. 2003. De la reconnaissance des formes linguistiques à l'analyse syntaxique. Ph.D. thesis, Université Paris-Est Marne-la-Vallée, France. Haiyun Peng, Lu Xu, Lidong Bing, Fei Huang, Wei Lu, and Luo Si. 2020. Knowing What, How and Why: A Near Complete Solution f...

work page 2003