pith. sign in

arxiv: 2604.17134 · v2 · pith:XWO4ZVOJnew · submitted 2026-04-18 · 💻 cs.CL

RoIt-XMASA: Multi-Domain Multilingual Sentiment Analysis Dataset for Romanian and Italian

Pith reviewed 2026-05-25 06:34 UTC · model grok-4.3

classification 💻 cs.CL
keywords multilingual sentiment analysiscross-domainadversarial trainingRomanianItaliandatasetXLM-Rmeta-learning
0
0 comments X

The pith

A new dataset and meta-learned adversarial method improve cross-lingual sentiment analysis for Romanian and Italian.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents RoIt-XMASA, a dataset of 36,000 labeled reviews in Romanian and Italian across books, movies, and music domains, plus over 200,000 unlabeled samples. It introduces a multi-target adversarial training approach that uses loss reversal guided by meta-learned coefficients to help models focus on sentiment while becoming invariant to language and domain shifts. With this method, the XLM-R model reaches an F1-score of 66.23 percent, which is 4.64 percent higher than the standard baseline. The authors also evaluate few-shot performance of Llama-3.1-8B at 58.43 percent F1, highlighting differences between fine-tuning and prompting. Readers interested in practical multilingual NLP applications would care because cross-domain and cross-language sentiment detection is a persistent challenge that limits many real-world systems.

Core claim

The central claim is that the RoIt-XMASA dataset, extending prior Amazon sentiment data to Italian and Romanian, combined with a multi-target adversarial training framework using loss reversal and meta-learned coefficients, enables XLM-R to achieve 66.23% F1-score on the task of cross-lingual and cross-domain sentiment analysis, surpassing the baseline by 4.64%.

What carries the argument

Multi-target adversarial training framework with loss reversal and meta-learned coefficients for balancing sentiment discrimination against domain and language invariance.

If this is right

  • The proposed framework dynamically balances multiple objectives without manual coefficient tuning.
  • XLM-R with the approach outperforms baseline training on the new dataset.
  • Few-shot prompting with Llama-3.1-8B provides an alternative at lower performance of 58.43% F1.
  • The dataset covers three domains and two languages to test invariance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the meta-learning balances invariance well, similar techniques could apply to other multi-objective NLP tasks like translation or classification across many languages.
  • With 202,141 unlabeled samples, the dataset could support semi-supervised methods beyond the supervised results shown.
  • The performance difference suggests that task-specific fine-tuning remains superior for accuracy in this setting compared to prompting.

Load-bearing premise

The meta-learned coefficients successfully balance sentiment discrimination against domain and language invariance on held-out data without instability or overfitting to the particular training distribution.

What would settle it

A test where the model is evaluated on entirely new domains or languages not seen during meta-learning, checking if the F1 improvement holds or if training becomes unstable due to the coefficients.

Figures

Figures reproduced from arXiv: 2604.17134 by Andrei-Marius Avram, Aureliu Valentin Antonie, Cosmin-Mircea Croitoru, Dumitru-Clementin Cercel, Vlad Andrei Muntean.

Figure 1
Figure 1. Figure 1: Rating distribution across the RoIt-XMASA dataset. Left: overall rating distribution; Center: rating distribution by [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Token distribution histograms for the text and title fields. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Token distribution of the text in RoIt-XMASA, grouped by language. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Token distribution of the title in RoIt-XMASA, grouped by language. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Token distribution of the text in RoIt-XMASA, grouped by domain. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Token distribution of the title in RoIt-XMASA, grouped by domain. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
read the original abstract

We present RoIt-XMASA, a multilingual dataset that extends the Cross-lingual Multi-domain Amazon Sentiment Analysis to Italian and Romanian, comprising 36,000 labeled reviews across three domains (books, movies, and music) and 202,141 unlabeled samples. To address cross-lingual and cross-domain challenges, we propose a multi-target adversarial training framework that employs loss reversal with meta-learned coefficients to dynamically balance sentiment discrimination with domain and language invariance. XLM-R achieves an F1-score of 66.23% with our approach, outperforming the baseline by 4.64%. Few-shot evaluation shows that Llama-3.1-8B achieves 58.43% F1-score, revealing a meaningful trade-off between the efficiency of prompting-based approaches and the higher performance of task-specific fine-tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces RoIt-XMASA, a multilingual multi-domain sentiment analysis dataset extending prior cross-lingual Amazon review resources to Italian and Romanian. It contains 36,000 labeled reviews across books, movies, and music plus 202,141 unlabeled samples. The authors propose a multi-target adversarial training framework that applies loss reversal with meta-learned coefficients to balance sentiment discrimination against domain and language invariance. They report that XLM-R fine-tuned under this framework reaches 66.23% F1, outperforming a baseline by 4.64 points, while few-shot prompting with Llama-3.1-8B achieves 58.43% F1.

Significance. If the meta-optimization procedure is shown to generalize via proper held-out validation, the dataset would constitute a useful public resource for cross-lingual and cross-domain sentiment analysis, and the framework would demonstrate a concrete method for trading off multiple invariance objectives. The explicit efficiency-performance comparison between fine-tuning and prompting is also of practical value. The contribution is primarily empirical and resource-oriented rather than theoretical.

major comments (2)
  1. [§4] §4 (Proposed Framework): The description of the meta-optimization loop for the loss-reversal coefficients does not specify whether a separate meta-validation set is maintained that respects the three-domain / two-language partition. Without such separation, the reported 4.64-point gain over baseline cannot be confidently attributed to stable trade-offs rather than memorization of the training distribution.
  2. [§5] §5 (Experiments) and associated result tables: The headline XLM-R F1 of 66.23% and the baseline comparison are presented without error bars, statistical significance tests, explicit data-split definitions, or baseline implementation details. These omissions make the central performance claim impossible to verify or reproduce from the reported information.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'with our approach' is used without a one-sentence characterization of the meta-learned loss-reversal mechanism, which would improve readability for a broad audience.
  2. [§3] Dataset description: The split between labeled and unlabeled portions across languages and domains is stated in aggregate; a per-language/domain breakdown table would clarify coverage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of clarity and reproducibility. We address each major comment below and indicate the revisions we will incorporate.

read point-by-point responses
  1. Referee: [§4] §4 (Proposed Framework): The description of the meta-optimization loop for the loss-reversal coefficients does not specify whether a separate meta-validation set is maintained that respects the three-domain / two-language partition. Without such separation, the reported 4.64-point gain over baseline cannot be confidently attributed to stable trade-offs rather than memorization of the training distribution.

    Authors: We agree that §4 does not explicitly describe the meta-validation procedure. In the revised manuscript we will expand this section to state that a held-out meta-validation set was maintained throughout meta-optimization, constructed by reserving entire domain-language combinations (one domain per language) so that the three-domain / two-language structure is preserved. This separation ensures the learned coefficients reflect stable trade-offs rather than training-distribution memorization, directly addressing the concern about attribution of the 4.64-point improvement. revision: yes

  2. Referee: [§5] §5 (Experiments) and associated result tables: The headline XLM-R F1 of 66.23% and the baseline comparison are presented without error bars, statistical significance tests, explicit data-split definitions, or baseline implementation details. These omissions make the central performance claim impossible to verify or reproduce from the reported information.

    Authors: We acknowledge these omissions limit verifiability. In the revised version we will (i) report mean and standard deviation F1 scores over five random seeds, (ii) add paired statistical significance tests against the baseline, (iii) provide explicit train/validation/test split definitions that document the domain-language partitioning, and (iv) include additional baseline implementation details (hyperparameters, optimizer settings, and a pointer to the released code). These additions will make the central claims reproducible. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results on new dataset are independent of inputs

full rationale

The paper creates a new labeled dataset (RoIt-XMASA) and reports XLM-R F1 of 66.23% under a proposed multi-target adversarial framework using meta-learned loss coefficients. This is a standard empirical evaluation against a stated baseline on held-out reviews; the performance number is not shown to reduce by construction to any fitted parameter, self-citation, or renamed input. No equations, uniqueness theorems, or ansatzes are invoked that collapse the central claim into the training distribution itself. The meta-learning step is an explicit component of the method rather than a hidden tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions of adversarial domain adaptation in NLP and on the representativeness of the newly constructed labeled reviews.

free parameters (1)
  • meta-learned coefficients
    Coefficients learned on the fly to weight the competing loss terms; their values are not fixed in advance.
axioms (1)
  • domain assumption Adversarial training with loss reversal can produce representations that are invariant to language and domain while remaining discriminative for sentiment.
    Invoked to justify the multi-target framework.

pith-pipeline@v0.9.0 · 5696 in / 1227 out tokens · 74748 ms · 2026-05-25T06:34:23.023672+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

  1. [1]

    InProceedings of the 31st annual ACM symposium on applied computing, 1140–1145

    An evaluation of machine translation for multilingual sentence-level sentiment analysis. InProceedings of the 31st annual ACM symposium on applied computing, 1140–1145. Augustyniak, L.; Wo´zniak, S.; Gruza, M.; Gramacki, P.; Ra- jda, K.; Morzy, M.; and Kajdanowicz, T. 2023. Massively multilingual corpus of sentiment datasets and multi-faceted sentiment cl...

  2. [2]

    InProceedings of the 2023 conference on empirical methods in natural lan- guage processing, 11265–11279

    UDAPDR: unsupervised domain adaptation via LLM prompting and distillation of rerankers. InProceedings of the 2023 conference on empirical methods in natural lan- guage processing, 11265–11279. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. At- tention is all you need.Advances in neural in...

  3. [3]

    For most authors... (a) Would answering this research question advance sci- ence without violating social contracts, such as violat- ing privacy norms, perpetuating unfair profiling, exac- erbating the socio-economic divide, or implying disre- spect to societies or cultures? Yes, this work introduces a multilingual dataset to improve sentiment analy- sis ...

  4. [4]

    Additionally, if your study involves hypotheses testing... (a) Did you clearly state the assumptions underlying all theoretical results? N/A (b) Have you provided justifications for all theoretical re- sults? N/A (c) Did you discuss competing hypotheses or theories that might challenge or complement your theoretical re- sults? N/A (d) Have you considered ...

  5. [5]

    (a) Did you state the full set of assumptions of all theoret- ical results? N/A (b) Did you include complete proofs of all theoretical re- sults? N/A

    Additionally, if you are including theoretical proofs... (a) Did you state the full set of assumptions of all theoret- ical results? N/A (b) Did you include complete proofs of all theoretical re- sults? N/A

  6. [6]

    Additionally, if you ran machine learning experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (ei- ther in the supplemental material or as a URL)? Yes, the dataset is released on HuggingFace (see Footnote 1). (b) Did you specify all the training details (e.g., data splits, hyperparameters, ...

  7. [7]

    (a) If your work uses existing assets, did you cite the cre- ators? Yes, we cite the original XMASA dataset cre- ators and the developers of models like XLM-R and Llama-3.1

    Additionally, if you are using existing assets (e.g., code, data, models) or curating/releasing new assets,without compromising anonymity... (a) If your work uses existing assets, did you cite the cre- ators? Yes, we cite the original XMASA dataset cre- ators and the developers of models like XLM-R and Llama-3.1. (b) Did you mention the license of the ass...

  8. [8]

    Additionally, if you used crowdsourcing or conducted research with human subjects,without compromising anonymity... (a) Did you include the full text of instructions given to participants and screenshots? N/A (b) Did you describe any potential participant risks, with mentions of Institutional Review Board (IRB) ap- provals? N/A (c) Did you include the est...

  9. [9]

    This includes ellipses (”......”→”...”) and mixed punctuation patterns

    while preserving essential, semantic, and syntactic in- formation for sentiment analysis: 1.Punctuation normalization: Multiple consecutive punctuation marks were reduced to a maximum of three instances (e.g., ”!!!!!”→”!!!”), preserving emphasis while preventing excessive repetition. This includes ellipses (”......”→”...”) and mixed punctuation patterns. ...