User eXperience Perception Insights Dataset (UXPID): Synthetic User Feedback from Public Industrial Forums

Choro Ulan uulu; Fabian Ries; Filippos Petridis; Helena Holmstr\"om Olsson; Jan Bosch; Jan Joosten; Mikhail Kulyabin; Nuno Miguel Martins Pacheco

arxiv: 2509.11777 · v2 · submitted 2025-09-15 · 💻 cs.CL · cs.LG

User eXperience Perception Insights Dataset (UXPID): Synthetic User Feedback from Public Industrial Forums

Mikhail Kulyabin , Jan Joosten , Choro Ulan uulu , Nuno Miguel Martins Pacheco , Fabian Ries , Filippos Petridis , Jan Bosch , Helena Holmstr\"om Olsson This is my paper

Pith reviewed 2026-05-18 17:13 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords user experience datasetindustrial forum feedbackLLM annotationssentiment analysisrequirements extractionanonymized datasetUX insightsNLP for software engineering

0 comments

The pith

The UXPID dataset supplies 7130 synthesized user feedback branches from industrial forums, each annotated by LLM for UX insights, expectations, severity, sentiment and topics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces the User eXperience Perception Insights Dataset to tackle the difficulty of systematic analysis of customer feedback in industrial forums. Real forum content tends to be unstructured and domain-specific, while privacy and licensing rules block easy access to the original records. The authors extract 7130 feedback branches, anonymize and synthesize them, then attach LLM-generated labels covering UX insights, user expectations, severity ratings, sentiment, and topic classifications. The resulting JSON collection is positioned as training and evaluation material for transformer models that perform issue detection, sentiment analysis, and requirements extraction. A sympathetic reader would see value in a ready-made, shareable resource that lets researchers work on industrial UX problems without needing proprietary data.

Core claim

The paper presents UXPID as a collection of 7130 synthesized and anonymized user feedback branches extracted from a public industrial automation forum, each stored as a JSON record containing multi-post comments together with metadata and LLM annotations for UX insights, user expectations, severity ratings, sentiment, and topic classifications, thereby enabling research in user requirements, UX analysis, and AI-driven feedback processing where privacy and licensing restrictions limit access to real-world data.

What carries the argument

The UXPID dataset itself: a set of structured JSON records of multi-post forum comments enriched with LLM annotations across UX-related attributes.

If this is right

The dataset can be used directly to train and evaluate transformer models on issue detection and requirements extraction in technical forums.
It supplies labeled examples for sentiment analysis tasks specific to industrial product support discussions.
Researchers gain a public benchmark for studying how users articulate expectations and problems in automation contexts.
The resource lowers the barrier to developing AI tools that process forum feedback while respecting privacy constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the annotations hold up under scrutiny, similar LLM-assisted synthesis pipelines could be reused on forums from other technical fields.
The dataset could serve as seed data for training smaller models that then label much larger volumes of unlabeled forum posts.
Aggregated patterns from the severity and topic labels might inform product teams about recurring user pain points without reading every thread.

Load-bearing premise

The LLM-generated annotations for UX insights, severity, sentiment, and topics are accurate and unbiased enough to function as reliable training labels without systematic errors from the model or the synthesis process.

What would settle it

Independent human experts rating a random sample of the records and finding low agreement with the LLM labels on severity ratings or topic classifications would show that the annotations cannot be trusted as ground truth.

Figures

Figures reproduced from arXiv: 2509.11777 by Choro Ulan uulu, Fabian Ries, Filippos Petridis, Helena Holmstr\"om Olsson, Jan Bosch, Jan Joosten, Mikhail Kulyabin, Nuno Miguel Martins Pacheco.

**Figure 1.** Figure 1: illustrates the general process for dataset creation. User comments were collected from the company open public technical forum. Endpoints enable the structured storage of metadata, including user id, date, and time of posting, titles, and the content of the comments themselves. For internal processing and analysis, the data are stored in JavaScript object notation (JSON) format [PITH_FULL_IMAGE:figures/f… view at source ↗

**Figure 2.** Figure 2: Overview of topic classification process In parallel to this process, the data was anonymized by using the LLM to preserve privacy. In the system prompt it was asked to change company names with "[company_name]", product names with "[product_name]", article numbers with "[article_no]", version numbers with "[version_no]", user names with "[user_name]", URLs with "[url]", and document names with "[document]… view at source ↗

**Figure 3.** Figure 3: Distribution of the branches by year (a), comments (b), severity (c), sentiment (d), type (e), and topic status (f). For all experiments, we utilized the distilbert-base-uncased configuration with a maximum sequence length of 512 tokens. The model architecture included 6 hidden layers with 768 dimensions each and a dropout rate of 0.4 to prevent overfitting. Our training and inference were performed on a T… view at source ↗

**Figure 4.** Figure 4: Example of the record structure from branch id 4661893102: content (a), metadata and analysis (b). we applied class weighting techniques. Complete model training parameters are available in the configuration file within our repository [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Customer feedback in industrial forums offers rich but underexplored insights into real-world product experience. Yet systematic analysis remains challenging due to unstructured, domain-specific content and the scarcity of high-quality labeled datasets. This paper presents the User eXperience Perception Insights Dataset (UXPID), a collection of 7130 synthesized and anonymized user feedback branches extracted from a public industrial automation forum. Each JSON record contains multi-post comments enriched with metadata and annotated by a large language model (LLM) for UX insights, user expectations, severity ratings, sentiment, and topic classifications. UXPID is designed to facilitate research in user requirements, user experience (UX) analysis, and AI-driven feedback processing, particularly where privacy and licensing restrictions limit access to real-world data. It supports the training and evaluation of transformer-based models for tasks such as issue detection, sentiment analysis, and requirements extraction in technical forums, providing a valuable resource for advancing NLP methods within industrial product support and software engineering domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper releases a new 7130-record dataset of LLM-annotated industrial forum feedback that fills a real data gap, but the complete absence of any human validation or error analysis on the labels is a load-bearing weakness.

read the letter

The core offering here is UXPID: 7130 synthesized and anonymized feedback branches pulled from a public industrial automation forum, stored as JSON with metadata and LLM-generated tags for UX insights, user expectations, severity, sentiment, and topics. That is genuinely new in the cited literature, where labeled technical feedback from this domain has been scarce. The structure looks usable for training models on requirements extraction or issue detection, and the choice to work with public forum data to avoid privacy blocks is sensible. The paper does a straightforward job of describing the extraction and annotation pipeline in the abstract. The main problem is that none of the LLM labels have any reported human check. There are no prompt details, no few-shot examples, no temperature settings, no inter-annotator agreement, and no error analysis against ground truth. In a specialized industrial setting, an LLM can easily mix up safety-critical severity with ordinary usability complaints, and without evidence that this did not happen the dataset's value as training material stays unproven. The central usefulness claim therefore rests on an assumption that has not been tested in the work. This is the kind of resource that would interest people doing NLP for software engineering or industrial UX, especially if they need synthetic data to start experiments. A reader who already has access to similar forums might still prefer to label their own data rather than rely on unvalidated outputs. The paper is coherent on its own terms and shows honest engagement with the data scarcity problem, so it is worth sending to a serious referee. The review should focus on whether the authors can add validation experiments or at least transparent methodology for the LLM step before the dataset is recommended for general use.

Referee Report

1 major / 1 minor

Summary. The paper presents the User eXperience Perception Insights Dataset (UXPID), a collection of 7130 synthesized and anonymized user feedback branches extracted from a public industrial automation forum. Each JSON record contains multi-post comments enriched with metadata and annotated by a large language model (LLM) for UX insights, user expectations, severity ratings, sentiment, and topic classifications. The dataset is positioned as a resource to support research in user requirements, UX analysis, and AI-driven feedback processing in technical forums where privacy and licensing restrictions limit real data access.

Significance. If the LLM annotations prove reliable, UXPID could help address the scarcity of labeled domain-specific data for training models on issue detection, sentiment analysis, and requirements extraction in industrial product support and software engineering. The use of public forum data combined with synthesis and anonymization to navigate privacy constraints is a constructive approach that may enable similar dataset efforts in other restricted domains.

major comments (1)

[Abstract] Abstract: The manuscript claims that UXPID supplies a valuable resource for advancing NLP methods and supports training of transformer-based models, yet provides no description of prompt engineering details, few-shot examples, temperature settings, model version, or—most critically—any human evaluation, inter-annotator agreement metrics, or error analysis of the LLM labels for severity ratings, sentiment, and topic classifications. In a specialized industrial-automation domain, this leaves the central usefulness claim dependent on an untested assumption that the annotations are sufficiently accurate and free of systematic domain-specific errors.

minor comments (1)

[Abstract] Abstract: The description of record structure would benefit from an explicit statement of the JSON schema or key fields to improve immediate usability for potential adopters.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript describing the UXPID dataset. We address the major comment point by point below and outline the revisions we will make to improve transparency regarding the LLM annotation process.

read point-by-point responses

Referee: [Abstract] Abstract: The manuscript claims that UXPID supplies a valuable resource for advancing NLP methods and supports training of transformer-based models, yet provides no description of prompt engineering details, few-shot examples, temperature settings, model version, or—most critically—any human evaluation, inter-annotator agreement metrics, or error analysis of the LLM labels for severity ratings, sentiment, and topic classifications. In a specialized industrial-automation domain, this leaves the central usefulness claim dependent on an untested assumption that the annotations are sufficiently accurate and free of systematic domain-specific errors.

Authors: We agree that the current manuscript would be strengthened by greater transparency on the annotation methodology. In the revised version, we will add a new subsection in the Methods section detailing the LLM model and version used, the full prompt templates, few-shot examples, and temperature settings applied during annotation. We will also include results from a human validation study conducted on a random sample of 300 records, reporting inter-annotator agreement (Cohen's kappa) for severity, sentiment, and topic labels along with a qualitative error analysis that examines potential domain-specific issues in industrial automation feedback. These additions will directly support the usefulness claims for NLP and transformer model training. revision: yes

Circularity Check

0 steps flagged

No significant circularity; dataset paper with no derivations or self-referential predictions

full rationale

The paper presents UXPID as a new data resource: 7130 synthesized forum threads annotated by an external LLM for UX insights, severity, sentiment, and topics. No equations, predictive models, fitted parameters, or derivation chains are claimed or present. The abstract and description frame the work as data collection and enrichment rather than any result derived from the paper's own inputs. No self-citations function as load-bearing justifications for uniqueness or ansatzes, and the annotations are generated outside the paper rather than reduced to its own definitions. This is a standard honest data-release contribution whose value depends on external use and validation, not internal circular logic.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the fidelity of the LLM annotation step and the realism of the synthesis procedure; both are introduced without external validation benchmarks in the abstract.

axioms (1)

domain assumption Large language models can produce accurate and unbiased annotations for UX insights, severity, sentiment, and topics on technical forum text.
The dataset construction relies on LLM labeling as the primary source of structured metadata without reported human verification or agreement metrics.

pith-pipeline@v0.9.0 · 5732 in / 1244 out tokens · 33368 ms · 2026-05-18T17:13:28.474248+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Each JSON record contains multi-post comments enriched with metadata and annotated by a large language model (LLM) for UX insights, user expectations, severity ratings, sentiment, and topic classifications.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 2 internal anchors

[1]

& Poranen, T

Tasnim, M., Rayhan, M., Zhang, Z. & Poranen, T. A systematic literature review on requirements engineering practices and challenges in open-source projects. In2023 49th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 278–285, 10.1109/SEAA60479.2023.00050 (2023). 3.Sommerville, I. & Sawyer, P.Requirements engineering: a good ...

work page doi:10.1109/seaa60479.2023.00050 2023
[2]

Factors4, e8, 10.2196/humanfactors.5443, (2017)

Harte, R.et al.A human-centered design methodology to enhance the usability, human factors, and user experience of connected health systems: A three-phase methodology.JMIR Hum. Factors4, e8, 10.2196/humanfactors.5443, (2017)

work page doi:10.2196/humanfactors.5443 2017
[3]

, author Bazilinskyy, P

Sauer, J., Sonderegger, A. & Schmutz, S. Usability, user experience and accessibility: towards an integrative model.Er- gonomics63, 1207–1220, 10.1080/00140139.2020.1774080, (2020). PMID: 32450782, https://doi.org/10.1080/00140139. 2020.1774080

work page doi:10.1080/00140139.2020.1774080 2020
[4]

& Panse, F

Maalej, W., Biryuk, V ., Wei, J. & Panse, F. On the automated processing of user feedback. InHandbook on Natural Language Processing for Requirements Engineering, 279–308 (Springer, 2025)

work page 2025
[5]

Soares, M. D. S., Vrancken, J. & Verbraeck, A. User requirements modeling and analysis of software-intensive systems84, 328–339, 10.1016/j.jss.2010.10.020, (2011)

work page doi:10.1016/j.jss.2010.10.020 2010
[6]

& Liu, H

Kang, Y ., Cai, Z., Tan, C.-W., Huang, Q. & Liu, H. Natural language processing (nlp) in management research: A literature review.J. Manag. Anal.7, 139–172, (2020). 9.Hirschberg, J. & Manning, C. D. Advances in natural language processing.Science349, 261–266, (2015)

work page 2020
[7]

Natural language processing for innovation search–reviewing an emerging non-human innovation intermediary

Just, J. Natural language processing for innovation search–reviewing an emerging non-human innovation intermediary. Technovation129, 102883, (2024)

work page 2024
[8]

& Welbers, K

Laurer, M., Van Atteveldt, W., Casas, A. & Welbers, K. Less annotating, more classifying: Addressing the data scarcity issue of supervised machine learning with deep transfer learning and bert-nli.Polit. Analysis32, 84–100, (2024)

work page 2024
[9]

ACM on Softw

Zhang, J.et al.Less is more: On the importance of data quality for unit test generation.Proc. ACM on Softw. Eng.2, 1293–1316, (2025)

work page 2025
[10]

& Saeed, F

Osman, A., Salim, N. & Saeed, F. Quality dimensions features for identifying high-quality user replies in text forum threads using classification methods14, e0215516, 10.1371/journal.pone.0215516. 14.Castelli, V .et al.The techqa dataset.arXiv preprint arXiv:1911.02984(2019). 15.Sonali, S. FR_nfr_dataset, 10.17632/4YSX9FYZV4.1, (2024)

work page doi:10.1371/journal.pone.0215516 1911
[11]

Ferrari, A., Spagnolo, G. O. & Gnesi, S. Pure: A dataset of public requirements documents. In2017 IEEE 25th international requirements engineering conference (RE), 502–505 (IEEE, 2017). 8/9 17.Bozyigit, F.et al.Dataset for: Text requirements to models, 10.21227/r9j6-nd62, (2023)

work page doi:10.21227/r9j6-nd62 2017
[12]

R., Irfan, A., Groen, E

Mekala, R. R., Irfan, A., Groen, E. C., Porter, A. & Lindvall, M. Classifying user requirements from online feedback in small dataset environments using deep learning. In2021 IEEE 29th International requirements engineering conference (RE), 139–149 (IEEE, 2021)

work page 2021
[13]

Kadebu, P., Sikka, S., Tyagi, R. K. & Chiurunge, P. A classification approach for software requirements towards maintainable security.Sci. Afr.19, e01496, (2023)

work page 2023
[14]

& Freitas Júnior, O

Neo, G., Moura, J., Almeida, H., Neo, A. & Freitas Júnior, O. User story tutor (UST) to support agile software developers:. InProceedings of the 16th International Conference on Computer Supported Education, 51–62, 10.5220/ 0012619200003693 (SCITEPRESS - Science and Technology Publications, 2024). 21.Kulyabin, M.et al.User experience perception insights d...

work page doi:10.5281/zenodo.17091284 2024
[15]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Sanh, V ., Debut, L., Chaumond, J. & Wolf, T. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, (2020). 1910.01108

work page internal anchor Pith review Pith/arXiv arXiv 2020
[16]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding, (2018). 1810.04805. Author contributions statement Data collection, M.K. and J.J.; conceptualization, J.J., M.K. and N.N.M.P; software, M.K. and J.J.; writing-original draft preparation, M.K., J.J., and N.N.M.P.; writing—revi...

work page internal anchor Pith review Pith/arXiv arXiv 2018

[1] [1]

& Poranen, T

Tasnim, M., Rayhan, M., Zhang, Z. & Poranen, T. A systematic literature review on requirements engineering practices and challenges in open-source projects. In2023 49th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 278–285, 10.1109/SEAA60479.2023.00050 (2023). 3.Sommerville, I. & Sawyer, P.Requirements engineering: a good ...

work page doi:10.1109/seaa60479.2023.00050 2023

[2] [2]

Factors4, e8, 10.2196/humanfactors.5443, (2017)

Harte, R.et al.A human-centered design methodology to enhance the usability, human factors, and user experience of connected health systems: A three-phase methodology.JMIR Hum. Factors4, e8, 10.2196/humanfactors.5443, (2017)

work page doi:10.2196/humanfactors.5443 2017

[3] [3]

, author Bazilinskyy, P

Sauer, J., Sonderegger, A. & Schmutz, S. Usability, user experience and accessibility: towards an integrative model.Er- gonomics63, 1207–1220, 10.1080/00140139.2020.1774080, (2020). PMID: 32450782, https://doi.org/10.1080/00140139. 2020.1774080

work page doi:10.1080/00140139.2020.1774080 2020

[4] [4]

& Panse, F

Maalej, W., Biryuk, V ., Wei, J. & Panse, F. On the automated processing of user feedback. InHandbook on Natural Language Processing for Requirements Engineering, 279–308 (Springer, 2025)

work page 2025

[5] [5]

Soares, M. D. S., Vrancken, J. & Verbraeck, A. User requirements modeling and analysis of software-intensive systems84, 328–339, 10.1016/j.jss.2010.10.020, (2011)

work page doi:10.1016/j.jss.2010.10.020 2010

[6] [6]

& Liu, H

Kang, Y ., Cai, Z., Tan, C.-W., Huang, Q. & Liu, H. Natural language processing (nlp) in management research: A literature review.J. Manag. Anal.7, 139–172, (2020). 9.Hirschberg, J. & Manning, C. D. Advances in natural language processing.Science349, 261–266, (2015)

work page 2020

[7] [7]

Natural language processing for innovation search–reviewing an emerging non-human innovation intermediary

Just, J. Natural language processing for innovation search–reviewing an emerging non-human innovation intermediary. Technovation129, 102883, (2024)

work page 2024

[8] [8]

& Welbers, K

Laurer, M., Van Atteveldt, W., Casas, A. & Welbers, K. Less annotating, more classifying: Addressing the data scarcity issue of supervised machine learning with deep transfer learning and bert-nli.Polit. Analysis32, 84–100, (2024)

work page 2024

[9] [9]

ACM on Softw

Zhang, J.et al.Less is more: On the importance of data quality for unit test generation.Proc. ACM on Softw. Eng.2, 1293–1316, (2025)

work page 2025

[10] [10]

& Saeed, F

Osman, A., Salim, N. & Saeed, F. Quality dimensions features for identifying high-quality user replies in text forum threads using classification methods14, e0215516, 10.1371/journal.pone.0215516. 14.Castelli, V .et al.The techqa dataset.arXiv preprint arXiv:1911.02984(2019). 15.Sonali, S. FR_nfr_dataset, 10.17632/4YSX9FYZV4.1, (2024)

work page doi:10.1371/journal.pone.0215516 1911

[11] [11]

Ferrari, A., Spagnolo, G. O. & Gnesi, S. Pure: A dataset of public requirements documents. In2017 IEEE 25th international requirements engineering conference (RE), 502–505 (IEEE, 2017). 8/9 17.Bozyigit, F.et al.Dataset for: Text requirements to models, 10.21227/r9j6-nd62, (2023)

work page doi:10.21227/r9j6-nd62 2017

[12] [12]

R., Irfan, A., Groen, E

Mekala, R. R., Irfan, A., Groen, E. C., Porter, A. & Lindvall, M. Classifying user requirements from online feedback in small dataset environments using deep learning. In2021 IEEE 29th International requirements engineering conference (RE), 139–149 (IEEE, 2021)

work page 2021

[13] [13]

Kadebu, P., Sikka, S., Tyagi, R. K. & Chiurunge, P. A classification approach for software requirements towards maintainable security.Sci. Afr.19, e01496, (2023)

work page 2023

[14] [14]

& Freitas Júnior, O

Neo, G., Moura, J., Almeida, H., Neo, A. & Freitas Júnior, O. User story tutor (UST) to support agile software developers:. InProceedings of the 16th International Conference on Computer Supported Education, 51–62, 10.5220/ 0012619200003693 (SCITEPRESS - Science and Technology Publications, 2024). 21.Kulyabin, M.et al.User experience perception insights d...

work page doi:10.5281/zenodo.17091284 2024

[15] [15]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Sanh, V ., Debut, L., Chaumond, J. & Wolf, T. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, (2020). 1910.01108

work page internal anchor Pith review Pith/arXiv arXiv 2020

[16] [16]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding, (2018). 1810.04805. Author contributions statement Data collection, M.K. and J.J.; conceptualization, J.J., M.K. and N.N.M.P; software, M.K. and J.J.; writing-original draft preparation, M.K., J.J., and N.N.M.P.; writing—revi...

work page internal anchor Pith review Pith/arXiv arXiv 2018