pith. machine review for the scientific record. sign in

arxiv: 2604.23124 · v1 · submitted 2026-04-25 · 💻 cs.SE · cs.AI

Recognition: unknown

ArgRE: Formal Argumentation for Conflict Resolution in Multi-Agent Requirements Negotiation

Authors on Pith no claims yet

Pith reviewed 2026-05-08 08:06 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords requirements engineeringabstract argumentationmulti-agent systemsconflict resolutiontraceabilitycompliancenegotiationDung semantics
0
0 comments X

The pith

ArgRE embeds Dung-style argumentation into multi-agent requirements negotiation to deliver explicit traceability for conflict resolutions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ArgRE as a system that replaces heuristic conflict resolution in multi-agent requirements engineering with a formal argumentation approach. Proposals, critiques, and refinements become arguments, while conflicts become directed attack relations whose accepted sets are computed under grounded and preferred semantics. The method is integrated with KAOS goal modeling, multi-layer verification, and artifact generation for standards compliance. Across five case studies in safety-critical, financial, and information-system domains, independent evaluators gave its decision justifications markedly higher scores than heuristic synthesis, compliance coverage rose substantially, and semantic intent preservation stayed comparable. A sympathetic reader cares because regulated domains need auditable reasons for accepting or rejecting each requirement rather than opaque aggregation.

Core claim

ArgRE models each requirements proposal, critique, and refinement as an argument in a Dung-style abstract argumentation framework, represents conflicts as directed attack relations, and computes the accepted set under grounded and preferred semantics. The pipeline combines this with KAOS goal modeling, multi-layer verification, and standards-oriented artifact generation. Evaluation on five case studies shows argument-level traceability, significantly higher evaluator ratings for decision justifications than heuristic baselines, comparable semantic intent preservation, and substantially higher compliance coverage. Structural analysis confirms that the default pairwise protocol produces acyclc

What carries the argument

Dung-style abstract argumentation framework in which proposals and critiques are arguments, conflicts are attack relations, and acceptance is computed under grounded and preferred semantics.

If this is right

  • Argument-level traceability becomes available for auditability in regulated domains.
  • Compliance coverage reaches 84.7 percent versus 47.6-47.8 percent for heuristic baselines.
  • Independent evaluators rate decision justifications significantly higher (4.32 versus 3.07).
  • Semantic intent preservation remains comparable at 94.9 percent BERTScore F1.
  • Default pairwise protocols yield acyclic graphs where grounded and preferred semantics coincide.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same argumentation structure could be tested in multi-agent planning or resource allocation outside requirements engineering.
  • Larger numbers of agents may increase the frequency of controlled cycles and the divergence between semantics.
  • Combining the accepted argument sets with automated theorem provers could further strengthen compliance guarantees.
  • Stakeholder training on interpreting attack graphs might increase adoption of the generated justifications.

Load-bearing premise

That modeling requirements proposals and conflicts as arguments and attack relations in an abstract argumentation framework produces outcomes that accurately reflect stakeholder intent and practical acceptability in real negotiation settings.

What would settle it

A case study in which the set of arguments accepted under grounded or preferred semantics is later rejected by the original stakeholders as failing to capture their intended balance of requirements or produces a non-compliant system.

Figures

Figures reproduced from arXiv: 2604.23124 by Chong Liu, Haowei Cheng, Hironori Washizaki, Jialong Li, Milhan Kim, Naoyasu Ubayashi, Phan Thi Huyen Thanh, Teeradaj Racharak, Truong Vinh Truong Duy.

Figure 1
Figure 1. Figure 1: FIGURE 1: Overview of the ArgRE framework. A stakeholder provides the initial project requirements (top) and may intervene view at source ↗
Figure 2
Figure 2. Figure 2: FIGURE 2: Basic attack pattern structure. P1: critique attacks view at source ↗
Figure 3
Figure 3. Figure 3: FIGURE 3: ArgRE argumentation graph for the autonomous view at source ↗
Figure 4
Figure 4. Figure 4: FIGURE 4: KAOS goal model for the autonomous-driving running example. The accepted requirements are organized into view at source ↗
Figure 5
Figure 5. Figure 5: FIGURE 5: Sensitivity of argumentation graph properties to view at source ↗
read the original abstract

As software systems grow in complexity, they must satisfy an increasing number of competing quality attributes, making it essential to balance them in a principled manner -- for example, a safety requirement for sensor-fusion verification may conflict with a tight planning-cycle budget. Multi-agent large language model frameworks support this balancing process by assigning specialized agents to different objectives. However, their conflict resolution is typically heuristic. Requirements are aggregated implicitly without explicit acceptance or rejection, limiting auditability in regulated domains. We present ArgRE, a multi-agent requirements negotiation system that embeds Dung-style abstract argumentation into the negotiation stage. Each proposal, critique, and refinement is modeled as an argument, conflicts are represented as directed attack relations, and the accepted set of arguments is computed under grounded and preferred semantics. The pipeline further integrates KAOS goal modeling, multi-layer verification, and standards-oriented artifact generation. Evaluation across five case studies spanning safety-critical, financial, and information-system domains shows that ArgRE provides argument-level traceability absent from existing frameworks. Independent evaluators rated its decision justifications significantly higher than those of heuristic synthesis (4.32 vs. 3.07, p < 0.001), indicating improved auditability, while semantic intent preservation remains comparable (94.9% BERTScore F1) and compliance coverage reaches 84.7% versus 47.6%--47.8% for baselines. Structural analysis further confirms that the default pairwise protocol yields acyclic graphs in which grounded and preferred semantics coincide, whereas cross-pair arbitration introduces controlled cyclicity, leading to predictable divergence between the two semantics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper presents ArgRE, a multi-agent requirements negotiation framework that models proposals, critiques, and refinements as arguments in a Dung-style abstract argumentation framework (AAF), with conflicts as directed attacks. It integrates KAOS goal modeling, computes accepted sets under grounded and preferred semantics, and generates standards-oriented artifacts. Evaluation on five case studies claims superior argument-level traceability, with independent evaluators rating decision justifications higher than heuristic baselines (4.32 vs. 3.07, p<0.001), comparable semantic intent preservation (94.9% BERTScore F1), and higher compliance coverage (84.7% vs. 47.6-47.8%). Structural analysis shows default protocols yield acyclic graphs where semantics coincide.

Significance. If the central claims hold, ArgRE offers a formal, auditable alternative to heuristic conflict resolution in requirements engineering, especially for regulated or safety-critical domains where traceability matters. The explicit embedding of Dung semantics and the acyclicity analysis under different arbitration protocols are strengths that could support reproducible decision-making. The empirical gains in evaluator ratings and compliance provide initial evidence of practical utility, though significance is limited by the untested assumption that binary attacks preserve nuanced stakeholder priorities from KAOS models.

major comments (2)
  1. The evaluation reports statistically significant improvements in traceability and compliance, but does not test whether the grounded/preferred extensions preserve the priority orderings or contribution degrees (+/–) present in the original KAOS goal models. If the attack-relation extraction collapses these into unweighted binary edges, the accepted set may reject high-priority requirements that a weighted or value-based framework would retain; this directly bears on the claim that outcomes reflect stakeholder intent.
  2. The pipeline description indicates that KAOS contributions are mapped to arguments and directed attacks before applying Dung semantics, yet no mechanism is described for encoding trade-off severity or partial support into the attack relation. This risks the central claim that the computed extensions accurately reflect practical acceptability, as standard AAFs discard degree information by construction.
minor comments (3)
  1. The abstract states that cross-pair arbitration introduces controlled cyclicity leading to divergence between semantics, but the specific protocol for constructing the attack relation (manual vs. LLM-driven) and its impact on graph properties is not detailed enough to assess reproducibility.
  2. BERTScore F1 is reported at 94.9% with no corresponding baseline values provided for the heuristic synthesis methods, making the 'comparable' claim difficult to evaluate quantitatively.
  3. The p < 0.001 result for evaluator ratings lacks specification of the exact statistical test, sample size per case study, or inter-rater reliability metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the interaction between KAOS contribution links and Dung-style abstract argumentation in ArgRE. The comments correctly identify that our mapping produces unweighted binary attacks. We address each point below with clarifications on scope and planned revisions.

read point-by-point responses
  1. Referee: The evaluation reports statistically significant improvements in traceability and compliance, but does not test whether the grounded/preferred extensions preserve the priority orderings or contribution degrees (+/–) present in the original KAOS goal models. If the attack-relation extraction collapses these into unweighted binary edges, the accepted set may reject high-priority requirements that a weighted or value-based framework would retain; this directly bears on the claim that outcomes reflect stakeholder intent.

    Authors: We agree that ArgRE derives binary attack relations from KAOS contribution links without retaining ordinal or numeric degrees, and the evaluation does not include a direct test of priority preservation. The framework instead uses the attack graph to compute grounded and preferred extensions, providing argument-level traceability and the reported gains in evaluator-rated justifications and compliance. The paper does not claim that extensions exactly reproduce all original priority orderings; rather, it demonstrates that the formal resolution process yields auditable outcomes that independent evaluators judge as better justified than heuristic baselines. In the revised manuscript we will add an explicit statement of this scope limitation in Section 4 and a short discussion of value-based argumentation as future work. revision: partial

  2. Referee: The pipeline description indicates that KAOS contributions are mapped to arguments and directed attacks before applying Dung semantics, yet no mechanism is described for encoding trade-off severity or partial support into the attack relation. This risks the central claim that the computed extensions accurately reflect practical acceptability, as standard AAFs discard degree information by construction.

    Authors: The referee accurately observes that the current pipeline performs a binary mapping and provides no encoding of severity or partial support. This choice enables the structural analysis showing acyclicity under default protocols and the coincidence of grounded and preferred semantics. While standard AAFs indeed abstract away degrees, the resulting extensions still deliver the measured improvements in traceability and compliance coverage across the five case studies. We will revise the manuscript to state explicitly that the framework targets formal, reproducible conflict resolution rather than degree-preserving optimality, and we will note the potential of gradual or weighted semantics for future extensions that could address trade-off severity. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation uses external evaluation and standard Dung semantics

full rationale

The paper constructs an AAF from KAOS goal models and multi-agent proposals, applies grounded/preferred semantics, and evaluates via independent human raters (4.32 vs 3.07), BERTScore F1 (94.9%), and compliance coverage against external baselines (84.7% vs 47.6-47.8%). No equations or steps reduce a claimed prediction to a fitted input by construction, no load-bearing premise rests on self-citation chains, and graph acyclicity analysis is a post-hoc structural check rather than a definitional tautology. The central claims rest on observable external metrics rather than self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on established Dung argumentation semantics and KAOS goal modeling without introducing new free parameters, ad-hoc axioms, or invented entities beyond the ArgRE pipeline itself.

axioms (1)
  • standard math Dung's abstract argumentation framework and its grounded and preferred semantics
    Invoked to compute the accepted set of arguments from attack relations.

pith-pipeline@v0.9.0 · 5620 in / 1247 out tokens · 28392 ms · 2026-05-08T08:06:49.538628+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 8 canonical work pages · 1 internal anchor

  1. [1]

    Pohl, ‘‘Requirements engineering: An overview,’’ RWTH, Fachgruppe Informatik, Aachen, 1996

    K. Pohl, ‘‘Requirements engineering: An overview,’’ RWTH, Fachgruppe Informatik, Aachen, 1996

  2. [2]

    S. U. Rehman and V . Gruhn, ‘‘An effective security requirements engineer- ing framework for cyber-physical systems,’’Technologies, vol. 6, no. 3, p. 65, 2018

  3. [3]

    International Organization for Standardization,Systems and Software Engineering—Systems and Software Quality Requirements and Evaluation (SQuaRE)—System and Software Quality Models, ISO/IEC 25010, 2011

  4. [4]

    V an Lamsweerde, ‘‘Requirements engineering in the year 00: A research perspective,’’ inProc

    A. V an Lamsweerde, ‘‘Requirements engineering in the year 00: A research perspective,’’ inProc. 22nd Int. Conf. Softw. Eng., 2000, pp. 5–19

  5. [5]

    [Online]

    Standish Group, ‘‘CHAOS Report 2020,’’ The Standish Group Interna- tional, 2020. [Online]. Available: https://www.standishgroup.com/

  6. [6]

    Méndez Fernándezet al., ‘‘Naming the pain in requirements engineer- ing: Contemporary problems, causes, and effects in practice,’’Empir

    D. Méndez Fernándezet al., ‘‘Naming the pain in requirements engineer- ing: Contemporary problems, causes, and effects in practice,’’Empir . Softw. Eng., vol. 22, no. 5, pp. 2298–2338, 2017

  7. [7]

    J. D. Musa,Software Reliability Engineering: More Reliable Software Faster and Cheaper. AuthorHouse, 2004

  8. [8]

    ChatGPT Prompt Patterns for Improving Code Quality, Refac- toring, Requirements Elicitation, and Software Design,

    J. Whiteet al., ‘‘ChatGPT prompt patterns for improving code quality, refactoring, requirements elicitation, and software design,’’ arXiv:2303.07839, 2023

  9. [9]

    Ahmadet al., ‘‘Requirements engineering using generative AI: Prompts and prompting patterns,’’ arXiv:2311.03832, 2023

    A. Ahmadet al., ‘‘Requirements engineering using generative AI: Prompts and prompting patterns,’’ arXiv:2311.03832, 2023

  10. [10]

    D. Jin, Z. Jin, X. Chen, and C. Wang, ‘‘MARE: Multi-agents collaboration framework for requirements engineering,’’ arXiv:2405.03256, 2024

  11. [11]

    Jinet al., ‘‘iReDev: A knowledge-driven multi-agent framework for intelligent requirements development,’’ arXiv:2507.13081, 2025

    D. Jinet al., ‘‘iReDev: A knowledge-driven multi-agent framework for intelligent requirements development,’’ arXiv:2507.13081, 2025

  12. [12]

    P . M. Dung, ‘‘On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming andn-person games,’’Artif. Intell., vol. 77, no. 2, pp. 321–357, 1995

  13. [13]

    European Parliament and Council, ‘‘Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act),’’Official Journal of the European Union, 2024

  14. [14]

    International Organization for Standardization,Road V ehicles—Functional Safety, ISO 26262:2018, 2018

  15. [15]

    Lee, ‘‘Extending the Potts and Bruns model for recording design ratio- nale,’’ inProc

    J. Lee, ‘‘Extending the Potts and Bruns model for recording design ratio- nale,’’ inProc. 13th Int. Conf. Softw. Eng., 1991, pp. 114–125

  16. [16]

    A. Tang, A. Babar, I. Gorton, and J. Han, ‘‘A survey of architecture design rationale,’’J. Syst. Softw., vol. 79, no. 12, pp. 1792–1804, 2006

  17. [17]

    Jureta, J

    I. Jureta, J. Mylopoulos, and S. Faulkner, ‘‘Revisiting the core ontology and problem in requirements engineering,’’ inProc. 16th IEEE Int. Re- quirements Eng. Conf., 2008, pp. 71–80

  18. [18]

    Z. Li, X. Fang, C. Chen, M. Li, and B. Liao, ‘‘Enhancing conflict reso- lution in language models via abstract argumentation,’’Neurocomputing, p. 132093, 2025

  19. [19]

    Ataeiet al., ‘‘Elicitron: A large language model agent-based simulation framework for design requirements elicitation,’’J

    M. Ataeiet al., ‘‘Elicitron: A large language model agent-based simulation framework for design requirements elicitation,’’J. Comput. Inf. Sci. Eng., vol. 25, no. 2, p. 021012, 2025

  20. [20]

    Arora, J

    C. Arora, J. Grundy, and M. Abdelrazek, ‘‘Advancing requirements engi- neering through generative AI: Assessing the role of LLMs,’’ inGenerative AI for Effective Software Development. Springer, 2024, pp. 129–148

  21. [21]

    Ronanki, C

    K. Ronanki, C. Berger, and J. Horkoff, ‘‘Investigating ChatGPT’s potential to assist in requirements elicitation processes,’’ inProc. 49th Euromicro SEAA, 2023, pp. 354–361

  22. [22]

    Chenget al., ‘‘Generative AI for requirements engineering: A system- atic literature review,’’Softw.: Pract

    H. Chenget al., ‘‘Generative AI for requirements engineering: A system- atic literature review,’’Softw.: Pract. Exp., 2025

  23. [23]

    Nuseibeh and S

    B. Nuseibeh and S. Easterbrook, ‘‘Requirements engineering: A roadmap,’’ inProc. Conf. Future of Software Engineering (ICSE), 2000, pp. 35–46

  24. [24]

    Zowghi and C

    D. Zowghi and C. Coulin, ‘‘Requirements elicitation: A survey of tech- niques, approaches, and tools,’’ inEngineering and Managing Software Requirements. Springer, 2005, pp. 19–46

  25. [25]

    P . K. Murukannaiah and M. P . Singh, ‘‘Xipho: Extending Tropos to support cross-organizational process modeling,’’ACM Trans. Softw. Eng. Methodol., vol. 24, no. 2, pp. 1–40, 2015

  26. [26]

    Grünbacher and N

    P . Grünbacher and N. Seyff, ‘‘Requirements negotiation,’’ inEngineering and Managing Software Requirements, A. Aurum and C. Wohlin, Eds. Berlin, Heidelberg: Springer, 2005, pp. 143–162. 22 VOLUME XX, 20XX Chenget al.: ArgRE

  27. [27]

    Chenget al., ‘‘QUARE: Multi-agent negotiation for balancing quality attributes in requirements engineering,’’ arXiv:2603.11890

    H. Chenget al., ‘‘QUARE: Multi-agent negotiation for balancing quality attributes in requirements engineering,’’ arXiv:2603.11890

  28. [28]

    T. J. M. Bench-Capon, ‘‘Persuasion in practical argument using value- based argumentation frameworks,’’J. Logic Comput., vol. 13, no. 3, pp. 429–448, 2003

  29. [29]

    Cayrol and M.-C

    C. Cayrol and M.-C. Lagasquie-Schiex, ‘‘On the acceptability of arguments in bipolar argumentation frameworks,’’ inProc. 8th Eur . Conf. Symbolic and Quantitative Approaches to Reasoning with Uncertainty, 2005, pp. 378–389

  30. [30]

    Mirbel and S

    I. Mirbel and S. Villata, ‘‘Enhancing goal-based requirements consistency: An argumentation-based approach,’’ inProc. iStar Workshop, 2010

  31. [31]

    Racharak, ‘‘Abstract argumentation for summarizing product re- views: A case study in Shopee Thailand,’’ inProc

    T. Racharak, ‘‘Abstract argumentation for summarizing product re- views: A case study in Shopee Thailand,’’ inProc. 11th IEEE Int. Conf. Knowledge and Systems Engineering (KSE), 2019, pp. 1–6, doi: 10.1109/KSE.2019.8919483

  32. [32]

    Racharaket al., ‘‘Towards assumption-based argumentation mining in hotel reviews,’’ inProc

    T. Racharaket al., ‘‘Towards assumption-based argumentation mining in hotel reviews,’’ inProc. 6th Int. Conf. Logic and Argumentation (CLAR), ser. LNCS, vol. 15712, Springer, 2025, doi: 10.1007/978-981-96-7956- 0_17

  33. [33]

    Walton and E

    D. Walton and E. C. W. Krabbe,Commitment in Dialogue: Basic Concepts of Interpersonal Reasoning. State University of New Y ork Press, 1995

  34. [34]

    Duet al., ‘‘Improving factuality and reasoning in language models through multiagent debate,’’ inProc

    Y . Duet al., ‘‘Improving factuality and reasoning in language models through multiagent debate,’’ inProc. 41st Int. Conf. Mach. Learn., 2024

  35. [35]

    Lianget al., ‘‘Encouraging divergent thinking in large language models through multi-agent debate,’’ inProc

    T. Lianget al., ‘‘Encouraging divergent thinking in large language models through multi-agent debate,’’ inProc. EMNLP, 2024, pp. 17889–17904

  36. [36]

    Chanet al., ‘‘ChatEval: Towards better LLM-based evaluators through multi-agent debate,’’ inProc

    C. Chanet al., ‘‘ChatEval: Towards better LLM-based evaluators through multi-agent debate,’’ inProc. ICLR, 2024

  37. [37]

    V an Lamsweerde, ‘‘Goal-oriented requirements engineering: A guided tour,’’ inProc

    A. V an Lamsweerde, ‘‘Goal-oriented requirements engineering: A guided tour,’’ inProc. 5th IEEE Int. Symp. Requirements Engineering, 2001, pp. 249–262

  38. [38]

    Y u, ‘‘Modelling strategic relationships for process reengineering,’’ in Social Modeling for Requirements Engineering

    E. Y u, ‘‘Modelling strategic relationships for process reengineering,’’ in Social Modeling for Requirements Engineering. MIT Press, 2011

  39. [39]

    Amyot and G

    D. Amyot and G. Mussbacher, ‘‘User requirements notation: The first ten years, the next ten years,’’Journal of Software, vol. 6, no. 5, pp. 747–768, 2011

  40. [40]

    Baroni, M

    P . Baroni, M. Caminada, and M. Giacomin, ‘‘An introduction to argumen- tation semantics,’’Knowledge Engineering Review, vol. 26, no. 4, pp. 365– 410, 2011

  41. [41]

    Modgil and M

    S. Modgil and M. Caminada, ‘‘Proof theories and algorithms for abstract argumentation frameworks,’’ inArgumentation in Artificial Intelligence. Springer, 2009, pp. 105–129

  42. [42]

    A. J. García and G. R. Simari, ‘‘Defeasible logic programming: An ar- gumentative approach,’’Theory Pract. Logic Program., vol. 4, no. 1–2, pp. 95–138, 2004

  43. [43]

    Fazelnia, V

    M. Fazelnia, V . Koscinski, S. Herzog, S. Morana, and A. V ogelsang, ‘‘Lessons from the use of natural language inference (NLI) in requirements engineering tasks,’’ inProc. IEEE 32nd Int. Requirements Eng. Conf. (RE), 2024, pp. 103–115

  44. [44]

    A. E. Gärtner and D. Göhlich, ‘‘Automated requirement contradiction detection through formal logic and LLMs,’’Autom. Softw. Eng., vol. 31, no. 2, 2024

  45. [45]

    BERTScore: Evaluating Text Generation with BERT

    T. Zhanget al., ‘‘BERTScore: Evaluating text generation with BERT,’’ arXiv:1904.09675, 2019

  46. [46]

    Fan and F

    X. Fan and F. Toni, ‘‘On computing explanations in abstract argumenta- tion,’’ inProc. AAAI Conf. Artificial Intelligence, 2015, pp. 1496–1502

  47. [47]

    Huanget al., ‘‘Envisioning intelligent requirements engineering via knowledge-guided multi-agent collaboration,’’ inProc

    J. Huanget al., ‘‘Envisioning intelligent requirements engineering via knowledge-guided multi-agent collaboration,’’ inProc. 40th Int. Conf. Autom. Softw. Eng. (NIER), 2025, pp. 1–6

  48. [48]

    P . E. Dunne, A. Hunter, P . McBurney, S. Parsons, and M. Wooldridge, ‘‘Weighted argument systems: Basic definitions, algorithms, and complex- ity results,’’Artificial Intelligence, vol. 175, no. 2, pp. 457–486, 2011. VOLUME XX, 20XX 23