A Study on the Framework for Evaluating the Ethics and Trustworthiness of Generative AI
Pith reviewed 2026-05-18 19:49 UTC · model grok-4.3
The pith
A framework of eleven dimensions with detailed indicators evaluates the ethics and trustworthiness of generative AI across its lifecycle.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors state that generative AI ethics and trustworthiness can be evaluated through a framework built around eleven dimensions—fairness, transparency, accountability, safety, privacy, accuracy, consistency, robustness, explainability, copyright and intellectual property protection, and source traceability—each equipped with specific indicators and assessment methodologies, informed by a comparison of policies from South Korea, the United States, the European Union, and China, and designed for use across the full AI lifecycle to integrate technical and multidisciplinary views.
What carries the argument
The proposed evaluation framework built from eleven dimensions, each with its own indicators and assessment methodologies.
If this is right
- Supplies practical tools to identify and manage ethical risks in actual AI applications.
- Gives policymakers, developers, and users concrete guidance for responsible AI decisions.
- Helps steer generative AI toward positive contributions to society.
- Creates a shared academic base for ongoing work on trustworthy AI systems.
- Combines technical evaluation with social and policy perspectives in one structure.
Where Pith is reading between the lines
- The framework could be tested by running it on current large language models to see which indicators need adjustment.
- It might serve as a starting point for creating standardized checklists used by regulators in multiple countries.
- Extending the indicators to include measurable scores could make the assessments more repeatable.
- The policy comparison section suggests the framework may adapt differently depending on regional legal contexts.
Load-bearing premise
The eleven dimensions and their indicators are complete enough to cover all relevant ethical and trustworthiness issues and can be applied across the AI lifecycle without further real-world testing.
What would settle it
Apply the framework to a deployed generative AI system and check whether it misses or fails to provide remedies for a major ethical failure such as undetected privacy leakage or widespread hallucinated facts.
read the original abstract
This study provides an in_depth analysis of the ethical and trustworthiness challenges emerging alongside the rapid advancement of generative artificial intelligence (AI) technologies and proposes a comprehensive framework for their systematic evaluation. While generative AI, such as ChatGPT, demonstrates remarkable innovative potential, it simultaneously raises ethical and social concerns, including bias, harmfulness, copyright infringement, privacy violations, and hallucination. Current AI evaluation methodologies, which mainly focus on performance and accuracy, are insufficient to address these multifaceted issues. Thus, this study emphasizes the need for new human_centered criteria that also reflect social impact. To this end, it identifies key dimensions for evaluating the ethics and trustworthiness of generative AI_fairness, transparency, accountability, safety, privacy, accuracy, consistency, robustness, explainability, copyright and intellectual property protection, and source traceability and develops detailed indicators and assessment methodologies for each. Moreover, it provides a comparative analysis of AI ethics policies and guidelines in South Korea, the United States, the European Union, and China, deriving key approaches and implications from each. The proposed framework applies across the AI lifecycle and integrates technical assessments with multidisciplinary perspectives, thereby offering practical means to identify and manage ethical risks in real_world contexts. Ultimately, the study establishes an academic foundation for the responsible advancement of generative AI and delivers actionable insights for policymakers, developers, users, and other stakeholders, supporting the positive societal contributions of AI technologies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to provide an in-depth analysis of ethical and trustworthiness challenges in generative AI and proposes a comprehensive framework identifying 11 key dimensions—fairness, transparency, accountability, safety, privacy, accuracy, consistency, robustness, explainability, copyright and intellectual property protection, and source traceability—along with detailed indicators and assessment methodologies for each. It includes a comparative analysis of AI ethics policies and guidelines from South Korea, the United States, the European Union, and China, deriving implications from each, and asserts that the framework applies across the full AI lifecycle while integrating technical assessments with multidisciplinary perspectives to offer practical means for identifying and managing ethical risks in real-world contexts.
Significance. If the 11 dimensions and associated indicators prove both comprehensive and operationalizable, the work could serve as a useful synthesized reference for policymakers, developers, and stakeholders by consolidating existing policy documents and literature into a structured evaluation approach with explicit assessment methods. The comparative policy analysis across four jurisdictions adds value by surfacing regional differences and common themes. However, the significance is currently constrained by the purely synthetic nature of the contribution, with no demonstrated application or validation to establish practical utility.
major comments (3)
- Abstract: The central claim that the framework 'applies across the AI lifecycle and integrates technical assessments with multidisciplinary perspectives, thereby offering practical means to identify and manage ethical risks in real-world contexts' is load-bearing but unsupported, as the manuscript contains no application of the 11 dimensions or indicators to any concrete generative AI system (e.g., ChatGPT or similar), no pilot evaluation, and no check for completeness or feasibility.
- Framework proposal and indicators section: The sufficiency of the listed 11 dimensions to capture all relevant ethical and trustworthiness issues is asserted without addressing potential overlaps (such as between transparency and explainability) or gaps relative to broader AI ethics literature, which directly affects the claim of comprehensiveness.
- Policy comparison section: While the analysis of policies from South Korea, the US, the EU, and China is informative, the manuscript does not explicitly derive or map the 11 dimensions and their indicators from specific policy elements, leaving the integration of these sources into the framework opaque and difficult to verify.
minor comments (2)
- Abstract: Formatting artifacts such as 'in_depth', 'human_centered', and 'real_world' should be corrected to standard hyphenation or spacing for professional presentation.
- Throughout the manuscript: Ensure the 11 dimensions are introduced and referenced in a consistent order and with uniform terminology to improve clarity and readability.
Simulated Author's Rebuttal
We appreciate the referee's detailed and constructive feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below. Where appropriate, we outline revisions to address the concerns raised.
read point-by-point responses
-
Referee: Abstract: The central claim that the framework 'applies across the AI lifecycle and integrates technical assessments with multidisciplinary perspectives, thereby offering practical means to identify and manage ethical risks in real-world contexts' is load-bearing but unsupported, as the manuscript contains no application of the 11 dimensions or indicators to any concrete generative AI system (e.g., ChatGPT or similar), no pilot evaluation, and no check for completeness or feasibility.
Authors: We acknowledge that the manuscript proposes the framework without including a concrete application or pilot study on a specific generative AI system. The claim in the abstract reflects the intended scope and design of the framework, which is derived from a synthesis of policy documents and literature to cover the AI lifecycle. However, to strengthen the manuscript and avoid overstatement, we will revise the abstract to clarify that the framework provides a structured approach for such identification and management, with practical utility to be validated in future work. Additionally, we will add a brief discussion section on potential applications and feasibility considerations. revision: partial
-
Referee: Framework proposal and indicators section: The sufficiency of the listed 11 dimensions to capture all relevant ethical and trustworthiness issues is asserted without addressing potential overlaps (such as between transparency and explainability) or gaps relative to broader AI ethics literature, which directly affects the claim of comprehensiveness.
Authors: We agree that explicitly addressing potential overlaps and justifying the comprehensiveness is important. In the revised manuscript, we will include a new subsection in the framework section that discusses the rationale for selecting these 11 dimensions, drawing on key references from the broader AI ethics literature (e.g., works on AI principles from OECD, UNESCO). We will also analyze overlaps, such as between transparency and explainability, explaining how they are distinguished in our framework: transparency refers to openness about system operations, while explainability focuses on providing understandable reasons for outputs. This will help substantiate the claim of comprehensiveness. revision: yes
-
Referee: Policy comparison section: While the analysis of policies from South Korea, the US, the EU, and China is informative, the manuscript does not explicitly derive or map the 11 dimensions and their indicators from specific policy elements, leaving the integration of these sources into the framework opaque and difficult to verify.
Authors: We thank the referee for pointing this out. To improve transparency, we will revise the policy comparison section to include explicit mappings. This could be achieved by adding a summary table or detailed explanations linking specific policy elements from each jurisdiction to the corresponding dimensions in our framework. For example, we will highlight how the EU's AI Act informs the accountability and safety dimensions, and similarly for other regions. This will make the derivation process clearer and verifiable. revision: yes
Circularity Check
Framework synthesized from external policies and literature without self-referential reduction.
full rationale
The manuscript builds its 11-dimension evaluation framework via comparative review of South Korean, US, EU, and Chinese AI ethics policies plus general literature on generative AI risks. Dimensions and indicators are explicitly drawn from these external sources rather than from any internal definitions, fitted parameters, or self-citation chains that would render the output equivalent to the input by construction. No equations, uniqueness theorems, or predictions are present that collapse back onto the paper's own assumptions; the contribution is therefore a synthesis that remains independent of its own outputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The eleven dimensions (fairness, transparency, accountability, safety, privacy, accuracy, consistency, robustness, explainability, copyright and intellectual property protection, and source traceability) are the key and sufficient criteria for evaluating ethics and trustworthiness.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
identifies key dimensions for evaluating the ethics and trustworthiness of generative AI—fairness, transparency, accountability, safety, privacy, accuracy, consistency, robustness, explainability, copyright and intellectual property protection, and source traceability—and develops detailed indicators and assessment methodologies for each
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The proposed framework applies across the AI lifecycle and integrates technical assessments with multidisciplinary perspectives
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Jeong, C. (2023). A Study on the Implementation of Generative AI Services Using an Enterprise Data -Based LLM Application Architecture. Advances in Artificial Intelligence and Machine Learning , 3(4) , 1588 -1618. https://dx.doi.org/10.54364/AAIML.2023.1191
-
[2]
Jeong, C. (2023). Generative AI service implementation using LLM application architecture: based on RAG model and LangChain framework . Journal of Intelligence and Information Systems , 19(4), 129 -164. https://doi.org/10.13088/jiis.2023.29.4.129
-
[3]
Ziv, B.Z. Nature. (2025, July 1). Why we need mandatory safeguards for emotionally responsive AI. Retrieved from https://www.nature.com/articles/d41586-025-02031-w
work page 2025
-
[4]
Goodfellow, I., Pouget -Abadie, J., Mirza, M., X u, B., Warde -Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems, 27
work page 2014
-
[5]
N., Kaiser, Ł., & Polo sukhin, I
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polo sukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30
work page 2017
-
[6]
An, J. & Park, H. (2023). Development of a case -based nursing education program using generative artificial intelligence. Journal of Korean Academy of Nursing Education, 29(3), 234–246. https://doi.org/10.5977/jkasne.2023.29.3.234
-
[7]
Adam, M., Wessel, M., & Benlian, A. (2021). AI -based chatbots in customer service and their effects on user compliance. Electronic Markets, 31(2), 427-445
work page 2021
-
[8]
Przegalinska, A., Ciecha nowski, L., Stroz, A., Gloor, P., & Mazurek, G. (2019). In bot we trust: A new methodology of chatbot performance measures. Business Horizons, 62(6), 785-797
work page 2019
-
[9]
Park, E. (2024). The effects of customers’ regulatory focus and familiarity with generative AI -based chatbots on their intention to disclose personal information: Focusing on privacy calculus theory. Knowledge Management Research, 25(2), 49–68. https://doi.org/10.15813/kmr.2024.25.2.003
-
[10]
Sánchez-Dí az, X., Ayala-Bastidas, G., Fonseca-Ortiz, P., Garrido, L. (2018). A Knowledge-Based Methodology for Building a Conversational Chatbot as an Intelligent Tutor, Advances in Computational Intelligence , Vol. 11289. 165 -175. https://doi.org/10.1007/978-3-030-04497-8_14
-
[11]
Jeong, C. & Jeong, J. (2020). A study on buil ding AI chatbots for the post -COVID-19 untact era. Journal of the Korea Institute of IT Service, 19(4), 31–47. https://doi.org/10.9716/KITS.2020.19.4.031
-
[12]
Foun- dation models for decision making: Problems, methods, and opportunities
Yang, et al., “Foundation Models for Decision Making: Problems, Methods, and Opportunities”, 2023. arXiv preprint arXiv:2303.04129
-
[13]
AgentBench: Evaluating LLMs as Agents
Karpas, E., et al., “AgentBench: Evaluating LLMs as Agents”, 2023. arXiv preprint arXiv:2308.03688
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[14]
Jeong, C. (2025). Beyond Text: Implementing Multimodal Large Language Model -Powered Multi-Agent Systems Using a No-Code Platform. Journal of Intelligence and Information Systems. 2025;31(1):191-231.doi:10.13088/jiis.2025.31.1.191
-
[15]
Jeong, C. (2025). A Practical MCP×A2A Integration Framework for Interoperability in LLM -Based Autonomous Multi - Agent Systems. Journal of Intelligence an d Information Systems , 31(3), 141 -170. https://dx.doi.org/10.13088/jiis.2025.31.3.141
-
[16]
Jeong, C. S., Sim, S. M., Cho, H. Y., Kim, S. S., & Shin, B. K. (2025). E2E Process Automation Leveraging Generative AI and IDP-Based Automation Agent: A Case Study on Corporate Expense Processing. Artificial Intelligence and Applications. . https://doi.org/10.47852/bonviewAIA52026307
-
[17]
Schlagwein, D., & Willcocks, L. (2023). ‘ChatGPT et al.’: The ethics of using (generative) artificial intelligence in research and science. Journal of Information Technology, 38(3), 232-238
work page 2023
-
[18]
M., Weinhardt, C., van der Aalst, W., & Hinz, O
Teubner, T., Flath, C. M., Weinhardt, C., van der Aalst, W., & Hinz, O. (2023). Willkommen im Zeitalter von ChatGPT & Co.: Die Chancen großer Sprachmodelle. Business & Information Systems Engineering, 65(2), 95-101
work page 2023
-
[19]
National Research Foundation of Korea. (2025, July 5). Researchers' Perceptions on Generative AI and Research Ethics. Retrieved from https://kenss.or.kr/board/data/article/252645
work page 2025
-
[20]
European Commission. (2019). Ethics Guidelines for Trustworthy AI. High-Level Expert Group on Artificial Intelligence. Received: 29 August 2025 | Revised: 29 October 2025 ______________________________________________________________________________ 22
work page 2019
-
[21]
National Institute of Standards and Technology (NIST). (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1
work page 2023
-
[22]
Korea Institute for Industrial Technology Promotion. (2025, Ju ly 5). China Establishes Next -Generation AI Ethical Guidelines (China Ministry of Science and Technology, September 26). Retrieved from https://eiec.kdi.re.kr/policy/domesticView.do?ac=0000159203&issus=
work page 2025
-
[23]
Ministry of Science and ICT. (2020). Principles of Artificial Intelligence Ethics Centered on People
work page 2020
-
[24]
SPRi. (2025, April 30). Research on AI Reliability and Ethical Systems. Retrieved from https://spri.kr/posts/view/23864?code=research
work page 2025
-
[25]
Korea Information Society Development Institute (KISDI). (2023). Self -Checklist for Implementing the 2023 AI Ethics Guidelines. Retrieved from https://www.kisdi.re.kr/bbs/view.do?bbsSn=114068&key=m2101113055944
work page 2023
-
[26]
The White House. (2022). Blueprint for an AI Bill of Rights
work page 2022
-
[27]
State Council of the People's Republic of China. (2017). New Generation Artificial Intelligence Development Plan
work page 2017
-
[28]
Administration of Cyberspace of China (CAC). (2022). Regulations on the Management of Algorithm Recommendation Services
work page 2022
-
[29]
Lee, J. (2022). A Study on the Ethics Policy of Artificial Intelligence (AI) in China. The Korean Association of Chinese Studies, no. 80, 69 – 87. http://dx.doi.org/10.14378/KACS.2022.80.80.4
-
[30]
FAIR AI. (2025, July 5). AI Ethics Guidelines by Country. Retrieved from https://fairai.or.kr/updates/guidelines
work page 2025
-
[31]
Jeong, C. (2025). Design and Evaluation Methods for LLM -Based Explainable AI (XAI) -Based Human-AI Collaboration Systems. Advances in Artificial Intelligence and Machine Learning , 5(3), 4308 -4341. https://dx.doi.org/10.54364/AAIML.2025.53240
-
[32]
Jacoby, J., & Matell, M. S. (1971). Three-Point Likert Scales Are Good Enough. Journal of Marketing Research, 8(4), 495-
work page 1971
-
[33]
https://doi.org/10.2307/3150242
-
[34]
Firstpagesage. (2025, September, 11). Top Generative AI Chatbots by Ma rket Share – September 2025. Retrieved from https://firstpagesage.com/reports/top-generative-ai-chatbots/
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.