Recognition: unknown
TRUST: A Framework for Decentralized AI Service v.0.1
Pith reviewed 2026-05-07 08:15 UTC · model grok-4.3
The pith
TRUST decentralizes auditing of AI reasoning chains by breaking them into hierarchical graphs and aligning incentives so honest participants profit.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TRUST decomposes Chain-of-Thought reasoning into five abstraction levels via Hierarchical Directed Acyclic Graphs for distributed auditing, projects multi-agent interactions into Causal Interaction Graphs through the DAAN protocol for deterministic root-cause attribution, and employs a multi-tier consensus with stake-weighted voting among computational checkers, LLM evaluators, and human experts to guarantee correctness under 30 percent adversarial participation. The Safety-Profitability Theorem ensures honest auditors profit while malicious actors incur losses, with all decisions recorded on-chain and privacy preserved by segmentation. Empirical tests show 72.4 percent accuracy, resilience,
What carries the argument
Hierarchical Directed Acyclic Graphs (HDAGs) for decomposing reasoning, the DAAN protocol for mapping to Causal Interaction Graphs, and the multi-tier stake-weighted consensus among checkers, evaluators, and experts.
Load-bearing premise
The multi-tier consensus with stake-weighted voting guarantees correctness under 30 percent adversarial participation and the DAAN protocol projects interactions to causal graphs without loss of information for root-cause attribution.
What would settle it
An experiment in which 35 percent of participants act adversarially and the system either drops below 60 percent accuracy or allows a malicious actor to profit, violating the Safety-Profitability Theorem.
Figures
read the original abstract
Large Reasoning Models (LRMs) and Multi-Agent Systems (MAS) in high-stakes domains demand reliable verification, yet centralized approaches suffer four limitations: (1) Robustness, with single points of failure vulnerable to attacks and bias; (2) Scalability, as reasoning complexity creates bottlenecks; (3) Opacity, as hidden auditing erodes trust; and (4) Privacy, as exposed reasoning traces risk model theft. We introduce TRUST (Transparent, Robust, and Unified Services for Trustworthy AI), a decentralized framework with three innovations: (i) Hierarchical Directed Acyclic Graphs (HDAGs) that decompose Chain-of-Thought reasoning into five abstraction levels for parallel distributed auditing; (ii) the DAAN protocol, which projects multi-agent interactions into Causal Interaction Graphs (CIGs) for deterministic root-cause attribution; and (iii) a multi-tier consensus mechanism among computational checkers, LLM evaluators, and human experts with stake-weighted voting that guarantees correctness under 30% adversarial participation. We prove a Safety-Profitability Theorem ensuring honest auditors profit while malicious actors incur losses. All decisions are recorded on-chain, while privacy-by-design segmentation prevents reconstruction of proprietary logic. Across multiple LLMs and benchmarks, TRUST attains 72.4% accuracy (4-18% above baselines) and remains resilient against 20% corruption. DAAN reaches 70% root-cause attribution (vs. 54-63% for standard methods) with 60% token savings. Human studies validate the design (F1 = 0.89, Brier = 0.074). The framework supports (A1) decentralized auditing, (A2) tamper-proof leaderboards, (A3) trustless data annotation, and (A4) governed autonomous agents, pioneering decentralized AI auditing for safe, accountable deployment of reasoning-capable systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TRUST, a decentralized framework for auditing Large Reasoning Models and Multi-Agent Systems. It proposes three main innovations: Hierarchical Directed Acyclic Graphs (HDAGs) to decompose Chain-of-Thought reasoning into five abstraction levels for distributed auditing, the DAAN protocol that projects multi-agent interactions into Causal Interaction Graphs (CIGs) for root-cause attribution, and a multi-tier consensus mechanism involving computational checkers, LLM evaluators, and human experts using stake-weighted voting. The authors assert a proved Safety-Profitability Theorem that ensures honest auditors profit while malicious actors lose, and report empirical results of 72.4% accuracy (4-18% above baselines), 70% root-cause attribution (vs. 54-63% for baselines), 60% token savings, and resilience to 20% corruption, with human validation metrics (F1=0.89, Brier=0.074). The framework is positioned to support decentralized auditing, tamper-proof leaderboards, trustless annotation, and governed agents.
Significance. If the Safety-Profitability Theorem were rigorously derived and the performance claims validated with reproducible experiments, the work would offer a novel approach to addressing robustness, scalability, opacity, and privacy issues in AI auditing through decentralization and on-chain recording. The combination of HDAG decomposition with CIG-based attribution and hybrid consensus could advance trustworthy AI deployment in high-stakes domains. However, the manuscript provides no formal derivations, experimental protocols, or data, so the potential impact cannot be assessed at present.
major comments (4)
- [Abstract] Abstract: The Safety-Profitability Theorem is asserted as proved, ensuring honest auditors profit under 30% adversarial participation, but the manuscript contains no proof sketch, lemmas, utility functions for participants, or analysis of false-positive/negative rates under collusion or sybil attacks. This is load-bearing for the central claim.
- [Abstract] Abstract: Performance claims (72.4% accuracy, 70% root-cause attribution, resilience against 20% corruption, 60% token savings) are stated without any experimental setup, datasets, baselines, number of runs, error bars, or statistical tests. These numbers cannot be evaluated or reproduced from the given text.
- [Framework Description] Framework / DAAN protocol: The claim that projection of multi-agent traces to Causal Interaction Graphs enables deterministic root-cause attribution without loss of information is presented as an axiom, but no formal argument, information-theoretic bound, or ablation study shows that the five-level HDAG decomposition preserves causal information.
- [Consensus Mechanism] Consensus mechanism: The multi-tier stake-weighted voting is said to guarantee correctness under 30% adversarial participation, yet there is no explicit adversary model, game-theoretic analysis, or bounds on detection rates when adversaries control up to 30% of stake.
minor comments (2)
- [Abstract] Abstract: The human studies are summarized only by aggregate metrics (F1 = 0.89, Brier = 0.074) with no details on study design, participant expertise, or task description.
- [Notation and Definitions] Notation: The definitions of HDAGs, the five abstraction levels, and CIGs would benefit from explicit mathematical notation or pseudocode to support reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review of our manuscript on the TRUST framework. We appreciate the recognition of its potential to address key challenges in decentralized AI auditing. We agree that the current version requires additional formal details and experimental documentation to substantiate the central claims. Below we respond point-by-point to the major comments and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: The Safety-Profitability Theorem is asserted as proved, ensuring honest auditors profit under 30% adversarial participation, but the manuscript contains no proof sketch, lemmas, utility functions for participants, or analysis of false-positive/negative rates under collusion or sybil attacks. This is load-bearing for the central claim.
Authors: We acknowledge that the Safety-Profitability Theorem is central to the framework and that the current manuscript states the theorem and its high-level implications without including the full derivation. This omission stems from length constraints in the initial v0.1 submission. In the revised manuscript we will add a dedicated Theoretical Analysis section containing a complete proof sketch, the utility functions for honest and malicious auditors, supporting lemmas on incentive compatibility, and an explicit analysis of false-positive and false-negative rates under collusion and Sybil attacks, all within the 30% adversarial participation bound. revision: yes
-
Referee: [Abstract] Abstract: Performance claims (72.4% accuracy, 70% root-cause attribution, resilience against 20% corruption, 60% token savings) are stated without any experimental setup, datasets, baselines, number of runs, error bars, or statistical tests. These numbers cannot be evaluated or reproduced from the given text.
Authors: We agree that the reported performance figures require full experimental documentation for reproducibility and evaluation. The numbers derive from our internal experiments across multiple LLMs and standard benchmarks, but the manuscript omits the protocol details. In the revision we will insert a comprehensive Experiments section that specifies the datasets, baselines, number of independent runs, error bars, and statistical tests. We will also release the associated code and data artifacts to enable independent verification. revision: yes
-
Referee: [Framework Description] Framework / DAAN protocol: The claim that projection of multi-agent traces to Causal Interaction Graphs enables deterministic root-cause attribution without loss of information is presented as an axiom, but no formal argument, information-theoretic bound, or ablation study shows that the five-level HDAG decomposition preserves causal information.
Authors: The DAAN protocol is designed so that the projection onto Causal Interaction Graphs preserves the necessary causal structure via the five-level HDAG decomposition. While the current text presents this property concisely, we accept that a formal justification is needed. We will expand the Framework Description section with a formal argument, including an information-theoretic bound demonstrating preservation of causal information, together with ablation studies that quantify the contribution of each HDAG level to root-cause attribution accuracy. revision: yes
-
Referee: [Consensus Mechanism] Consensus mechanism: The multi-tier stake-weighted voting is said to guarantee correctness under 30% adversarial participation, yet there is no explicit adversary model, game-theoretic analysis, or bounds on detection rates when adversaries control up to 30% of stake.
Authors: We recognize that the current description of the multi-tier consensus mechanism is high-level and lacks an explicit adversary model. In the revised manuscript we will augment the Consensus Mechanism section with a formal adversary model, a game-theoretic analysis of the stake-weighted voting incentives, and derived bounds on detection rates and overall correctness under adversarial control of up to 30% of the stake, building on Byzantine fault tolerance principles adapted to the hybrid checker-LLM-human setting. revision: yes
Circularity Check
Safety-Profitability Theorem reduces to the framework's own 30% adversarial consensus assumptions by construction
specific steps
-
self definitional
[Abstract]
"We prove a Safety-Profitability Theorem ensuring honest auditors profit while malicious actors incur losses. ... a multi-tier consensus mechanism among computational checkers, LLM evaluators, and human experts with stake-weighted voting that guarantees correctness under 30% adversarial participation."
The theorem's conclusion (honest profit, malicious loss) is identical to the built-in 'guarantees correctness' property of the stake-weighted voting rule at the 30% threshold. The claimed proof therefore reduces to the framework's own design assumptions without additional mathematical content or external adversary analysis.
full rationale
The paper's load-bearing theoretical result is the Safety-Profitability Theorem, which is asserted to follow from the multi-tier consensus mechanism. However, the mechanism is defined to 'guarantee correctness under 30% adversarial participation' via stake-weighted voting, and the theorem simply restates that honest participants profit while malicious ones lose under that same guarantee. No independent game-theoretic model, utility functions, or bounding lemmas are supplied; the profitability outcome is therefore entailed directly by the design choice rather than derived. Experimental accuracy and attribution figures are internal evaluations of the same unverified mechanism and do not break the circularity. This matches the self-definitional pattern exactly.
Axiom & Free-Parameter Ledger
free parameters (2)
- 30% adversarial participation threshold
- stake-weighted voting parameters
axioms (2)
- domain assumption Multi-tier consensus with stake-weighted voting guarantees correctness under up to 30% adversarial participation
- ad hoc to paper Projection of multi-agent interactions into Causal Interaction Graphs enables deterministic root-cause attribution
invented entities (3)
-
Hierarchical Directed Acyclic Graphs (HDAGs)
no independent evidence
-
DAAN protocol
no independent evidence
-
Causal Interaction Graphs (CIGs)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Artificial intelligence risk management framework (ai rmf 1.0)
AI, N. Artificial intelligence risk management framework (ai rmf 1.0). URL: https://nvlpubs. nist. gov/nistpubs/ai/nist. ai, pp.\ 100--1, 2023
2023
-
[2]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022
work page internal anchor Pith review arXiv 2022
-
[3]
M., Gebru, T., McMillan-Major, A., and Shmitchell, S
Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp.\ 610--623, 2021
2021
-
[4]
A., MacKnight, R., Kline, B., and Gomes, G
Boiko, D. A., MacKnight, R., Kline, B., and Gomes, G. Autonomous chemical research with large language models. Nature, 624 0 (7992): 0 570--578, 2023
2023
-
[5]
The foundation model transparency index
Bommasani, R., Klyman, K., Longpre, S., Kapoor, S., Maslej, N., Xiong, B., Zhang, D., and Liang, P. The foundation model transparency index. arXiv preprint arXiv:2310.12941, 2023
-
[6]
Extracting training data from large language models
Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, U., et al. Extracting training data from large language models. In 30th USENIX security symposium (USENIX Security 21), pp.\ 2633--2650, 2021
2021
-
[7]
Practical byzantine fault tolerance
Castro, M., Liskov, B., et al. Practical byzantine fault tolerance. In OsDI, volume 99, pp.\ 173--186, 1999
1999
-
[8]
Why Do Multi-Agent LLM Systems Fail?
Cemri, M., Pan, M. Z., Yang, S., Agrawal, L. A., Chopra, B., Tiwari, R., Keutzer, K., Parameswaran, A., Klein, D., Ramchandran, K., Zaharia, M., Gonzalez, J. E., and Stoica, I. Why do multi-agent llm systems fail? arXiv preprint arXiv:2503.13657, 2025. URL https://arxiv.org/abs/2503.13657
work page internal anchor Pith review arXiv 2025
-
[9]
Chalkidis, I., Jana, A., Hartung, D., Bommarito, M., Androutsopoulos, I., Katz, D. M., and Aletras, N. Lexglue: A benchmark dataset for legal language understanding in english. arXiv preprint arXiv:2110.00976, 2021
-
[10]
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chan, C.-M., Chen, W., Su, Y., Yu, J., Xue, W., Zhang, S., Fu, J., and Liu, Z. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201, 2023
work page internal anchor Pith review arXiv 2023
-
[11]
H., Chen, S., Liu, Z., Jiang, F., and Wang, B
Chen, G. H., Chen, S., Liu, Z., Jiang, F., and Wang, B. Humans or llms as the judge? a study on judgement biases. arXiv preprint arXiv:2402.10669, 2024
-
[12]
Laying down harmonised rules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts
COM, E. Laying down harmonised rules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts. Proposal for a regulation of the European parliament and of the council, 2021
2021
-
[13]
B., and Mordatch, I
Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., and Mordatch, I. Improving factuality and reasoning in language models through multiagent debate. In Forty-first International Conference on Machine Learning, 2023
2023
-
[14]
Gu, J., Jiang, X., Shi, Z., Tan, H., Zhai, X., Xu, C., Li, W., Shen, Y., Ma, S., Liu, H., Wang, S., Zhang, K., Wang, Y., Gao, W., Ni, L., and Guo, J. A survey on LLM -as-a-judge. arXiv preprint arXiv:2411.15594, 2024. URL https://arxiv.org/abs/2411.15594
work page internal anchor Pith review arXiv 2024
-
[15]
Deepseek-r1 incentivizes reasoning in llms through reinforcement learning
Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X., et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature, 645 0 (8081): 0 633--638, 2025
2025
-
[16]
D3: Dissecting multi-agent debate for LLM evaluation, 2024
Harrasse, A., Roch, R., Valentini, E., Bontempi, G., Schmidhuber, J., et al. D3: Dissecting multi-agent debate for LLM evaluation, 2024. URL https://arxiv.org/abs/2410.04663
-
[17]
Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S. K. S., Lin, Z., et al. Metagpt: Meta programming for a multi-agent collaborative framework. In The twelfth international conference on learning representations, 2023
2023
-
[18]
Jaech, A., Kalai, A., Lerer, A., Richardson, A., El-Kishky, A., Low, A., Helyar, A., Madry, A., Beutel, A., Carney, A., et al. Openai o1 system card. arXiv preprint arXiv:2412.16720, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
Fault localization using interventional causal learning for cloud-native applications
Jha, S., Rios, J., Abe, N., Bagehorn, F., and Shwartz, L. Fault localization using interventional causal learning for cloud-native applications. In 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S), pp.\ 141--147. IEEE, 2024
2024
-
[20]
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., Narasimhan, K., Vaswani, A., Jiang, C., R e gis, Y., and Hsu, D. SWE -bench: Can language models resolve real-world GitHub issues?, 2024. URL https://arxiv.org/abs/2310.06770
work page internal anchor Pith review arXiv 2024
-
[21]
Kothapalli, V., Firooz, H., and Sanjabi, M. Cot-icl lab: A synthetic framework for studying chain-of-thought learning from in-context demonstrations. arXiv preprint arXiv:2502.15132, 2025
-
[22]
The byzantine generals problem
Lamport, L., Shostak, R., and Pease, M. The byzantine generals problem. In Concurrency: the works of leslie lamport, pp.\ 203--226. 2019
2019
-
[23]
Measuring Faithfulness in Chain-of-Thought Reasoning
Lanham, T., Chen, A., Radhakrishnan, A., Steiner, B., Denison, C., Hernandez, D., Li, D., Durmus, E., Hubinger, E., Kernion, J., et al. Measuring faithfulness in chain-of-thought reasoning. arXiv preprint arXiv:2307.13702, 2023
work page internal anchor Pith review arXiv 2023
-
[24]
Holistic Evaluation of Language Models
Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., Zhang, Y., Narayanan, D., Wu, Y., Kumar, A., et al. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110, 2022
work page internal anchor Pith review arXiv 2022
-
[25]
Let's verify step by step
Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., and Cobbe, K. Let's verify step by step. In The Twelfth International Conference on Learning Representations, 2023
2023
-
[26]
Luo, Y., Song, Y., Zhang, X., Liu, J., Wang, W., Chen, G., Su, W., and Zheng, B. Deconstructing long chain-of-thought: A structured reasoning optimization framework for long cot distillation. arXiv preprint arXiv:2503.16385, 2025
-
[27]
D., and Gebru, T
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., and Gebru, T. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency, pp.\ 220--229, 2019
2019
-
[28]
Feder Cooper, Daphne Ippolito, Christopher A
Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A. F., Ippolito, D., Choquette-Choo, C. A., Wallace, E., Tram \`e r, F., and Lee, K. Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035, 2023
-
[29]
OpenAI . GPT -4 technical report, 2023. URL https://arxiv.org/abs/2303.08774
work page internal anchor Pith review arXiv 2023
-
[30]
Llm evaluators recognize and favor their own generations
Panickssery, A., Bowman, S., and Feng, S. Llm evaluators recognize and favor their own generations. Advances in Neural Information Processing Systems, 37: 0 68772--68802, 2024
2024
-
[31]
Ignore Previous Prompt: Attack Techniques For Language Models
Perez, F. and Ribeiro, I. Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527, 2022
work page internal anchor Pith review arXiv 2022
-
[32]
https://doi.org/10.48550/ arXiv.2508.05687
Reid, A., O'Callaghan, S., Carroll, L., and Caetano, T. Risk analysis techniques for governed llm-based multi-agent systems. arXiv preprint arXiv:2508.05687, 2025
-
[33]
Sarafyazd, M. and Jazayeri, M. Hierarchical reasoning by neural circuits in the frontal cortex. Science, 364 0 (6441): 0 eaav8911, 2019. doi:10.1126/science.aav8911. URL https://www.science.org/doi/abs/10.1126/science.aav8911
-
[34]
Toolformer: Language Models Can Teach Themselves to Use Tools
Schick, T., Dwivedi-Yu, J., Dessi, R., Raileanu, R., Lombrozo, T., Zettlemoyer, L., Liang, P., Hwang, J., Lai, C., Tsvetkov, Y., Ranzato, M., and Kim, Y. Toolformer: Language models can teach themselves to use tools, 2023. URL https://arxiv.org/abs/2302.04761
work page internal anchor Pith review arXiv 2023
-
[35]
Judging the judges: A systematic study of position bias in LLM -as-a-judge
Shi, L., Ma, C., Liang, W., Diao, X., Ma, W., and Vosoughi, S. Judging the judges: A systematic study of position bias in LLM -as-a-judge. In Inui, K., Sakti, S., Wang, H., Wong, D. F., Bhattacharyya, P., Banerjee, B., Ekbal, A., Chakraborty, T., and Singh, D. P. (eds.), Proceedings of the 14th International Joint Conference on Natural Language Processing...
2025
-
[36]
S., Wei, J., Chung, H
Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., et al. Large language models encode clinical knowledge. Nature, 620 0 (7972): 0 172--180, 2023
2023
-
[37]
Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting
Turpin, M., Michael, J., Perez, E., and Bowman, S. Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems, 36: 0 74952--74965, 2023
2023
- [38]
-
[39]
V., Zhou, D., et al
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 0 24824--24837, 2022
2022
-
[40]
Xie, Z., Zheng, Y., Ottens, L., Zhang, K., Kozyrakis, C., and Mace, J. Cloud atlas: Efficient fault localization for cloud systems using language models and causal insight, 2024. URL https://arxiv.org/abs/2407.08694
-
[41]
Tree of thoughts: Deliberate problem solving with large language models
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., Cao, Y., and Narasimhan, K. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems, 36: 0 11809--11822, 2023 a
2023
-
[42]
React: Synergizing reasoning and acting in language models
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023 b
2023
-
[43]
Justice or prejudice? quantifying biases in llm-as-a-judge
Ye, J., Wang, Y., Huang, Y., Chen, D., Zhang, Q., Moniz, N., Gao, T., Geyer, W., Huang, C., Chen, P.-Y., et al. Justice or prejudice? quantifying biases in llm-as-a-judge. arXiv preprint arXiv:2410.02736, 2024
-
[44]
Judging llm-as-a-judge with mt-bench and chatbot arena
Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in neural information processing systems, 36: 0 46595--46623, 2023
2023
-
[45]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., and Fredrikson, M. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023
work page internal anchor Pith review arXiv 2023
-
[46]
write newline
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.