Red teaming chatgpt via jailbreaking: Bias, robustness, reliability and toxicity

Terry Yue Zhuo, Yujin Huang, Chunyang Chen, Zhenchang Xing · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration

cs.CR · 2025-12-08 · unverdicted · novelty 5.0

AgentCrypt introduces a deterministic three-tier privacy framework for AI agent collaboration that uses masking and homomorphic encryption to protect data independently of model accuracy.

TrustLLM: Trustworthiness in Large Language Models

cs.CL · 2024-01-10 · unverdicted · novelty 5.0

TrustLLM defines eight trustworthiness principles, creates a six-dimension benchmark, and evaluates 16 LLMs showing proprietary models generally lead but some open-source ones are close while over-calibration can hurt utility.

citing papers explorer

Showing 1 of 1 citing paper after filters.

AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration cs.CR · 2025-12-08 · unverdicted · none · ref 42
AgentCrypt introduces a deterministic three-tier privacy framework for AI agent collaboration that uses masking and homomorphic encryption to protect data independently of model accuracy.

Red teaming chatgpt via jailbreaking: Bias, robustness, reliability and toxicity

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer