In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT.CoRR abs/2304.08979

Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang · 2023 · arXiv 2304.08979

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

cs.SE · 2024-08-02 · unverdicted · novelty 7.0

Healer uses LLMs to dynamically generate and execute runtime error-handling code, with GPT-4 recovering from 72.8% of errors across four datasets.

Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

PopQuiz Attack infers LLM training data membership by turning examples into quiz questions and measuring answer accuracy, reaching 0.873 average ROC-AUC across six models and outperforming prior methods by 20.6%.

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

cs.CR · 2023-08-07 · unverdicted · novelty 6.0

Real-world jailbreak prompts collected from the wild achieve up to 0.95 attack success rates against major LLMs including GPT-4, with some persisting for over 240 days.

Discerning Authorship in Online Health Communities: Experience, Trust, and Transparency Implications for Moderating AI

cs.HC · 2026-04-21 · unverdicted · novelty 5.0

People show little ability to distinguish AI-generated from human-written health advice in online communities, with detection varying by health condition and unreliable heuristics.

GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs

cs.CL · 2025-08-28 · unverdicted · novelty 5.0

GUARD automates generation of guideline-violating questions and jailbreak diagnostics to test LLM compliance with government ethics guidelines, validated empirically on eight models and extended to vision-language models.

Red Skills or Blue Skills? A Dive Into Skills Published on ClawHub

cs.CL · 2026-03-19 · unverdicted · novelty 4.0

Analysis of ClawHub shows language-based functional divides in agent skills, with over 30% flagged suspicious and submission-time documentation enabling 73% accurate risk prediction.

citing papers explorer

Showing 6 of 6 citing papers.

Towards Agentic Runtime Healing cs.SE · 2024-08-02 · unverdicted · none · ref 50
Healer uses LLMs to dynamically generate and execute runtime error-handling code, with GPT-4 recovering from 72.8% of errors across four datasets.
Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models cs.CR · 2026-05-07 · unverdicted · none · ref 40
PopQuiz Attack infers LLM training data membership by turning examples into quiz questions and measuring answer accuracy, reaching 0.873 average ROC-AUC across six models and outperforming prior methods by 20.6%.
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models cs.CR · 2023-08-07 · unverdicted · none · ref 75
Real-world jailbreak prompts collected from the wild achieve up to 0.95 attack success rates against major LLMs including GPT-4, with some persisting for over 240 days.
Discerning Authorship in Online Health Communities: Experience, Trust, and Transparency Implications for Moderating AI cs.HC · 2026-04-21 · unverdicted · none · ref 69
People show little ability to distinguish AI-generated from human-written health advice in online communities, with detection varying by health condition and unreliable heuristics.
GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs cs.CL · 2025-08-28 · unverdicted · none · ref 21
GUARD automates generation of guideline-violating questions and jailbreak diagnostics to test LLM compliance with government ethics guidelines, validated empirically on eight models and extended to vision-language models.
Red Skills or Blue Skills? A Dive Into Skills Published on ClawHub cs.CL · 2026-03-19 · unverdicted · none · ref 16
Analysis of ClawHub shows language-based functional divides in agent skills, with over 30% flagged suspicious and submission-time documentation enabling 73% accurate risk prediction.

In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT.CoRR abs/2304.08979

fields

years

verdicts

representative citing papers

citing papers explorer