arXiv:2212.08061 (2022), https://arxiv.org/abs/2212.08061

Shaikh, O · 2022 · arXiv 2212.08061

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

dataset 2

citation-polarity summary

use dataset 2

representative citing papers

Navigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarks

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

Toxicity benchmarks for LLMs produce inconsistent results when task type, input domain, or model changes, revealing intrinsic evaluation biases.

AI Failures in the Eyes of the Downstream Developer: A First Look at Concerns, Practices, and Challenges

cs.SE · 2025-03-25 · unverdicted · novelty 6.0

Mixed-methods study maps downstream developers' concerns, practices, and challenges with AI failures in PTM-based software.

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

cs.AI · 2024-08-01 · conditional · novelty 6.0

Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.

A StrongREJECT for Empty Jailbreaks

cs.LG · 2024-02-15 · conditional · novelty 6.0

StrongREJECT provides a standardized benchmark and evaluator for jailbreak attacks that aligns better with human judgments than prior methods and reveals that successful jailbreaks often reduce model capabilities.

Jailbroken: How Does LLM Safety Training Fail?

cs.LG · 2023-07-05 · unverdicted · novelty 6.0

LLM safety training fails due to competing objectives and mismatched generalization, enabling new jailbreaks that succeed on all unsafe prompts from red-teaming sets in GPT-4 and Claude.

When Prompts Become Payloads: A Framework for Mitigating SQL Injection Attacks in Large Language Model-Driven Applications

cs.CR · 2026-05-11 · unverdicted · novelty 5.0

A framework with prompt sanitization, behavioral anomaly detection, and signature controls mitigates SQL injection in LLM-driven database apps, showing high accuracy on adversarial benchmarks.

Mitigating Watermark Forgery in Generative Models via Randomized Key Selection

cs.CR · 2025-07-10 · unverdicted · novelty 5.0

Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.

LLM-Safety Evaluations Lack Robustness

cs.CR · 2025-03-04 · unverdicted · novelty 4.0

LLM safety evaluations are hindered by noise in dataset curation, automated red-teaming, response generation, and LLM-judge evaluation, making fair comparisons difficult and slowing progress.

citing papers explorer

Showing 8 of 8 citing papers.

Navigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarks cs.AI · 2026-05-11 · unverdicted · none · ref 25
Toxicity benchmarks for LLMs produce inconsistent results when task type, input domain, or model changes, revealing intrinsic evaluation biases.
AI Failures in the Eyes of the Downstream Developer: A First Look at Concerns, Practices, and Challenges cs.SE · 2025-03-25 · unverdicted · none · ref 104
Mixed-methods study maps downstream developers' concerns, practices, and challenges with AI failures in PTM-based software.
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models cs.AI · 2024-08-01 · conditional · none · ref 252
Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.
A StrongREJECT for Empty Jailbreaks cs.LG · 2024-02-15 · conditional · none · ref 30
StrongREJECT provides a standardized benchmark and evaluator for jailbreak attacks that aligns better with human judgments than prior methods and reveals that successful jailbreaks often reduce model capabilities.
Jailbroken: How Does LLM Safety Training Fail? cs.LG · 2023-07-05 · unverdicted · none · ref 44
LLM safety training fails due to competing objectives and mismatched generalization, enabling new jailbreaks that succeed on all unsafe prompts from red-teaming sets in GPT-4 and Claude.
When Prompts Become Payloads: A Framework for Mitigating SQL Injection Attacks in Large Language Model-Driven Applications cs.CR · 2026-05-11 · unverdicted · none · ref 4
A framework with prompt sanitization, behavioral anomaly detection, and signature controls mitigates SQL injection in LLM-driven database apps, showing high accuracy on adversarial benchmarks.
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection cs.CR · 2025-07-10 · unverdicted · none · ref 34
Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.
LLM-Safety Evaluations Lack Robustness cs.CR · 2025-03-04 · unverdicted · none · ref 50
LLM safety evaluations are hindered by noise in dataset curation, automated red-teaming, response generation, and LLM-judge evaluation, making fair comparisons difficult and slowing progress.

arXiv:2212.08061 (2022), https://arxiv.org/abs/2212.08061

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer