Title resolution pending

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models , author= · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Decision theory shows that LLM cascades are structurally limited by always incurring the cheap model's cost before deciding to escalate, with the best performance given by the envelope of pairwise cascades rather than fixed chains or many stages.

Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

Stylistic rewrites of harmful prompts raise attack success rates from 3.84% to 36.8-65% across 31 frontier models, indicating weak generalization in safety refusals.

Contrastive Analysis of Linguistic Representations in Large Language Model Outputs through Structured Synthetic Data Generation and Abstracted N-gram Associations

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

A methodological framework detects subtle group-associated linguistic biases in LLM outputs by generating controlled synthetic minimal pairs, abstracting n-grams, and ranking high-signal fragments with a PMI variant for expert review.

citing papers explorer

Showing 3 of 3 citing papers.

Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades cs.LG · 2026-05-07 · unverdicted · none · ref 59
Decision theory shows that LLM cascades are structurally limited by always incurring the cheap model's cost before deciding to escalate, with the best performance given by the envelope of pairwise cascades rather than fixed chains or many stages.
Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety cs.CL · 2026-04-20 · unverdicted · none · ref 55
Stylistic rewrites of harmful prompts raise attack success rates from 3.84% to 36.8-65% across 31 frontier models, indicating weak generalization in safety refusals.
Contrastive Analysis of Linguistic Representations in Large Language Model Outputs through Structured Synthetic Data Generation and Abstracted N-gram Associations cs.CL · 2026-04-19 · unverdicted · none · ref 20
A methodological framework detects subtle group-associated linguistic biases in LLM outputs by generating controlled synthetic minimal pairs, abstracting n-grams, and ranking high-signal fragments with a PMI variant for expert review.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer