A Unified Approach to Interpreting Model Predictions , url =

Lundberg, Scott M, Lee, Su-In , booktitle =

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

LLM Output Detectability and Task Performance Can be Jointly Optimized

cs.CL · 2026-05-02 · unverdicted · novelty 6.0

PUPPET jointly optimizes LLM outputs for high detectability and task performance via RL rewards from a detector and a task evaluator, outperforming watermarking on tasks while matching detectability.

Beyond Performance Disparities: A Three-Level Audit of Representational Harm in CelebA

cs.CY · 2026-05-14 · unverdicted · novelty 5.0

CelebA encodes gendered double standards of ageing and beauty that produce hyper-scrutiny of women and categorical exclusion of older men across labels, feature weights, and spatial attention.

citing papers explorer

Showing 2 of 2 citing papers.

LLM Output Detectability and Task Performance Can be Jointly Optimized cs.CL · 2026-05-02 · unverdicted · none · ref 59
PUPPET jointly optimizes LLM outputs for high detectability and task performance via RL rewards from a detector and a task evaluator, outperforming watermarking on tasks while matching detectability.
Beyond Performance Disparities: A Three-Level Audit of Representational Harm in CelebA cs.CY · 2026-05-14 · unverdicted · none · ref 93
CelebA encodes gendered double standards of ageing and beauty that produce hyper-scrutiny of women and categorical exclusion of older men across labels, feature weights, and spatial attention.

A Unified Approach to Interpreting Model Predictions , url =

fields

years

verdicts

representative citing papers

citing papers explorer