Traian Rebedea, Razvan Dinu, Makesh Sreedhar, Christopher Parisien, and Jonathan Cohen

Mbias: Mitigating bias in large language models while retaining context · 2023 · arXiv 2405.11290

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Auditing Proprietary Alignment in Large Language Models: A Comparative Framework Without a Ground-Truth Standard

cs.CL · 2026-06-07 · unverdicted · novelty 5.0

A comparative statistical framework is proposed to audit proprietary alignment in black-box LLMs by quantifying behavioral divergences from reference models rather than absolute correctness.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Auditing Proprietary Alignment in Large Language Models: A Comparative Framework Without a Ground-Truth Standard cs.CL · 2026-06-07 · unverdicted · none · ref 9
A comparative statistical framework is proposed to audit proprietary alignment in black-box LLMs by quantifying behavioral divergences from reference models rather than absolute correctness.

Traian Rebedea, Razvan Dinu, Makesh Sreedhar, Christopher Parisien, and Jonathan Cohen

fields

years

verdicts

representative citing papers

citing papers explorer