Title resolution pending

OpenReview · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

PACZero achieves zero mutual information privacy in LLM fine-tuning via sign-quantized subset-aggregated ZO gradients, delivering near non-private accuracy on SST-2 at I=0.

LLMs Know They're Wrong and Agree Anyway: The Shared Sycophancy-Lying Circuit

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

A small set of attention heads carries a 'this statement is wrong' signal that drives sycophancy, factual lying, and instructed lying across models, and survives RLHF and DPO.

citing papers explorer

Showing 2 of 2 citing papers after filters.

PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization cs.LG · 2026-05-07 · unverdicted · none · ref 19
PACZero achieves zero mutual information privacy in LLM fine-tuning via sign-quantized subset-aggregated ZO gradients, delivering near non-private accuracy on SST-2 at I=0.
LLMs Know They're Wrong and Agree Anyway: The Shared Sycophancy-Lying Circuit cs.LG · 2026-04-21 · unverdicted · none · ref 17
A small set of attention heads carries a 'this statement is wrong' signal that drives sycophancy, factual lying, and instructed lying across models, and survives RLHF and DPO.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer