Demystifying prompts in language models via perplexity estimation

Hila Gonen, Srini Iyer, Terra Blevins, Noah A Smith, Luke Zettlemoyer · 2022 · arXiv 2212.04037

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

From Competition to Collaboration: Designing Sustainable Mechanisms Between LLMs and Online Forums

cs.AI · 2026-02-04 · unverdicted · novelty 7.0

A new sequential interaction framework lets LLMs propose questions to forums, with simulations on real Stack Exchange data showing players can reach roughly half the utility of an ideal full-information scenario despite incentive misalignment.

Linking Extreme Discourse to Structural Polarization in Signed Interaction Networks

cs.SI · 2026-05-12 · unverdicted · novelty 6.0

A pipeline derives continuous signed edges from LLM stance scores on text and links discourse signals such as toxicity and extreme claims to changes in structural polarization measured by spectral and frustration scores on Reddit Brexit data.

SnapAudit: Active Auditing of Differentially Private In-Context Learning via Snapshot-Based Simulation

cs.CR · 2025-11-17 · conditional · novelty 6.0

SnapAudit decomposes DP-ICL into a deterministic snapshot stage and a stochastic noise stage, using bootstrap simulation to achieve 80-200x faster auditing and exposing privacy bound violations in existing Gaussian and embedding mechanisms.

Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI

cs.CR · 2025-07-08 · unverdicted · novelty 6.0

Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

cs.CL · 2023-10-17 · conditional · novelty 6.0

LLMs are highly sensitive to prompt formatting in few-shot settings, with accuracy varying by up to 76 points across formats; FormatSpread samples formats to report performance intervals without model weights.

Through the Stealth Lens: Attention-Aware Defenses Against Poisoning in RAG

cs.CR · 2025-06-04 · unverdicted · novelty 5.0

Introduces NPAS and AV Filter using LLM attention weights to defend RAG against poisoning, reporting up to 20% accuracy gains while adaptive attacks reach 35% success.

citing papers explorer

Showing 6 of 6 citing papers.

From Competition to Collaboration: Designing Sustainable Mechanisms Between LLMs and Online Forums cs.AI · 2026-02-04 · unverdicted · none · ref 22
A new sequential interaction framework lets LLMs propose questions to forums, with simulations on real Stack Exchange data showing players can reach roughly half the utility of an ideal full-information scenario despite incentive misalignment.
Linking Extreme Discourse to Structural Polarization in Signed Interaction Networks cs.SI · 2026-05-12 · unverdicted · none · ref 30
A pipeline derives continuous signed edges from LLM stance scores on text and links discourse signals such as toxicity and extreme claims to changes in structural polarization measured by spectral and frustration scores on Reddit Brexit data.
SnapAudit: Active Auditing of Differentially Private In-Context Learning via Snapshot-Based Simulation cs.CR · 2025-11-17 · conditional · none · ref 3
SnapAudit decomposes DP-ICL into a deterministic snapshot stage and a stochastic noise stage, using bootstrap simulation to achieve 80-200x faster auditing and exposing privacy bound violations in existing Gaussian and embedding mechanisms.
Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI cs.CR · 2025-07-08 · unverdicted · none · ref 26
Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting cs.CL · 2023-10-17 · conditional · none · ref 66
LLMs are highly sensitive to prompt formatting in few-shot settings, with accuracy varying by up to 76 points across formats; FormatSpread samples formats to report performance intervals without model weights.
Through the Stealth Lens: Attention-Aware Defenses Against Poisoning in RAG cs.CR · 2025-06-04 · unverdicted · none · ref 22
Introduces NPAS and AV Filter using LLM attention weights to defend RAG against poisoning, reporting up to 20% accuracy gains while adaptive attacks reach 35% success.

Demystifying prompts in language models via perplexity estimation

fields

years

verdicts

representative citing papers

citing papers explorer