pith. sign in

Persistent pre-training poisoning of llms.ArXiv, abs/2410.13722:null, 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.AI 2 cs.CL 1

years

2026 2 2025 1

verdicts

UNVERDICTED 3

roles

background 1

polarities

unclear 1

representative citing papers

Iterative Finetuning is Mostly Idempotent

cs.AI · 2026-05-01 · unverdicted · novelty 6.0

Iterative self-finetuning of LLMs mostly fails to amplify seeded behavioral traits, with amplification limited to specific DPO setups and often harming coherence.

When AI reviews science: Can we trust the referee?

cs.AI · 2026-04-26 · unverdicted · novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.

citing papers explorer

Showing 3 of 3 citing papers.

  • Iterative Finetuning is Mostly Idempotent cs.AI · 2026-05-01 · unverdicted · none · ref 16

    Iterative self-finetuning of LLMs mostly fails to amplify seeded behavioral traits, with amplification limited to specific DPO setups and often harming coherence.

  • When AI reviews science: Can we trust the referee? cs.AI · 2026-04-26 · unverdicted · none · ref 16

    AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.

  • LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users cs.CL · 2025-07-03 · unverdicted · none · ref 15

    A single attacker can use strategic upvoting and downvoting on language model outputs to inject facts, security flaws, or fake news that persist in the model for all users after preference tuning.