Poller reduces LLM-human disagreement in evaluating Chinese poetry understanding by having LLMs role-play as authors, with reported error reductions of 94.55% and 89.53% on rhetorical techniques and defamiliarization.
Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
Although debiased large language models (LLMs) excel at handling known or low-bias prompts, they often fail on unfamiliar and high-bias prompts. We demonstrate via out-of-distribution (OOD) detection that these high-bias prompts cause a distribution shift, degrading static model performance. To enable real-time correction, we propose CAP-TTA, a test-time adaptation framework. CAP-TTA triggers context-aware LoRA updates only when a bias-risk score exceeds a set threshold. By utilizing an offline precomputed diagonal preconditioner, it ensures fast and stable optimization. Across multiple benchmarks and human evaluations, CAP-TTA effectively reduces toxicity/bias score with significantly lower latency than standard optimization methods (e.g., AdamW or SGD). Furthermore, it prevents catastrophic forgetting, and substantially improves narrative fluency over state-of-the-art baselines without compromising debiasing performance.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
An image-semantic guided method enhances MLLMs for detecting AI-generated modern Chinese poetry by combining poem text with visual representations of content, achieving 85.65% Macro-F1 with Gemini and outperforming text baselines and RoBERTa.
BV-Blend blends prompt-local and semantic-cluster historical reward statistics via SEM-derived weights to stabilize critic-free RL advantage estimation.
citing papers explorer
-
Poller: Are LLMs Suitable for Evaluating the Poetry Understanding Task?
Poller reduces LLM-human disagreement in evaluating Chinese poetry understanding by having LLMs role-play as authors, with reported error reductions of 94.55% and 89.53% on rhetorical techniques and defamiliarization.
-
Seeing the Poem: Image-Semantic Detection of AI-Generated Modern Chinese Poetry with MLLMs
An image-semantic guided method enhances MLLMs for detecting AI-generated modern Chinese poetry by combining poem text with visual representations of content, achieving 85.65% Macro-F1 with Gemini and outperforming text baselines and RoBERTa.
-
BV-Blend: Uncertainty-Weighted Historical Baselines for Stable Critic-Free RL with Verifiable Rewards
BV-Blend blends prompt-local and semantic-cluster historical reward statistics via SEM-derived weights to stabilize critic-free RL advantage estimation.