Self-evolving rubric with anti-gaming fitness reveals that objective capability scaling fails to transfer to subjective LLM behaviors, with advice-restraint as the universal lowest dimension that can regress.
Mentalchat16k: A benchmark dataset for conversational mental health assistance
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
dataset 1polarities
use dataset 1representative citing papers
Proposes causal risk minimization via higher-order moment-balancing error decomposition and attribute projection for high-dimensional treatments, with experiments on continuous, discrete, and text data.
Guardian-as-an-Advisor prepends risk labels and explanations from a guardian model to queries, improving LLM safety compliance and reducing over-refusal while adding minimal compute overhead.
citing papers explorer
-
Does Capability Transfer to Subjective Behavior -- and Would Our Instruments Tell Us? A Self-Evolving, Trust-by-Construction Evaluation Paradigm
Self-evolving rubric with anti-gaming fitness reveals that objective capability scaling fails to transfer to subjective LLM behaviors, with advice-restraint as the universal lowest dimension that can regress.
-
Causal Risk Minimization for High-Dimensional Treatments
Proposes causal risk minimization via higher-order moment-balancing error decomposition and attribute projection for high-dimensional treatments, with experiments on continuous, discrete, and text data.
-
Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs
Guardian-as-an-Advisor prepends risk labels and explanations from a guardian model to queries, improving LLM safety compliance and reducing over-refusal while adding minimal compute overhead.