Green Shielding introduces CUE criteria and the HCM-Dx benchmark to demonstrate that routine prompt variations systematically alter LLM diagnostic behavior along clinically relevant dimensions, producing Pareto-like tradeoffs in plausibility versus coverage.
Chatbot arena: An open platform for evaluating llms by human preference
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
years
2026 4roles
method 1polarities
use method 1representative citing papers
LPM 1.0 generates infinite-length, identity-stable, real-time audio-visual conversational performances for single characters using a distilled causal diffusion transformer and a new benchmark.
citing papers explorer
-
Green Shielding: A User-Centric Approach Towards Trustworthy AI
Green Shielding introduces CUE criteria and the HCM-Dx benchmark to demonstrate that routine prompt variations systematically alter LLM diagnostic behavior along clinically relevant dimensions, producing Pareto-like tradeoffs in plausibility versus coverage.
-
LPM 1.0: Video-based Character Performance Model
LPM 1.0 generates infinite-length, identity-stable, real-time audio-visual conversational performances for single characters using a distilled causal diffusion transformer and a new benchmark.
- Capturing LLM Capabilities via Evidence-Calibrated Query Clustering
- TabPFN-3: Technical Report