Eicher-Miller, Toby Jia-Jun Li, Meng Jiang, and Ronald A

Annalisa Szymanski, Noah Ziems, Heather A · 2025 · arXiv 8359.371209

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis

cs.CL · 2026-03-20 · conditional · novelty 7.0

Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.

IdeaBlocks: Expressing and Reusing Divergent Intents for Graphic Design Exploration using Generative AI

cs.HC · 2025-07-29 · unverdicted · novelty 7.0

IdeaBlocks modularizes divergent intents into Exploration Blocks with multi-level reuse options, enabling 2.13 times more images explored and 12.5% greater visual diversity than baseline in a comparative user study.

MAESTRO: Adapting GUIs and Guiding Navigation with User Preferences in Conversational Agents with GUIs

cs.HC · 2026-04-07 · unverdicted · novelty 6.0

MAESTRO adds a shared preference memory plus GUI-adaptation and workflow-navigation mechanisms to conversational agents with GUIs and tests them in a 33-person movie-booking study.

Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users

cs.AI · 2025-12-11 · unverdicted · novelty 6.0

LLM safety evaluations for personal advice must test responses against diverse user vulnerability profiles, since context-blind ratings overestimate safety and realistic prompt context does not fix the problem.

High-quality generation of dynamic game content via small language models: A proof of concept

cs.AI · 2026-01-30 · conditional · novelty 5.0

Proof-of-concept shows fine-tuned small language models achieve adequate quality for real-time game content generation in a scoped RPG loop via retry-until-success and LLM-as-judge evaluation.

citing papers explorer

Showing 5 of 5 citing papers.

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis cs.CL · 2026-03-20 · conditional · none · ref 63
Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.
IdeaBlocks: Expressing and Reusing Divergent Intents for Graphic Design Exploration using Generative AI cs.HC · 2025-07-29 · unverdicted · none · ref 46
IdeaBlocks modularizes divergent intents into Exploration Blocks with multi-level reuse options, enabling 2.13 times more images explored and 12.5% greater visual diversity than baseline in a comparative user study.
MAESTRO: Adapting GUIs and Guiding Navigation with User Preferences in Conversational Agents with GUIs cs.HC · 2026-04-07 · unverdicted · none · ref 34
MAESTRO adds a shared preference memory plus GUI-adaptation and workflow-navigation mechanisms to conversational agents with GUIs and tests them in a 33-person movie-booking study.
Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users cs.AI · 2025-12-11 · unverdicted · none · ref 31
LLM safety evaluations for personal advice must test responses against diverse user vulnerability profiles, since context-blind ratings overestimate safety and realistic prompt context does not fix the problem.
High-quality generation of dynamic game content via small language models: A proof of concept cs.AI · 2026-01-30 · conditional · none · ref 27
Proof-of-concept shows fine-tuned small language models achieve adequate quality for real-time game content generation in a scoped RPG loop via retry-until-success and LLM-as-judge evaluation.

Eicher-Miller, Toby Jia-Jun Li, Meng Jiang, and Ronald A

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer