British Journal of Mathematical and Statistical Psychology61(1), 29–48 (2008)

Gwet, K · 2008 · DOI 10.1348/000711006x126600

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open at publisher browse 10 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

EMPATH: A Multilingual Auditor-Judge Benchmark for Safety Evaluation of Emotional-Support Chatbots

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

The paper presents EMPATH, a new multilingual multi-turn benchmark for safety evaluation of emotional-support chatbots that uses separate auditor and judge models and releases its pipeline and rubrics.

Off-Distribution Voices: Fanfiction Subgenres as Universal Vernacular Jailbreaks for Aligned LLMs

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

Fanfiction subgenres from AO3 function as universal register-based jailbreaks, raising mean attack success rate from 0.278 to 0.731 across eight aligned LLMs on HarmBench and JailbreakBench.

The Harder Text Embedding Benchmark (HTEB): Beyond One-dimensional Static Robustness

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

HTEB introduces dynamic, multi-axis evaluation of text embedding robustness using LLM transformations, finding decoupled profiles across models and that scaling does not close all robustness gaps.

RECOM: A Validity Discrimination Tradeoff in Automatic Metrics for Open Ended Reddit Question Answering

cs.CL · 2026-06-17 · unverdicted · novelty 6.0

RECOM dataset shows automatic metrics for open-ended Reddit QA exhibit a validity-discrimination tradeoff, with cosine similarity strong on validity but weak on model ranking, and BERTScore showing the reverse pattern after length control.

Mixed-Modality Dual Face-Hair Retrieval

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

Introduces DFHR task, DFHR-Bench with over 180K triplets, and MFHC framework for mixed-modality dual face-hair retrieval.

Mitigating Prompt-Induced Cognitive Biases in General-Purpose AI for Software Engineering

cs.SE · 2026-04-18 · unverdicted · novelty 6.0

A prompting method that forces GPAI models to state SE best practices before deciding reduces prompt-induced cognitive biases by 51% on average across eight tested biases.

Usability Analysis of Configurator User Interfaces with Multimodal Large Language Models

cs.SE · 2026-05-28 · unverdicted · novelty 5.0

Multimodal LLMs applied to 16 real-world configurators using 18 synthesized criteria can identify usability issues and generate actionable suggestions, with human review confirming reliability.

Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction

cs.HC · 2026-03-23 · unverdicted · novelty 5.0

Analysis of 1,223 AI-HCI papers shows declining focus on human epistemic sovereignty and rising optimization of autonomous agents, leading to a proposal for scaffolded cognitive friction via multi-agent systems to preserve human cognitive agency.

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

cs.CL · 2026-04-22

Multilingual Training and Evaluation Resources for Vision-Language Models

cs.CL · 2026-04-20

citing papers explorer

Showing 10 of 10 citing papers after filters.

EMPATH: A Multilingual Auditor-Judge Benchmark for Safety Evaluation of Emotional-Support Chatbots cs.AI · 2026-06-29 · unverdicted · none · ref 16
The paper presents EMPATH, a new multilingual multi-turn benchmark for safety evaluation of emotional-support chatbots that uses separate auditor and judge models and releases its pipeline and rubrics.
Off-Distribution Voices: Fanfiction Subgenres as Universal Vernacular Jailbreaks for Aligned LLMs cs.CL · 2026-06-03 · unverdicted · none · ref 12
Fanfiction subgenres from AO3 function as universal register-based jailbreaks, raising mean attack success rate from 0.278 to 0.731 across eight aligned LLMs on HarmBench and JailbreakBench.
The Harder Text Embedding Benchmark (HTEB): Beyond One-dimensional Static Robustness cs.CL · 2026-05-27 · unverdicted · none · ref 23
HTEB introduces dynamic, multi-axis evaluation of text embedding robustness using LLM transformations, finding decoupled profiles across models and that scaling does not close all robustness gaps.
RECOM: A Validity Discrimination Tradeoff in Automatic Metrics for Open Ended Reddit Question Answering cs.CL · 2026-06-17 · unverdicted · none · ref 37
RECOM dataset shows automatic metrics for open-ended Reddit QA exhibit a validity-discrimination tradeoff, with cosine similarity strong on validity but weak on model ranking, and BERTScore showing the reverse pattern after length control.
Mixed-Modality Dual Face-Hair Retrieval cs.CV · 2026-06-02 · unverdicted · none · ref 21
Introduces DFHR task, DFHR-Bench with over 180K triplets, and MFHC framework for mixed-modality dual face-hair retrieval.
Mitigating Prompt-Induced Cognitive Biases in General-Purpose AI for Software Engineering cs.SE · 2026-04-18 · unverdicted · none · ref 27
A prompting method that forces GPAI models to state SE best practices before deciding reduces prompt-induced cognitive biases by 51% on average across eight tested biases.
Usability Analysis of Configurator User Interfaces with Multimodal Large Language Models cs.SE · 2026-05-28 · unverdicted · none · ref 10
Multimodal LLMs applied to 16 real-world configurators using 18 synthesized criteria can identify usability issues and generate actionable suggestions, with human review confirming reliability.
Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction cs.HC · 2026-03-23 · unverdicted · none · ref 103
Analysis of 1,223 AI-HCI papers shows declining focus on human epistemic sovereignty and rising optimization of autonomous agents, leading to a proposal for scaffolded cognitive friction via multi-agent systems to preserve human cognitive agency.
DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories cs.CL · 2026-04-22 · unreviewed · ref 18
Multilingual Training and Evaluation Resources for Vision-Language Models cs.CL · 2026-04-20 · unreviewed · ref 16

British Journal of Mathematical and Statistical Psychology61(1), 29–48 (2008)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer