OS-SPEAR is a new evaluation toolkit that tests 22 OS agents and identifies trade-offs between efficiency and safety or robustness.
Determinants of llm-assisted decision-making
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
background 2polarities
background 2representative citing papers
Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.
RAGRoute introduces a neural router for federated RAG that dynamically selects relevant sources, reducing communication by up to 80.65% and latency by 52.50% while preserving accuracy on three benchmarks.
Large language models can meaningfully support the creation and decision-making processes for game design pillars in mixed-initiative workflows, as shown by a prototype tested in a game jam and with expert interviews.
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.
Across 43,200 simulations with five LLMs and five scenarios, model trust in humans aligns with human-like patterns driven by trustworthiness dimensions and is sometimes biased by age, gender, and religion.
Uncertainty-aware fine-tuning with a decision-theory-based loss produces better-calibrated uncertainty estimates than standard training on free-form QA tasks.
citing papers explorer
-
OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents
OS-SPEAR is a new evaluation toolkit that tests 22 OS agents and identifies trade-offs between efficiency and safety or robustness.
-
Geometry-Calibrated Conformal Abstention for Language Models
Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.
-
Efficient Federated Search for Retrieval-Augmented Generation using Lightweight Routing
RAGRoute introduces a neural router for federated RAG that dynamically selects relevant sources, reducing communication by up to 80.65% and latency by 52.50% while preserving accuracy on three benchmarks.
-
LLMs are the Ideal Candidate for Mixed-Initiative Game Design Pillar Workflows
Large language models can meaningfully support the creation and decision-making processes for game design pillars in mixed-initiative workflows, as shown by a prototype tested in a game jam and with expert interviews.
-
User Detection and Response Patterns of Sycophantic Behavior in Conversational AI
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.
-
A closer look at how large language models trust humans: patterns and biases
Across 43,200 simulations with five LLMs and five scenarios, model trust in humans aligns with human-like patterns driven by trustworthiness dimensions and is sometimes biased by age, gender, and religion.
-
Enhancing Trust in Large Language Models via Uncertainty-Calibrated Fine-Tuning
Uncertainty-aware fine-tuning with a decision-theory-based loss produces better-calibrated uncertainty estimates than standard training on free-form QA tasks.