Large-scale analysis of wild LLM chat logs finds that user interaction patterns stabilize quickly after initial use and correlate with long-term outcomes like retention, creating an agency paradox of limited exploration in unconstrained systems.
hub Canonical reference
InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23)
Canonical reference. 82% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 11representative citing papers
Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.
uxCUA is a trained computer use agent that assesses GUI usability more accurately than larger models by learning to prioritize and execute important user interactions on labeled interface datasets.
Point&Grasp probabilistically integrates pointing and grasp gestures for out-of-reach object selection in MR, trained on a new ORG dataset, and outperforms single-cue baselines in user studies.
GROVE visualizes distributions of language model generations as overlapping paths through a text graph, with user studies showing that graph summaries aid structural judgments like diversity assessment while raw outputs remain better for details.
GUI agents can transform live web interfaces in real-time via DOM manipulations to deliver contextual assistance directly within the application.
A program synthesis system models collaborative physical activities from narrated demonstrations as editable programs, enabling users to teach, inspect, and correct them, with a study showing 70% success in refining soccer tactics programs.
Evalet applies functional fragmentation to deliver fragment-level qualitative analysis of LLM evaluations, with a user study showing 48% more misalignment detections than holistic scoring.
IdeaBlocks modularizes divergent intents into Exploration Blocks with multi-level reuse options, enabling 2.13 times more images explored and 12.5% greater visual diversity than baseline in a comparative user study.
Introduces xSense Design Cards with four types (Experience, Sensory, Technology, Exploration) to guide multisensory experience design and evaluation in HCI.
Babel is an efficient black-box jailbreaking framework that formalizes sparse safety attention heads via a mathematical obfuscation model and uses iterative distribution refinement to achieve higher attack success rates on models like GPT-4o and Claude-3-5-haiku with around 40 queries.
CanvasConvo presents a spatial canvas interface for branching LLM conversations, evaluated in a 5-7 day field study with 24 participants that found support for exploratory workflows.
A survey of 457 papers yields a six-dimensional design space for abstraction in interactive systems that reframes gulfs of execution and evaluation while articulating cognitive and design processes for bridging abstraction gaps.
Cripping AI is a proposed framework that dismantles ableist assumptions in AI by centering disabled ways of knowing and respecting disabled labor in co-creation.
A modular VR simulator supports four distinct micromobility vehicles on one hardware setup and a preliminary study finds unique riding experiences for each.
JARVIS delivers VLM-powered contextual AR guidance with state verification for cross-reality tasks, improving usability and success rates over baselines in a 14-person study.
A qualitative study with 22 creative writers finds that the reflective value of AI refusals depends on alignment with users' situational thinking phases, cognitive beliefs, and views of AI roles.
A critical incident technique study with 142 participants identifies mechanisms by which games create or block agender euphoria and supplies empirically grounded design criteria for gender-neutral play.
Adaptive Prompt Elicitation (APE) uses an information-theoretic framework to generate visual queries that elicit and compile user intent into better prompts for text-to-image models, showing improved alignment in benchmarks and a user study.
Polite chatbot feedback lowers psychological reactance and boosts behavioral intentions but lacks engagement, whereas verbal leakage heightens surprise and engagement at the expense of increased reactance.
A randomized trial found that a 45-minute prompt-based programming lesson produced modest non-significant performance gains and significant self-efficacy gains compared to code tracing.
Perspective paper proposing an integrated framework for automated decision systems that shifts priority from prediction accuracy to accounting for changes in organizational workflows and intervention effects.
AnimationDiff is a visual comparison tool that combines contextual scene viewing, overlay/side-by-side modes, filtering, and temporal lenses to help users select among generated 3D character animations.
OOPrompt reifies user intents into structured manipulable artifacts to enable modular and iterative prompting in LLM-based interactive systems.
citing papers explorer
-
Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations
GROVE visualizes distributions of language model generations as overlapping paths through a text graph, with user studies showing that graph summaries aid structural judgments like diversity assessment while raw outputs remain better for details.
-
Evalet: Evaluating Large Language Models through Functional Fragmentation
Evalet applies functional fragmentation to deliver fragment-level qualitative analysis of LLM evaluations, with a user study showing 48% more misalignment detections than holistic scoring.
-
Teaching Prompt-Based Programming with LLMs: A 45-Minute Lesson with Guided Practice for End-User Programmers
A randomized trial found that a 45-minute prompt-based programming lesson produced modest non-significant performance gains and significant self-efficacy gains compared to code tracing.