Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
hub Mixed citations
Stepanova, John Desnoyers-Stewart, Kristina Höök, and Bernhard E
Mixed citation behavior. Most common role is background (62%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Users experience fast-food intimacy with Soul's AI boyfriend that conflicts with gradual cultural expectations, introduces technical uncertainty, and shifts emotional labor onto women.
GROVE visualizes distributions of language model generations as overlapping paths through a text graph, with user studies showing that graph summaries aid structural judgments like diversity assessment while raw outputs remain better for details.
Evalet applies functional fragmentation to deliver fragment-level qualitative analysis of LLM evaluations, with a user study showing 48% more misalignment detections than holistic scoring.
A qualitative study of South Korean parents shows that trauma and healing after learning a child is LGBTQ+ leads to identity reconstruction as supportive parents and more critical, protective informating practices.
CanvasConvo presents a spatial canvas interface for branching LLM conversations, evaluated in a 5-7 day field study with 24 participants that found support for exploratory workflows.
Malleable Prompting reifies subjective preferences from natural language into GUI widgets and modulates LLM token probabilities during decoding to enable controllable generation, with a user study showing improved precision and perceived controllability over standard prompting.
CogInstrument represents human reasoning as revisable cognitive motifs in graphical form to support iterative alignment with LLMs during planning tasks, with a N=12 study indicating gains in targeted revision, agency, and trust over standard dialogue interfaces.
Narrix helps novices identify and reuse narrative strategies from examples through visualization and strategy-steered generation, improving retention, confidence, and adaptation over chat interfaces in a 12-person study.
OOPrompt reifies user intents into structured manipulable artifacts to enable modular and iterative prompting in LLM-based interactive systems.
An interactive public hammock captures and replays biodata as embodied traces, with a field study of ten users indicating it fosters anonymous connection and appreciation for shared vitality.
Proof-of-concept shows fine-tuned small language models achieve adequate quality for real-time game content generation in a scoped RPG loop via retry-until-success and LLM-as-judge evaluation.
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.
citing papers explorer
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
-
Fast-Food Intimacy: How Chinese Women Navigate Soul's AI Boyfriend
Users experience fast-food intimacy with Soul's AI boyfriend that conflicts with gradual cultural expectations, introduces technical uncertainty, and shifts emotional labor onto women.
-
Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations
GROVE visualizes distributions of language model generations as overlapping paths through a text graph, with user studies showing that graph summaries aid structural judgments like diversity assessment while raw outputs remain better for details.
-
Evalet: Evaluating Large Language Models through Functional Fragmentation
Evalet applies functional fragmentation to deliver fragment-level qualitative analysis of LLM evaluations, with a user study showing 48% more misalignment detections than holistic scoring.
-
Journeys of Parents with LGBTQ+ Children: How Trauma and Healing Reshape Identity and (Mis)Informating Practices
A qualitative study of South Korean parents shows that trauma and healing after learning a child is LGBTQ+ leads to identity reconstruction as supportive parents and more critical, protective informating practices.
-
Conversations in Space: Structuring Non-Linear LLM Interactions on a Canvas
CanvasConvo presents a spatial canvas interface for branching LLM conversations, evaluated in a 5-7 day field study with 24 participants that found support for exploratory workflows.
-
From Words to Widgets for Controllable LLM Generation
Malleable Prompting reifies subjective preferences from natural language into GUI widgets and modulates LLM token probabilities during decoding to enable controllable generation, with a user study showing improved precision and perceived controllability over standard prompting.
-
CogInstrument: Modeling Cognitive Processes for Bidirectional Human-LLM Alignment in Planning Tasks
CogInstrument represents human reasoning as revisable cognitive motifs in graphical form to support iterative alignment with LLMs during planning tasks, with a N=12 study indicating gains in targeted revision, agency, and trust over standard dialogue interfaces.
-
Narrix: Remixing Narrative Strategies from Examples for Story Writing
Narrix helps novices identify and reuse narrative strategies from examples through visualization and strategy-steered generation, improving retention, confidence, and adaptation over chat interfaces in a 12-person study.
-
OOPrompt: Reifying Intents into Structured Artifacts for Modular and Iterative Prompting
OOPrompt reifies user intents into structured manipulable artifacts to enable modular and iterative prompting in LLM-based interactive systems.
-
HeartSway: Exploring Biodata as Poetic Traces in Public Space
An interactive public hammock captures and replays biodata as embodied traces, with a field study of ten users indicating it fosters anonymous connection and appreciation for shared vitality.
-
High-quality generation of dynamic game content via small language models: A proof of concept
Proof-of-concept shows fine-tuned small language models achieve adequate quality for real-time game content generation in a scoped RPG loop via retry-until-success and LLM-as-judge evaluation.
-
Benchmark Data Contamination of Large Language Models: A Survey
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.