DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.
Ringland, and Angela D
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
background 4polarities
background 4representative citing papers
Developers using AI assistants exhibit more stable emotions and greater focus on code creation, evaluation, and verification, captured in a new four-dimensional S-IASE model from retrospective labeling of screen recordings, surveys, and interviews.
MLLMs given the same instructions as human participants achieve expert-level performance on perceiving stress in network visualizations and rely on similar visual proxies.
Introduces the Mechanism Plausibility Scale, a four-level framework separating generative sufficiency from mechanistic plausibility in LLM-based agent-based models.
In real human subjects, AI transparency impacts imperfectly cooperative interactions far more than personality traits, unlike simulations where both are comparably influential.
Southeast Asian immigrant mothers in Taiwan navigate structural marginalization to foster children's learning and transmit cultural values, yielding justice-oriented design implications for socio-technical systems at multiple levels.
Generative AI boosts attackers' ability to create harmful content at scale while also enabling defenders to detect threats, support users, and improve moderation processes.
ChatGPT o3-mini achieves 54.5% success on medium Codeforces tasks versus 18.1% for DeepSeek-R1, with both models performing similarly on easy tasks and poorly on hard ones.
A survey of user studies on LLM use in programming that identifies interaction behaviors, mixed benefits and weaknesses, and factors influencing human and task performance.
citing papers explorer
-
Code Generation by Differential Test Time Scaling
DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.
-
How Do Developers Interact with AI? An Exploratory Study on Modeling Developer Programming Behavior
Developers using AI assistants exhibit more stable emotions and greater focus on code creation, evaluation, and verification, captured in a new four-dimensional S-IASE model from retrospective labeling of screen recordings, surveys, and interviews.
-
Exploring MLLMs Perception of Network Visualization Principles
MLLMs given the same instructions as human participants achieve expert-level performance on perceiving stress in network visualizations and rely on similar visual proxies.
-
Mechanism Plausibility in Generative Agent-Based Modeling
Introduces the Mechanism Plausibility Scale, a four-level framework separating generative sufficiency from mechanistic plausibility in LLM-based agent-based models.
-
Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies
In real human subjects, AI transparency impacts imperfectly cooperative interactions far more than personality traits, unlike simulations where both are comparably influential.
-
Navigating Marginalization: Toward Justice-Oriented Socio-Technical Design for Parent-Child Learning among Southeast Asian Immigrant Mothers in Taiwan
Southeast Asian immigrant mothers in Taiwan navigate structural marginalization to foster children's learning and transmit cultural values, yielding justice-oriented design implications for socio-technical systems at multiple levels.
-
How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape
Generative AI boosts attackers' ability to create harmful content at scale while also enabling defenders to detect threats, support users, and improve moderation processes.
-
A Showdown of ChatGPT vs DeepSeek in Solving Programming Tasks
ChatGPT o3-mini achieves 54.5% success on medium Codeforces tasks versus 18.1% for DeepSeek-R1, with both models performing similarly on easy tasks and poorly on hard ones.
-
Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks
A survey of user studies on LLM use in programming that identifies interaction behaviors, mixed benefits and weaknesses, and factors influencing human and task performance.