DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.
hub
How beginning programmers and code LLMs ( mis)read each other
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 10roles
background 4polarities
background 4representative citing papers
Developers using AI assistants exhibit more stable emotions and greater focus on code creation, evaluation, and verification, captured in a new four-dimensional S-IASE model from retrospective labeling of screen recordings, surveys, and interviews.
MLLMs given the same instructions as human participants achieve expert-level performance on perceiving stress in network visualizations and rely on similar visual proxies.
Mixed-methods study of 27 developers characterizes five Copilot chat interaction modes and ten needs linked to problem-solving styles and experience levels.
Introduces the Mechanism Plausibility Scale, a four-level framework separating generative sufficiency from mechanistic plausibility in LLM-based agent-based models.
In real human subjects, AI transparency impacts imperfectly cooperative interactions far more than personality traits, unlike simulations where both are comparably influential.
Southeast Asian immigrant mothers in Taiwan navigate structural marginalization to foster children's learning and transmit cultural values, yielding justice-oriented design implications for socio-technical systems at multiple levels.
Generative AI boosts attackers' ability to create harmful content at scale while also enabling defenders to detect threats, support users, and improve moderation processes.
ChatGPT o3-mini achieves 54.5% success on medium Codeforces tasks versus 18.1% for DeepSeek-R1, with both models performing similarly on easy tasks and poorly on hard ones.
A survey of user studies on LLM use in programming that identifies interaction behaviors, mixed benefits and weaknesses, and factors influencing human and task performance.
citing papers explorer
-
Exploring MLLMs Perception of Network Visualization Principles
MLLMs given the same instructions as human participants achieve expert-level performance on perceiving stress in network visualizations and rely on similar visual proxies.
-
How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape
Generative AI boosts attackers' ability to create harmful content at scale while also enabling defenders to detect threats, support users, and improve moderation processes.
-
A Showdown of ChatGPT vs DeepSeek in Solving Programming Tasks
ChatGPT o3-mini achieves 54.5% success on medium Codeforces tasks versus 18.1% for DeepSeek-R1, with both models performing similarly on easy tasks and poorly on hard ones.