{"total":10,"items":[{"citing_arxiv_id":"2606.19216","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"No Two Developers Think Alike: How Problem-Solving Styles and Experience Shape Needs in Conversational Interaction with Copilot","primary_cat":"cs.SE","submitted_at":"2026-06-17T15:52:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Mixed-methods study of 27 developers characterizes five Copilot chat interaction modes and ten needs linked to problem-solving styles and experience levels.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20473","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Code Generation by Differential Test Time Scaling","primary_cat":"cs.SE","submitted_at":"2026-05-19T20:39:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12824","ref_index":1,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mechanism Plausibility in Generative Agent-Based Modeling","primary_cat":"cs.MA","submitted_at":"2026-05-12T23:46:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Introduces the Mechanism Plausibility Scale, a four-level framework separating generative sufficiency from mechanistic plausibility in LLM-based agent-based models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Using LLMs in simulations offers the tantalizing promise that weights and biases obtained by training on social data may contain relevant distributional information about human behavior, allowing for richer representations of human subjects [4, 61, 76]. On the other hand, critiques have also formed around models failing to capture the complete experiences of the human subjects they substitute [1], which leaves us with questions about if this is tied to the nature of LLMs, and if so, the question of if LLMs should be used at all. When using LLMs in modeling social phenomena, we are left with a few puzzles: For a given simulation, did the results emerge from some correctly retrieved social knowledge encoded in the LLM's weights? Do our agents model the human behavior we are interested in?"},{"citing_arxiv_id":"2604.15607","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies","primary_cat":"cs.CL","submitted_at":"2026-04-17T01:10:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"In real human subjects, AI transparency impacts imperfectly cooperative interactions far more than personality traits, unlike simulations where both are comparably influential.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"ative and joint impacts are underexplored in imperfectly cooperative scenarios, where peo- ple and AI only have partially aligned goals and objectives. This study compares a purely simulated dataset comprising 2,000 simulations and a parallel human subjects experiment in- volving 290 human participants to investigate these effects across two scenario categories: (1) hiring negotiations between human job candi- dates and AI hiring agents; and (2) human-AI transactions wherein AI agents may conceal information to maximize internal goals. We examine user Extraversion and Agreeableness alongside AI design characteristics, including Adaptability, Expertise, and chain-of-thought Transparency. Our causal discovery analysis"},{"citing_arxiv_id":"2604.06353","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Navigating Marginalization: Toward Justice-Oriented Socio-Technical Design for Parent-Child Learning among Southeast Asian Immigrant Mothers in Taiwan","primary_cat":"cs.HC","submitted_at":"2026-04-07T18:31:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Southeast Asian immigrant mothers in Taiwan navigate structural marginalization to foster children's learning and transmit cultural values, yielding justice-oriented design implications for socio-technical systems at multiple levels.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"communities. In response, HCI researchers have proposed justice as an evolving, situated framework for addressing lived experi- ences of marginalization and oppression through technology design [6, 11, 23, 28, 29, 32]. Others have extended these frameworks by examining how harms emerge and how justice/injustice is concep- tualized in the literature [ 21, 23], arguing for reflexive practices that respond to injustice across individual, community, and struc- tural levels. Dombrowski et al. adapted Lötter's six dimensions (transformation, recognition, enablement, reciprocity, distribution, and accountability) to guide justice-oriented design [29]. This per- spective pushes designers to move beyond merely acknowledging"},{"citing_arxiv_id":"2604.16393","ref_index":48,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"How Do Developers Interact with AI? An Exploratory Study on Modeling Developer Programming Behavior","primary_cat":"cs.SE","submitted_at":"2026-03-28T14:37:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Developers using AI assistants exhibit more stable emotions and greater focus on code creation, evaluation, and verification, captured in a new four-dimensional S-IASE model from retrospective labeling of screen recordings, surveys, and interviews.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.06033","ref_index":127,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape","primary_cat":"cs.HC","submitted_at":"2025-11-10T22:00:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Generative AI boosts attackers' ability to create harmful content at scale while also enabling defenders to detect threats, support users, and improve moderation processes.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"counterfeit products [ 60], cryptocurrency giveaways [ 61, 64], and \"pig butchering\" attacks [ 1, 80], among others. The profit incentive for scammers has led to an arms race in detection and evasion, with increasingly sophisticated scam infrastructure and content [128]. Common detection strategies include content-based classifiers, account-based classifiers [127], and browser-based warnings [28], most of which are custom to each platform. Violent extremism(VE) glorifies, encourages, or facilitates violence to support ideological goals [ 5, 70]. Global definitions of VE and the specific entities it encompasses vary by country. Similar to CSAM, platforms rely on hash- based detection, with the Global Internet Forum to Counter Terrorism (GIFCT) acting as a hash-sharing clearinghouse."},{"citing_arxiv_id":"2506.14611","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Exploring MLLMs Perception of Network Visualization Principles","primary_cat":"cs.HC","submitted_at":"2025-06-17T15:10:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MLLMs given the same instructions as human participants achieve expert-level performance on perceiving stress in network visualizations and rely on similar visual proxies.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2503.13549","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Showdown of ChatGPT vs DeepSeek in Solving Programming Tasks","primary_cat":"cs.SE","submitted_at":"2025-03-16T14:35:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"ChatGPT o3-mini achieves 54.5% success on medium Codeforces tasks versus 18.1% for DeepSeek-R1, with both models performing similarly on easy tasks and poorly on hard ones.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2410.01026","ref_index":62,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks","primary_cat":"cs.SE","submitted_at":"2024-10-01T19:34:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey of user studies on LLM use in programming that identifies interaction behaviors, mixed benefits and weaknesses, and factors influencing human and task performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}