{"total":103,"items":[{"citing_arxiv_id":"2606.30338","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Sequential Fairness Auditing with Limited Output Access","primary_cat":"cs.AI","submitted_at":"2026-06-29T14:17:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper introduces a sequential generalized likelihood-ratio test framework for auditing Statistical Parity and Equal Opportunity fairness metrics under limited model query access.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.28574","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Correct codes for the wrong reasons? validating LLMs as measurement instruments for theoretical constructs","primary_cat":"cs.CL","submitted_at":"2026-06-26T19:58:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Grain calibration decomposes theoretical constructs into clause-level components, tests each with extractive evidence, and combines results through explicit theory-derived rules to validate LLM coding beyond agreement with human annotators.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.28063","ref_index":65,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"How to deal with machine learning bias in economic history","primary_cat":"econ.GN","submitted_at":"2026-06-26T13:10:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper guides ML use in economic history, identifies systematic prediction bias that distorts coefficients, and shows debiasing via small expert-labeled samples can correct it while preserving scale.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.25923","ref_index":54,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"$\\text{DT}^2$: Decision-Targeted Digital Twins","primary_cat":"cs.LG","submitted_at":"2026-06-24T15:02:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DT² trains digital twins to preserve pairwise policy rankings from fitted Q-evaluation on offline data rather than minimizing one-step transition errors, improving policy ranking and reducing decision regret.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.23204","ref_index":30,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Unmasking LAION-5B: Age, Gender, Race, and Emotion Biases in Large-Scale Image Datasets","primary_cat":"cs.CV","submitted_at":"2026-06-22T11:49:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Empirical audit of LAION-2B-en and LAION-2B-multi finds overrepresentation of young adults, White people, and males plus stereotypical emotion associations across two attribute classifiers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.23057","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Who Owns the AI Recommendation? A Multi-Industry Empirical Map of Brand Category Ownership Across Large Language Models","primary_cat":"cs.IR","submitted_at":"2026-06-22T09:10:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Empirical study of LLM brand recommendations across industries finds moderate concentration (mean Gini 0.28) and low cross-model agreement (41.6%) on top brands.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.21195","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond Hooking Onto the World: Referential Profiles and the Numerical Structure of LLM Grounding","primary_cat":"cs.CL","submitted_at":"2026-06-19T08:06:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"LLMs realize derivative referential profiles through distributed numerical structures in their parameters and activations, indirectly supported by mechanistic interpretability findings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.18656","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Wrong Kind of Right: Quantifying and Localizing Misfired Alignment in LLMs","primary_cat":"cs.CL","submitted_at":"2026-06-17T03:53:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs exhibit misfired alignment on stereotype questions at 4.7-18.9% rates on the new VETO benchmark of 2,032 contrastive pairs, unlike humans at 0%, due to overgeneralized safety cues after instruction tuning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.17831","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Accountability in Autonomous Drone-Based Firefighting: Insights From a Field Trial","primary_cat":"cs.RO","submitted_at":"2026-06-16T12:01:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Field trials of autonomous drones in firefighting reveal uncertainty in accountability attribution, identifying two challenges via Bovens' framework and offering integration recommendations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.17464","ref_index":106,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models","primary_cat":"cs.LG","submitted_at":"2026-06-16T03:26:15+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12922","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Polar: A Benchmark for Evaluating Political Bias in LLMs","primary_cat":"cs.CL","submitted_at":"2026-06-11T05:26:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Polar is a new cross-context benchmark showing LLM political bias measurements are not fixed but vary with country, issue, model, and language.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11316","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Sch\\\"utzen: Evaluating LLM Safety in Bulgarian and German Contexts","primary_cat":"cs.CL","submitted_at":"2026-06-09T18:01:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Schützen is a German-Bulgarian LLM safety dataset showing pronounced cross-language differences in model safety behavior.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09401","ref_index":193,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-06-08T12:21:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Empirical benchmarks show distribution similarity between adaptation and pretraining data increases practical privacy leakage in DP-adapted LLMs at fixed theoretical guarantees, with LoRA providing strongest protection for OOD cases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08076","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"\"I understand your perspective\": LLM Persuasion and Sycophancy through the Lens of Communicative Action Theory","primary_cat":"cs.CL","submitted_at":"2026-06-06T09:54:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLMs outperform humans in expressing illocutionary intents and sycophancy in successful persuasive counter-arguments from ChangeMyView, with crowd workers preferring LLM versions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07969","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Neutrality Bites: Gender Representation in AI-Generated Animal Stories","primary_cat":"cs.CL","submitted_at":"2026-06-06T04:04:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LLMs exhibit masculine bias when assigning gender to animal characters in generated stories, with neutrality often resulting in erasure of feminine perspectives.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.06694","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Geography of Algorithmic Judgment: LLM Intermediaries, Place Identity, and Racial Steering in Housing Search","primary_cat":"cs.LG","submitted_at":"2026-06-04T20:17:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Behavioral audit finds emergent, city-dependent racial steering in LLM housing recommendations that changes with user identity and preference context.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09881","ref_index":129,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Toward Calibrated, Fair, and accurate Deepfake Detection","primary_cat":"cs.LG","submitted_at":"2026-06-03T05:44:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04302","ref_index":70,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding","primary_cat":"cs.CL","submitted_at":"2026-06-03T00:12:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04152","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Thinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled Research","primary_cat":"cs.AI","submitted_at":"2026-06-02T19:19:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"PEEL is proposed as a semiotic scaffolding combining deterministic distant reading and LLM analysis to reveal systematic distortions in quantity, term frequency, and epistemic voice within AI condensations of research texts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02973","ref_index":61,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Chatbots Output Meaningful (but Problematic) Language","primary_cat":"cs.CL","submitted_at":"2026-06-02T00:24:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLM outputs are meaningful according to standard theories of human language, without requiring anthropomorphic assumptions about the models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02911","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Ghost Annotator: a Framework to Explore Human Label Variation in Content Moderation through Conformal Prediction","primary_cat":"cs.CL","submitted_at":"2026-06-01T21:32:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The Ghost Annotator framework applies conformal prediction and collaborative filtering representations to measure LLM divergence from human annotations across four models and datasets, revealing higher confidence in misaligned cases and consistent demographic misalignment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02755","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Acceptance-Test-Driven Evaluation Protocols for Business-Centric LLM Systems","primary_cat":"cs.SE","submitted_at":"2026-06-01T18:21:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper introduces a red-train-green lifecycle and governance metric stack that adapts acceptance testing to LLM systems for business use.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01929","ref_index":115,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"VET: A Framework for Analyzing AI Discourse","primary_cat":"cs.AI","submitted_at":"2026-06-01T08:59:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Introduces the VET framework to categorize and critique polarized AI narratives including hype, doom, denial, and normalcy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01045","ref_index":77,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Child-directed speech facilitates production, not comprehension, in BabyLMs","primary_cat":"cs.CL","submitted_at":"2026-05-31T06:27:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CDS-trained BabyLMs show earlier and more appropriate production in a new frame-completion task while FineWeb-edu models lead on comprehension benchmarks, indicating current tests underestimate CDS benefits.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00873","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Prompts for Public-Sector LLMs Should Be Governed as Commons","primary_cat":"cs.CY","submitted_at":"2026-05-30T20:01:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Prompts for public-sector LLMs encode value-laden decisions and should be governed through community-maintained Prompt Commons repositories with provenance, licensing, and moderation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00250","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Effects of Varying LLM Access on Essay Writing Behavior","primary_cat":"cs.CL","submitted_at":"2026-05-29T18:30:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Pilot experiment shows limited LLM access maintains higher student ownership and strategic use than unlimited access, with no difference in essay quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31167","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability","primary_cat":"cs.AI","submitted_at":"2026-05-29T11:20:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Introduces LLM-FACETS, a privacy-preserving open-source framework for LLM evaluation using deterministic metrics locally, LLM-judge metrics with user-controlled APIs, and mechanisms for uncertainty visualization and hallucination detection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31021","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Persona-Based Evaluation Framework for Pluralistic Alignment in Generative AI","primary_cat":"cs.AI","submitted_at":"2026-05-29T08:54:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Proposes a state-space constrained emulation framework for pluralistic AI evaluation using synthetic cognitive profiles and reports instability in persona coherence under sequential and perturbed inference.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30169","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms","primary_cat":"cs.CY","submitted_at":"2026-05-28T16:20:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LM agents' changeable modules prevent persistent identity and sanction sensitivity, making reputation mechanisms structurally inapplicable and requiring protocol-based behavioral harnesses instead.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29365","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Casual as an Anchor: Resolving Supervision Misalignment in Formality Transfer Dataset","primary_cat":"cs.CL","submitted_at":"2026-05-28T05:07:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"The authors introduce a three-level formality spectrum (informal, casual, formal) and the 3LF dataset to correct supervision misalignment in formality transfer, reporting large gains in informal-to-formal performance on models including GPT variants.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27564","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Future of Facts: Tracing the Factual Generation-Verification Gap","primary_cat":"cs.CL","submitted_at":"2026-05-26T18:36:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Empirical tracing across model families shows verification precedes and outlasts generation for facts, with updates producing simultaneous verification of old and new answers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23676","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AI at the Front Lines of Platform Governance: Using LLMs to Support Illegal Content Reporting under the Digital Services Act","primary_cat":"cs.HC","submitted_at":"2026-05-22T14:22:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"EvalAI providing pro/con arguments improves provision-level accuracy and reduces misclassification distance in DSA illegal content reporting under AI error conditions versus conventional XAI.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[51, 77, 105], but they remain prone to hallucinations and misclassifications [60, 80], raising ethical concerns. Salimzadeh et al. [119] show that complexity and uncertainty shape user performance in AI-supported tasks; in legal content reporting, this makes meaningful contestability especially important because interfaces must enable users to understand, challenge, and revise automated reasoning [24, 69, 133]. A central challenge in AI-supported reporting is not only whether AI assistance is present, but how it is structured and how susceptible users become to inevitable output errors. Conventional explainability often follows arecommend-and-justifylogic, foregrounding a single predicted option and rationale. Such designs may reduce friction, but can also anchor users to erroneous outputs when the reasoning is difficult to verify"},{"citing_arxiv_id":"2605.21299","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Tracing the ongoing emergence of human-like reasoning in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-20T15:28:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LLMs function as accurate semantic processors for conditionals but do not replicate the pragmatic inferences that define human reasoning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21035","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Quiet Path from Seemingly Minor Design Errors to Workplace AI Incidents","primary_cat":"cs.HC","submitted_at":"2026-05-20T11:13:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Empirical analysis of 1,524 AI incident reports shows 83% arise from worker-AI trait misalignments, with 74% of those traceable to developers prioritizing efficiency over precision or personalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20512","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Framing an AI with Values Reduces AI Reliance in AI-supported Writing Tasks","primary_cat":"cs.HC","submitted_at":"2026-05-19T21:33:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"An online experiment finds that showing users an overview of an AI's values reduces reliance on AI suggestions during writing tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"[15] Gábor Bella, Paula Helm, Gertraud Koch, and Fausto Giunchiglia. 2024. Tackling Language Modelling Bias in Support of Linguistic Diversity. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT '24). Association for Computing Machinery, New York, NY, USA, 562-572. doi:10.1145/3630106.3658925 [16] Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(Virtual Event, Canada)(FAccT '21). Association for Computing Machinery, New York, NY, USA, 610-623. doi:10."},{"citing_arxiv_id":"2605.16993","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Adversarial Fragility and Language Vulnerability in Clinical AI: A Systematic Audit of Diagnostic Collapse Under Imperceptible Perturbations and Cross-Lingual Drift in Low-Resource Healthcare Settings","primary_cat":"cs.CY","submitted_at":"2026-05-16T13:33:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The study shows clinical AI accuracy collapsing from 89% to 62% on X-rays under imperceptible adversarial perturbations and from 85% to 55% on clinical cases in Nigerian Pidgin and Yoruba-inflected English.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16538","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLMs in Qualitative Research: Opportunities, Limitations, and Practical Considerations","primary_cat":"cs.HC","submitted_at":"2026-05-15T18:33:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"The paper outlines opportunities, limitations, and practical parameters for integrating LLMs into qualitative research while aligning with epistemological commitments like reflexivity and interpretive judgment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16197","ref_index":111,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Position: AI as Part of Self -- Extending the Mind Requires Cognitive Co-Regulation","primary_cat":"cs.HC","submitted_at":"2026-05-15T17:11:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper claims that alignment requires treating AI as part of the self through cognitive co-regulation, identifying risks like deskilling and automation bias while drawing on System 0 cognition theory.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"of human-AI interactions. This, therefore, avoids resolving the question of whether AI systems truly possess these intrinsic properties. This functional-level response has independent grounding in radical embodied cognitive science, which argues that cognition is constituted by agent-environment dynamics rather than internal representations alone [ 110]. Chater et al. [ 111] show that shared intentionality within humans emerges from the interaction process itself, whereby humans choose the tacit agreement formed via hypothetical negotiations. Symbiotic cognition functions similarly in human-AI interactions: cognition is realized dynamically through interactions with AI systems. Recent work corroborates that human-AI interaction alters human perceptual and cognitive patterns"},{"citing_arxiv_id":"2605.13434","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity","primary_cat":"cs.LG","submitted_at":"2026-05-13T12:27:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"F(y)≤F(x) +⟨∇F(x),y−x⟩+ L 2 ∥y−x∥ 2 (DL) Jensen's Inequality: ∥Em [¯x]∥2 ≤E m h ∥¯x∥2 i (JN) Young's Inequality: ∥u∥ ∥v∥ ≤ s 2 ∥u∥2 + 1 2s ∥v∥2 ,∀s >0(YN) Squared Sum Inequality: nX i=1 vi 2 ≤n nX i=1 ∥vi∥2 .(SS) Proof OutlineTheorem B.9 states our main result in a more general form, showing that Rescaled ASGD can target any convex combination (7) of the local objective functions, and es- tablishes a bound on the expected average squared gradient norm of the cycle iterates. To that end, Lemma B.8 establishes a bound in terms of the squared norm of the cycle bias utilizing a standard descent lemma. A bound on the norm of the expected cumulative cycle bias is derived in Lemmas B.2 to B.6. Similarly, Lemma B."},{"citing_arxiv_id":"2605.13261","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"\"It became a self-fulfilling prophecy\": How Lived Experiences are Entangled with AI Predictions in Menstrual Cycle Tracking Apps","primary_cat":"cs.HC","submitted_at":"2026-05-13T09:42:37+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Users entangle their lived experiences with AI predictions in menstrual tracking apps, leading to self-fulfilling prophecies, limited critical awareness from UI, and isolation for non-normative users.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT '21). Association for Computing Machinery, New York, NY, USA, 624-635. https://doi.org/10.1145/3442188.3445923 [37] Jeena Joseph. 2025. The algorithmic self: how AI is reshaping human identity, introspection, and agency.Frontiers in Psychology16 (July 2025). https://doi. org/10.3389/fpsyg.2025.1645795 Publisher: Frontiers. [38] Annika Kaltenhauser, Evropi Stefanidi, and Johannes Schöning. 2024. Playing with Perspectives and Unveiling the Autoethnographic Kaleidoscope in HCI - A Literature Review of Autoethnographies. InProceedings of the CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1-20. https: //doi.org/10.1145/3613904.3642355 [39] Jonas Keppel, Marvin Strauss, Luke Haliburton, Henrike Weingärtner, Julia Do-"},{"citing_arxiv_id":"2605.12809","ref_index":257,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces","primary_cat":"cs.LG","submitted_at":"2026-05-12T23:01:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12613","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Creating Group Rules with AI: Human-AI Collaboration in WhatsApp Moderation","primary_cat":"cs.HC","submitted_at":"2026-05-12T18:02:49+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Admins in India used Meta AI to help create WhatsApp group rules, appreciating reduced workload but remaining cautious about privacy, relational trust, and contextual tone.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12273","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Into the Unknown: Accounting for Missing Demographic Data when Mitigating Ad Delivery Skew","primary_cat":"cs.CY","submitted_at":"2026-05-12T15:36:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A budget split intervention reduces gender skew in online ad delivery by incorporating users with unknown demographics alongside targeted inferred-gender groups.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Journal of Quantitative Description 1 (2021), 1-23. [20] Google. 2026. AdWords for nonprofits - Google Ad Grants Programme Details. https://www.google.com/intl/en_au/grants/details.html. Retrieved 2026-01-13. [21] Google Ads Help. 2026. How to become a Google Partner or Premier Partner. https://support.google.com/google-ads/answer/9702452? sjid=14590807631916300459-NA Retrieved 2026-01-12. [22] Lelia Marie Hampton. 2021. Black Feminist Musings on Algorithmic Oppression. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT '21). Association for Computing Machinery, New York, NY, USA, 1. doi:10.1145/3442188.3445929 [23] Google Ads Help. 2026. About audience segments. https://support."},{"citing_arxiv_id":"2605.11672","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination","primary_cat":"cs.AI","submitted_at":"2026-05-12T07:28:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Under semantic underdetermination, LLMs cannot always guarantee strong correctness, strict non-bias, and high utility at once.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11345","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evaluating Structured Documentation as a Tool for Reflexivity in Dataset Development","primary_cat":"cs.CY","submitted_at":"2026-05-11T23:55:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Structured dataset documentation shows little engagement with major reflexivity themes from FAccT literature, leading to a new codebook and extended datasheet questions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"7 questions that ask \"any other comments\"), we find 22 questions (44%) to be relevant to one or more of the themes of reflexivity we identify in Section 4.1. Accordingly, we make recommendations about how each of the 22 datasheet questions can better prompt dataset creators to be reflexive by proposing extensions to each question. These recommendations are informed by broader discourse on reflexivity and the improvement of data practices within ML [15, 16, 23, 102, 105, 112, 125, 128, 146]. We present the question extensions for a sample of the 22 datasheet questions in Table 2. In Appendix E, we provide the full version of this table which includes further explanation of the extensions we propose. Table 2: Recommendations on incorporating reflexivity in datasheet question prompts. Datasheet Questions (abbre-"},{"citing_arxiv_id":"2605.10234","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Social Policy of Large Language Models: How GPT, Claude, DeepSeek and Grok Allocate Social Budgets in Spain and Germany","primary_cat":"cs.CY","submitted_at":"2026-05-11T09:10:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Four LLMs exhibit a shared implicit social policy that under-allocates pensions by a factor of three and over-allocates housing by four compared to OECD budgets, with only Claude showing meaningful response to national context.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09793","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Push and Pushback in Contesting AI: Demands for and Resistance to Accountability","primary_cat":"cs.HC","submitted_at":"2026-05-10T22:29:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Thematic analysis of 43 AI contestation cases, using Bovens's relational accountability model, produces categories of demands from below, institutional pushback, outcomes, and contextual factors shaping effective contestation.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"In this paper, we examine both sides of this contestation dynamic: how contestation emergesfrom below and how it is metfrom above, with the goal of understanding the strategies, interactions, and conditions that shape accountability in practice. 2.2 Accountability as Relational, Contestation as Accountability-Seeking Bovens's model [23, 24], widely drawn on in the algorithmic accountability literature [39, 41, 125], conceptualizes accountability as an inherently relational process: an actorAis obliged to explain and justify its conduct to a forumF, which then may respond, e.g., by posing questions, evaluating the justifications offered, passing judgment, and imposing consequences [23, 24, 41, 87]. Building on this relational understanding, we characterise contestation by actors from below as \"accountability-"},{"citing_arxiv_id":"2605.09647","ref_index":77,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Modeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMs","primary_cat":"cs.SI","submitted_at":"2026-05-10T16:46:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs contain identifiable COCO neurons that enable implicit self-correction against stereotypes; targeted editing of these neurons improves fairness and robustness to jailbreaks while preserving generation quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08837","ref_index":66,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Grounding Gap: How LLMs Anchor the Meaning of Abstract Concepts Differently from Humans","primary_cat":"cs.CL","submitted_at":"2026-05-09T09:41:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs show a grounding gap with humans on abstract concepts, with property-generation correlations at most r=0.37 versus human-to-human r>0.9, though larger models align better on explicit rating tasks and internal SAE features capture some grounding dimensions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07622","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Is She Even Relevant? When BERT Ignores Explicit Gender Cues","primary_cat":"cs.CL","submitted_at":"2026-05-08T11:48:22+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}