{"total":31,"items":[{"citing_arxiv_id":"2606.31474","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TabPATE: Differentially Private Tabular In-Context Learning Without Public Data","primary_cat":"cs.LG","submitted_at":"2026-06-30T10:50:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TabPATE applies a PATE-style private aggregation to synthetic tabular queries generated from feature ranges, enabling private in-context learning with near-random membership inference success while keeping competitive utility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.29251","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"When Summaries Distort Decisions: Information Fidelity in LLM-Compressed Financial Analysis","primary_cat":"cs.AI","submitted_at":"2026-06-28T07:44:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM-based compression of financial source material can alter downstream investment decisions via decontextualization and model dependency, addressed by an agentic auditing approach that checks multiple compressions against the original.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.28917","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ML-Powered LDAP Reconnaissance Detection using Weak Supervision","primary_cat":"cs.LG","submitted_at":"2026-06-27T13:48:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Weakly supervised ML classifier and hypothesis-testing signature mining detect LDAP reconnaissance at 65% TPR and 81.48% field precision.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.24579","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Cross-Lingual Exploration for Parametric Knowledge","primary_cat":"cs.CL","submitted_at":"2026-06-23T13:42:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Cross-lingual prompt exploration improves factual recall and consistency in LLMs across 17 languages more efficiently than native-language scaling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12225","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bridging the Smart City Cybersecurity Data Gap Through AI-Driven Synthetic Dataset Generation","primary_cat":"cs.CR","submitted_at":"2026-06-10T15:36:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Proposes an AI-driven synthetic data generation framework to create realistic cybersecurity datasets for smart city research where real data is scarce or sensitive.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11058","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The social consequences of AI delegation","primary_cat":"physics.soc-ph","submitted_at":"2026-06-09T16:20:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"This perspective paper calls for a research program treating LLMs as consequential social actors whose outputs influence human decisions, norms, and collective dynamics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09479","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Optical Music Recognition for Real-World Manuscripts with Synthetic Data","primary_cat":"cs.CV","submitted_at":"2026-06-08T13:38:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Domain adaptation via synthetic manuscript images improves OMR performance on real-world piano manuscripts without requiring in-domain symbols.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01185","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"\"Skill issues'': data-centric optimization of lakehouse agents","primary_cat":"cs.AI","submitted_at":"2026-05-31T11:58:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Data-centric optimization of skills for agents on a branching lakehouse improves accuracy by 31.9% on 25 tasks via state-verification evaluation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30539","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Theory-Guided LLM Pedagogical Agent for STEM+C Scaffolding Without Over-Reliance","primary_cat":"cs.MA","submitted_at":"2026-05-28T20:13:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Copa is a theory-guided multimodal LLM agent that supports high school computational modeling through adaptive feedback, shown in a 33-dyad study to increase student confidence and conceptual verbalization without fostering dependence.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27090","ref_index":79,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Open Data from LIGO, Virgo, and KAGRA through the Second Part of the Fourth Observing Run","primary_cat":"gr-qc","submitted_at":"2026-05-26T14:36:19+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":2.0,"formal_verification":"none","one_line_summary":"LIGO-Virgo-KAGRA releases calibrated strain time series, noise-subtraction channels, and GWOSC v5.0 analysis products covering April 2024 to January 2025.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07553","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MedicalRec: Medical recommender system for image classification without retraining","primary_cat":"cs.LG","submitted_at":"2026-05-23T13:29:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A transformer recommender system trained on a new benchmark of over 5,000 model performances from medical imaging papers achieves up to 75.5% HitRate@100.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23026","ref_index":92,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Opportunities and Risks of Generative AI through the Health Information Journey","primary_cat":"cs.CY","submitted_at":"2026-05-21T20:49:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Authors propose a four-stage framework to analyze opportunities and risks of generative AI across the health information journey from public sources to clinical care.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"[91] Arya Rao, John Kim, Meghana Kamineni, Michael Pang, Winston Lie, Keith J. Dreyer, and Marc D. Succi. Evaluating GPT as an Ad- junct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot.Jour- nal of the American College of Radiology, 20(10): 990-997, October 2023. URLhttp://dx.doi .org/10.1016/j.jacr.2023.05.003. 15 [92] Arya Rao, Michael Pang, John Kim, Meghana Kamineni, Winston Lie, Anoop K Prasad, Adam Landman, Keith Dreyer, and Marc D Succi. Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.Journal of Medical Internet Research, 25:e48659, August 2023. URLhttp://dx.doi.org/10.2196/4 8659. [93] Hsu-Ju Kao, Tsair-Wei Chien, Wen-Chung"},{"citing_arxiv_id":"2605.18483","ref_index":79,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Modality vs. Morphology: A Framework for Time Series Classification for Biological Signals","primary_cat":"cs.LG","submitted_at":"2026-05-18T14:36:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A review synthesizes evidence from EEG, EMG, ECG, PPG and ocular signals to argue that waveform morphology, rather than modality or model class, primarily determines TSC performance and interpretability.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Spikes / Bursts EEG; EMG; EOG Decision Tree; KNN; CNNs; Bayesian Networks; Transformers (Vision, Hybrid); Ensembles; Graph NNs; HMMs EEG: Brain Activity [66], [67] EEG: Seizure Detection [59], [68]. EMG: Gesture Recognition [69]-[73] EMG: Motion Prediction [74] EMG: Joint Kinematics [75] EMG: Blind Source Separation [76], [77] EOG: Eye Movement [78], [79] EOG: Sleep Staging [80] Oscillatory EEG; ECG; PPG Random Forest; Regression; SVM; SSM; CNN (Hybrid); LSTM (Bi-LSTM; Transformers EEG: Sleep Staging [81], [82] EEG: Emotion Recognition [83] EEG: Brain State [84] EEG: Schizophrenia Detection [85] ECG: Arrhythmia Detection [60], [86]-[89] ECG: ECG Classification [90] ECG: Stress Detection [91] ECG: Waveform Delineation [92]"},{"citing_arxiv_id":"2605.18199","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PIPER: Content-Based Table Search via profiling and LLM-Generated Pseudoqueries","primary_cat":"cs.IR","submitted_at":"2026-05-18T10:39:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PIPER retrieves and ranks tabular datasets by profiling their content and using LLM-generated queries for dense vector search, outperforming metadata baselines and TableQA methods in low-metadata settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11632","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization","primary_cat":"cs.CL","submitted_at":"2026-05-12T06:56:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Macro uses DPO on composite preference pairs to raise validity of multilingual self-generated counterfactual explanations by 12.55% on average over chain-of-thought while preserving minimality.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"2.3 Edit Score (R edit) The other objective beyond validity is to minimize modifications to the original input x. Following Dehghanighobadi et al. (2025); Wang et al. (2025); Bhattacharjee et al. (2024), we employ a normal- ized Levenshtein distance d, which captures all applied edits, converted to a similarity score3: Redit(x,˜x) = 1− d(x,˜x) |x| (5) This will penalize ˜xwith extensive modifications to the input x, which may even result in a negative edit score. 3.2.4 Total Score (R total) We compute a scalar total score by a weighted sum over available components: Rtotal = X k wkRk, w k >0(6) where k indexes applicable scores with weights wk and scoresR k ∈ {Rflip,R aug,R edit}(App. B.4). 3.3 Preference Alignment"},{"citing_arxiv_id":"2605.10286","ref_index":56,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks","primary_cat":"cs.AI","submitted_at":"2026-05-11T09:46:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09533","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Assessment of RAG and Fine-Tuning for Industrial Question-Answering-Applications","primary_cat":"cs.CL","submitted_at":"2026-05-10T13:35:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"RAG is more effective and cost-efficient than fine-tuning for industrial QA adaptation on automotive datasets.","context_count":1,"top_context_role":"extension","top_context_polarity":"extend","context_text":"running and manually tackles the task, e.g., by searching for the answer using more traditional, slower methods. We split the recursive term in Equation 1 into two components: re- running and human fallback, proportionally weighted by the rerun probabilityR, which captures the user's willingness to retry. This yields the revised recursive Equation (2): CoP ex =G+V+ (CoP ex ∗R+H∗(1−R)) (1−S)(2) Solving this recursion gives the closed-form Equation (3): CoP ex = G+V+H(1−R) (1−S) 1−R(1−S) (3) Note that this extended Cost-of-Pass (CoP ex) is equivalent toCoPwhen the repetition rateRis assumed to be one and the validation costVto be zero. While this extended Cost- of-Pass model still has limitations, we believe it more ac-"},{"citing_arxiv_id":"2605.07957","ref_index":46,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Similar Pattern Annotation via Retrieval Knowledge for LLM-Based Test Code Fault Localization","primary_cat":"cs.SE","submitted_at":"2026-05-08T16:20:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SPARK improves LLM-based test code fault localization by retrieving similar past faults and selectively annotating suspicious lines in new failing tests.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"3 Retrieval-Augmented Generation (RAG) for Fault Localization Recent LLM-based system understanding and reasoning methods have increasingly moved beyond vanilla prompts toward augmenting prompts with additional contextual information-such as repository artifacts, change sets, and logs-to improve the effectiveness of LLM reasoning for tasks such as patch generation [ 76], program repair [46], and SUTFL [12, 63]. This trend is largely driven by Retrieval-Augmented Generation (RAG) [ 33], which enhances LLM responses by retrieving relevant external information at inference time and incorporating it into the prompt. Manuscript submitted to ACM 8 Golnaz Gharachorlu, Mahsa Panahandeh, Lionel C. Briand, Ruifeng Gao, and Ruiyuan Wan RAG is particularly useful because vanilla LLMs may lack the domain and project-specific knowledge needed to"},{"citing_arxiv_id":"2605.01189","ref_index":104,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"NEURON: A Neuro-symbolic System for Grounded Clinical Explainability","primary_cat":"cs.AI","submitted_at":"2026-05-02T02:00:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"NEURON integrates SNOMED CT, ML, and RAG LLM to raise AUC from 0.74-0.77 to 0.84-0.88 and human-aligned explainability scores from 0.50 to 0.85 on MIMIC-IV acute heart failure data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25906","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Make Any Collection Navigable: Methods for Constructing and Evaluating Hypergraph of Text","primary_cat":"cs.IR","submitted_at":"2026-04-28T17:52:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Methods for constructing Hypergraphs of Text are proposed with a new effort ratio metric where TF-IDF baselines match LLM methods in experiments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13934","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Towards Enabling An Artificial Self-Construction Software Life-cycle via Autopoietic Architectures","primary_cat":"cs.SE","submitted_at":"2026-04-15T14:46:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Proposes autopoietic architectures for self-constructing software as a fundamental shift in the SDLC, leveraging foundation models for autonomous evolution and maintenance.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"automating repetitive tasks using Machine Learning (ML) and, more recently, foundational models (FMs) - large-scale pre-trained mod- els trained on vast datasets of code and natural language with broad capabilities across diverse tasks. Automating software maintenance ensures that software sys- tems remain efficient, reliable, adaptable, and relevant over time [43, 46]. Several studies have investigated software maintenance automation on downstream tasks such as testing [16, 17], and pro- gram repair [10, 30, 60, 67]. Additionally, foundation model-based code generation has been extensively studied in the field of soft- ware maintenance [7, 11, 25, 52, 57, 61] to assist practitioners with large language models (LLMs) demonstrating remarkable abilities"},{"citing_arxiv_id":"2604.13433","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PackSELL: A Sparse Matrix Format for Precision-Agnostic High-Performance SpMV","primary_cat":"cs.DC","submitted_at":"2026-04-15T03:14:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09277","ref_index":125,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Catalog of Data Errors","primary_cat":"cs.DB","submitted_at":"2026-04-10T12:46:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A new catalog classifying 35 data error types into missing, incorrect, and redundant categories for tabular data, with definitions and examples to improve data quality management.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"InProceedings of the International Conference on Data Engineering (ICDE). IEEE, 28-32. doi:10.1109/ICDEW67478.2025.00008 Manuscript submitted to ACM 34 Bhadauria et al. [124] Youran Zhou, Sunil Aryal, and Mohamed Reda Bouadjenek. 2024. Review for Handling Missing Data with special missing mechanism.CoRR abs/2404.04905 (2024). doi:10.48550/ARXIV.2404.04905 [125] Yuhan Zhou, Fengjiao Tu, Kewei Sha, Junhua Ding, and Haihua Chen. 2024. A Survey on Data Quality Dimensions and Tools for Machine Learning Invited Paper. InIEEE International Conference on Artificial Intelligence Testing, AITest. IEEE, Shanghai, China, 120-131. doi:10.1109/AITEST62860. 2024.00023 [126] Jingyu Zhu, Xintong Zhao, Yu Sun, Shaoxu Song, and Xiaojie Yuan."},{"citing_arxiv_id":"2604.07956","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems","primary_cat":"cs.AI","submitted_at":"2026-04-09T08:21:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MONETA is the first multimodal benchmark for industry classification using text and geographic sources, with MLLM baselines at 62-74% accuracy and up to 22.8% gains from multi-turn context enrichment and explanations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.24858","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Context-Mediated Domain Adaptation in Multi-Agent Sensemaking Systems","primary_cat":"cs.HC","submitted_at":"2026-03-25T22:57:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Context-mediated domain adaptation treats user modifications to AI artifacts as implicit domain specifications that reshape LLM-powered multi-agent reasoning, demonstrated via the Seedentia system which extracted 46 domain knowledge entries from expert edits.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.02359","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis","primary_cat":"cs.CL","submitted_at":"2026-03-20T04:31:03+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.14590","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Counterfactual Modeling with Fine-Tuned LLMs for Health Intervention Design and Sensor Data Augmentation","primary_cat":"cs.LG","submitted_at":"2026-01-21T02:04:08+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Fine-tuned LLMs produce plausible counterfactuals for health interventions and recover 20% F1 via data augmentation in label-scarce sensor datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.02735","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Revisiting Forest Proximities via Sparse Leaf-Incidence Kernels","primary_cat":"cs.LG","submitted_at":"2026-01-06T05:57:43+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Forest proximities admit an exact sparse factorization via separable weighted leaf-collision kernels that reduces computation to sparse linear algebra over leaf collisions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.05929","ref_index":125,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM Harms: A Taxonomy and Discussion","primary_cat":"cs.CY","submitted_at":"2025-12-05T18:12:21+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"Murdock et al. chronicle the evolution of spam: prompts tuned on Bayesian -filter feedback can produce near -undetectable spam waves, eroding the efficacy of long -standing defences [123]. These ten studies jointly show that deception scales with model accessibility and is already eroding trust in educational, financial and e -commerce domains [124], [125], [126]. 13 4.3.3 Security & Privacy Attacks Misuse escalates when adversaries integrate LLMs into offensive security tool -chains. The OWASP Gen-AI Risk list now ranks Prompt Injection (LLM01) as the top vulnerability; adversaries can coerce a model to reveal hidden system instructions or execute unt rusted plugin calls [92]. Wei et al. extend"},{"citing_arxiv_id":"2509.21080","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"InsideOut: Measuring and Mitigating Insider-Outsider Bias in Interview Script Generation","primary_cat":"cs.CL","submitted_at":"2025-09-25T12:28:25+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The paper introduces the InsideOut benchmark to quantify insider-outsider bias in LLM-generated interview scripts across 10 cultures and shows that multi-agent mitigation frameworks substantially reduce the bias on metrics like Cultural Alignment Gap.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.13131","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AlphaEvolve: A coding agent for scientific and algorithmic discovery","primary_cat":"cs.AI","submitted_at":"2025-06-16T06:37:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"materials science [45, 71, 94, 119], chemistry [12, 64], bioinformatics [67, 85], geoscience [79], and quantum physics [30, 78] (for surveys on the topic, see [36, 65, 81]). Many of these methods use LLMs to automate several distinct stages of the scientific discovery process [37, 59, 106, 109, 112], e.g., for generating and ranking hypotheses and ideas [38, 90]. Of these methods, especially related toAlphaEvolveare the methods that use LLM-guided tree search-based algorithms [11] or LLM-guided evolutionary algorithms [34, 113, 120]. Other works use LLMs to optimize experimental planning and design [7, 10, 43, 75] or experiment execution and workflow [28, 62, 82, 105, 116]. Finally, there are also works focusing on the data analysis stage [80]."}],"limit":50,"offset":0}