{"total":14,"items":[{"citing_arxiv_id":"2607.01006","ref_index":9,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Understanding Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-07-01T14:45:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"The paper reviews Transformer architecture, emergent LLM capabilities resembling cognition, explainable AI methods, and argues against both anthropomorphism and overly reductive views of LLM behavior as mere memorization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.19638","ref_index":32,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MiqraBERT: Regression-Based Sentence-BERT Finetuning for Biblical Hebrew Parallel Detection","primary_cat":"cs.CL","submitted_at":"2026-06-17T22:31:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MiqraBERT, a finetuned Sentence-BERT model, achieves 2.7-fold better distributional separation of parallel versus non-parallel Biblical Hebrew verses and reduces ambiguous overlap from 24% to 6%, with strong performance on narrative but weak on poetic parallels.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03420","ref_index":48,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"PHAF-Personalized Hand Avatars in a Flash","primary_cat":"cs.CV","submitted_at":"2026-06-02T10:05:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A method to generate personalized hand avatars from two views in a fraction of the time of optimization-based approaches.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00537","ref_index":22,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"PACE: Phase-Aware Chunk Execution for Robot Policies with Action Chunking","primary_cat":"cs.RO","submitted_at":"2026-05-30T05:11:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PACE dynamically selects execution horizons for action chunks in robot policies by detecting low-speed transition points in predicted speed profiles, raising success rates from 57.8% to 64.2% on 50 simulation tasks and from 50.7% to 70.4% in real-robot tests.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24949","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"APT-Agent: Automated Penetration Testing using Large Language Models","primary_cat":"cs.CR","submitted_at":"2026-05-24T08:54:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"APT-Agent automates penetration testing with LLMs using rectification and memory modules, achieving 84.29% end-to-end success on Metasploitable 2 versus lower rates for baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21962","ref_index":101,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"AI-Enabled Serious Games: Integrating Intelligence and Adaptivity in Training Systems","primary_cat":"cs.AI","submitted_at":"2026-05-21T03:48:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The chapter synthesizes the history of adaptive learning systems and examines how AI can provide instructional intelligence and real-time adaptivity in serious games while highlighting challenges such as explainability and limited long-term outcome data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19930","ref_index":42,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"High resolution large working distance scanning helium microscopy","primary_cat":"physics.optics","submitted_at":"2026-05-19T14:50:36+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Sub-micron resolution (340 nm beamwidth) achieved in large-working-distance pinhole scanning helium microscopy through constrained optimization of atom optics, redesigned pinhole plate, smaller pinhole, increased source distance, and larger detector aperture.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13138","ref_index":47,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study","primary_cat":"cs.SE","submitted_at":"2026-05-13T08:05:14+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Code language models show no transferable security understanding from code diffs alone, rely on commit messages, miss over 93% of fixes at 0.5% false positive rate, and suffer large drops under group or temporal splits.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"onthecommitmessage[ 32,33,40,50,60,61];linkingvulnerability informationfrompublicadvisories,bugtrackersorsimilarsourcesto commits [1,3,8,9,20,24,28,29,31,33,40,47,52,61]; and using other tools, algorithms or machine learning to perform code-based identificationofpotentialVFCs[ 16,31,34,40,47,61].Additionally, syntheticgenerationofVFCshasalsobeenexplored[ 47].Regardless of the technique used for generating labels, it has been shown [9] that the quality of labels is a significant concern. Some works have attempted to quantify this by using manual verification [31,32,40, 52, 60, 61]. An overview of dataset relations and key characteristics is shown in Figure 1. While some connections are present, overall"},{"citing_arxiv_id":"2605.06209","ref_index":53,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models","primary_cat":"cs.SE","submitted_at":"2026-05-07T13:14:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SiblingRepair uses LLMs with semantic sibling detection and simultaneous/iterative repair strategies to outperform prior multi-hunk APR tools like Hercules on Defects4J and GHRB benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"identify structurally (lexically or syntactically) similar or func- tionally equivalent code fragments, whereas SIBLINGREPAIR focuses on identifying code fragments that are likely to require similar modifications, or siblings. Although our approach draws inspiration from token-based and embedding-based techniques commonly used in Type-3 (syntactic similarity) and Type-4 (semantic similarity) [53] clone detection, these techniques are used only to preliminarily identify candidate siblings from suspicious locations. After obtaining candidate siblings, SIBLINGREPAIRuses an LLM to further reason about true siblings in simultaneous and iterative repair stages. We therefore focus on repair-oriented semantic relevance rather than structural similarity or functional equivalence."},{"citing_arxiv_id":"2605.00435","ref_index":9,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Escaping Mode Collapse in LLM Generation via Geometric Regulation","primary_cat":"cs.CL","submitted_at":"2026-05-01T06:12:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Reinforced Mode Regulation (RMR) applies low-rank damping to the Transformer value cache to prevent geometric collapse and enable stable autoregressive generation at entropy rates as low as 0.8 nats/step.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21214","ref_index":6,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A Demonstration of SQLyzr: A Platform for Fine-Grained Text-to-SQL Evaluation and Analysis","primary_cat":"cs.DB","submitted_at":"2026-04-23T02:12:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SQLyzr is a new evaluation platform that adds diverse metrics, realistic settings, query classification, and analysis features to overcome the single-score limitations of existing text-to-SQL benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20936","ref_index":11,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe","primary_cat":"cs.MM","submitted_at":"2026-04-22T13:11:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AttentionBender applies 2D transforms to cross-attention maps in video diffusion transformers, producing distributed distortions and glitch aesthetics that reveal entangled attention mechanisms while serving as both an XAI probe and creative tool.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"tradition becomes explicit through Network Bending, which inserts deterministic transformations into trained generative networks at inference time to reveal the network's internal mechanics and open new expressive possibilities [9]. This material, interventionist approach intersects with a growing discourse around Explainable AI for the Arts (XAIxArts) [11, 12], which argues that explainability should be evaluated by how it supports artist agency, reflection, and creative autonomy, rather than only by fidelity to a single \"ground truth\" explanation . Within this framing, explainability is not only a descriptive report about a model, but something that can be enacted through tools that allow artists to probe, test, and reconfigure internal processes [10]."},{"citing_arxiv_id":"2604.07883","ref_index":3,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"An Agentic Evaluation Architecture for Historical Bias Detection in Educational Textbooks","primary_cat":"cs.AI","submitted_at":"2026-04-09T06:51:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"An agentic architecture with multimodal screening, a five-agent jury, meta-synthesis, and source attribution protocol detects biases in Romanian history textbooks more accurately than zero-shot baselines, achieving 83.3% acceptable excerpts and human preference in 64.8% of blind comparisons.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"failure modes through a Source Attribution Protocol that enforces this distinc- tion as a constrained intermediate representation prior to evaluation. Agentic Bias Detection in Educational Textbooks 3 2.3 Multi-Agent Systems and AI in Education Multi-agent architectures have demonstrated superior robustness and factuality over single-model pipelines in reasoning-intensive tasks [3,4,7,29]. In AI in Edu- cation, LLMs have been applied to tutoring and content creation [12], but their application to curriculum governance remains limited [18,28]. We bridge these domains by adapting deliberative multi-agent evaluation- traditionally used to improve generative quality-into a conservative reliability filter for large-scale educational auditing, where the objective is to suppress weakly supported eval-"},{"citing_arxiv_id":"2601.11848","ref_index":23,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Compass vs Railway Tracks: Unpacking User Mental Models for Communicating Long-Horizon Work to Humans vs. AI","primary_cat":"cs.HC","submitted_at":"2026-01-17T00:34:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Users treat human delegation for long tasks as a flexible compass but AI delegation as rigid railway tracks due to perceived AI limitations in inference and judgment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}