{"total":75,"items":[{"citing_arxiv_id":"2606.02812","ref_index":59,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection","primary_cat":"cs.AI","submitted_at":"2026-06-01T19:30:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Traj-Evolve combines non-parametric experience retrieval and multi-agent RL with a leave-one-out unification strategy to outperform baselines on lung cancer prediction from up to five years of multimodal EHRs, including in never-smokers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00605","ref_index":75,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Looped Transformers with Layer Normalization Provably Learn the Power Method","primary_cat":"cs.LG","submitted_at":"2026-05-30T08:05:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Looped linear transformers with LN provably converge via GD to implement the power method on principal component prediction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30784","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Text-guided Feature Disentanglement for Cross-modal Gait Recognition","primary_cat":"cs.CV","submitted_at":"2026-05-29T03:16:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TCFDNet uses a Gait Modality Text Dictionary from LLMs, CLIP alignment, and text-guided disentanglement modules to achieve SOTA cross-modal gait recognition on SUSTech1K and FreeGait.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28607","ref_index":6,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Adaptive Multimodal Agents-Based Framework for Automatic Workflow Execution","primary_cat":"cs.AI","submitted_at":"2026-05-27T15:23:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A multimodal multi-agent system constructs a fixed topological knowledge base offline from logs and applies adaptive RAG with collaborative verification for automatic workflow execution.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20382","ref_index":6,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs","primary_cat":"cs.CL","submitted_at":"2026-05-19T18:32:20+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17152","ref_index":27,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages","primary_cat":"cs.CL","submitted_at":"2026-05-16T20:56:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11235","ref_index":26,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning","primary_cat":"cs.LG","submitted_at":"2026-05-11T20:50:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"METIS internalizes curriculum judgment in LLM reinforcement fine-tuning by predicting within-prompt reward variance via in-context learning and jointly optimizing with a self-judgment reward, yielding superior performance and up to 67% faster convergence across math, code, and agent benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Speed-rl: Faster training of reasoning models via online curriculum learning.arXiv preprint arXiv:2506.09016, 2025. [25] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877-1901, 2020. [26] Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Xu Ma, Rui Li, Hao Xia, Jingjing Xu, Zhifang Wu, Baobao Chang, Xu Sun, Zhifang Li, and Zhifang Sui. A survey on in-context learning.arXiv preprint arXiv:2301.00234, 2023. URLhttps://arxiv.org/abs/2301.00234. [27] Shiguang Wu, Yaqing Wang, and Quanming Yao. Why in-context learning models are good few-shot"},{"citing_arxiv_id":"2605.10936","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Personal Visual Context Learning in Large Multimodal Models","primary_cat":"cs.CV","submitted_at":"2026-05-11T17:59:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces Personal VCL formalization and benchmark revealing LMM context gaps, plus an Agentic Context Bank baseline that boosts personalized visual reasoning.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Grounded question-answering in long egocentric videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [15] Anxhelo Diko, Tinghuai Wang, Wassim Swaileh, Shiyan Sun, and Ioannis Patras. ReWind: Understanding long videos with instructed learnable memory. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [16] Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, et al. A survey on in-context learning.arXiv preprint arXiv:2301.00234, 2023. 10 [17] Yuhao Dong, Shulin Tian, Shuai Liu, Shuangrui Ding, Yuhang Zang, and Xiao Dong. Demo-ICL: In-context learning for procedural video knowledge acquisition."},{"citing_arxiv_id":"2605.05472","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The Pedagogy of AI Mistakes: Fostering Higher-Order Thinking","primary_cat":"cs.CY","submitted_at":"2026-05-06T21:50:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"AI mistakes can be structured into course activities to foster higher-order thinking, metacognition, and AI literacy in higher education.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05161","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging","primary_cat":"cs.CV","submitted_at":"2026-05-06T17:32:34+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08212","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"LLMs with in-context learning for Algorithmic Theoretical Physics","primary_cat":"cs.LG","submitted_at":"2026-05-06T09:30:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Frontier LLMs with in-context learning and CAS integration solve most algorithmic tasks in theoretical physics when supplied with worked examples.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04227","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks","primary_cat":"cs.AI","submitted_at":"2026-05-05T19:12:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Pro²Assist uses multimodal egocentric perception from AR glasses to track fine-grained progress in long-horizon procedural tasks and deliver timely proactive assistance, outperforming baselines by over 21% in action understanding and up to 2.29x in timing accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03950","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning","primary_cat":"cs.CV","submitted_at":"2026-05-05T16:36:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"UnAC improves LMM performance on visual reasoning benchmarks by combining adaptive visual prompting, image abstraction, and gradual self-checking.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01448","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Decompose and Recompose: Reasoning New Skills from Existing Abilities for Cross-Task Robotic Manipulation","primary_cat":"cs.RO","submitted_at":"2026-05-02T13:55:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Decompose and Recompose decomposes seen robotic demonstrations into skill-action alignments and recomposes them via visual-semantic retrieval and planning to enable zero-shot cross-task generalization.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Reasoning New Skills from Existing Abilities for Cross-Task Robotic Manipulation A. Methodology and Implementation Details A.1. Low-level control interface We use a standard RLBench control interface that executes an end-effector target pose via motion planning and applies a discrete gripper open/close command. Concretely, each continuous control step is represented as u= [p,q, g]∈R 3 ×H× {0,1},(6) where p∈R 3 is the end-effector position in the robot base frame, q∈H is a unit quaternion specifying orientation, and g is a binary gripper command (e.g.,g=1for open,g=0for close). A.2. Discrete action format for the LLM For LLM interaction, we discretize translation and rotation into integer bins and represent each action as a 7-tuple: a= [i x, iy, iz, ir, ip, iψ, g],(7)"},{"citing_arxiv_id":"2604.27660","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"From Context to Skills: Can Language Models Learn from Context Skillfully?","primary_cat":"cs.AI","submitted_at":"2026-04-30T09:53:15+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Evaluation.Context learning is a newly emerging research topic, aiming to require LMs to learn from unseen context and leverage new knowledge beyond what was acquired during pre-training to reason and solve tasks [12]. This goes far beyond long-context tasks [ 6, 35] that primarily test retrieval or reading comprehension, and in-context learning tasks [11], where models learn simple task patterns via instructions and demonstrations. We select the CL-bench [12] for our evaluation, which comprises 500 complex contexts, 1,899 tasks, and 31,607 verification rubrics, all crafted by experienced domain experts. Each task is designed such that the new content required to resolve it is contained within the corresponding context."},{"citing_arxiv_id":"2604.27244","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"RAQG-QPP: Query Performance Prediction with Retrieved Query Variants and Retrieval Augmented Query Generation","primary_cat":"cs.IR","submitted_at":"2026-04-29T22:31:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Retrieved query variants from logs combined with LLM-augmented generation improve unsupervised QPP accuracy by up to 30% for neural rankers on TREC DL'19 and DL'20.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"used to understand the underlying information of queries, which can help the IR process. For instance, Wang et al. [31] used pseudo-documents generated by LLMs to do query expansion. Moreover, IR can also be applied to improve the generative process of LLMs. Since the advent of GPT-3 [12], LLMs have gained the competence offew-shot learning, i.e. learning from a few demonstrations [ 32]. This paradigm is known as in-context learning (ICL) [ 33]. If a demonstration dataset is provided, one effective way to seek suitable demonstrations from it for a specific task is to retrieve demonstrations according to their similarity to the target task [34]. The retrieved contexts for LLM are not restricted to labelled examples; they can also be unlabelled data, such as the"},{"citing_arxiv_id":"2604.23371","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"When Context Sticks: Studying Interference in In-Context Learning","primary_cat":"cs.LG","submitted_at":"2026-04-25T16:35:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Chat-based Large Language Models\". In: (2023). arXiv: 2309 . 12727 [cs.AI].URL: https://arxiv.org/abs/2309.12727. [9] Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, and Furu Wei.Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta- Optimizers. 2023. arXiv: 2212.10559 [cs.CL].URL: https://arxiv.org/abs/ 2212.10559. [10] Qingxiu Dong et al. \"A Survey on In-context Learning\". In: (2024). arXiv: 2301.00234 [cs.CL].URL:https://arxiv.org/abs/2301.00234. [11] Shivam Garg, Dimitris Tsipras, Percy Liang, and Gregory Valiant. \"What Can Transformers Learn In-Context? A Case Study of Simple Function Classes\". In: (2023). arXiv: 2208 . 01066 [cs.CL].URL:https://arxiv.org/abs/2208."},{"citing_arxiv_id":"2604.22906","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities","primary_cat":"cs.DC","submitted_at":"2026-04-24T16:56:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Table 9 summarizes the main evaluation platforms. Manuscript submitted to ACM Network Edge Inference for Large Language Models: Principl es, Techniques, and Opportunities 27 Table 9. Representative Evaluation platforms and so/f_tware s tacks for edge LLM inference Category Rep. tools/platforms Evaluation focus Hardware simulation gem5 [14], Timeloop [117], NVSim [36] Processor/accelerator pipelines and memory hierarchy behavior System simulation or proﬁling TVM [21], PyTorch proﬁlers, MLPerf [105] Kernel scheduling, operator fusion, quantization eﬀects, backend behavior Network emulation ns-3 [59], Mininet [87] Split/distributed inference under bandwidth, RTT, and packet loss constraints Real-device testbeds (hardware)"},{"citing_arxiv_id":"2604.18907","ref_index":117,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Gradient-Based Program Synthesis with Neurally Interpreted Languages","primary_cat":"cs.LG","submitted_at":"2026-04-20T23:14:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18562","ref_index":171,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation","primary_cat":"cs.CV","submitted_at":"2026-04-20T17:49:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AnchorSeg uses ordered query banks of latent reasoning tokens plus a spatial anchor token and a Token-Mask Cycle Consistency loss to achieve 67.7% gIoU and 68.1% cIoU on the ReasonSeg benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17214","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition","primary_cat":"cs.AI","submitted_at":"2026-04-19T02:50:14+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Fine-tuned LLaMA3 with LoRA reaches 81.24% F1 on 18-category fine-grained medical entity recognition, beating zero-shot by 63.11% and few-shot by 35.63%.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"scenarios where annotated data is scarce. These approaches allow rapid adaptation to new tasks, with minimal supervision. In the context of applying ICL to NER, prior work has ex- plored both few-shot and zero-shot capabilities. A recent study [27] proposed GPT-NER, a method that improves few-shot perfor- mance by selecting semantically similar examples for prompts us- ing SimCSE [6], a contrastive sentence embedding model, instead of traditional encoders like RoBERTa-large [16]. This approach demonstrated strong results in low-resource conditions, achieving comparable performance to fully supervised BERT-based models on standard NER benchmark datasets of CoNLL-2003 [23] and OntoNotes5.0 [21]. A more recent study [26] aimed to improve few-"},{"citing_arxiv_id":"2604.15547","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)","primary_cat":"cs.CL","submitted_at":"2026-04-16T21:52:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SSAS improves LLM sentiment prediction consistency and data quality by up to 30% on three review datasets via syntactic and semantic context assessment summarization.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"in instruction templates - can induce significant fluctuations in classification accuracy. This volatility suggests that the standard attention mechanism often converges on surface-level patterns rather than underlying logical structures. Furthermore, the architectural constraints of the transformer's context window present a dimensional bottleneck. As noted by Dong et al. [6], fixed token limits necessitate a zero-sum trade-off between the depth of individual examples and the breadth of the reference set. In enterprise analytics, where datasets are high-dimensional and noisy, this limitation often leads to recency bias or the inclusion of non-representative outliers that confound the model's outcomes. The SSAS framework departs from traditional ICL by replacing static, heuristically-derived prompts [30, 31] with a"},{"citing_arxiv_id":"2604.14030","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Dual-Enhancement Product Bundling: Bridging Interactive Graph and Large Language Model","primary_cat":"cs.CL","submitted_at":"2026-04-15T16:09:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A graph-to-text paradigm with Dynamic Concept Binding Mechanism integrates interactive graphs and LLMs to recommend product bundles, yielding 6.3%-26.5% gains over baselines on POG, POG_dense, and Steam datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12049","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs","primary_cat":"cs.CL","submitted_at":"2026-04-13T20:41:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"wSSAS is a two-phase deterministic framework that uses hierarchical text organization and SNR-based feature prioritization to improve clustering integrity, categorization accuracy, and reproducibility when applying LLMs to large review datasets.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Despite this versatility, the path to enterprise-grade reliability remains obstructed by several technical frictions. Cur- rent LLM performance is sensitive to prompt engineering; such that minor syntactic variations in instructions can yield drastically different classification outcomes [6]. Furthermore, the inherent constraints of In-Context Learning token limits restrict the volume of reference examples a model can process [7]. At a cognitive level, LLMs struggle ∗{shreeya, nitin, sharookh}@tellagence.com †nitindra.joglekar@villanova.edu ‡webercm@pdx.edu arXiv:2604.12049v1 [cs.CL] 13 Apr 2026 wSSAS: A Framework for Improved Text Categorization and Summarization using LLMs with nuanced linguistic phenomena such as irony, intensification, and latent bias [ 8]. For organizations, these lim-"},{"citing_arxiv_id":"2604.10946","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Learning to Adapt: In-Context Learning Beyond Stationarity","primary_cat":"cs.LG","submitted_at":"2026-04-13T03:37:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08752","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"LLMs Underperform Graph-Based Parsers on Supervised Relation Extraction for Complex Graphs","primary_cat":"cs.CL","submitted_at":"2026-04-09T20:34:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Graph-based parsers outperform LLMs on supervised relation extraction as linguistic graph complexity grows with more relations per document.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05429","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Bridging Natural Language and Microgrid Dynamics: A Context-Aware Simulator and Dataset","primary_cat":"eess.SY","submitted_at":"2026-04-07T04:52:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OpenCEM is the first open-source digital twin that integrates unstructured contextual information with quantitative microgrid dynamics to enable context-aware energy management.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03048","ref_index":68,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition","primary_cat":"cs.SE","submitted_at":"2026-04-03T13:56:39+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Hybrid LLM plus static analysis for algorithm recognition in code cuts required model calls by 72-97% and lifts F1-scores by as much as 12 points.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Our work improves on both aspects and is the first to comprehensively evaluate the combination of LLMs with static analysis for the use case of algorithm recognition. Large Language Models:Many efforts aim to enhance LLMs via improved architectures (attention [ 66], mixture of experts [ 37]), refined prompting (ICL [ 51], CoT [ 54]), and optimization methods (quantization [67], distillation [68]) to im- prove both output quality and runtime efficiency. Additionally, numerous studies investigate LLMs in code related tasks such as code translation [24], [25], bug fixing [26], [27] or code clone detection [69], [70]. For example Khajezade et al. [ 70] explore using GPT for code clone detection in both mono-lingual and cross-lingual settings finding that it achieves F1-scores"},{"citing_arxiv_id":"2604.16421","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Measuring Representation Robustness in Large Language Models for Geometry","primary_cat":"cs.CL","submitted_at":"2026-04-03T11:36:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs display accuracy gaps of up to 14 percentage points on the same geometry problems solely due to representation choice, with vector forms consistently weakest and a convert-then-solve prompt helping only high-capacity models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"2022. PaLM: Scaling language modeling with pathways.arXiv preprint arXiv:2204.02311. https://arxiv.org/abs/2204. 02311. [7] Karl Cobbe, Vineet Kosaraju, Oleg Klimov, et al. 2021. Training verifiers to solve math word problems. InProceedings of the 35th AAAI Conference on Artificial Intelligence, pages 12563-12571.https://arxiv.org/abs/2009.03393. [8] Qingxiu Dong, Lei Li, Di Xu, et al. 2024. A survey on in-context learning.ACM Computing Surveys, 56(3):1-41.https://arxiv.org/abs/2301.00234. [9] Iddo Drori, Sarah Zhang, Reece Shuttleworth, et al. 2022. A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.Proceedings of the National Academy of Sciences, 119(32):e2123433119."},{"citing_arxiv_id":"2604.02845","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Deformation-based In-Context Learning for Point Cloud Understanding","primary_cat":"cs.CV","submitted_at":"2026-04-03T08:01:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DeformPIC deforms query point clouds under prompt guidance for in-context learning, outperforming prior methods with lower Chamfer Distance on reconstruction, denoising, and registration tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository.arXiv preprint arXiv:1512.03012, 2015. 5 [6] Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, An- dreas Geiger, and Hongyang Li. End-to-end autonomous driving: Challenges and frontiers.IEEE TPAMI, 2024. 2 [7] Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, et al. A survey on in-context learning.arXiv preprint arXiv:2301.00234, 2022. 2 [8] Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jian- jian Sun, Zheng Ge, Li Yi, and Kaisheng Ma. Autoen- coders as cross-modal teachers: Can pretrained 2d image"},{"citing_arxiv_id":"2604.01707","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework","primary_cat":"cs.CL","submitted_at":"2026-04-02T07:19:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"ergistic management and retrieval strategies to respective storage components, hierarchical storage effectively optimizes the trade-off between computational overhead and knowledge persistence. ❸ Vector-based storage.This approach encodes textual memory into high-dimensional embeddings, subsequently indexed in dedi- cated vector libraries or databases, such as FAISS [13] and Qdrant, to enable the agent to perform efficient semantic similarity search. Vector-based storage can function as a standalone repository or serve as a foundational building block frequently integrated into more complex storage architectures. ❹ Graph-based storage.This approach utilizes diverse graph topologies, such as trees, knowledge graphs, and temporal graphs,"},{"citing_arxiv_id":"2512.14917","ref_index":20,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Evaluating Code Reasoning Abilities of Large Language Models Under Real-World Settings","primary_cat":"cs.SE","submitted_at":"2025-12-16T21:12:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A new dataset and nine-metric majority-vote procedure show that existing code-reasoning benchmarks are dominated by lower-complexity problems that do not reflect real-world code.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.12794","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Rule-Aware Prompt Framework for Structured Numeric Reasoning in Cyber-Physical Systems","primary_cat":"eess.SY","submitted_at":"2025-12-14T18:23:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A rule-aware modular prompt framework enables LLMs to perform structured numeric reasoning on power grid data by separating rules from normalized deviations, improving anomaly detection consistency and reducing token use in IEEE 118-bus tests.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.06721","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems in the Wild","primary_cat":"cs.AI","submitted_at":"2025-12-07T08:21:07+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ProAgent uses on-demand tiered perception and context-aware LLM reasoning to deliver proactive assistance on AR glasses, achieving up to 27.7% higher prediction accuracy and 20.5% lower false detections than baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.13502","ref_index":2,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SnapAudit: Active Auditing of Differentially Private In-Context Learning via Snapshot-Based Simulation","primary_cat":"cs.CR","submitted_at":"2025-11-17T15:39:54+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SnapAudit decomposes DP-ICL into a deterministic snapshot stage and a stochastic noise stage, using bootstrap simulation to achieve 80-200x faster auditing and exposing privacy bound violations in existing Gaussian and embedding mechanisms.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.04570","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm","primary_cat":"cs.CV","submitted_at":"2025-11-06T17:25:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Video generation models demonstrate competitive multimodal reasoning on a new benchmark, matching or exceeding VLMs on visual puzzles and achieving 92% on MATH and 69.2% on MMMU.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.01763","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Context-Guided Decompilation: A Step Towards Re-executability","primary_cat":"cs.SE","submitted_at":"2025-11-03T17:21:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ICL4Decomp applies in-context learning to guide LLMs in generating re-executable decompiled code from binaries, reporting roughly 40% higher re-executability than prior methods across datasets and optimization levels.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.18333","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption","primary_cat":"cs.CR","submitted_at":"2025-10-21T06:34:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.18117","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Online In-Context Distillation for Low-Resource Vision Language Models","primary_cat":"cs.CV","submitted_at":"2025-10-20T21:35:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Online In-Context Distillation lets small VLMs gain up to 33% performance with as little as 4% teacher annotations by distilling knowledge through dynamic in-context demonstrations at inference.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.09689","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models","primary_cat":"cs.CR","submitted_at":"2025-10-09T09:44:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CREST-Search is a red-teaming framework that crafts seemingly benign search queries to induce unsafe citations from web-augmented LLMs, backed by a new WebSearch-Harm dataset for fine-tuning a specialized attacker model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.24164","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis","primary_cat":"cs.CL","submitted_at":"2025-09-29T01:28:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A new framework using Task Subspace Logit Attribution localizes attention heads specialized for task recognition and task learning in in-context learning, showing they align and rotate hidden states within a task subspace.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.23108","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Artificial Phantasia: Emergent Mental Imagery in Large Language Models","primary_cat":"cs.AI","submitted_at":"2025-09-27T04:36:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs achieve higher accuracy than humans on compositional imagery tasks previously argued to require pictorial representations, supporting emergent propositional mental imagery in AI.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.20328","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Video models are zero-shot learners and reasoners","primary_cat":"cs.LG","submitted_at":"2025-09-24T17:17:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877-1901, 2020. [8] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Emergent abilities of large language models.arXiv preprint arXiv:2206.07682, 2022. [9] Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, et al. A survey on in-context learning.arXiv preprint arXiv:2301.00234, 2022. 10 Video models are zero-shot learners and reasoners [10] Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners."},{"citing_arxiv_id":"2508.14685","ref_index":6,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SSA: Improving Performance With a Better Scoring Function","primary_cat":"cs.CL","submitted_at":"2025-08-20T13:01:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Replacing Softmax with Scaled Signed Averaging in transformer attention improves generalization under distribution shifts for in-context learning and boosts results on NLP benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.21046","ref_index":136,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence","primary_cat":"cs.AI","submitted_at":"2025-07-28T17:59:05+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.20906","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Soft Head Selection for Injecting ICL-Derived Task Embeddings","primary_cat":"cs.CL","submitted_at":"2025-07-28T14:59:17+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SITE applies soft gradient-based head selection to inject ICL-derived task embeddings, outperforming prior embedding adaptation and few-shot ICL across generation, reasoning, and NLU tasks on 12 LLMs from 4B to 70B parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.21168","ref_index":1,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Diverse LLMs or Diverse Question Interpretations? That is the Ensembling Question","primary_cat":"cs.CL","submitted_at":"2025-07-25T15:26:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Question interpretation diversity outperforms model diversity for LLM ensembling on binary QA tasks using majority voting.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.15671","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"BugScope: Learn to Find Bugs Like Human","primary_cat":"cs.SE","submitted_at":"2025-07-21T14:34:01+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BugScope structures LLM bug detection into three human-mirroring steps and distills guidelines from examples, reaching 0.87 F1 on 33 real bugs while outperforming Claude and Cursor tools and uncovering 184 new issues in production code.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.15616","ref_index":26,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models","primary_cat":"cs.CV","submitted_at":"2025-05-21T15:06:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LENS is a new multi-level benchmark dataset for evaluating MLLMs on perception-to-reasoning tasks using the same images across all levels with recent social media content.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.20571","ref_index":63,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Reinforcement Learning for Reasoning in Large Language Models with One Training Example","primary_cat":"cs.LG","submitted_at":"2025-04-29T09:24:30+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"One training example via RLVR boosts LLM math reasoning from 17.6% to 35.7% average across six benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}