{"total":14,"items":[{"citing_arxiv_id":"2606.29100","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Toward Exascale AI for Science: A Scalable AI Skill for Autonomous Microkinetics Discovery","primary_cat":"cs.CE","submitted_at":"2026-06-27T22:10:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Introduces a scalable AI skill framework for autonomous microkinetics discovery that automates workflows and evaluates surrogate reliability.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24002","ref_index":49,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Harnessing AtomisticSkills for Agentic Atomistic Research","primary_cat":"physics.chem-ph","submitted_at":"2026-05-18T21:45:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AtomisticSkills is a new harness framework with 100+ human-curated skills that lets general AI agents perform atomistic research tasks including simulations, screening, and analysis, shown on electrolyte design, CO2 capture, drug screening, and catalyst tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18661","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AI for Auto-Research: Roadmap & User Guide","primary_cat":"cs.AI","submitted_at":"2026-05-18T17:08:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"with tools, resources, and prompts, and iteratively tests the resulting agent so that users can interact with the paper's methods through natural language. This reframes dissemination as operational access: a paper is no longer only read, but queried and executed. Related systems broaden this idea from paper-specific agents to tool-using scientific agents. Gaoet al. [42] study how scientific tool ecosystems can democratize AI scientists by exposing computational capabilities through agent-accessible interfaces. ProteinMCP [227] applies an MCP-based agentic framework to protein engineering, illustrating how domain-specific workflows can be wrapped into interactive, tool-using systems. These systems suggest that future dissemination may increasingly involve executable interfaces, reproducible"},{"citing_arxiv_id":"2605.13045","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Large Language Models Lack Temporal Awareness of Medical Knowledge","primary_cat":"cs.LG","submitted_at":"2026-05-13T06:04:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"LLMs lack temporal awareness of medical knowledge, showing gradual performance decline on up-to-date facts, much lower accuracy on historical knowledge (25-54% relative), and inconsistent year-to-year predictions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02669","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"An explainable hypothesis-driven approach to Drug-Induced Liver Injury with HADES","primary_cat":"cs.AI","submitted_at":"2026-05-04T14:50:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HADES is an agentic AI system that generates mechanistic hypotheses for drug-induced liver injury using molecular, metabolite, and pathway evidence, outperforming prior binary classifiers on the new DILER benchmark while establishing a baseline for hypothesis alignment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27725","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AgentEconomist: An End-to-end Agentic System Translating Economic Intuitions into Executable Computational Experiments","primary_cat":"cs.HC","submitted_at":"2026-04-30T11:17:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AgentEconomist is an end-to-end agentic system with idea development, experimental design, and execution stages that uses a large economics paper database to produce research ideas with better literature grounding, novelty, and insight than generic LLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23938","ref_index":3,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TSAssistant: A Human-in-the-Loop Agentic Framework for Automated Target Safety Assessment","primary_cat":"cs.CL","submitted_at":"2026-04-27T01:22:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"TSAssistant decomposes target safety assessment report generation into research and synthesis subagents with tool-based evidence retrieval, hierarchical instructions, and interactive human refinement, reporting high reproducibility and grounding.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23674","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Vibe Medicine: Redefining Biomedical Research Through Human-AI Co-Work","primary_cat":"cs.AI","submitted_at":"2026-04-26T12:27:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Vibe Medicine proposes directing AI agents via natural language for end-to-end biomedical workflows using LLMs, agent frameworks, and a curated collection of over 1,000 medical skills.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"DrugBank remain central because they provide comple- mentary representations of drug knowledge. ChEMBL was designed as a large-scale open bioactivity resource contain- ing binding, functional, and ADMET (absorption, distribu- tion, metabolism, excretion, and toxicity) related data for drug-like molecules and their targets, supporting structure- activity analysis and predictive modeling [30]. DrugBank, in turn, integrates information on drugs, mechanisms, tar- gets, and interactions, and has expanded substantially to include investigational compounds, drug-drug interactions, and repurposing-relevant data [104]. An agent capable of orchestratingbothresourcesisthereforewellsuitedfortasks such as target-to-compound mapping, mechanism-of-action"},{"citing_arxiv_id":"2604.22571","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LARA: Validation-Driven Agentic Supercomputer Workflows for Atomistic Modeling","primary_cat":"physics.comp-ph","submitted_at":"2026-04-24T14:03:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LARA-HPC introduces a validation-first agentic system with dry-run verification and multi-phase refinement that improves robustness of AI-generated DFT workflows on HPC systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08491","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery","primary_cat":"cs.HC","submitted_at":"2026-04-09T17:30:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08224","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering","primary_cat":"cs.SE","submitted_at":"2026-04-09T13:19:41+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03964","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources","primary_cat":"cs.AI","submitted_at":"2026-04-05T05:02:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SkillFoundry mines heterogeneous scientific resources into a self-evolving library of validated agent skills, with 71.1% novelty versus prior libraries and measurable gains on coding benchmarks plus two genomics tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03361","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The limits of bio-molecular modeling with large language models : a cross-scale evaluation","primary_cat":"cs.LG","submitted_at":"2026-04-03T17:38:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLMs perform adequately on bio-molecular classification tasks but remain weak on regression, with hybrid architectures outperforming others on long sequences and fine-tuning hurting generalization.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"imbalance and ensure a representative distribution across the bio-chemical space. 2.1.2 Integration of Computational Tools Current artificial intelligence models, particularly LLMs, possess varying degrees of capability in automatically invoking tools and parsing function parameters. To further enhance the eval- uation capability of the benchmark for models, we integrated a suite of domain-specific tool interfaces[43]. This integration allows for an agentic workflow where models can invoke specific predictive functions, such assolubility lipophilicity hydration()andBBB penetrance(), to derive ADMETAI (Absorption, Distribution, Metabolism, Excretion, Toxicity, and AI-predicted) properties. 2.1.3 Evaluation Framework and Metrics The final stage of the benchmark involves a standardized evaluation pipeline where formatted"},{"citing_arxiv_id":"2512.15567","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evaluating Large Language Models in Scientific Discovery","primary_cat":"cs.AI","submitted_at":"2025-12-17T16:20:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"The SDE benchmark shows LLMs lag on scientific discovery tasks relative to general science tests, with diminishing scaling returns and shared weaknesses across models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}