hub

Democratizing ai scientists using tooluniverse

· 2025 · arXiv 2509.23426

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

Large Language Models Lack Temporal Awareness of Medical Knowledge

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

LLMs lack temporal awareness of medical knowledge, showing gradual performance decline on up-to-date facts, much lower accuracy on historical knowledge (25-54% relative), and inconsistent year-to-year predictions.

Evaluating Large Language Models in Scientific Discovery

cs.AI · 2025-12-17 · unverdicted · novelty 8.0

The SDE benchmark shows LLMs lag on scientific discovery tasks relative to general science tests, with diminishing scaling returns and shared weaknesses across models.

Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery

cs.HC · 2026-04-09 · unverdicted · novelty 7.0

LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.

SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

cs.AI · 2026-04-05 · unverdicted · novelty 7.0

SkillFoundry mines heterogeneous scientific resources into a self-evolving library of validated agent skills, with 71.1% novelty versus prior libraries and measurable gains on coding benchmarks plus two genomics tasks.

The limits of bio-molecular modeling with large language models : a cross-scale evaluation

cs.LG · 2026-04-03 · unverdicted · novelty 7.0

LLMs perform adequately on bio-molecular classification tasks but remain weak on regression, with hybrid architectures outperforming others on long sequences and fine-tuning hurting generalization.

An explainable hypothesis-driven approach to Drug-Induced Liver Injury with HADES

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

HADES is an agentic AI system that generates mechanistic hypotheses for drug-induced liver injury using molecular, metabolite, and pathway evidence, outperforming prior binary classifiers on the new DILER benchmark while establishing a baseline for hypothesis alignment.

AgentEconomist: An End-to-end Agentic System Translating Economic Intuitions into Executable Computational Experiments

cs.HC · 2026-04-30 · unverdicted · novelty 6.0

AgentEconomist is an end-to-end agentic system with idea development, experimental design, and execution stages that uses a large economics paper database to produce research ideas with better literature grounding, novelty, and insight than generic LLMs.

Harnessing AtomisticSkills for Agentic Atomistic Research

physics.chem-ph · 2026-05-18 · unverdicted · novelty 5.0

AtomisticSkills is a new harness framework with 100+ human-curated skills that lets general AI agents perform atomistic research tasks including simulations, screening, and analysis, shown on electrolyte design, CO2 capture, drug screening, and catalyst tasks.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

cs.SE · 2026-04-09 · accept · novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

AI for Auto-Research: Roadmap & User Guide

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

TSAssistant: A Human-in-the-Loop Agentic Framework for Automated Target Safety Assessment

cs.CL · 2026-04-27 · unverdicted · novelty 4.0 · 2 refs

TSAssistant decomposes target safety assessment report generation into research and synthesis subagents with tool-based evidence retrieval, hierarchical instructions, and interactive human refinement, reporting high reproducibility and grounding.

Vibe Medicine: Redefining Biomedical Research Through Human-AI Co-Work

cs.AI · 2026-04-26 · unverdicted · novelty 4.0

Vibe Medicine proposes directing AI agents via natural language for end-to-end biomedical workflows using LLMs, agent frameworks, and a curated collection of over 1,000 medical skills.

LARA: Validation-Driven Agentic Supercomputer Workflows for Atomistic Modeling

physics.comp-ph · 2026-04-24 · unverdicted · novelty 4.0

LARA-HPC introduces a validation-first agentic system with dry-run verification and multi-phase refinement that improves robustness of AI-generated DFT workflows on HPC systems.

Toward Exascale AI for Science: A Scalable AI Skill for Autonomous Microkinetics Discovery

cs.CE · 2026-06-27 · unverdicted · novelty 3.0

Introduces a scalable AI skill framework for autonomous microkinetics discovery that automates workflows and evaluates surrogate reliability.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 42
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

Democratizing ai scientists using tooluniverse

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer