PDEAgent-Bench is the first multi-metric, multi-library benchmark for AI-generated PDE solvers, evaluating executability, numerical accuracy, and efficiency across DOLFINx, Firedrake, and deal.II.
hub
Towards scientific intelligence: A survey of llm-based scientific agents
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
ReplicatorBench evaluates LLM agents on replicating social and behavioral science claims across retrieval, computation, and interpretation stages, finding strength in experiment execution but weakness in resource retrieval.
Luminol-AIDetect detects machine-generated text zero-shot by extracting perplexity-based features from original and shuffled text versions, using density estimation and ensemble prediction to exploit greater structural fragility in AI output.
WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.
OpenAaaS is a hierarchical agent-as-a-service system that enables secure multi-agent collaboration for materials informatics by moving code to data rather than data to code.
Malicious actors could use AI agents to submit large numbers of fake papers, inflating the submission count and thereby raising the acceptance odds for a small set of chosen legitimate papers under stable conference acceptance rates.
Agentic architecture automates research-question-to-workflow translation via LLM intent extraction, deterministic generators, and Skills, raising intent accuracy from 44% to 83% and cutting data transfer by 92% in a 1000 Genomes evaluation.
DORA Explorer boosts LLM agent exploration without training by ranking diverse actions using log-probabilities and a tunable parameter, yielding UCB-competitive results on multi-armed bandits and gains on text adventure environments.
TurboAgent uses an LLM as coordinator for specialized agents to autonomously generate, predict, optimize, and validate turbomachinery designs, achieving R² > 0.91 agreement with CFD on a transonic compressor and 1.61% efficiency gains.
The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.
URSA is a modular agent ecosystem that uses LLMs and scientific tools to accelerate research tasks of varying complexity.
citing papers explorer
-
PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation
PDEAgent-Bench is the first multi-metric, multi-library benchmark for AI-generated PDE solvers, evaluating executability, numerical accuracy, and efficiency across DOLFINx, Firedrake, and deal.II.
-
ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences
ReplicatorBench evaluates LLM agents on replicating social and behavioral science claims across retrieval, computation, and interpretation stages, finding strength in experiment execution but weakness in resource retrieval.
-
Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling
Luminol-AIDetect detects machine-generated text zero-shot by extracting perplexity-based features from original and shuffled text versions, using density estimation and ensemble prediction to exploit greater structural fragility in AI output.
-
Weak-Link Optimization for Multi-Agent Reasoning and Collaboration
WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.
-
AlphaEvolve: A coding agent for scientific and algorithmic discovery
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.
-
OpenAaaS: An Open Agent-as-a-Service Framework for Distributed Materials-Informatics Research
OpenAaaS is a hierarchical agent-as-a-service system that enables secure multi-agent collaboration for materials informatics by moving code to data rather than data to code.
-
Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents
Malicious actors could use AI agents to submit large numbers of fake papers, inflating the submission count and thereby raising the acceptance odds for a small set of chosen legitimate papers under stable conference acceptance rates.
-
From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation
Agentic architecture automates research-question-to-workflow translation via LLM intent extraction, deterministic generators, and Skills, raising intent accuracy from 44% to 83% and cutting data transfer by 92% in a 1000 Genomes evaluation.
-
DORA Explorer: Improving the Exploration Ability of LLMs Without Training
DORA Explorer boosts LLM agent exploration without training by ranking diverse actions using log-probabilities and a tunable parameter, yielding UCB-competitive results on multi-armed bandits and gains on text adventure environments.
-
TurboAgent: An LLM-Driven Autonomous Multi-Agent Framework for Turbomachinery Aerodynamic Design
TurboAgent uses an LLM as coordinator for specialized agents to autonomously generate, predict, optimize, and validate turbomachinery designs, achieving R² > 0.91 agreement with CFD on a transonic compressor and 1.61% efficiency gains.
-
Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator
The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.
-
URSA: The Universal Research and Scientific Agent
URSA is a modular agent ecosystem that uses LLMs and scientific tools to accelerate research tasks of varying complexity.
- Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live