A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
Foundational Autoraters: Taming Large Language Models for better automatic evaluation
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
EvoSkill evolves agent skills via failure analysis and Pareto frontier selection, raising exact-match accuracy 7.3% on OfficeQA and 12.1% on SealQA with 5.3% zero-shot transfer to BrowseComp.
CASE is a novel agentic AI system that proactively interviews scam victims using LLMs to collect detailed intelligence, which is then structured for use in scam prevention, resulting in a 21% increase in enforcements on Google Pay India.
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
citing papers explorer
-
Towards an AI co-scientist
A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
-
EvoSkill: Automated Skill Discovery for Multi-Agent Systems
EvoSkill evolves agent skills via failure analysis and Pareto frontier selection, raising exact-match accuracy 7.3% on OfficeQA and 12.1% on SealQA with 5.3% zero-shot transfer to BrowseComp.
-
CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments
CASE is a novel agentic AI system that proactively interviews scam victims using LLMs to collect detailed intelligence, which is then structured for use in scam prevention, resulting in a 21% increase in enforcements on Google Pay India.
-
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.