Foundational Autoraters: Taming Large Language Models for better automatic evaluation

· 2024 · arXiv 2407.10817

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Towards an AI co-scientist

cs.AI · 2025-02-26 · unverdicted · novelty 6.0

A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.

EvoSkill: Automated Skill Discovery for Multi-Agent Systems

cs.AI · 2026-03-03 · unverdicted · novelty 5.0

EvoSkill evolves agent skills via failure analysis and Pareto frontier selection, raising exact-match accuracy 7.3% on OfficeQA and 12.1% on SealQA with 5.3% zero-shot transfer to BrowseComp.

CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments

cs.AI · 2025-08-27 · unverdicted · novelty 5.0

CASE is a novel agentic AI system that proactively interviews scam victims using LLMs to collect detailed intelligence, which is then structured for use in scam prevention, resulting in a 21% increase in enforcements on Google Pay India.

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

cs.CL · 2024-12-07 · accept · novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

citing papers explorer

Showing 4 of 4 citing papers.

Towards an AI co-scientist cs.AI · 2025-02-26 · unverdicted · none · ref 29
A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
EvoSkill: Automated Skill Discovery for Multi-Agent Systems cs.AI · 2026-03-03 · unverdicted · none · ref 12
EvoSkill evolves agent skills via failure analysis and Pareto frontier selection, raising exact-match accuracy 7.3% on OfficeQA and 12.1% on SealQA with 5.3% zero-shot transfer to BrowseComp.
CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments cs.AI · 2025-08-27 · unverdicted · none · ref 12
CASE is a novel agentic AI system that proactively interviews scam victims using LLMs to collect detailed intelligence, which is then structured for use in scam prevention, resulting in a 21% increase in enforcements on Google Pay India.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 234
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

Foundational Autoraters: Taming Large Language Models for better automatic evaluation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer