pith. sign in

Foundational Autoraters: Taming Large Language Models for better automatic evaluation

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

fields

cs.AI 3 cs.CL 1

roles

background 2

polarities

background 2

representative citing papers

Towards an AI co-scientist

cs.AI · 2025-02-26 · unverdicted · novelty 6.0

A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.

EvoSkill: Automated Skill Discovery for Multi-Agent Systems

cs.AI · 2026-03-03 · unverdicted · novelty 5.0

EvoSkill evolves agent skills via failure analysis and Pareto frontier selection, raising exact-match accuracy 7.3% on OfficeQA and 12.1% on SealQA with 5.3% zero-shot transfer to BrowseComp.

citing papers explorer

Showing 4 of 4 citing papers.

  • Towards an AI co-scientist cs.AI · 2025-02-26 · unverdicted · none · ref 29

    A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.

  • EvoSkill: Automated Skill Discovery for Multi-Agent Systems cs.AI · 2026-03-03 · unverdicted · none · ref 12

    EvoSkill evolves agent skills via failure analysis and Pareto frontier selection, raising exact-match accuracy 7.3% on OfficeQA and 12.1% on SealQA with 5.3% zero-shot transfer to BrowseComp.

  • CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments cs.AI · 2025-08-27 · unverdicted · none · ref 12

    CASE is a novel agentic AI system that proactively interviews scam victims using LLMs to collect detailed intelligence, which is then structured for use in scam prevention, resulting in a 21% increase in enforcements on Google Pay India.

  • LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 234

    A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.