pith. sign in

Prescience: A benchmark for forecasting scientific contributions

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.AI 2 cs.CL 1

years

2026 3

verdicts

UNVERDICTED 3

clear filters

representative citing papers

GIANTS: Generative Insight Anticipation from Scientific Literature

cs.CL · 2026-04-10 · unverdicted · novelty 8.0

GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.

Forecasting Scientific Progress with Artificial Intelligence

cs.AI · 2026-05-21 · unverdicted · novelty 7.0

Introduces the CUSP benchmark across 4760 events and finds frontier AI models can pick plausible directions but fail to predict whether or when scientific advances will occur, with performance varying by domain and insensitive to training cutoffs.

citing papers explorer

Showing 3 of 3 citing papers after filters.

  • GIANTS: Generative Insight Anticipation from Scientific Literature cs.CL · 2026-04-10 · unverdicted · none · ref 1 · internal anchor

    GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.

  • Forecasting Scientific Progress with Artificial Intelligence cs.AI · 2026-05-21 · unverdicted · none · ref 19 · internal anchor

    Introduces the CUSP benchmark across 4760 events and finds frontier AI models can pick plausible directions but fail to predict whether or when scientific advances will occur, with performance varying by domain and insensitive to training cutoffs.

  • ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment cs.AI · 2026-05-30 · unverdicted · none · ref 8 · internal anchor

    ForeSci is a temporally controlled benchmark with 500 tasks for assessing LLM agents on forward-looking AI research judgments in four domains using cutoff-aligned knowledge bases.