Table G:Case study combining model-generated AOP reasoning (highlighted for alignment) and LLM-as-a-Judge evaluation (no highlighting)

Hepatic steatosis is directly linked to liver toxicity

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway

q-bio.QM · 2026-04-07 · unverdicted · novelty 7.0

ToxReason is an AOP-grounded benchmark that evaluates LLMs on mechanistic organ-level toxicity reasoning from molecular initiating events to adverse outcomes, showing that high predictive accuracy does not guarantee faithful biological explanations.

citing papers explorer

Showing 1 of 1 citing paper.

ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway q-bio.QM · 2026-04-07 · unverdicted · none · ref 27
ToxReason is an AOP-grounded benchmark that evaluates LLMs on mechanistic organ-level toxicity reasoning from molecular initiating events to adverse outcomes, showing that high predictive accuracy does not guarantee faithful biological explanations.

Table G:Case study combining model-generated AOP reasoning (highlighted for alignment) and LLM-as-a-Judge evaluation (no highlighting)

fields

years

verdicts

representative citing papers

citing papers explorer