NatureBench evaluates ten frontier AI coding agents on 90 tasks from Nature papers under web-search-disabled conditions and finds the strongest agent surpasses published SOTA on only 17.8% of tasks, succeeding mainly by translating problems into familiar supervised learning setups.
Nature Methods , year =
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
LOGICA adds context to pretrained biological LMs via logit-space contrastive alignment with gated adapters, improving AUC on held-out drug-resistance mutation ranking from ~0.55 to ~0.65 while preserving token likelihoods.
Transformers on impossible-language variants show gradual grammatical sensitivity loss but sharp long-sentence generation failures, supporting generative deficiency as a link to non-attestation.
R3LM trains LLMs via two-stage reasoning-then-regression on a new dataset CRE-ReasonBench with mechanistic traces, achieving SOTA enhancer activity prediction across three cell types with interpretable outputs.
Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.
citing papers explorer
-
NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?
NatureBench evaluates ten frontier AI coding agents on 90 tasks from Nature papers under web-search-disabled conditions and finds the strongest agent surpasses published SOTA on only 17.8% of tasks, succeeding mainly by translating problems into familiar supervised learning setups.
-
Contextualizing Biological Language Models across Modalities via Logit-Space Contrastive Alignment
LOGICA adds context to pretrained biological LMs via logit-space contrastive alignment with gated adapters, improving AUC on held-out drug-resistance mutation ranking from ~0.55 to ~0.65 while preserving token likelihoods.
-
When transformers learn "impossible" languages, what do they learn?
Transformers on impossible-language variants show gradual grammatical sensitivity loss but sharp long-sentence generation failures, supporting generative deficiency as a link to non-attestation.
-
Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction
R3LM trains LLMs via two-stage reasoning-then-regression on a new dataset CRE-ReasonBench with mechanistic traces, achieving SOTA enhancer activity prediction across three cell types with interpretable outputs.
-
Sessa: Selective State Space Attention
Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.