CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.
hub
Measuring mathematical problem solving with the math dataset
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
dataset 3representative citing papers
MAS-Algorithm is a multi-agent workflow that improves AI acceptance rates on algorithmic problems by 6.48% on average, outperforming parameter-efficient fine-tuning.
Prolonged RL training with KL control and reference policy resetting enables LLMs to develop novel reasoning strategies inaccessible to base models even under extensive sampling.
Repeated sampling scales problem coverage log-linearly with sample count, improving SWE-bench Lite performance from 15.9% to 56% using 250 samples.
MindLoom synthesizes frontier-level reasoning data by decomposing solutions into thought mode chains, training a retrieval model for mode selection, composing new problems with distribution-aligned sampling, and applying rollout-based difficulty labeling for fine-tuning.
HAPO is a new token-level policy optimization method for LLMs that continuously adapts four optimization stages using entropy, claiming consistent gains over DAPO on math, code, and logic tasks.
STELLA aligns ESM3 bimodal sequence-structure encodings with Llama-3.1-8B text modeling to claim state-of-the-art results on protein functional description prediction and enzyme-catalyzed reaction prediction.
Reasoning before answering MCQs increases LLM confidence more for incorrect answers and degrades calibration on a 57-subject benchmark across seven models.
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.
citing papers explorer
-
STELLA: A Multimodal LLM for Protein Functional Annotation via Unified Sequence-Structure Encoding
STELLA aligns ESM3 bimodal sequence-structure encodings with Llama-3.1-8B text modeling to claim state-of-the-art results on protein functional description prediction and enzyme-catalyzed reaction prediction.