Chinese simpleqa: A chinese factuality evaluation for large language models, 2024a

Skywork open reasoner series · 2025 · arXiv 2411.07140

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

cs.CL · 2025-11-04 · unverdicted · novelty 7.0

MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.

Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference

cs.DC · 2025-10-07 · conditional · novelty 6.0

Comprehensive profiling of expert selection in frontier MoE models reveals temporal and spatial patterns that enable 6.6x speedup on wafer-scale GPUs and 1.25x on existing systems via targeted optimizations.

LIMO: Less is More for Reasoning

cs.CL · 2025-02-05 · unverdicted · novelty 6.0

LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.

Kimi K2: Open Agentic Intelligence

cs.LG · 2025-07-28 · unverdicted · novelty 5.0

Kimi K2 is a 1-trillion-parameter MoE model that leads open-source non-thinking models on agentic benchmarks including 65.8 on SWE-Bench Verified and 66.1 on Tau2-Bench.

KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality

cs.AI · 2025-06-24 · unverdicted · novelty 5.0

KnowRL integrates a knowledge-verification factuality reward into RL training to enforce fact-based reasoning steps and lower hallucination rates in LLMs.

citing papers explorer

Showing 5 of 5 citing papers.

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning cs.CL · 2025-11-04 · unverdicted · none · ref 7
MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.
Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference cs.DC · 2025-10-07 · conditional · none · ref 22
Comprehensive profiling of expert selection in frontier MoE models reveals temporal and spatial patterns that enable 6.6x speedup on wafer-scale GPUs and 1.25x on existing systems via targeted optimizations.
LIMO: Less is More for Reasoning cs.CL · 2025-02-05 · unverdicted · none · ref 99
LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.
Kimi K2: Open Agentic Intelligence cs.LG · 2025-07-28 · unverdicted · none · ref 23
Kimi K2 is a 1-trillion-parameter MoE model that leads open-source non-thinking models on agentic benchmarks including 65.8 on SWE-Bench Verified and 66.1 on Tau2-Bench.
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality cs.AI · 2025-06-24 · unverdicted · none · ref 3
KnowRL integrates a knowledge-verification factuality reward into RL training to enforce fact-based reasoning steps and lower hallucination rates in LLMs.

Chinese simpleqa: A chinese factuality evaluation for large language models, 2024a

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer