Title resolution pending

Geng, Xinyang, Liu, Hao , title =

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

method 1

citation-polarity summary

unclear 1

representative citing papers

MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

MIST is a new synthetic speech-based tool-calling dataset for IoT devices that exposes performance gaps between open- and closed-weight multimodal LLMs.

Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models

cs.AI · 2026-05-17 · conditional · novelty 6.0

A compact 25M chess move predictor exceeds larger fine-tuned models on puzzles, indicating memorization in earlier claims, while LLM-Modulo raises general LLM move accuracy from 1.2% to 21.2% and validity to 95.3%.

Synthetic Pre-Pre-Training Improves Language Model Robustness to Noisy Pre-Training Data

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Synthetic pre-pre-training on structured data improves LLM robustness to noisy pre-training, matching baseline loss with up to 49% fewer natural tokens for a 1B model.

Training Language Models to Self-Correct via Reinforcement Learning

cs.LG · 2024-09-19 · unverdicted · novelty 6.0

SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

cs.CL · 2023-09-07 · conditional · novelty 6.0

DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.

The Platonic Representation Hypothesis

cs.LG · 2024-05-13 · unverdicted · novelty 5.0

Representations learned by large AI models are converging toward a shared statistical model of reality.

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

cs.CL · 2024-01-11 · unverdicted · novelty 5.0

DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

cs.LG · 2023-04-13 · unverdicted · novelty 5.0

RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.

citing papers explorer

Showing 8 of 8 citing papers.

MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes cs.CL · 2026-05-07 · unverdicted · none · ref 1
MIST is a new synthetic speech-based tool-calling dataset for IoT devices that exposes performance gaps between open- and closed-weight multimodal LLMs.
Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models cs.AI · 2026-05-17 · conditional · none · ref 13
A compact 25M chess move predictor exceeds larger fine-tuned models on puzzles, indicating memorization in earlier claims, while LLM-Modulo raises general LLM move accuracy from 1.2% to 21.2% and validity to 95.3%.
Synthetic Pre-Pre-Training Improves Language Model Robustness to Noisy Pre-Training Data cs.CL · 2026-05-11 · unverdicted · none · ref 15
Synthetic pre-pre-training on structured data improves LLM robustness to noisy pre-training, matching baseline loss with up to 49% fewer natural tokens for a 1B model.
Training Language Models to Self-Correct via Reinforcement Learning cs.LG · 2024-09-19 · unverdicted · none · ref 153
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models cs.CL · 2023-09-07 · conditional · none · ref 92
DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.
The Platonic Representation Hypothesis cs.LG · 2024-05-13 · unverdicted · none · ref 46
Representations learned by large AI models are converging toward a shared statistical model of reality.
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models cs.CL · 2024-01-11 · unverdicted · none · ref 139
DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment cs.LG · 2023-04-13 · unverdicted · none · ref 128
RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer