arXiv preprint arXiv:2410.15226 , year=

Hao Chen, Abdul Waheed, Xiang Li, Yidong Wang, Jindong Wang, Bhiksha Raj, Marah I Abdin · 2024 · arXiv 2410.15226

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

cs.SE · 2026-05-12 · unverdicted · novelty 7.0

StepCodeReasoner aligns code reasoning with verifiable stepwise execution traces via print anchors and bi-level GRPO reinforcement learning, reaching SOTA results on CRUXEval (91.1%) and LiveCodeBench (86.5%) for a 7B model.

Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation

cs.CL · 2026-04-13 · unverdicted · novelty 7.0

Gemma 3 27B and Aya Expanse 32B are effective multilingual teachers; data qualities such as diversity and fluency explain over 93% of variance in student performance rather than model size.

Synthetic Eggs in Many Baskets: The Impact of Synthetic Data Diversity on LLM Fine-Tuning

cs.CL · 2025-11-03 · unverdicted · novelty 4.0

Fine-tuning LLMs on multi-source synthetic data mitigates distribution collapse and self-preference bias while increasing output quality relative to single-source or human-only fine-tuning.

LLM Harms: A Taxonomy and Discussion

cs.CY · 2025-12-05 · unverdicted · novelty 3.0

This paper proposes a taxonomy of LLM harms in five categories and suggests mitigation strategies plus a dynamic auditing system for responsible development.

Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation

cs.CL · 2025-04-02 · unverdicted · novelty 3.0

A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.

Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices

cs.DC · 2025-03-11 · unverdicted · novelty 2.0

Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.

citing papers explorer

Showing 6 of 6 citing papers.

StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning cs.SE · 2026-05-12 · unverdicted · none · ref 16
StepCodeReasoner aligns code reasoning with verifiable stepwise execution traces via print anchors and bi-level GRPO reinforcement learning, reaching SOTA results on CRUXEval (91.1%) and LiveCodeBench (86.5%) for a 7B model.
Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation cs.CL · 2026-04-13 · unverdicted · none · ref 1
Gemma 3 27B and Aya Expanse 32B are effective multilingual teachers; data qualities such as diversity and fluency explain over 93% of variance in student performance rather than model size.
Synthetic Eggs in Many Baskets: The Impact of Synthetic Data Diversity on LLM Fine-Tuning cs.CL · 2025-11-03 · unverdicted · none · ref 2
Fine-tuning LLMs on multi-source synthetic data mitigates distribution collapse and self-preference bias while increasing output quality relative to single-source or human-only fine-tuning.
LLM Harms: A Taxonomy and Discussion cs.CY · 2025-12-05 · unverdicted · none · ref 46
This paper proposes a taxonomy of LLM harms in five categories and suggests mitigation strategies plus a dynamic auditing system for responsible development.
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation cs.CL · 2025-04-02 · unverdicted · none · ref 49
A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.
Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices cs.DC · 2025-03-11 · unverdicted · none · ref 19
Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.

arXiv preprint arXiv:2410.15226 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer