Data-juicer: A one-stop data processing system for large language models

Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, et al · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning

cs.LG · 2026-05-13 · conditional · novelty 7.0

AutoSelection discovers data recipes from a 90K instruction pool that outperform full-data training and other selectors on reasoning tasks for SFT across multiple models.

Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

GLoRA replaces raw factor averaging with gauge-aware aggregation in a consensus subspace estimated from client projectors, enabling consistent low-rank federated LoRA under heterogeneity.

citing papers explorer

Showing 2 of 2 citing papers.

From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning cs.LG · 2026-05-13 · conditional · none · ref 3
AutoSelection discovers data recipes from a 90K instruction pool that outperform full-data training and other selectors on reasoning tasks for SFT across multiple models.
Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA cs.LG · 2026-05-07 · unverdicted · none · ref 3
GLoRA replaces raw factor averaging with gauge-aware aggregation in a consensus subspace estimated from client projectors, enabling consistent low-rank federated LoRA under heterogeneity.

Data-juicer: A one-stop data processing system for large language models

fields

years

verdicts

representative citing papers

citing papers explorer