Title resolution pending

Zichun Yu, Spandan Das, Chenyan Xiong · 2024 · arXiv 2406.06046

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.

GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization

cs.DB · 2026-04-09 · unverdicted · novelty 6.0

GRACE dynamically constructs and updates coresets for LLM training using representation diversity, gradient-based importance, and k-NN graph propagation to improve efficiency and performance.

An Empirical Study on Influence-Based Pretraining Data Selection for Code Large Language Models

cs.SE · 2026-04-09 · unverdicted · novelty 4.0

Data-influence-score filtering using validation-set loss on downstream coding tasks improves Code-LLM performance, with the most beneficial training data varying significantly across different programming tasks.

Efficient Dataset Selection for Continual Adaptation of Generative Recommenders

cs.IR · 2026-04-09 · unverdicted · novelty 4.0

Gradient-based representations paired with distribution-matching enable efficient curation of small data subsets that improve performance and training efficiency for continually adapting generative recommenders while maintaining robustness to distributional drift.

citing papers explorer

Showing 4 of 4 citing papers.

Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation cs.LG · 2026-04-17 · unverdicted · none · ref 57
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.
GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization cs.DB · 2026-04-09 · unverdicted · none · ref 80
GRACE dynamically constructs and updates coresets for LLM training using representation diversity, gradient-based importance, and k-NN graph propagation to improve efficiency and performance.
An Empirical Study on Influence-Based Pretraining Data Selection for Code Large Language Models cs.SE · 2026-04-09 · unverdicted · none · ref 44
Data-influence-score filtering using validation-set loss on downstream coding tasks improves Code-LLM performance, with the most beneficial training data varying significantly across different programming tasks.
Efficient Dataset Selection for Continual Adaptation of Generative Recommenders cs.IR · 2026-04-09 · unverdicted · none · ref 7
Gradient-based representations paired with distribution-matching enable efficient curation of small data subsets that improve performance and training efficiency for continually adapting generative recommenders while maintaining robustness to distributional drift.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer