Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
IGSR pairs LLM term generation with marginal influence scoring inside MCTS to discover symbolic equations, reporting gains on benchmarks and a novel DNA-methylation / RNA-Pol-II-pausing link in genomic data that wet-lab work later supported.
citing papers explorer
-
Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior
Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
-
Influence-Guided Symbolic Regression: Scientific Discovery via LLM-Driven Equation Search with Granular Feedback
IGSR pairs LLM term generation with marginal influence scoring inside MCTS to discover symbolic equations, reporting gains on benchmarks and a novel DNA-methylation / RNA-Pol-II-pausing link in genomic data that wet-lab work later supported.