arxiv: 2604.15706 · v1 · submitted 2026-04-17 · 💻 cs.CL

Recognition: unknown

Target-Oriented Pretraining Data Selection via Neuron-Activated Graph

Zijun Wang , Haoqin Tu , Weidong Zhou , Yiyang Zhou , Xiaohuan Zhou , Bingni Zhang , Weiguo Feng , Taifeng Wang

show 2 more authors

Cihang Xie Fengze Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:44 UTC · model grok-4.3

classification 💻 cs.CL

keywords pretraining data selectionneuron activated graphtarget oriented learninglanguage model pretrainingdata rankinginterpretability in LLMssparse neuron selection

0 comments

The pith

Neuron-activated graphs select pretraining data that raises target-task accuracy by 4.9 percent on average.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a training-free method to choose pretraining data matched to specific target tasks by building a graph from the most influential neurons in an existing language model. This Neuron-Activated Graph captures key features of the target inputs without relying on opaque representations. Ranking candidate data by how closely it matches the graph leads to better model performance after pretraining compared to random selection or other selection techniques. The gains appear across six benchmarks and persist when handling multiple targets at once.

Core claim

Ranking pretraining data according to its similarity to target examples, measured in a Neuron-Activated Graph constructed from a sparse set of high-impact neurons across layers of an off-the-shelf LLM, produces superior target-oriented pretraining outcomes, yielding an average 4.9% improvement over random sampling and a 5.3% accuracy gain on HellaSwag over prior methods.

What carries the argument

Neuron-Activated Graph (NAG): a compact graph formed by selecting and connecting the most influential neurons for target inputs, which serves as the basis for computing similarity scores to rank pretraining candidates.

If this is right

Target-specific pretraining becomes feasible without training additional models for data selection.
Performance on downstream tasks such as commonsense reasoning improves measurably when data is chosen this way.
The approach extends to settings with multiple simultaneous targets without losing effectiveness.
Only a tiny fraction of neurons (around 0.12%) needs activation to represent the essential target features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the sparse neuron set truly forms a functional backbone, similar graphs could help diagnose what knowledge a model has acquired during pretraining.
Data selection based on neuron patterns might generalize beyond language to other domains where models have identifiable internal structures.
Using off-the-shelf models for this purpose implies that the method can leverage existing large models without extra compute for characterization.

Load-bearing premise

A sparse set of high-impact neurons identified in any off-the-shelf LLM sufficiently characterizes target inputs so that NAG similarity reliably identifies useful pretraining data for improving downstream performance.

What would settle it

If randomly sampled pretraining data leads to equal or better downstream performance than NAG-selected data on the same target tasks, or if deactivating the NAG neurons causes no notable drop in model capability.

Figures

Figures reproduced from arXiv: 2604.15706 by Bingni Zhang, Cihang Xie, Fengze Liu, Haoqin Tu, Taifeng Wang, Weidong Zhou, Weiguo Feng, Xiaohuan Zhou, Yiyang Zhou, Zijun Wang.

**Figure 1.** Figure 1: General quality-based data selection is often misaligned with specific downstream capabilities (left), while prior targetoriented methods rely on shallow similarity to target examples (middle left). Our NAG instead aligns pretraining data with target tasks by selecting inputs that activate similar neurons in the LLM, capturing the underlying capability required for the target (middle right), even across d… view at source ↗

**Figure 2.** Figure 2: Overview of Neuron-Activated Graph (NAG) target-oriented data selection. Given a small set of target examples Dtarget, we first characterize each input by its neuron-level NAG features. For a given input, we quantify the impact of individual neurons and select top-K of them per layer to construct a compact NAG. NAGs from the target examples are aggregated into a target neuron-activation profile. Each candi… view at source ↗

**Figure 3.** Figure 3: Task-level clustering of data instances based on NAG representations. The resulting clusters align closely with task identities, indicating NAG encodes task-discriminative representations. 4.1.1. NAG CAPTURES CRITICAL NEURONS NAG characterizes each input using a sparse set of highimpact neurons in LLMs. To justify this design choice, we evaluate whether the neurons selected by NAG are indeed crucial to … view at source ↗

**Figure 4.** Figure 4: Performance under varying filtering rates rf for data selected by different ranking methods. Results are reported in the Single-Target setting with HellaSwag as the target; NAG is constructed from Qwen3-1.7B-Base using the default configuration. NAG consistently improves performance as lower-ranked samples are removed, indicating strong alignment between its induced ranking and downstream task utility. t… view at source ↗

**Figure 6.** Figure 6: Effect of neuron sparsity (layerwise neuron ratio rk) on NAG construction. NAGs are extracted from the Qwen3-Base family (1.7B, 4B, and 8B). Performance consistently peaks at rk = 0.3% across model scales. As rk → 1, the NAG-based ranking theoretically collapses toward random selection. as rk grows and reaches its maximum at rk ≈ 0.3% across different model scales. Further increasing rk yields only little … view at source ↗

**Figure 7.** Figure 7: Compute efficiency of NAG-based data selection. We report Compute multipliers (CM) of [ NAG / baseline data selection methods ] across six benchmarks, where a higher CM indicates that less compute of NAG is required to reach the same accuracy. Results are shown for (a) six benchmarks individually and (b) averaged across benchmarks. and can be reused for any number of target tasks. In contrast, BETR require… view at source ↗

**Figure 8.** Figure 8: Task-level clustering of data instances based on NAG representations across different NAG widths. Corresponding quantitative measurement is shown in Tab. 11 [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

read the original abstract

Everyday tasks come with a target, and pretraining models around this target is what turns them into experts. In this paper, we study target-oriented language model (LM) pretraining by introducing Neuron-Activated Graph Ranking (NAG-based Ranking), a training-free and interpretable framework for target pretraining data selection. Rather than using black-box representations, our approach directly characterizes each target input by a sparse set of high-impact neurons in any off-the-shelf LLMs. Concretely, we quantify neuron impact and select the most influential neurons across layers into a compact Neuron-Activated Graph (NAG), and rank candidate data by NAG similarity to target examples. We conduct experiments across six benchmarks, where our NAG-based Ranking improves target-oriented pretraining by 4.9% on average over random sampling, and also outperforms state-of-the-art baselines by 5.3% accuracy on HellaSwag. It also remains effective under a more applicable multi-target setting, where our best setup surpasses two baselines by 1.1% and 4.1%, respectively. Furthermore, we provide a comprehensive analysis on why and how our NAG works, e.g., deactivating NAG-selected neurons (only 0.12% of all) causes a 23.5% performance collapse, and restricting NAG to the final layer incurs a 4.1% average drop, indicating that NAG captures a sparse "functional backbone" for learning target features. We release the code at https://github.com/asillycat/NAG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NAG data selection beats random and some baselines on benchmarks but the gains are not shown to come from neuron overlap rather than other data similarities.

read the letter

The central claim is that ranking pretraining data by similarity in a sparse neuron-activated graph from an off-the-shelf LLM improves target-oriented training. They report 4.9% average lift over random sampling across six benchmarks and 5.3% on HellaSwag versus prior methods, with the approach staying useful in a multi-target case. The deactivation test is the strongest part of the evidence: zeroing out 0.12% of neurons collapses target performance by 23.5%, and limiting the graph to the final layer costs 4.1%. That shows the selected neurons matter for the model's existing behavior on the target inputs. Releasing the code helps too. What is new is the explicit construction of these compact graphs from neuron impact scores across layers and using them directly for ranking instead of black-box embeddings or simpler heuristics. The paper does a reasonable job documenting the procedure and running the main comparisons. The soft spot is the missing link between neuron overlap and actual pretraining benefit. The deactivation results are all on the frozen model; they do not test whether data chosen this way causes the model to learn or strengthen the corresponding features during training. Without an ablation that matches the selected data on length, domain, or lexical overlap but drops the neuron criterion, the accuracy numbers could reflect any correlated property of the data rather than the NAG mechanism itself. The analyses stay at the level of explaining the frozen model's current activations. This work is aimed at researchers doing data curation for specialized language model pretraining. It has enough concrete experiments and a clear practical angle to merit peer review, though the causal interpretation would need tighter controls in revision. I would bring it to a reading group to talk through the ablation gap.

Referee Report

3 major / 2 minor

Summary. The paper introduces Neuron-Activated Graph (NAG) Ranking, a training-free method for target-oriented pretraining data selection. It identifies sparse high-impact neurons (0.12% of total) in off-the-shelf LLMs to build a compact NAG for each target input, then ranks candidate pretraining examples by NAG similarity. Experiments across six benchmarks report 4.9% average gains over random sampling and 5.3% accuracy improvement on HellaSwag versus state-of-the-art baselines, with further gains in a multi-target setting. Supporting analyses show that deactivating NAG neurons causes 23.5% performance collapse on targets and restricting NAG to the final layer drops performance by 4.1%.

Significance. If the central claims hold after addressing the gaps below, the work provides an interpretable, parameter-free alternative to black-box data selection for domain adaptation of LLMs. The neuron-level characterization and deactivation results add to mechanistic interpretability by highlighting a sparse functional backbone. The multi-target results and code release further increase potential impact for efficient, targeted pretraining.

major comments (3)

[Experiments and Analysis] The performance claims (4.9% over random, 5.3% on HellaSwag) rest on end-to-end accuracy gains, yet no ablation compares NAG ranking against a non-neuron baseline that matches the selected data on other axes such as length, domain distribution, or lexical overlap. Without this control, the gains could arise from incidental correlations rather than neuron overlap causing better feature learning during pretraining.
[Analysis] The neuron deactivation (23.5% collapse) and layer-restriction (4.1% drop) experiments demonstrate that the identified neurons are critical for the frozen model's current inference on target tasks. However, this does not establish that pretraining on NAG-similar data will cause the model to acquire or strengthen the corresponding capabilities, as opposed to other properties of the selected examples.
[Experiments] The experimental results lack reported details on the number of runs, standard deviations or error bars, exact baseline implementations, and the precise criteria for selecting the sparse neurons (e.g., impact quantification thresholds). These omissions prevent verification of statistical reliability and reproducibility of the reported improvements.

minor comments (2)

[Abstract] The abstract refers to 'six benchmarks' without naming them; listing the specific tasks (e.g., HellaSwag and the others) in the abstract or early introduction would improve clarity.
[Method] Notation for neuron impact quantification and NAG construction could be formalized with equations in the method section to make the training-free procedure fully reproducible from the text alone.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their positive summary and recommendation for major revision. We have prepared detailed responses to each major comment, committing to revisions that address the concerns about ablations, interpretation of analyses, and missing experimental details. These changes will strengthen the manuscript's claims and reproducibility.

read point-by-point responses

Referee: [Experiments and Analysis] The performance claims (4.9% over random, 5.3% on HellaSwag) rest on end-to-end accuracy gains, yet no ablation compares NAG ranking against a non-neuron baseline that matches the selected data on other axes such as length, domain distribution, or lexical overlap. Without this control, the gains could arise from incidental correlations rather than neuron overlap causing better feature learning during pretraining.

Authors: We agree that such an ablation would provide stronger evidence that the improvements stem specifically from neuron similarity rather than other properties of the selected data. Our current comparisons are against random sampling and prior state-of-the-art data selection methods, which may implicitly capture some domain or lexical aspects. To rigorously address this concern, we will add a new ablation experiment in the revised paper. This will involve constructing a baseline that selects pretraining data to match the NAG-selected set on length and lexical overlap (using metrics like Jaccard similarity), and demonstrate that NAG ranking still yields superior target performance. This revision will help isolate the contribution of the Neuron-Activated Graph. revision: yes
Referee: [Analysis] The neuron deactivation (23.5% collapse) and layer-restriction (4.1% drop) experiments demonstrate that the identified neurons are critical for the frozen model's current inference on target tasks. However, this does not establish that pretraining on NAG-similar data will cause the model to acquire or strengthen the corresponding capabilities, as opposed to other properties of the selected examples.

Authors: This is a valid observation. The deactivation and layer analyses are intended to characterize the functional importance of the selected neurons in the off-the-shelf model, supporting the interpretability of NAG as capturing a sparse backbone relevant to the target. While we do not claim direct causation from these experiments alone, the end-to-end pretraining results show consistent gains, suggesting that data selected via NAG similarity aids in learning target-relevant features. We will revise the discussion in Section 4 to more clearly frame these analyses as providing mechanistic insight and motivation for the selection method, rather than definitive proof of capability acquisition during pretraining. We believe this clarification will address the concern without overstating the results. revision: partial
Referee: [Experiments] The experimental results lack reported details on the number of runs, standard deviations or error bars, exact baseline implementations, and the precise criteria for selecting the sparse neurons (e.g., impact quantification thresholds). These omissions prevent verification of statistical reliability and reproducibility of the reported improvements.

Authors: We appreciate this feedback on reproducibility. In the revised manuscript, we will report: the number of runs with standard deviations and error bars; exact details on how baselines were implemented; and the precise criteria and thresholds used for neuron selection in the NAG construction. These additions will allow full verification and reproduction of our results. revision: yes

Circularity Check

0 steps flagged

No significant circularity: NAG ranking is a procedural similarity metric evaluated on external benchmarks

full rationale

The paper defines NAG-based Ranking as a training-free procedure that extracts sparse high-impact neurons from an off-the-shelf LLM, assembles them into a graph, and ranks candidate pretraining examples by graph similarity to target inputs. All reported gains (4.9% average over random sampling, 5.3% on HellaSwag) are measured on held-out external benchmarks rather than being derived from the selection rule itself. Neuron-impact quantification, graph construction, and similarity computation are defined independently of the downstream accuracy metric; deactivation and layer-ablation analyses supply separate empirical checks. No equations reduce the final performance numbers to fitted parameters or self-referential definitions, and no load-bearing uniqueness theorems are imported via self-citation. The derivation chain therefore remains self-contained against external evaluation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Assessment is limited to the abstract; the ledger reflects the core assumptions and new constructs described there.

free parameters (1)

Neuron impact quantification and selection criteria
The abstract does not specify exact thresholds or formulas for identifying high-impact neurons or building the compact NAG, implying tunable choices.

axioms (1)

domain assumption High-impact neurons across layers of off-the-shelf LLMs can be identified and aggregated into a graph that captures target-relevant features
This underpins the entire NAG construction and similarity ranking step.

invented entities (1)

Neuron-Activated Graph (NAG) no independent evidence
purpose: Compact representation of target inputs via sparse influential neurons for data similarity ranking
New construct introduced to enable the training-free selection framework.

pith-pipeline@v0.9.0 · 5612 in / 1410 out tokens · 43732 ms · 2026-05-10T08:44:23.411413+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Let the Target Select for Itself: Data Selection via Target-Aligned Paths
cs.LG 2026-05 unverdicted novelty 6.0

Target-aligned data selection via normalized endpoint loss drop on a validation-induced reference path achieves competitive performance with reduced computational overhead.

Reference graph

Works this paper leans on

3 extracted references · cited by 1 Pith paper

[1]

High-Mean impact: neurons selected based on the average impact score across inputs
[2]

High-∆ impact: neurons selected based on the largest differences in mean impact scores between target examples and random inputs, computed over 10k HellaSwag samples and 10k random inputs during inference
[3]

expensive

Random: randomly select 28 neurons from all neurons, 1 neuron per layer. Deactivating High-∆ neurons causes a pronounced performance drop of 17.8%, while deactivating neurons selected by High-Mean impact or random sampling yields negligible degradation. This observation provides additional mechanistic insight into why NAG is effective. Neurons with High-∆...

2018