Knowledge Distillation of Noisy Force Labels for Improved Coarse-Grained Force Fields

Aleksandra Pachalieva; Emily Shinkle; Feranmi V. Olowookere; Nicholas Lubbers; Sakib Matin

arxiv: 2510.26650 · v2 · submitted 2025-10-30 · ⚛️ physics.chem-ph

Knowledge Distillation of Noisy Force Labels for Improved Coarse-Grained Force Fields

Feranmi V. Olowookere , Sakib Matin , Aleksandra Pachalieva , Nicholas Lubbers , Emily Shinkle This is my paper

Pith reviewed 2026-05-18 02:58 UTC · model grok-4.3

classification ⚛️ physics.chem-ph

keywords knowledge distillationcoarse-grained force fieldsneural network potentialsforce matchingdeep eutectic solventmolecular dynamicsmapped forces

0 comments

The pith

Training student coarse-grained models on ensemble teacher-predicted forces and per-bead energies yields more stable and accurate CG force fields than direct training on mapped forces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Coarse-grained models reduce computational cost by grouping atoms into beads but introduce mapping noise that makes force labels noisy and prevents direct fitting to all-atom energies. The paper trains a teacher neural network solely on these noisy mapped forces to produce denoised predictions, then distills both force and energy outputs from the teacher to train student models in single-model and ensemble setups. Validation on a deep eutectic solvent shows that the ensemble-teacher approach improves reproduction of two-, three-, and many-body properties while increasing simulation stability.

Core claim

A knowledge distillation framework first trains a teacher CG neural network potential on AA-to-CG mapped forces alone to denoise the labels; its force and per-bead energy predictions are then used to train refined student models, with ensemble averaging of teacher outputs providing the strongest gains in quality and stability on a deep eutectic solvent as measured by structural and dynamical observables.

What carries the argument

Knowledge distillation pipeline in which a teacher neural network potential, trained only on noisy AA-to-CG mapped forces, supplies denoised force and per-bead energy targets for training student CG models.

If this is right

CG simulations using the distilled models remain stable for longer trajectories than those trained by direct force matching.
Structural and thermodynamic properties of complex fluids such as deep eutectic solvents are reproduced more accurately at the coarse-grained level.
Both force and energy targets from the teacher contribute measurable improvements, with ensemble averaging amplifying the gains.
The method works for both single-teacher and multi-teacher distillation setups without requiring changes to the underlying neural network architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same denoising step could be inserted into other force-field training pipelines that start from mapped or noisy reference data.
Because the student models are more stable, they may enable routine CG simulations at length and time scales that remain inaccessible even to current CG methods.
Extending the teacher-student split to include additional per-bead quantities such as virials might further reduce variance in thermodynamic predictions.

Load-bearing premise

The teacher model trained on mapped forces produces predictions that faithfully represent the effective coarse-grained physics rather than adding new biases specific to the distillation process.

What would settle it

Direct comparison on the same deep eutectic solvent in which a student model trained on ensemble teacher predictions matches all-atom reference properties no better than a baseline model trained directly on the original mapped forces.

read the original abstract

Molecular dynamics simulations are an integral tool for studying the atomistic behavior of materials under diverse conditions. However, they can be computationally demanding in wall-clock time, especially for large systems, which limits the time and length scales accessible. Coarse-grained (CG) models reduce computational expense by grouping atoms into simplified representations commonly called beads, but sacrifice atomic detail and introduce mapping noise, complicating the training of machine-learned surrogates. Moreover, because CG models inherently include entropic contributions, they cannot be fit directly to all-atom energies, leaving instantaneous, noisy forces as the only state-specific quantities available for training. Here, we apply a knowledge distillation framework by first training an initial CG neural network potential (the teacher) solely on AA-to-CG mapped forces to denoise those labels, then distill its force and energy predictions to train refined CG models (the student) in both single- and ensemble-training setups while exploring different force and energy target combinations. We validate this framework on a complex molecular fluid, a deep eutectic solvent, by evaluating two-, three-, and many-body properties and compare the CG and all-atom results. Our findings demonstrate that training a student model on ensemble teacher-predicted forces and per-bead energies improves the quality and stability of CG force fields.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a knowledge distillation framework for coarse-grained (CG) neural network force fields. A teacher model is first trained solely on noisy all-atom-to-CG mapped forces to denoise the labels. Its force and per-bead energy predictions are then used to train a student model, with both single-model and ensemble-training variants explored using different combinations of force and energy targets. The method is tested on a deep eutectic solvent, with validation performed by comparing two-, three-, and many-body structural properties against all-atom reference data.

Significance. If the central claim holds, the approach provides a practical route to mitigate mapping-induced noise in CG parameterization without requiring direct access to CG free energies. The inclusion of ensemble distillation and per-bead energy targets represents a concrete extension of standard force-matching workflows and could improve stability for complex molecular fluids.

major comments (2)

[Methods] Methods section (teacher training paragraph): the manuscript does not report an explicit consistency check verifying that the teacher’s predicted forces equal the negative gradient of its predicted per-bead energies. Because the student is trained on both quantities, any force-energy inconsistency introduced by the teacher would propagate directly into the final CG potential and undermine the claim that the distilled labels recover effective many-body physics.
[Results] Results section (validation on deep eutectic solvent): while two-, three-, and many-body properties are compared to all-atom data, the text provides no quantitative error metrics, standard deviations across independent runs, or ablation tables contrasting the distilled student against a direct force-matching baseline trained on the same mapped forces. Without these numbers the magnitude and statistical significance of the reported improvement cannot be assessed.

minor comments (2)

[Abstract] The abstract states that 'ensemble teacher-predicted forces and per-bead energies' improve quality, yet the precise weighting or loss combination used in the ensemble student is not stated until the Methods; moving this detail to the abstract or adding a short equation reference would improve readability.
[Figures] Figure captions for the structural-property comparisons should explicitly state the number of independent trajectories and the block-averaging procedure used to generate error bars (if any).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment below and will incorporate the suggested revisions to improve the clarity and rigor of our work.

read point-by-point responses

Referee: [Methods] Methods section (teacher training paragraph): the manuscript does not report an explicit consistency check verifying that the teacher’s predicted forces equal the negative gradient of its predicted per-bead energies. Because the student is trained on both quantities, any force-energy inconsistency introduced by the teacher would propagate directly into the final CG potential and undermine the claim that the distilled labels recover effective many-body physics.

Authors: We agree with the referee that ensuring force-energy consistency in the teacher model is crucial for the validity of the distillation approach. Although the teacher is trained exclusively on forces, its architecture allows for energy predictions, and we will add a consistency check in the revised Methods section. Specifically, we will report the mean absolute difference between the predicted forces and the negative gradients of the predicted per-bead energies on a held-out validation set, demonstrating that the teacher provides physically consistent targets. revision: yes
Referee: [Results] Results section (validation on deep eutectic solvent): while two-, three-, and many-body properties are compared to all-atom data, the text provides no quantitative error metrics, standard deviations across independent runs, or ablation tables contrasting the distilled student against a direct force-matching baseline trained on the same mapped forces. Without these numbers the magnitude and statistical significance of the reported improvement cannot be assessed.

Authors: We appreciate this suggestion for enhancing the quantitative assessment of our results. In the revised manuscript, we will include quantitative error metrics (e.g., root-mean-square deviations for radial distribution functions and other structural properties), standard deviations computed across multiple independent training runs, and an ablation study table that directly compares the performance of the distilled student models (single and ensemble variants) against a baseline direct force-matching model trained on the same mapped forces. This will provide a clear measure of the improvement and its statistical significance. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with external data and no circular reductions

full rationale

The paper trains a teacher model exclusively on external all-atom to coarse-grained mapped force data to produce denoised labels, then distills those teacher predictions (forces and per-bead energies) into a student model under single- and ensemble-training regimes. This is a standard supervised distillation pipeline whose inputs are independent reference trajectories; no equation, loss term, or validation metric reduces the final student potential or reported improvements back to a parameter defined by the same outputs, nor does any load-bearing step rely on a self-citation whose content is itself unverified or tautological. Validation against two-, three-, and many-body structural properties supplies an external benchmark outside the training targets, confirming the workflow remains non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard domain assumption that CG forces contain the effective interactions after entropy is integrated out and that a neural network can learn a useful denoising mapping from noisy labels.

axioms (1)

domain assumption CG models inherently include entropic contributions and cannot be fit directly to all-atom energies
Explicitly stated in the abstract as the reason only instantaneous noisy forces are available for training.

pith-pipeline@v0.9.0 · 5775 in / 1206 out tokens · 25576 ms · 2026-05-18T02:58:06.049967+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We train the teacher networks on this mapped AA data... Lteacher = wF Lerr(ˆF,F). ... student ... Lstudent = Lteacher + wf Lerr(fSi,fTi) + wE Lerr(ES,ET) + wε Lerr(εSi,εTi)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The HIP-NN-TS ... predicts energy contributions at each hierarchy which are summed to yield per-bead energies ε and the system energy E. Forces ... obtained by automatic differentiation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.