Beyond Scalar Objectives: Expert-Feedback-Driven Autonomous Experimentation for Scientific Discovery at the Nanoscale

Arpan Biswas; Dennis Meier; Hiroshi Funakubo; Jan Schulthei{\ss}; Jefferey Baxter; Ralph Bulanadi; Rama Vasudevan; Yongtao Liu

arxiv: 2605.21820 · v1 · pith:QI5KHIMPnew · submitted 2026-05-20 · 💻 cs.LG · cond-mat.mtrl-sci

Beyond Scalar Objectives: Expert-Feedback-Driven Autonomous Experimentation for Scientific Discovery at the Nanoscale

Ralph Bulanadi , Jefferey Baxter , Arpan Biswas , Hiroshi Funakubo , Dennis Meier , Jan Schulthei{\ss} , Rama Vasudevan , Yongtao Liu This is my paper

Pith reviewed 2026-05-22 08:39 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.mtrl-sci

keywords deep-kernel pairwise learningautonomous microscopyexpert feedbacklatent utility functionferroelectric domain wallsself-driving laboratoriesnanoscale structuresactive learning

0 comments

The pith

Expert pairwise judgments guide autonomous nanoscale microscopy by learning a latent utility function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops deep-kernel pairwise learning (DKPL) to incorporate human expertise into autonomous experimentation for nanoscale microscopy. Experts directly compare which experimental output is more promising based on their interdisciplinary knowledge, rather than using predefined scalar descriptors. DKPL then learns a latent utility function from these judgments to guide subsequent measurements in an active learning loop. This matters because many scientifically important nanoscale features are difficult to express as single numbers yet clear to experts. The approach is demonstrated on a model dataset with known ground truth and applied to real ferroelectric materials to distinguish domain wall properties.

Core claim

DKPL learns a latent utility function from expert pairwise judgments on microscopy images to select high-value measurement locations and identify physically meaningful structures, such as characteristic domain-wall angles in bismuth ferrite and head-to-head versus tail-to-tail character in erbium manganite, without depending on scalar objectives that may overlook key phenomena.

What carries the argument

Deep-kernel pairwise learning (DKPL), which trains on expert comparisons of experimental outputs to infer a ranking utility that directs autonomous microscopy toward scientifically promising regions.

If this is right

The system can identify domain-wall angles and charge characters in ferroelectric materials that scalar methods overlook.
Autonomous experiments can focus on regions with high scientific information content using only human comparisons.
This provides a direct route to integrate interdisciplinary expert knowledge into self-driving laboratory workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same comparison-based learning could apply to other experimental domains where experts recognize important patterns faster than quantitative descriptors.
Periodic expert input during live experiments might allow the utility model to adapt as new structures emerge.
Combining DKPL with additional data types such as spectroscopy could strengthen the learned distinctions.

Load-bearing premise

Expert pairwise judgments are consistent enough across sessions and capture the physically relevant distinctions that scalar metrics miss.

What would settle it

If the system is run on the model dataset with known ground truth and fails to prioritize high-information regions or distinguish domain-wall types in the bismuth ferrite and erbium manganite experiments, the claim that expert judgments yield a superior guiding utility would not hold.

Figures

Figures reproduced from arXiv: 2605.21820 by Arpan Biswas, Dennis Meier, Hiroshi Funakubo, Jan Schulthei{\ss}, Jefferey Baxter, Ralph Bulanadi, Rama Vasudevan, Yongtao Liu.

**Figure 4.** Figure 4: DKPL sampling in Auto-3DPFM of PZT to examine characteristic domain-wall angle. a) i. Map of amplitude. The solid purple square shows (ii.) a domain wall without 180° character; the dotted orange square shows (iii.) a domain wall with 180° character. b) The results of DKPL, showing i. the highest-predicted-utility points overlayed on the total amplitude; ii. the “ground truth” domain-wall character establi… view at source ↗

**Figure 5.** Figure 5: DKPL sampling in Auto-3DPFM of a grain in an ErMnO3 polycrystal to examine the tail-to-tail character of domain walls. a) i. Map of amplitude with ferroelectric domains marked. The solid red square shows (ii.) strong tail-to-tail character; the dashed orange square shows (iii.) weak tail-to-tail character; and the dotted yellow square shows (iv.) head-to-head character. b) An overlay showing highest-predic… view at source ↗

read the original abstract

Self-driving laboratories or autonomous experimentation are emerging as transformative platforms for accelerating scientific discovery. Bayesian optimization (BO) is among the most widely used machine learning frameworks for these purposes, but these BO-based frameworks rely on predefined scalar descriptors to guide experimentation. In many situations, the determination of an appropriate scalar descriptor can be challenging, and may fail to capture subtle yet scientifically important phenomena apparent to experts with interdisciplinary insight. To overcome this limitation, here we develop deep-kernel pairwise learning (DKPL), an approach for autonomous microscopy experiments which incorporates human expertise and interdisciplinary scientific knowledge into an active learning loop. Instead of relying on explicit scalar objectives, DKPL enables experts to directly evaluate which experimental output is more promising using interdisciplinary knowledge. DKPL then learns a latent utility function from these expert judgements to guide subsequent autonomous microscopy experiments. We demonstrate DKPL's performance in learning physically meaningful nanoscale structures while effectively prioritizing high-information measurement regions using an experimental model dataset with known ground truth. We further apply DKPL to analyze the character of ferroelectric domain walls, where we find DKPL capable of distinguishing between high and low characteristic domain-wall angles in bismuth ferrite, and able to discover both head-to-head and tail-to-tail domain-wall character in erbium manganite. This development establishes an approach to integrate expert knowledge into autonomous microscopy experiments and demonstrates a pathway toward expert-guided self-driving laboratories capable of addressing scientific problems beyond the limits of scalar-metrics-driven learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DKPL replaces scalar objectives with expert pairwise judgments in microscopy loops, but lacks the numbers needed to confirm real gains over standard Bayesian optimization.

read the letter

The main takeaway is that this paper shows how to fold expert judgment into autonomous microscopy by having specialists compare pairs of images instead of optimizing a fixed number. They train a deep kernel model on those preferences to pick the next scan location, calling the whole thing DKPL. On a simulated dataset with known ground truth it recovers physically sensible structures, and on real bismuth ferrite and erbium manganite it flags domain walls with specific angles or head-to-head versus tail-to-tail polarity. That is a direct, practical response to the common complaint that scalar metrics miss subtle but important features in nanoscale materials work. The real-sample results give a concrete sense of what the method can surface when human insight is brought inside the loop. The soft spot is the missing quantitative evidence. The description gives no accuracy figures, no error bars, no ablation that isolates the expert-feedback part, and no comparison showing how much faster or more reliably DKPL finds the interesting regions than ordinary Bayesian optimization. There is also no check on whether the same expert would make the same pairwise calls on different days or whether different experts agree. Without those data points it is hard to tell whether the learned utility is adding stable signal or simply reflecting one person's current view. This is aimed at groups already running active-learning microscopy or building self-driving labs who want to move past purely numerical objectives. A reader working on ferroelectric characterization or similar would see usable examples of the approach in action. The work is coherent enough and grounded in actual experiments to deserve a serious referee, even though the validation will need tightening. I would send it out for peer review with a request for performance metrics, baseline comparisons, and reliability checks on the expert judgments.

Referee Report

3 major / 2 minor

Summary. The paper introduces deep-kernel pairwise learning (DKPL) to incorporate expert feedback into autonomous microscopy experiments, replacing scalar objectives in Bayesian optimization with a latent utility function learned from expert pairwise judgments on experimental outputs. It demonstrates the approach on a model dataset with known ground truth and applies it to real ferroelectric materials (bismuth ferrite and erbium manganite) to distinguish domain-wall angles and head-to-head versus tail-to-tail character.

Significance. If validated, the work could meaningfully advance self-driving laboratories by enabling integration of interdisciplinary expert knowledge that scalar metrics often miss, opening pathways for expert-guided experimentation in complex nanoscale materials problems.

major comments (3)

The abstract and demonstration sections report successful application on the model dataset with known ground truth and on two real materials, but provide no quantitative performance metrics, error bars, ablation of the expert-feedback component, or direct comparisons to scalar-objective baselines. This leaves the central claim of outperformance only partially supported.
The learning of the latent utility from expert pairwise judgments (described in the method and results) implicitly assumes low noise and high physical relevance of the judgments (e.g., capturing domain-wall angles in BiFeO3 or head-to-head/tail-to-tail distinctions in ErMnO3). No inter-expert agreement, test-retest reliability, or correlation with independent physical observables is reported, which is load-bearing for the claim that DKPL captures distinctions missed by scalars.
In the real-materials application, the paper states DKPL is 'capable of distinguishing' the relevant domain-wall features, yet without quantitative validation that the learned utility correlates with physically meaningful observables where scalar baselines fail, the advantage over standard BO remains unproven.

minor comments (2)

Clarify in the methods how the deep kernel is constructed from the pairwise preference model to avoid potential confusion with standard kernel definitions in BO.
Add a brief statement in the abstract or introduction on the specific form of expert input (e.g., number of pairwise comparisons per iteration) for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments, which have helped us identify areas where the manuscript can be strengthened. We address each major comment below and describe the revisions we plan to incorporate.

read point-by-point responses

Referee: The abstract and demonstration sections report successful application on the model dataset with known ground truth and on two real materials, but provide no quantitative performance metrics, error bars, ablation of the expert-feedback component, or direct comparisons to scalar-objective baselines. This leaves the central claim of outperformance only partially supported.

Authors: We agree that additional quantitative support would strengthen the central claims. The current demonstrations emphasize qualitative alignment with known ground truth on the model dataset and physically interpretable features in real materials. In the revised manuscript we will add quantitative metrics for the model dataset (e.g., cumulative regret or success rate in identifying high-value regions), error bars from repeated trials where feasible, an ablation removing the expert-feedback component, and direct comparisons against standard scalar-objective Bayesian optimization baselines. revision: yes
Referee: The learning of the latent utility from expert pairwise judgments (described in the method and results) implicitly assumes low noise and high physical relevance of the judgments (e.g., capturing domain-wall angles in BiFeO3 or head-to-head/tail-to-tail distinctions in ErMnO3). No inter-expert agreement, test-retest reliability, or correlation with independent physical observables is reported, which is load-bearing for the claim that DKPL captures distinctions missed by scalars.

Authors: The referee correctly identifies that inter-expert agreement and test-retest statistics are not reported. Judgments in this study were obtained from a single domain expert; therefore we cannot retroactively compute multi-expert reliability metrics. We will revise the manuscript to (i) explicitly state this limitation, (ii) provide any available post-hoc correlations between the learned utility scores and independent physical observables (domain-wall angle distributions and polarization character) drawn from the experimental data and literature, and (iii) outline how the framework could be extended to aggregate judgments from multiple experts in future work. revision: partial
Referee: In the real-materials application, the paper states DKPL is 'capable of distinguishing' the relevant domain-wall features, yet without quantitative validation that the learned utility correlates with physically meaningful observables where scalar baselines fail, the advantage over standard BO remains unproven.

Authors: We acknowledge that the current phrasing relies on qualitative demonstration. We will augment the real-materials section with quantitative evidence: correlation analysis between the learned latent utility and measured physical parameters (domain-wall angle histograms and head-to-head versus tail-to-tail polarization vectors), together with side-by-side comparisons showing regions prioritized by DKPL but missed by scalar baselines. These additions will be supported by the existing experimental data. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces DKPL as a method that learns a latent utility function from external expert pairwise judgments on experimental outputs, using these to guide active learning in microscopy without predefined scalar objectives. This chain relies on independent human inputs rather than internal equations or self-citations; no step reduces a claimed prediction to a fitted parameter by construction, nor imports uniqueness via author-overlapping citations. The approach remains self-contained, with performance claims tied to external expert evaluations and ground-truth datasets rather than tautological redefinitions of its own components.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the premise that expert judgments provide a reliable training signal for a latent utility function that generalizes to new measurement regions; no free parameters, axioms, or invented entities are explicitly introduced in the abstract.

pith-pipeline@v0.9.0 · 5823 in / 1136 out tokens · 34730 ms · 2026-05-22T08:39:57.302460+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

more” or “less

DKPL sampling in Auto-3DPFM of a grain in an ErMnO3 polycrystal to examine the tail-to-tail character of domain walls. a) i. Map of amplitude with ferroelectric domains marked. The solid red square shows (ii.) strong tail-to-tail character; the dashed orange square shows (iii.) weak tail-to-tail character; and the dotted yellow square shows (iv.) head-to-...

work page 2020
[2]

& Kumacheva, E

References 1 Abolhasani, M. & Kumacheva, E. The rise of self-driving labs in chemical and materials sciences. Nature Synthesis 2, 483-492 (2023). 2 Liu, Y . et al. AEcroscopy: a software–hardware framework empowering microscopy toward automated and autonomous experimentation. Small Methods 8, 2301740 (2024). 3 Seifrid, M. et al. Autonomous chemical experi...

work page arXiv 2023

[1] [1]

more” or “less

DKPL sampling in Auto-3DPFM of a grain in an ErMnO3 polycrystal to examine the tail-to-tail character of domain walls. a) i. Map of amplitude with ferroelectric domains marked. The solid red square shows (ii.) strong tail-to-tail character; the dashed orange square shows (iii.) weak tail-to-tail character; and the dotted yellow square shows (iv.) head-to-...

work page 2020

[2] [2]

& Kumacheva, E

References 1 Abolhasani, M. & Kumacheva, E. The rise of self-driving labs in chemical and materials sciences. Nature Synthesis 2, 483-492 (2023). 2 Liu, Y . et al. AEcroscopy: a software–hardware framework empowering microscopy toward automated and autonomous experimentation. Small Methods 8, 2301740 (2024). 3 Seifrid, M. et al. Autonomous chemical experi...

work page arXiv 2023