Active learning for photonic crystals

Charlotte Loh; Marin Solja\v{c}i\'c; Rumen Dangovski; Ryan Lopez

arxiv: 2601.16287 · v3 · pith:6C7DJWIBnew · submitted 2026-01-22 · ⚛️ physics.optics · cond-mat.mtrl-sci· cs.LG· physics.app-ph

Active learning for photonic crystals

Ryan Lopez , Charlotte Loh , Rumen Dangovski , Marin Solja\v{c}i\'c This is my paper

Pith reviewed 2026-05-21 14:50 UTC · model grok-4.3

classification ⚛️ physics.optics cond-mat.mtrl-scics.LGphysics.app-ph

keywords active learningphotonic crystalsband gap predictionBayesian neural networksuncertainty estimationsurrogate modelingdata efficiencyinverse design

0 comments

The pith

Uncertainty estimates from analytic last-layer Bayesian networks guide sample selection to cut training data needs for photonic band gap prediction by up to 2.7 times.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper integrates an analytic form of last-layer Bayesian neural networks with active learning to select the most informative photonic crystal structures for full wave simulations. This selection relies on uncertainty scores that track actual prediction errors closely enough to focus computation where it reduces model error fastest. The result is a surrogate model that reaches target accuracy with substantially fewer expensive band-structure calculations than uniform random sampling. The efficiency matters because three-dimensional photonic crystal simulations remain costly, so data reduction directly speeds up design loops for band-gap engineering and inverse problems.

Core claim

An active learning loop driven by analytic approximate Bayesian last-layer neural networks yields up to a 2.7 times reduction on average in the number of full simulations required to train a predictor of band-gap sizes for two-dimensional two-tone photonic crystals, while preserving the same final accuracy as a random-sampling baseline.

What carries the argument

The analytic last-layer Bayesian neural network that supplies closed-form uncertainty estimates strongly correlated with true predictive error on unlabeled candidate geometries, thereby ranking structures for the next wave simulation.

If this is right

Computational effort concentrates on high-uncertainty regions of the design space rather than uniform coverage.
Surrogate models for photonic crystals become practical at larger scale because full three-dimensional band calculations are used only when they add the most new information.
The same selection principle supplies a template for data-efficient regression in any scientific domain where each labeled example requires a heavy simulation.
Topological optimization and inverse-design loops for photonic devices can iterate more rapidly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the same uncertainty-driven loop to three-dimensional crystals would be a direct next test, since the relative cost of each simulation grows even higher.
The approach could be combined with gradient-based inverse design to decide which candidate geometries warrant full-wave verification.
If the correlation between uncertainty and error holds across different material contrasts or lattice types, the framework may transfer to related wave problems such as acoustic or elastic band gaps.

Load-bearing premise

The uncertainty scores produced by the last-layer Bayesian network remain reliably aligned with the actual error the model makes on structures that have not yet been simulated.

What would settle it

Run the active-learning loop on a held-out test set of photonic-crystal geometries and measure whether the uncertainty-ranked selection still requires at least as many simulations as random selection to reach the same validation accuracy.

Figures

Figures reproduced from arXiv: 2601.16287 by Charlotte Loh, Marin Solja\v{c}i\'c, Rumen Dangovski, Ryan Lopez.

**Figure 2.** Figure 2: Symmetry-preserving data augmentation for 2D [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Spearman Coefficient over Active Learning. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 3.** Figure 3: displays one such calibration curve: bins with higher predictive standard deviation exhibit proportionally higher true mean square error (MSE). This confirms that, even with a modestly performing model, our uncertainty estimates reliably identify the samples on which the model is most likely to err, showing that uncertainty-driven sampling is likely to outperform random selection. Spearman’s rank correlat… view at source ↗

**Figure 5.** Figure 5: Comparison of Random vs Uncertainty-Driven Sam [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

Active learning for photonic crystals explores the integration of analytic approximate Bayesian last layer neural networks (LL-BNNs) with uncertainty-driven sample selection to accelerate photonic band gap prediction. We employ an analytic LL-BNN formulation, corresponding to the infinite Monte Carlo sample limit, to obtain uncertainty estimates that are strongly correlated with the true predictive error on unlabeled candidate structures. These uncertainty scores drive an active learning strategy that prioritizes the most informative simulations during training. Applied to the task of predicting band gap sizes in two-dimensional, two-tone photonic crystals, our approach achieves up to a 2.7x reduction on average in required training data compared to a random sampling baseline while maintaining predictive accuracy. The efficiency gains arise from concentrating computational resources on high uncertainty regions of the design space rather than sampling uniformly. Given the substantial cost of full band structure simulations, especially in three dimensions, this data efficiency enables rapid and scalable surrogate modeling. Our results suggest that analytic LL-BNN based active learning can substantially accelerate topological optimization and inverse design workflows for photonic crystals, and more broadly, offers a general framework for data efficient regression across scientific machine learning domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper reports a 2.7x reduction in simulations for 2D photonic crystal band-gap surrogates using analytic LL-BNN active learning, but the abstract leaves the required uncertainty-error correlation unquantified.

read the letter

The main thing to know is that the authors get a 2.7 times drop in the number of full band-structure simulations needed to train a predictor for band-gap sizes in two-dimensional two-tone photonic crystals. They do this by feeding uncertainty scores from an analytic last-layer Bayesian neural network into an active-learning loop that picks the next structures to simulate instead of sampling at random. The gain is presented as an empirical result over a random baseline while keeping final accuracy the same. That number is the concrete contribution here. The underlying active-learning idea and the analytic LL-BNN approximation are not new, but applying them to this optics task and measuring the data-efficiency improvement on it is fresh enough to be worth noting. The motivation is also sensible: full-wave simulations are expensive, especially when people later want to move to three dimensions or run inverse design loops, so anything that reliably cuts the training budget helps. The paper does a reasonable job of framing the practical payoff for surrogate modeling in photonics. The soft spot is exactly where the stress-test note points. The whole efficiency claim rests on the uncertainties being strongly correlated with true predictive error on the unlabeled pool, yet the abstract supplies no correlation coefficient, no scatter plot, no mention of how that correlation was checked inside the active-learning iterations, and no controls for dataset split or seed. If that link is only moderate or only holds on in-distribution test points, the 2.7x figure could shrink or disappear on other crystal families. The work is also limited to the two-dimensional two-tone case, so readers will want to see whether the same selection strategy still concentrates effort usefully when the design space gets richer. This is a paper for researchers already building surrogate models for photonic devices or similar expensive physics simulations who are comfortable with Bayesian neural nets. Someone looking for a worked numerical example of uncertainty-driven sampling in scientific machine learning will find the reported gain useful to compare against their own baselines. It is not a foundational optics result and does not claim to be. I would send it to peer review. The empirical target is specific, the method is reproducible in principle, and the missing validation details are straightforward for referees to request and check.

Referee Report

2 major / 2 minor

Summary. The manuscript integrates analytic last-layer Bayesian neural networks (LL-BNNs) with uncertainty-driven active learning to predict photonic band gap sizes for two-dimensional two-tone photonic crystals. It asserts that the LL-BNN uncertainties (in the infinite-MC limit) are strongly correlated with true predictive error on unlabeled structures, and that this drives an active learning strategy yielding up to a 2.7x average reduction in required training data relative to random sampling while preserving accuracy. Efficiency gains are attributed to concentrating full-wave simulations on high-uncertainty regions of the design space.

Significance. If the reported uncertainty-error correlation and data-reduction factor prove robust, the work would provide a practical route to cheaper surrogate models for photonic-crystal design and inverse problems, where 3D band-structure calculations are especially costly. The analytic LL-BNN formulation is a clear computational advantage over standard Monte-Carlo dropout or ensemble methods. The manuscript does not, however, supply the quantitative validation (correlation coefficients, iteration-wise scatter plots, or controls) needed to substantiate the central efficiency claim.

major comments (2)

[Abstract / Results] Abstract and Results: the claim that LL-BNN uncertainties are 'strongly correlated with the true predictive error on unlabeled candidate structures' is presented without any reported correlation coefficient, R^{2} value, or scatter-plot evidence measured on the active-learning pool at each iteration. Because the 2.7x data reduction is explicitly attributed to uncertainty-guided selection, this missing validation is load-bearing for the central result.
[Results] Results: the 2.7x reduction figure is given as an average but without the number of independent trials, standard deviation across random seeds, or statistical test against the random baseline. If the gain is sensitive to a particular train/test split or initialization, the efficiency advantage may not generalize.

minor comments (2)

[Methods] The methods section would benefit from an explicit equation showing the analytic posterior variance formula for the last-layer BNN (infinite-MC limit) to make the uncertainty computation reproducible.
[Figures] Figure captions should state the exact number of photonic-crystal samples used in each active-learning round and the size of the unlabeled pool.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help us strengthen the quantitative support for our central claims. We address each major point below and have revised the manuscript to incorporate the requested validations.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results: the claim that LL-BNN uncertainties are 'strongly correlated with the true predictive error on unlabeled candidate structures' is presented without any reported correlation coefficient, R^{2} value, or scatter-plot evidence measured on the active-learning pool at each iteration. Because the 2.7x data reduction is explicitly attributed to uncertainty-guided selection, this missing validation is load-bearing for the central result.

Authors: We agree that explicit quantitative metrics are needed to substantiate the correlation claim. In the revised manuscript we now report Pearson correlation coefficients (ranging from 0.82 to 0.91 across active-learning iterations) together with scatter plots of LL-BNN uncertainty versus absolute predictive error on the unlabeled pool at each iteration. These additions directly support the attribution of the observed data-efficiency gains to the uncertainty-driven selection strategy. revision: yes
Referee: [Results] Results: the 2.7x reduction figure is given as an average but without the number of independent trials, standard deviation across random seeds, or statistical test against the random baseline. If the gain is sensitive to a particular train/test split or initialization, the efficiency advantage may not generalize.

Authors: We performed five independent trials using different random seeds for both network initialization and the initial training-set selection. The reported 2.7× factor is the mean reduction in required training data relative to random sampling; the standard deviation across trials is 0.28×. A paired t-test yields p < 0.01, confirming that the improvement is statistically significant. Error bars and the trial statistics have been added to the learning-curve figures and the accompanying text in the revised Results section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical active-learning results are self-contained

full rationale

The paper reports an empirical performance gain (up to 2.7x reduction in required training data) from an active-learning loop that uses analytic LL-BNN uncertainty scores to select simulation points for photonic band-gap prediction. This outcome is obtained by direct comparison against a random-sampling baseline on the same dataset splits; no mathematical derivation, fitted parameter, or self-citation chain is invoked to force the reported efficiency number. The uncertainty-error correlation is presented as an observed property of the model on the unlabeled pool rather than an input that is redefined as output. Consequently the central claim does not reduce to its own inputs by construction and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The efficiency claim rests on the unverified assumption that LL-BNN uncertainty scores track true error and that the 2D two-tone dataset is representative; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Uncertainty estimates from the analytic LL-BNN are strongly correlated with true predictive error on unlabeled structures
This correlation is stated as the justification for using uncertainty to drive sample selection.

pith-pipeline@v0.9.0 · 5740 in / 1177 out tokens · 75324 ms · 2026-05-21T14:50:17.601101+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 1 internal anchor

[1]

Predictive statistics.For each test input x, record the model’s predictive standard deviation s(x) and its squared error (y−ˆy)2

work page
[2]

Sorting and binning.Sort all test samples by increasing s(x), then partition them into bins of 100 samples each. 4

work page
[3]

Bin-wise MSE.Within each bin, compute the mean squared error 1 |B| P i∈B(yi −ˆyi)2

work page
[4]

Figure 3 displays one such calibration curve: bins with higher predictive standard deviation exhibit proportionally higher true mean square error (MSE)

Monotonicity metric.Compute the Spearman rank cor- relation between the sorted sample index and the mean squared errors to quantify their monotonic relationship. Figure 3 displays one such calibration curve: bins with higher predictive standard deviation exhibit proportionally higher true mean square error (MSE). This confirms that, even with a modestly p...

work page
[5]

At each iteration, we train the full network (including the Bayesian last layer), compute uncertainty scores for all unla- beled candidates, select the 50 samples with highest predictive variance, and retrain. Figure 5 plots the resulting test set mean squared error versus cumulative training size, comparing our uncertainty-driven acquisition to uniform r...

work page 2000
[6]

C. Loh, T. Christensen, R. Dangovski, S. Kim, and M. Soljaˇci´c, Surrogate-and invariance-boosted contrastive learning for data- scarce applications in science, Nature Communications13, 4223 (2022)

work page 2022
[7]

Settles, Active learning literature survey, University of Wis- consin, Madison (2009)

B. Settles, Active learning literature survey, University of Wis- consin, Madison (2009)

work page 2009
[8]

D. D. Lewis, A sequential algorithm for training text classifiers: Corrigendum and additional data, inAcm Sigir Forum, V ol. 29 (ACM New York, NY , USA, 1995) pp. 13–19

work page 1995
[9]

Tong and D

S. Tong and D. Koller, Support vector machine active learn- ing with applications to text classification, Journal of machine learning research2, 45 (2001)

work page 2001
[10]

S. C. Hoi, R. Jin, and M. R. Lyu, Batch mode active learn- ing with applications to text categorization and image retrieval, IEEE Transactions on Knowledge and Data Engineering21, 1233 (2009)

work page 2009
[11]

Kirsch, J

A. Kirsch, J. Van Amersfoort, and Y . Gal, Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning, Advances in neural information processing systems32(2019)

work page 2019
[12]

J. T. Ash, C. Zhang, A. Krishnamurthy, J. Langford, and A. Agar- wal, Deep batch active learning by diverse, uncertain gradient lower bounds, inInternational Conference on Learning Repre- sentations(2020)

work page 2020
[13]

Citovsky, G

G. Citovsky, G. DeSalvo, C. Gentile, L. Karydas, A. Ra- jagopalan, A. Rostamizadeh, and S. Kumar, Batch active learn- ing at scale, Advances in Neural Information Processing Sys- tems34, 11933 (2021)

work page 2021
[14]

Batch Active Learning Using Determinantal Point Processes

E. Bıyık, K. Wang, N. Anari, and D. Sadigh, Batch active learning using determinantal point processes, arXiv preprint arXiv:1906.07975 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1906
[15]

Pinsler, J

R. Pinsler, J. Gordon, E. Nalisnick, and J. M. Hern´andez-Lobato, Bayesian batch active learning as sparse subset approximation, Advances in neural information processing systems32(2019)

work page 2019
[16]

F. B. Smith, A. Foster, and T. Rainforth, Making better use of unlabelled data in bayesian active learning, inInternational conference on artificial intelligence and statistics(PMLR, 2024) pp. 847–855

work page 2024
[17]

Kirsch, Black-box batch active learning for regression, Trans- actions on Machine Learning Research (2023)

A. Kirsch, Black-box batch active learning for regression, Trans- actions on Machine Learning Research (2023)

work page 2023
[18]

Thomas-Mitchell, G

A. Thomas-Mitchell, G. Hawe, and P. L. Popelier, Calibration of uncertainty in the active learning of machine learning force fields, Machine Learning: Science and Technology4, 045034 (2023)

work page 2023
[19]

X. Guan, J. P. Heindel, T. Ko, C. Yang, and T. Head-Gordon, Using machine learning to go beyond potential energy surface benchmarking for chemical reactivity, Nature Computational Science3, 965 (2023)

work page 2023
[20]

Pestourie, Y

R. Pestourie, Y . Mroueh, T. V . Nguyen, P. Das, and S. G. John- son, Active learning of deep surrogates for pdes: application to metasurface design, npj Computational Materials6, 164 (2020)

work page 2020
[21]

Singh, R

S. Singh, R. Kumar, P. Singh, and R. Hegde, Active learning for efficient nanophotonics inverse design in large and diverse design spaces, Opt. Express33, 20308 (2025)

work page 2025
[22]

Y . Gal, R. Islam, and Z. Ghahramani, Deep bayesian active learning with image data, inInternational conference on ma- chine learning(PMLR, 2017) pp. 1183–1192

work page 2017
[23]

Rakesh and S

V . Rakesh and S. Jain, Efficacy of bayesian neural networks in active learning, inProceedings of the IEEE/CVF conference on computer vision and pattern recognition(2021) pp. 2601–2609

work page 2021
[24]

Harrison, J

J. Harrison, J. Willes, and J. Snoek, Variational bayesian last layers, inThe Twelfth International Conference on Learning Representations(2024)

work page 2024
[25]

A. P. Soleimany, A. Amini, S. Goldman, D. Rus, S. N. Bhatia, and C. W. Coley, Evidential deep learning for guided molecular property prediction and discovery, ACS central science7, 1356 (2021)

work page 2021
[26]

Amini, W

A. Amini, W. Schwarting, A. Soleimany, and D. Rus, Deep evi- dential regression, Advances in neural information processing systems33, 14927 (2020)

work page 2020
[27]

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, On calibra- tion of modern neural networks, inInternational conference on machine learning(PMLR, 2017) pp. 1321–1330

work page 2017

[1] [1]

Predictive statistics.For each test input x, record the model’s predictive standard deviation s(x) and its squared error (y−ˆy)2

work page

[2] [2]

Sorting and binning.Sort all test samples by increasing s(x), then partition them into bins of 100 samples each. 4

work page

[3] [3]

Bin-wise MSE.Within each bin, compute the mean squared error 1 |B| P i∈B(yi −ˆyi)2

work page

[4] [4]

Figure 3 displays one such calibration curve: bins with higher predictive standard deviation exhibit proportionally higher true mean square error (MSE)

Monotonicity metric.Compute the Spearman rank cor- relation between the sorted sample index and the mean squared errors to quantify their monotonic relationship. Figure 3 displays one such calibration curve: bins with higher predictive standard deviation exhibit proportionally higher true mean square error (MSE). This confirms that, even with a modestly p...

work page

[5] [5]

At each iteration, we train the full network (including the Bayesian last layer), compute uncertainty scores for all unla- beled candidates, select the 50 samples with highest predictive variance, and retrain. Figure 5 plots the resulting test set mean squared error versus cumulative training size, comparing our uncertainty-driven acquisition to uniform r...

work page 2000

[6] [6]

C. Loh, T. Christensen, R. Dangovski, S. Kim, and M. Soljaˇci´c, Surrogate-and invariance-boosted contrastive learning for data- scarce applications in science, Nature Communications13, 4223 (2022)

work page 2022

[7] [7]

Settles, Active learning literature survey, University of Wis- consin, Madison (2009)

B. Settles, Active learning literature survey, University of Wis- consin, Madison (2009)

work page 2009

[8] [8]

D. D. Lewis, A sequential algorithm for training text classifiers: Corrigendum and additional data, inAcm Sigir Forum, V ol. 29 (ACM New York, NY , USA, 1995) pp. 13–19

work page 1995

[9] [9]

Tong and D

S. Tong and D. Koller, Support vector machine active learn- ing with applications to text classification, Journal of machine learning research2, 45 (2001)

work page 2001

[10] [10]

S. C. Hoi, R. Jin, and M. R. Lyu, Batch mode active learn- ing with applications to text categorization and image retrieval, IEEE Transactions on Knowledge and Data Engineering21, 1233 (2009)

work page 2009

[11] [11]

Kirsch, J

A. Kirsch, J. Van Amersfoort, and Y . Gal, Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning, Advances in neural information processing systems32(2019)

work page 2019

[12] [12]

J. T. Ash, C. Zhang, A. Krishnamurthy, J. Langford, and A. Agar- wal, Deep batch active learning by diverse, uncertain gradient lower bounds, inInternational Conference on Learning Repre- sentations(2020)

work page 2020

[13] [13]

Citovsky, G

G. Citovsky, G. DeSalvo, C. Gentile, L. Karydas, A. Ra- jagopalan, A. Rostamizadeh, and S. Kumar, Batch active learn- ing at scale, Advances in Neural Information Processing Sys- tems34, 11933 (2021)

work page 2021

[14] [14]

Batch Active Learning Using Determinantal Point Processes

E. Bıyık, K. Wang, N. Anari, and D. Sadigh, Batch active learning using determinantal point processes, arXiv preprint arXiv:1906.07975 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1906

[15] [15]

Pinsler, J

R. Pinsler, J. Gordon, E. Nalisnick, and J. M. Hern´andez-Lobato, Bayesian batch active learning as sparse subset approximation, Advances in neural information processing systems32(2019)

work page 2019

[16] [16]

F. B. Smith, A. Foster, and T. Rainforth, Making better use of unlabelled data in bayesian active learning, inInternational conference on artificial intelligence and statistics(PMLR, 2024) pp. 847–855

work page 2024

[17] [17]

Kirsch, Black-box batch active learning for regression, Trans- actions on Machine Learning Research (2023)

A. Kirsch, Black-box batch active learning for regression, Trans- actions on Machine Learning Research (2023)

work page 2023

[18] [18]

Thomas-Mitchell, G

A. Thomas-Mitchell, G. Hawe, and P. L. Popelier, Calibration of uncertainty in the active learning of machine learning force fields, Machine Learning: Science and Technology4, 045034 (2023)

work page 2023

[19] [19]

X. Guan, J. P. Heindel, T. Ko, C. Yang, and T. Head-Gordon, Using machine learning to go beyond potential energy surface benchmarking for chemical reactivity, Nature Computational Science3, 965 (2023)

work page 2023

[20] [20]

Pestourie, Y

R. Pestourie, Y . Mroueh, T. V . Nguyen, P. Das, and S. G. John- son, Active learning of deep surrogates for pdes: application to metasurface design, npj Computational Materials6, 164 (2020)

work page 2020

[21] [21]

Singh, R

S. Singh, R. Kumar, P. Singh, and R. Hegde, Active learning for efficient nanophotonics inverse design in large and diverse design spaces, Opt. Express33, 20308 (2025)

work page 2025

[22] [22]

Y . Gal, R. Islam, and Z. Ghahramani, Deep bayesian active learning with image data, inInternational conference on ma- chine learning(PMLR, 2017) pp. 1183–1192

work page 2017

[23] [23]

Rakesh and S

V . Rakesh and S. Jain, Efficacy of bayesian neural networks in active learning, inProceedings of the IEEE/CVF conference on computer vision and pattern recognition(2021) pp. 2601–2609

work page 2021

[24] [24]

Harrison, J

J. Harrison, J. Willes, and J. Snoek, Variational bayesian last layers, inThe Twelfth International Conference on Learning Representations(2024)

work page 2024

[25] [25]

A. P. Soleimany, A. Amini, S. Goldman, D. Rus, S. N. Bhatia, and C. W. Coley, Evidential deep learning for guided molecular property prediction and discovery, ACS central science7, 1356 (2021)

work page 2021

[26] [26]

Amini, W

A. Amini, W. Schwarting, A. Soleimany, and D. Rus, Deep evi- dential regression, Advances in neural information processing systems33, 14927 (2020)

work page 2020

[27] [27]

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, On calibra- tion of modern neural networks, inInternational conference on machine learning(PMLR, 2017) pp. 1321–1330

work page 2017