pith. sign in

arxiv: 2604.02528 · v1 · submitted 2026-04-02 · 💻 cs.AI · cs.LG

Interpretable Deep Reinforcement Learning for Element-level Bridge Life-cycle Optimization

Pith reviewed 2026-05-13 20:56 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords interpretable reinforcement learningbridge life-cycle optimizationelement-level condition statesoblique decision treessoft decision treestemperature annealingmaintenance policy
0
0 comments X

The pith

Reinforcement learning with soft trees produces near-optimal, human-readable decision trees for element-level bridge maintenance policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the challenge of optimal life-cycle policies for bridges when condition data shifts from single ratings to detailed four-dimensional arrays of element condition states. It proposes an RL method that uses differentiable soft decision trees as the policy approximator, combined with temperature annealing during training and regularization with pruning to control complexity. If the approach works, the resulting policies take the form of simple oblique decision trees that engineers can inspect, audit, and directly code into existing bridge management systems. This matters because new national specifications require risk-based decisions at the element level, yet most advanced optimization methods remain black boxes that cannot be implemented or trusted in practice. The benefits are shown on a steel girder bridge example in both supervised and reinforcement learning settings.

Core claim

The central claim is that differentiable soft tree models, when used as actor approximators in reinforcement learning and trained with temperature annealing plus regularization and pruning, produce deterministic oblique decision trees that achieve near-optimal life-cycle policies for bridges under element-level condition state representations. These trees have limited numbers of nodes and depth, remain directly understandable and auditable by humans, and can be implemented in current bridge management systems without the performance loss typically seen when interpretability constraints are added to RL policies.

What carries the argument

Differentiable soft tree models serving as actor function approximators in the RL framework, with temperature annealing and regularization-plus-pruning rules to enforce policy simplicity.

If this is right

  • Life-cycle policies take the form of deterministic oblique decision trees with limited nodes and depth.
  • Policies become directly understandable and auditable by bridge engineers.
  • The resulting rules can be implemented into existing bridge management systems.
  • Near-optimal performance is retained despite the interpretability constraint.
  • The framework applies to element-level optimization for steel girder bridges.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same soft-tree RL approach could be tested on other infrastructure assets that use multi-element condition data.
  • The extracted decision rules may reveal specific condition thresholds that drive cost-effective maintenance choices.
  • Bridge agencies could combine these trees with expert judgment to create hybrid policies that are both data-driven and locally adjustable.

Load-bearing premise

That the combination of soft trees, annealing, and pruning will keep policies near-optimal without the performance drop that normally occurs when RL policies are forced into simple, interpretable forms.

What would settle it

A direct numerical comparison on the steel girder bridge problem showing that the expected life-cycle cost or performance of the pruned oblique decision tree policy is substantially worse than a standard deep neural network RL policy on the same task.

Figures

Figures reproduced from arXiv: 2604.02528 by David Y. Yang, Seyyed Amirhossein Moayyedi.

Figure 1
Figure 1. Figure 1: Tree-based interpretable decision models: (a) standard decision tree and (b) oblique decision tree [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A simple soft tree with one internal node for three-class classification. 4 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Temperature-controlled sigmoid function. Instead of relying on the standard sigmoid function, we leverage a so-called temperature-controlled sigmoid function given as follows: 𝜎(𝑧;𝑇) = 1 1 + exp (−𝑧/𝑇) (3) where 𝑧 = output of an internal node, e.g., 𝑧 = w ⊺x + 𝑏 in the previous example; 𝑇 = temperature, with 𝑇 = 1 corresponding to the standard sigmoid function. As the temperature decreases, the sigmoid fun… view at source ↗
Figure 4
Figure 4. Figure 4: Pseudo code for pruning trivial nodes • Anneal 𝑇: The temperature is gradually decreased following an exponential schedule: 𝑇 (𝑠) = 𝑇0 ·  𝑇min 𝑇0  𝑠/𝑆 (4) where 𝑠 = stage of the annealing schedule (starting from 0), which may include several gradient descent steps; 𝑆 = total number of stages during the training process; 𝑇min = minimum temperature at the end of the training. 2.4 Regularization and Pruning… view at source ↗
Figure 5
Figure 5. Figure 5: Pseudo code for pruning infeasible paths node enforces the condition 𝑥1 + 𝑥2 ≤ 0. If at the same time a previous ancestor node has already enforced 𝑥1 + 𝑥2 > 5, the internal node and all of its offspring nodes (i.e., subtree from the internal node) can never be reached, allowing the subtree to be pruned without ill effects. This process is implemented by analyzing the feasibility of a supplementary linear … view at source ↗
Figure 6
Figure 6. Figure 6: Example of pruning frozen tree: (a) original frozen tree and (b) oblique decision tree after pruning [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Implementation of a differentiable soft tree model (depth = 3): (a) neural network layers, (b) rearrangement to soft tree structure, and (c) output of class logits. example, [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: General architecture for supervised classification with soft tree models [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: PPO architecture for reinforcement learning with soft tree based actors other actor-critic algorithms, e.g. the soft actor-critic algorithm [Haarnoja et al., 2018], can also be adapted by replacing the standard actor network with the proposed soft tree model. The general architecture and its subsequent conversion to the oblique decision tree form a novel tree-based deep learning framework. Overall, the new… view at source ↗
Figure 10
Figure 10. Figure 10: Labeled synthetic data for supervised classification experiments 3 Validation via Supervised Learning Experiments Although the ultimate goal of this study is to optimize life-cycle policies using RL, directly applying the methodology to RL tasks may obscure the benefits and limitations of the proposed soft tree model and its subsequent extraction of oblique decision trees. Because decisions in bridge life… view at source ↗
Figure 11
Figure 11. Figure 11: Effects of 𝐿1 regularization on (a) number of pruned nodes and (b) model performance low. On the other hand, the temperature parameter affects the performance of the baseline soft tree model. In particular, when the temperature is high (𝑇 = 100), the sigmoid gating function is close to a linear function with near-zero gradients, as demonstrated in [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Effects of pruning threshold on (a) model performance and (b) pruned nodes. larger threshold values result in a greater number of pruned nodes. However, even if no nodes are explicitly shut off (e.g., when using the pruning threshold of 10−8 ), a substantial number of nodes are still pruned by the routine in [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: CS evolution under two policies: (a) do-nothing; (b) conduct rehabilitation (𝑎 = 3) if CS4 percentage is greater than 5%, do nothing otherwise. where 𝑎 = 1 . . . 4 = maintenance, repair, rehabilitation, and replacement, respectively. The costs per element for these life-cycle actions, including the “do-nothing” option, are assumed to be 0 (𝑎 = 0), 10 (𝑎 = 1), 100 (𝑎 = 2), 1,000 (𝑎 = 4), and 2,000 (𝑎 = 5) … view at source ↗
Figure 14
Figure 14. Figure 14: Learning curves associated with both types of actor models. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: RL-derived oblique decision tree after pruning. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: RL-derived oblique decision tree integrated with agency rules [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗
read the original abstract

The new Specifications for the National Bridge Inventory (SNBI), in effect from 2022, emphasize the use of element-level condition states (CS) for risk-based bridge management. Instead of a general component rating, element-level condition data use an array of relative CS quantities (i.e., CS proportions) to represent the condition of a bridge. Although this greatly increases the granularity of bridge condition data, it introduces challenges to set up optimal life-cycle policies due to the expanded state space from one single categorical integer to four-dimensional probability arrays. This study proposes a new interpretable reinforcement learning (RL) approach to seek optimal life-cycle policies based on element-level state representations. Compared to existing RL methods, the proposed algorithm yields life-cycle policies in the form of oblique decision trees with reasonable amounts of nodes and depth, making them directly understandable and auditable by humans and easily implementable into current bridge management systems. To achieve near-optimal policies, the proposed approach introduces three major improvements to existing RL methods: (a) the use of differentiable soft tree models as actor function approximators, (b) a temperature annealing process during training, and (c) regularization paired with pruning rules to limit policy complexity. Collectively, these improvements can yield interpretable life-cycle policies in the form of deterministic oblique decision trees. The benefits and trade-offs from these techniques are demonstrated in both supervised and reinforcement learning settings. The resulting framework is illustrated in a life-cycle optimization problem for steel girder bridges.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes an interpretable reinforcement learning framework for element-level bridge life-cycle optimization under the new SNBI specifications. It replaces standard neural actors with differentiable soft oblique decision trees, applies temperature annealing during training, and uses regularization plus pruning rules to obtain compact deterministic oblique trees. These trees are claimed to yield near-optimal policies for steel girder bridges while remaining human-auditable and directly implementable in existing management systems.

Significance. If the central claim holds, the work would supply a practical route to deploy RL-derived policies in bridge management without sacrificing interpretability, directly addressing the expanded state space created by four-dimensional condition-state proportions. The combination of soft-tree actors, annealing, and explicit pruning rules is a targeted technical contribution that could be adopted by agencies already using element-level data.

major comments (3)
  1. [RL training procedure (description of annealing and pruning)] The manuscript provides no explicit bound or convergence argument showing that the expected life-cycle cost of the final pruned deterministic oblique tree remains close to that of the unconstrained soft policy once temperature reaches its minimum and pruning is applied. This gap is load-bearing for the 'near-optimal' claim.
  2. [Experimental results] No ablation isolating the pruning step is reported; therefore it is impossible to quantify how much of any observed performance degradation is attributable to the interpretability constraints versus other factors in the bridge MDP.
  3. [State representation and tree model] The geometry of the four-dimensional probability-vector state space is not analyzed with respect to the oblique splits; it remains unclear whether the hard tree can deviate sharply from the soft approximation on this particular manifold.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by including at least one quantitative performance metric (e.g., relative cost gap to an unconstrained baseline) to support the 'near-optimal' assertion.
  2. [Problem formulation] Notation for the four condition-state proportions (CS1–CS4) should be introduced consistently in the problem formulation section before being used in the tree model.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each of the major comments in detail below, indicating where revisions will be made to improve clarity and strengthen the claims.

read point-by-point responses
  1. Referee: The manuscript provides no explicit bound or convergence argument showing that the expected life-cycle cost of the final pruned deterministic oblique tree remains close to that of the unconstrained soft policy once temperature reaches its minimum and pruning is applied. This gap is load-bearing for the 'near-optimal' claim.

    Authors: We agree that a formal convergence bound would provide stronger theoretical support for the 'near-optimal' claim. However, establishing such a bound for the composition of temperature annealing, soft-to-hard transition, and pruning in a continuous state MDP is non-trivial and would require significant additional theoretical development. Instead, we rely on extensive empirical evaluation. In the revised version, we will include additional plots and tables showing the life-cycle cost of the soft policy, the policy at minimum temperature, and the pruned tree across all experimental scenarios. These results demonstrate that the degradation is typically less than 3-7% depending on the regularization strength. We will also discuss the design of the pruning rules, which explicitly aim to preserve high-value actions. revision: partial

  2. Referee: No ablation isolating the pruning step is reported; therefore it is impossible to quantify how much of any observed performance degradation is attributable to the interpretability constraints versus other factors in the bridge MDP.

    Authors: We acknowledge this limitation in the current experimental design. To address it, we will add an ablation study in the revised manuscript. Specifically, we will report performance metrics for: (1) the unconstrained neural actor baseline, (2) the soft oblique tree actor without annealing or pruning, (3) the annealed soft tree, and (4) the final pruned deterministic tree. This will isolate the contribution of each component, including the pruning step, to any performance changes. revision: yes

  3. Referee: The geometry of the four-dimensional probability-vector state space is not analyzed with respect to the oblique splits; it remains unclear whether the hard tree can deviate sharply from the soft approximation on this particular manifold.

    Authors: The state space is the 3-simplex embedded in 4D (since proportions sum to 1). Oblique decision trees are particularly well-suited here because linear splits can effectively partition this low-dimensional manifold. The soft tree provides a differentiable approximation during training, and annealing gradually sharpens the decisions. To clarify this, we will add a paragraph in Section 3 explaining the geometry and why oblique splits align with the linear structure of condition state transitions. Additionally, we will report the average and maximum KL-divergence between the soft and hard policy distributions on a held-out set of states to quantify any sharp deviations. revision: yes

Circularity Check

0 steps flagged

No circularity: method extends standard RL with explicit components

full rationale

The derivation introduces differentiable soft-tree actors, temperature annealing, and regularization-plus-pruning as three distinct, externally motivated improvements to existing RL methods. These are not defined in terms of the final deterministic oblique trees; the paper treats the transition from soft to pruned hard policies as an empirical outcome demonstrated in both supervised and RL settings rather than a tautological identity. No equation or claim reduces the reported near-optimality to a fitted parameter or self-citation chain. The central claim therefore remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are named. The approach assumes standard RL convergence properties and that soft trees can be annealed into deterministic trees without large optimality loss.

axioms (1)
  • domain assumption Differentiable soft tree models can serve as effective actor approximators that anneal to near-optimal deterministic policies
    Invoked to justify the claim that the three improvements together achieve near-optimal policies.

pith-pipeline@v0.9.0 · 5568 in / 1217 out tokens · 39744 ms · 2026-05-13T20:56:06.750071+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    URLhttps://www.fhwa.dot.gov/bridge/management/FHWA_BMS_workbook_1120. pdf. Accessed: 2023-12-05. FHWA.Bridge Preservation Guide. Federal Highway Administration (FHWA), Washington, DC,

  2. [2]

    doi:10.1061/(ASCE)IS.1943-555X.0000143

    ISSN 1076-0342, 1943-555X. doi:10.1061/(ASCE)IS.1943-555X.0000143. S. B. Chase, Y. Adu-Gyamfi, A. E. Aktan, and E. Minaie.Synthesis of National and International Methodologies Used for Bridge Health Indices. Federal Highway Administration (FHWA), Washington, DC,

  3. [3]

    doi:https://doi.org/10.17226/895. C.P. Andriotis and K. G. Papakonstantinou. Managing engineering systems with large state and action spaces through deep reinforcement learning.Reliability Engineering & System Safety, 191:106483,

  4. [4]

    doi:10.1016/j.ress.2019.04.036

    ISSN 09518320. doi:10.1016/j.ress.2019.04.036. C.P. Andriotis and K.G. Papakonstantinou. Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints.Reliability Engineering & System Safety, 212:107551,

  5. [5]

    doi:10.1016/j.ress.2021.107551

    ISSN 09518320. doi:10.1016/j.ress.2021.107551. David Y. Yang. Adaptive Risk-Based Life-Cycle Management for Large-Scale Structures Using Deep Reinforcement Learning and Surrogate Modeling.Journal of Engineering Mechanics, 148(1):04021126, 2022a. ISSN 0733-9399. doi:10.1061/(ASCE)EM.1943-7889.0002028. Minghui Cheng and Dan M. Frangopol. A Decision-Making F...

  6. [6]

    doi:10.1061/(ASCE)CP.1943-5487.0000991

    ISSN 0887-3801. doi:10.1061/(ASCE)CP.1943-5487.0000991. AshmitaBhattacharya,MohammadSaifullah,andKonstantinosG.Papakonstantinou. District-levelbridgenetworksman- agement with multi-agent reinforcement learning: from theory to real-world application.Structure and Infrastructure Engineering, 21(11-12):2064–2082,

  7. [7]

    2025 , issn =

    ISSN 1573-2479, 1744-8980. doi:10.1080/15732479.2025.2559150. Alireza Ghavidel, Ao Du, and Sabarethinam Kameshwar. Risk-based Multi-threat Decision-support Methodology for Long-term Bridge Asset Management: Volume 1 AI-Based Bridge-level Decision Support. Technical Report dot 80943, The University of Texas at San Antonio, San Antonio, TX,

  8. [8]

    doi:10.1007/s00521-024- 10437-2. M. Z. Naser.Machine learning for civil & environmental engineers: a practical approach to data-driven analysis, explainability, and causality. John Wiley & Sons, Hoboken, NJ,

  9. [9]

    doi:10.31887/DCNS.2012.14.1/jmarewski

    ISSN 1958-5969. doi:10.31887/DCNS.2012.14.1/jmarewski. Nathaniel D. Phillips, Hansjörg Neth, Jan K. Woike, and Wolfgang Gaissmaier. FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees.Judgment and Decision Making, 12(4):344–368,

  10. [10]

    doi:10.1017/S1930297500006239

    ISSN 1930-2975. doi:10.1017/S1930297500006239. Mohsen Zaker Esteghamati and Madeleine M. Flint. Developing data-driven surrogate models for holistic performance- based assessment of mid-rise RC frame buildings at early design.Engineering Structures, 245:112971,

  11. [11]

    doi:10.1016/j.engstruct.2021.112971

    ISSN 01410296. doi:10.1016/j.engstruct.2021.112971. David Y. Yang. Deep Reinforcement Learning–Enabled Bridge Management Considering Asset and Network Risks.Journal of Infrastructure Systems, 28(3):04022023, 2022b. ISSN 1076-0342. doi:10.1061/(ASCE)IS.1943- 555X.0000704. Trevor Hastie, Robert Tibshirani, and Jerome Friedman.The Elements of Statistical Lea...

  12. [12]

    doi:10.4310/SII.2009.v2.n3.a8

    ISSN 19387989, 19387997. doi:10.4310/SII.2009.v2.n3.a8. S. K. Murthy, S. Kasif, and S. Salzberg. A System for Induction of Oblique Decision Trees.Journal of Artificial Intelligence Research, 2:1–32,

  13. [13]

    doi:10.1613/jair.63

    ISSN 1076-9757. doi:10.1613/jair.63. Paul D. Thompson, Edgar P. Small, Michael Johnson, and Allen R. Marshall. The Pontis bridge management system. Structural Engineering International, 8(4):303–308,

  14. [14]

    doi:10.2749/101686698780488758

    ISSN 10168664. doi:10.2749/101686698780488758. AASHTO.AASHTO Guide for Commonly Recognized (CoRe) Structural Elements. American Association of State Highway and Transportation Officials (AASHTO), Washington, DC,

  15. [15]

    doi:10.1061/AJRUA6.0001127

    ISSN 2376-7642. doi:10.1061/AJRUA6.0001127. FHWA.FHWA InfoBridge: Data. Federal Highway Administration (FHWA), Washington, DC,

  16. [16]

    Accessed: 2024-12-26

    URL https://infobridge.fhwa.dot.gov/Data. Accessed: 2024-12-26. Samuel Kotz, N. Balakrishnan, and Norman L. Johnson. Chapter 49: Dirichlet and Inverted Dirichlet Distributions. In Continuous Multivariate Distributions, pages 485–528. John Wiley & Sons, New York, NY,

  17. [17]

    doi:10.1016/j.strusafe.2019.101911

    ISSN 01674730. doi:10.1016/j.strusafe.2019.101911. Ahmed Fawzy Gad. PyGAD: an intuitive genetic algorithm Python library.Multimedia Tools and Applications, 83(20): 58029–58042,

  18. [18]

    doi:10.1007/s11042-023-17167-y

    ISSN 1573-7721. doi:10.1007/s11042-023-17167-y. 21