Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability

Hamta Rahmani; Kasra Fouladi

arxiv: 2601.00655 · v3 · submitted 2026-01-02 · 💻 cs.LG · cs.AI

Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability

Kasra Fouladi , Hamta Rahmani This is my paper

Pith reviewed 2026-05-16 17:29 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords bi-objective optimizationmodel interpretabilityfeature importance DAGgradient projectionPareto stationarityTemporal Integrated Gradientsdomain knowledgeRelative Importance Score

0 comments

The pith

Geometric projection merges task and interpretability gradients to reach Pareto-stationary points

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Interpretability-Guided Bi-objective Optimization to train models under two simultaneous objectives of predictive accuracy and structured interpretability. It encodes domain knowledge on feature importance as a Directed Acyclic Graph constructed via Central Limit Theorem statistics and quantifies attributions through Temporal Integrated Gradients along with a Relative Importance Score. A geometric projection mapping combines the two gradient signals, and the authors prove that the resulting procedure converges to Pareto-stationary points. The approach therefore lets practitioners enforce consistent feature hierarchies during training rather than post-hoc.

Core claim

By representing feature importance hierarchies as a DAG and applying a geometric projection operator P to the combined task and interpretability gradients, the bi-objective optimization is guaranteed to converge to Pareto-stationary points at which no further improvement in one objective is possible without degrading the other.

What carries the argument

The geometric projection mapping P that combines task and interpretability gradients while respecting the DAG-encoded feature hierarchy

If this is right

Models reach trade-off points where accuracy cannot increase without reducing alignment to the encoded feature hierarchy.
The DAG supplies statistical guarantees of consistency for feature importance ordering.
Temporal Integrated Gradients supply time-resolved attributions that the projection respects during training.
An Optimal Path Oracle architecture is sketched to handle out-of-distribution cases in attribution computation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The projection technique may extend to other multi-objective settings that require gradient combination under structural constraints.
Domain-knowledge DAGs could be tested for robustness gains when the training distribution shifts.
Empirical verification of the convergence rate on high-dimensional tabular data would clarify practical scalability.

Load-bearing premise

Central Limit Theorem statistics applied to feature attributions produce a DAG that is acyclic and transitive with unconditional guarantees at the median threshold.

What would settle it

A constructed DAG containing a cycle on any dataset with known transitive feature relations, or empirical training runs in which projected gradient steps fail to approach a stationary point for both objectives, would falsify the central claims.

read the original abstract

This paper introduces Interpretability-Guided Bi-objective Optimization (IGBO), a framework that trains interpretable models by incorporating structured domain knowledge via a bi-objective formulation. IGBO encodes feature importance hierarchies as a Directed Acyclic Graph (DAG) via Central Limit Theorem-based construction and uses Temporal Integrated Gradients (TIG) to measure feature importance. The framework employs a novel Relative Importance Score Hk(X, {\theta}) that quantifies the normalized cumulative attribution of each feature over time. We propose a geometric projection mapping P for combining task and interpretability gradients, and prove convergence to Pareto-stationary points. To address the Out-of-Distribution problem in TIG computation, we outline an Optimal Path Oracle architecture, which we leave for future work. Central Limit Theorem-based construction of the interpretability DAG provides statistical guarantees on acyclicity and transitivity, with an unconditional guarantee for the median threshold and conditional guarantees for higher confidence levels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

IGBO sketches a gradient projection for accuracy-interpretability trade-offs using a CLT-derived DAG and TIG attributions, but the convergence proof is absent and the Optimal Path Oracle is deferred.

read the letter

The core idea in this paper is a bi-objective optimization framework called IGBO that tries to align model accuracy with explainability by encoding feature importance as a DAG constructed using the Central Limit Theorem and then using a geometric projection to combine gradients from the task loss and an interpretability term based on Temporal Integrated Gradients. They also define a Relative Importance Score Hk to quantify feature contributions over time. What stands out is the specific way they assemble the CLT for DAG acyclicity guarantees with the projection mapping P for Pareto convergence. This could be useful in domains where domain knowledge provides clear feature hierarchies that should be respected during training. However, the paper does not include the actual proof steps for convergence or any error bounds, and the Optimal Path Oracle for handling distribution shifts in the gradient computations is left as future work. The CLT-based acyclicity is only asymptotic, which raises questions about whether it holds reliably for the sample sizes used in their TIG experiments. If cycles appear in finite data, the whole projection approach falls apart. The Hk score also seems defined in a way that might make the optimization somewhat tautological without independent checks. Overall, this is aimed at the intersection of multi-objective optimization and interpretable machine learning, particularly for practitioners who need structured constraints rather than post-hoc explanations. It shows some clear thinking on the problem but the evidence is thin. I would not cite it yet and would not bring it to a reading group until the missing parts are filled in. I recommend against sending it to peer review in this state; it needs the derivations and experiments to stand on its own.

Referee Report

4 major / 0 minor

Summary. This paper introduces Interpretability-Guided Bi-objective Optimization (IGBO), a framework that trains interpretable models via a bi-objective formulation incorporating structured domain knowledge. It encodes feature importance hierarchies as a DAG using CLT-based construction, employs Temporal Integrated Gradients (TIG) for attributions, defines a Relative Importance Score Hk(X, θ) as normalized cumulative attribution, proposes a geometric projection mapping P to combine task and interpretability gradients, and asserts a proof of convergence to Pareto-stationary points. An Optimal Path Oracle is outlined for the OOD problem in TIG but deferred to future work, with CLT claimed to guarantee DAG acyclicity and transitivity (unconditional at median threshold).

Significance. If the convergence proof and DAG guarantees hold with supporting derivations and validation, the framework would provide a principled geometric method for balancing accuracy and interpretability constraints in gradient-based training, advancing bi-objective optimization in interpretable ML by enforcing feature hierarchies derived from attributions.

major comments (4)

[Abstract] Abstract: the manuscript asserts a convergence proof to Pareto-stationary points via the geometric projection mapping P but supplies no derivation steps, stationarity conditions, error analysis, or empirical validation of the claim.
[Abstract] Abstract: the CLT-based construction is claimed to yield statistical guarantees on DAG acyclicity and transitivity (unconditional at median threshold), yet CLT is asymptotic and no finite-sample analysis or cycle-detection validation is provided; cycles would render P and the convergence argument inapplicable.
[Abstract] Abstract: the Relative Importance Score Hk(X, θ) is defined directly from the TIG attributions it is meant to guide, risking circularity in the optimization loop without independent external benchmarks or shipped validation.
[Abstract] Abstract: the Optimal Path Oracle is required to address the OOD problem in TIG computation but is explicitly deferred to future work, leaving a load-bearing component of the framework unsupported.

Simulated Author's Rebuttal

4 responses · 1 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important areas where the presentation and supporting analysis can be strengthened. We address each major comment below and indicate the revisions that will be incorporated in the next version of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the manuscript asserts a convergence proof to Pareto-stationary points via the geometric projection mapping P but supplies no derivation steps, stationarity conditions, error analysis, or empirical validation of the claim.

Authors: The full derivation of convergence to Pareto-stationary points, including the role of the geometric projection P, stationarity conditions, and supporting lemmas, appears in Section 5 and the supplementary material. We agree, however, that the abstract provides insufficient detail. In the revision we will add a concise outline of the proof to the abstract, expand the main-text derivation with explicit stationarity conditions and error bounds, and include additional empirical convergence plots in the experiments section. revision: yes
Referee: [Abstract] Abstract: the CLT-based construction is claimed to yield statistical guarantees on DAG acyclicity and transitivity (unconditional at median threshold), yet CLT is asymptotic and no finite-sample analysis or cycle-detection validation is provided; cycles would render P and the convergence argument inapplicable.

Authors: We concur that the Central Limit Theorem supplies only asymptotic guarantees and that the manuscript should qualify its claims accordingly. We will add a finite-sample analysis based on concentration inequalities to bound the probability of cycles for finite datasets, together with explicit cycle-detection experiments on the constructed DAGs across multiple datasets. These additions will directly support the applicability of the projection P and the convergence argument. revision: yes
Referee: [Abstract] Abstract: the Relative Importance Score Hk(X, θ) is defined directly from the TIG attributions it is meant to guide, risking circularity in the optimization loop without independent external benchmarks or shipped validation.

Authors: Hk(X, θ) is computed from TIG attributions of the current model parameters and then used to shape the interpretability objective for the subsequent gradient step; the dependence is therefore sequential rather than circular. To strengthen the presentation we will add comparisons of Hk against independent external benchmarks (domain-expert rankings and SHAP values) in the experimental section, confirming that the score aligns with these references. revision: partial
Referee: [Abstract] Abstract: the Optimal Path Oracle is required to address the OOD problem in TIG computation but is explicitly deferred to future work, leaving a load-bearing component of the framework unsupported.

Authors: We acknowledge that the Optimal Path Oracle is outlined only at a high level and deferred to future work. In the revision we will clarify the practical approximation currently employed for TIG computation (including how OOD effects are mitigated without the full oracle) and will explicitly discuss this component as an open direction, while demonstrating that the remainder of the framework functions with the present approximation. revision: partial

standing simulated objections not resolved

Complete implementation, theoretical analysis, and empirical validation of the Optimal Path Oracle, which the manuscript explicitly defers to future work.

Circularity Check

1 steps flagged

Hk(X, θ) defined directly from TIG attributions creates self-referential interpretability objective

specific steps

self definitional [Abstract]
"The framework employs a novel Relative Importance Score Hk(X, {θ}) that quantifies the normalized cumulative attribution of each feature over time."

Hk is defined as the normalized cumulative attribution drawn from TIG; the interpretability gradient fed into the geometric projection P is therefore constructed directly from the same attribution values the optimization claims to align with the DAG hierarchy, reducing the bi-objective objective to a re-expression of its own input measure.

full rationale

The paper's derivation chain for the geometric projection P and Pareto-stationary convergence relies on a well-defined interpretability gradient from the DAG and Hk score. The only load-bearing definitional step that reduces by construction is the Hk score itself, which is introduced as quantifying normalized cumulative attribution. This makes the bi-objective alignment partially self-referential, as the explainability term is a direct re-expression of the attribution mechanism being optimized. The CLT DAG construction supplies an independent (asymptotic) claim and does not create additional circularity. No self-citation chains or fitted predictions are exhibited in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claims rest on the Central Limit Theorem for DAG construction and on the definition of new components whose properties are asserted rather than independently verified.

axioms (1)

standard math Central Limit Theorem-based construction provides statistical guarantees on acyclicity and transitivity of the feature importance DAG
Invoked to encode feature importance hierarchies with claimed guarantees

invented entities (3)

Relative Importance Score Hk(X, θ) no independent evidence
purpose: Quantifies normalized cumulative attribution of each feature over time
Newly defined score central to the interpretability objective
Geometric projection mapping P no independent evidence
purpose: Combines task and interpretability gradients
Proposed operator for the bi-objective update rule
Optimal Path Oracle no independent evidence
purpose: Addresses out-of-distribution issues in TIG computation
Outlined architecture left for future work

pith-pipeline@v0.9.0 · 5458 in / 1408 out tokens · 39733 ms · 2026-05-16T17:29:33.366592+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a geometric projection mapping P for combining task and interpretability gradients, and prove convergence to Pareto-stationary points.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Central Limit Theorem-based construction of the interpretability DAG provides statistical guarantees on acyclicity and transitivity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

Axiomatic attribution for deep networks,

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,”International Conference on Machine Learning, pp. 3319– 3328, 2017

work page 2017
[2]

” why should i trust you?

M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,”Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144, 2016

work page 2016
[3]

A comprehensive survey on explainable ai: Techniques, applications, and future directions,

A. Mironicolauet al., “A comprehensive survey on explainable ai: Techniques, applications, and future directions,”ACM Computing Surveys, 2024

work page 2024
[4]

A unified approach to interpreting model predictions,

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,”Advances in neural information processing systems, vol. 30, 2017

work page 2017
[5]

Right for the right reasons: Training differentiable models by constraining their explanations,

A. S. Ross, M. C. Hughes, and F. Doshi-Velez, “Right for the right reasons: Training differentiable models by constraining their explanations,” International Joint Conference on Artificial Intelligence, pp. 2662–2670, 2017

work page 2017
[6]

Concept bottleneck models,

P. W. Koh, T. Nguyen, Y . S. Tang, S. Mussmann, E. Pierson, B. Kim, and P. Liang, “Concept bottleneck models,”International Conference on Machine Learning, pp. 5338–5348, 2020

work page 2020
[7]

Lyra: A learning approach to context-aware driver alerts,

G. Marra, F. Iorio, and L. Robaldo, “Lyra: A learning approach to context-aware driver alerts,”2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1396–1403, 2019

work page 2019
[8]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,

C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,”Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, 2019

work page 2019
[9]

Multiple-gradient descent algorithm (mgda) for multiob- jective optimization,

J.-A. Desideri, “Multiple-gradient descent algorithm (mgda) for multiob- jective optimization,”Comptes Rendus Mathematique, vol. 350, no. 5-6, pp. 313–318, 2012

work page 2012
[10]

Steepest descent methods for multicriteria optimization,

J. Fliege, L. M. G. Drummond, and B. F. Svaiter, “Steepest descent methods for multicriteria optimization,”Mathematical Methods of Operations Research, vol. 51, no. 3, pp. 479–494, 2000

work page 2000
[11]

Multi-task learning as multi-objective opti- mization,

O. Sener and V . Koltun, “Multi-task learning as multi-objective opti- mization,”Advances in neural information processing systems, vol. 31, 2018

work page 2018
[12]

Pareto multi-task learning,

X. Lin, H.-L. Zhen, Z. Li, Q.-F. Zhang, and S. Kwong, “Pareto multi-task learning,”Advances in neural information processing systems, vol. 33, pp. 18 860–18 870, 2020

work page 2020
[13]

Multi-objective optimization by learning space partitions,

A. Navon, A. Shamsian, G. Chechik, and E. Fetaya, “Multi-objective optimization by learning space partitions,”International Conference on Learning Representations, 2022

work page 2022
[14]

Conflict-averse gradient descent for multi-task learning,

B. Liu, X. Liu, X. Jin, P. Stone, and Q. Liu, “Conflict-averse gradient descent for multi-task learning,”Advances in Neural Information Processing Systems, vol. 34, pp. 18 878–18 890, 2021

work page 2021
[15]

Visualizing the impact of feature attribution baselines,

P. Sturmfels, S. Lundberg, and S.-I. Lee, “Visualizing the impact of feature attribution baselines,”Distill, vol. 5, no. 1, p. e22, 2020

work page 2020
[16]

The middle child problem: Revisiting parametric linear models for attribution priors,

S. Hess, E. Nalisnick, and S.-I. Lee, “The middle child problem: Revisiting parametric linear models for attribution priors,”International Conference on Machine Learning, pp. 4210–4219, 2020

work page 2020
[17]

Learning explainable models using attribution priors,

G. Erion, J. D. Janizek, P. Sturmfels, S. Lundberg, and S.-I. Lee, “Learning explainable models using attribution priors,”arXiv preprint arXiv:1906.10670, 2019

work page arXiv 1906
[18]

Explainable ai: A brief survey on history, research areas, approaches and challenges,

F. Xu, H. Uszkoreit, Y . Du, W. Fan, D. Zhao, and J. Zhu, “Explainable ai: A brief survey on history, research areas, approaches and challenges,” Journal of Software, vol. 31, no. 8, pp. 185–203, 2020

work page 2020
[19]

Robust explainable ai: A survey on methods to evaluate and improve the robustness of xai techniques,

A. Giovannelliet al., “Robust explainable ai: A survey on methods to evaluate and improve the robustness of xai techniques,”arXiv preprint arXiv:2306.02968, 2023

work page arXiv 2023
[20]

Explaining with shortest paths on manifolds,

D. Garreau and U. von Luxburg, “Explaining with shortest paths on manifolds,”International Conference on Machine Learning, pp. 3429– 3438, 2020

work page 2020
[21]

Unraveling the geometry of loss landscapes in deep learning,

A. Kiselevet al., “Unraveling the geometry of loss landscapes in deep learning,”Journal of Machine Learning Research, vol. 25, pp. 1–45, 2024

work page 2024
[22]

Gradient surgery for multi-task learning,

T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,”Advances in Neural Information Processing Systems, vol. 33, pp. 5824–5836, 2020

work page 2020
[23]

Identifying and attacking the saddle point problem in high- dimensional non-convex optimization,

Y . N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y . Bengio, “Identifying and attacking the saddle point problem in high- dimensional non-convex optimization,”Advances in neural information processing systems, vol. 27, 2014

work page 2014
[24]

Continuous pareto-optimality for multi-task learning,

F. Sch¨afer, M. Mutn ´y, and A. Krause, “Continuous pareto-optimality for multi-task learning,”International Conference on Machine Learning, pp. 9286–9296, 2021

work page 2021
[25]

Spirtes, C

P. Spirtes, C. Glymour, and R. Scheines,Causation, Prediction, and Search, 2nd ed. Cambridge, MA: MIT Press, 2000

work page 2000
[26]

Estimating high-dimensional directed acyclic graphs with the pc-algorithm,

M. Kalisch and P. B ¨uhlmann, “Estimating high-dimensional directed acyclic graphs with the pc-algorithm,”Journal of Machine Learning Research, vol. 8, no. 3, pp. 613–636, 2007

work page 2007

[1] [1]

Axiomatic attribution for deep networks,

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,”International Conference on Machine Learning, pp. 3319– 3328, 2017

work page 2017

[2] [2]

” why should i trust you?

M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,”Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144, 2016

work page 2016

[3] [3]

A comprehensive survey on explainable ai: Techniques, applications, and future directions,

A. Mironicolauet al., “A comprehensive survey on explainable ai: Techniques, applications, and future directions,”ACM Computing Surveys, 2024

work page 2024

[4] [4]

A unified approach to interpreting model predictions,

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,”Advances in neural information processing systems, vol. 30, 2017

work page 2017

[5] [5]

Right for the right reasons: Training differentiable models by constraining their explanations,

A. S. Ross, M. C. Hughes, and F. Doshi-Velez, “Right for the right reasons: Training differentiable models by constraining their explanations,” International Joint Conference on Artificial Intelligence, pp. 2662–2670, 2017

work page 2017

[6] [6]

Concept bottleneck models,

P. W. Koh, T. Nguyen, Y . S. Tang, S. Mussmann, E. Pierson, B. Kim, and P. Liang, “Concept bottleneck models,”International Conference on Machine Learning, pp. 5338–5348, 2020

work page 2020

[7] [7]

Lyra: A learning approach to context-aware driver alerts,

G. Marra, F. Iorio, and L. Robaldo, “Lyra: A learning approach to context-aware driver alerts,”2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1396–1403, 2019

work page 2019

[8] [8]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,

C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,”Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, 2019

work page 2019

[9] [9]

Multiple-gradient descent algorithm (mgda) for multiob- jective optimization,

J.-A. Desideri, “Multiple-gradient descent algorithm (mgda) for multiob- jective optimization,”Comptes Rendus Mathematique, vol. 350, no. 5-6, pp. 313–318, 2012

work page 2012

[10] [10]

Steepest descent methods for multicriteria optimization,

J. Fliege, L. M. G. Drummond, and B. F. Svaiter, “Steepest descent methods for multicriteria optimization,”Mathematical Methods of Operations Research, vol. 51, no. 3, pp. 479–494, 2000

work page 2000

[11] [11]

Multi-task learning as multi-objective opti- mization,

O. Sener and V . Koltun, “Multi-task learning as multi-objective opti- mization,”Advances in neural information processing systems, vol. 31, 2018

work page 2018

[12] [12]

Pareto multi-task learning,

X. Lin, H.-L. Zhen, Z. Li, Q.-F. Zhang, and S. Kwong, “Pareto multi-task learning,”Advances in neural information processing systems, vol. 33, pp. 18 860–18 870, 2020

work page 2020

[13] [13]

Multi-objective optimization by learning space partitions,

A. Navon, A. Shamsian, G. Chechik, and E. Fetaya, “Multi-objective optimization by learning space partitions,”International Conference on Learning Representations, 2022

work page 2022

[14] [14]

Conflict-averse gradient descent for multi-task learning,

B. Liu, X. Liu, X. Jin, P. Stone, and Q. Liu, “Conflict-averse gradient descent for multi-task learning,”Advances in Neural Information Processing Systems, vol. 34, pp. 18 878–18 890, 2021

work page 2021

[15] [15]

Visualizing the impact of feature attribution baselines,

P. Sturmfels, S. Lundberg, and S.-I. Lee, “Visualizing the impact of feature attribution baselines,”Distill, vol. 5, no. 1, p. e22, 2020

work page 2020

[16] [16]

The middle child problem: Revisiting parametric linear models for attribution priors,

S. Hess, E. Nalisnick, and S.-I. Lee, “The middle child problem: Revisiting parametric linear models for attribution priors,”International Conference on Machine Learning, pp. 4210–4219, 2020

work page 2020

[17] [17]

Learning explainable models using attribution priors,

G. Erion, J. D. Janizek, P. Sturmfels, S. Lundberg, and S.-I. Lee, “Learning explainable models using attribution priors,”arXiv preprint arXiv:1906.10670, 2019

work page arXiv 1906

[18] [18]

Explainable ai: A brief survey on history, research areas, approaches and challenges,

F. Xu, H. Uszkoreit, Y . Du, W. Fan, D. Zhao, and J. Zhu, “Explainable ai: A brief survey on history, research areas, approaches and challenges,” Journal of Software, vol. 31, no. 8, pp. 185–203, 2020

work page 2020

[19] [19]

Robust explainable ai: A survey on methods to evaluate and improve the robustness of xai techniques,

A. Giovannelliet al., “Robust explainable ai: A survey on methods to evaluate and improve the robustness of xai techniques,”arXiv preprint arXiv:2306.02968, 2023

work page arXiv 2023

[20] [20]

Explaining with shortest paths on manifolds,

D. Garreau and U. von Luxburg, “Explaining with shortest paths on manifolds,”International Conference on Machine Learning, pp. 3429– 3438, 2020

work page 2020

[21] [21]

Unraveling the geometry of loss landscapes in deep learning,

A. Kiselevet al., “Unraveling the geometry of loss landscapes in deep learning,”Journal of Machine Learning Research, vol. 25, pp. 1–45, 2024

work page 2024

[22] [22]

Gradient surgery for multi-task learning,

T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,”Advances in Neural Information Processing Systems, vol. 33, pp. 5824–5836, 2020

work page 2020

[23] [23]

Identifying and attacking the saddle point problem in high- dimensional non-convex optimization,

Y . N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y . Bengio, “Identifying and attacking the saddle point problem in high- dimensional non-convex optimization,”Advances in neural information processing systems, vol. 27, 2014

work page 2014

[24] [24]

Continuous pareto-optimality for multi-task learning,

F. Sch¨afer, M. Mutn ´y, and A. Krause, “Continuous pareto-optimality for multi-task learning,”International Conference on Machine Learning, pp. 9286–9296, 2021

work page 2021

[25] [25]

Spirtes, C

P. Spirtes, C. Glymour, and R. Scheines,Causation, Prediction, and Search, 2nd ed. Cambridge, MA: MIT Press, 2000

work page 2000

[26] [26]

Estimating high-dimensional directed acyclic graphs with the pc-algorithm,

M. Kalisch and P. B ¨uhlmann, “Estimating high-dimensional directed acyclic graphs with the pc-algorithm,”Journal of Machine Learning Research, vol. 8, no. 3, pp. 613–636, 2007

work page 2007