Concolic Testing on Individual Fairness of Neural Network Models

Chih-Duo Hong; Fang Yu; Ming-I Huang

arxiv: 2509.06864 · v2 · submitted 2025-09-08 · 💻 cs.LG · cs.SE

Concolic Testing on Individual Fairness of Neural Network Models

Ming-I Huang , Chih-Duo Hong , Fang Yu This is my paper

Pith reviewed 2026-05-18 17:55 UTC · model grok-4.3

classification 💻 cs.LG cs.SE

keywords individual fairnessconcolic testingdeep neural networksfairness verificationbias detectionformal methodsPyFair

0 comments

The pith

PyFair adapts concolic testing with a dual network architecture to verify individual fairness in deep neural networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PyFair as a formal framework for evaluating and verifying individual fairness of deep neural networks by adapting the concolic testing tool PyCT. It generates fairness-specific path constraints to systematically explore DNN behaviors relevant to individual fairness. The central innovation is a dual network architecture that enables comprehensive fairness assessments and provides completeness guarantees for certain network types. Evaluation on 25 benchmark models shows it can detect discriminatory instances and verify fairness, though scalability remains a challenge for complex models. A reader would care because it offers a rigorous method for testing fairness in pre-trained models used in high-stakes decisions.

Core claim

By adapting PyCT to produce fairness-specific path constraints and introducing a dual network architecture, PyFair can comprehensively assess and verify individual fairness in DNNs, delivering completeness guarantees for specific network types while detecting discriminatory behaviors in practice.

What carries the argument

The dual network architecture that supports the generation of fairness path constraints and enables both detection of unfairness and verification with completeness guarantees.

If this is right

PyFair can detect discriminatory instances in a variety of pre-trained DNNs.
It can verify fairness properties with completeness guarantees for certain types of networks.
The framework works on models that have already undergone bias mitigation techniques.
Scalability challenges appear when applying the method to more complex models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the dual architecture to other verification tasks like adversarial robustness could broaden its utility.
Embedding PyFair-like checks into model training loops might prevent fairness issues before deployment.
Testing on real-world datasets from domains like credit scoring would reveal practical effectiveness beyond benchmarks.

Load-bearing premise

That adapting PyCT to generate fairness-specific path constraints will systematically explore all DNN behaviors relevant to individual fairness without missing critical discriminatory paths.

What would settle it

A concrete counterexample would be a DNN that PyFair verifies as fair but that actually produces different outputs for two similar individuals, or a failure to detect known discrimination in one of the benchmark models.

Figures

Figures reproduced from arXiv: 2509.06864 by Chih-Duo Hong, Fang Yu, Ming-I Huang.

**Figure 2.** Figure 2: The 2-DNN obtained from the DNN in Figure [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: An overview of the PyFair framework conditions. In this example, the solver identifies a solution x = 6 for the perturbed input, leading to a new test case [6, 5]. Feeding this case into the model changes its output from 0 to 1. Consequently, φ is a discriminatory instance and we conclude that the DNN is unfair. 4. Fairness Verification with PyFair Model fairness checking aims to exhaustively explore the i… view at source ↗

**Figure 4.** Figure 4: Tree T (top left), Tree T ′ (top right), and Queue Q (below) The 2 nd iteration: We give the constraints dequeued from Q to the SMT solver, which provides a solution x = 1, y = 0, x′ = 2 as a new test input. We repeat this procedure to gather more branches, as illustrated by Tree T ′ in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Tree T (left) and Tree T ′ (right) explored by this third test input is depicted by Tree T ′ in [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

This paper introduces PyFair, a formal framework for evaluating and verifying individual fairness of Deep Neural Networks (DNNs). By adapting the concolic testing tool PyCT, we generate fairness-specific path constraints to systematically explore DNN behaviors. Our key innovation is a dual network architecture that enables comprehensive fairness assessments and provides completeness guarantees for certain network types. We evaluate PyFair on 25 benchmark models, including those enhanced by existing bias mitigation techniques. Results demonstrate PyFair's efficacy in detecting discriminatory instances and verifying fairness, while also revealing scalability challenges for complex models. This work advances algorithmic fairness in critical domains by offering a rigorous, systematic method for fairness testing and verification of pre-trained DNNs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PyFair adapts concolic testing to individual fairness via a dual-network setup, but completeness claims rest on shaky coverage assumptions for non-linear DNNs.

read the letter

The main thing here is that the authors took the existing PyCT concolic tester and reworked it for checking individual fairness in neural nets. The dual-network architecture is the central new element; it lets them generate fairness-specific path constraints and claim completeness guarantees for certain network types. They ran the thing on 25 benchmark models, including some that already had bias mitigation applied, and report that it can surface discriminatory inputs while also exposing scalability limits on bigger models. That last part is a useful honest note. The systematic path exploration from concolic testing is a sensible direction because it can target specific behaviors more deliberately than random sampling or purely black-box checks. Applying it to pre-trained models is also practical for real auditing scenarios. The soft spot is the completeness story. DNNs have non-linear activations and continuous input spaces, while the original PyCT works on discrete Python programs. Any adaptation has to introduce approximations for floating-point arithmetic and ReLU-style functions. Nothing in the description shows that these approximations actually preserve the required coverage over all regions where fairness violations could occur. The stress-test concern lands: the guarantees look more assumed than demonstrated. The abstract also skips concrete numbers on detection rates, false positives, or runtimes across model sizes, so the efficacy claims stay high-level. This is aimed at people who build or audit fairness tools for deployed neural nets and want something more structured than post-training fixes. A reader interested in verification techniques for ML could extract some concrete ideas from the constraint generation and dual setup. I would send it to peer review. The adaptation is concrete enough to merit expert feedback even if the theoretical parts need more work.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PyFair, a formal framework adapting the concolic testing tool PyCT to generate fairness-specific path constraints for evaluating and verifying individual fairness of DNNs. The central innovation is a dual-network architecture claimed to enable comprehensive assessments and to provide completeness guarantees for certain network types. The approach is evaluated on 25 benchmark models (including bias-mitigated variants), with results asserted to demonstrate detection of discriminatory instances and verification of fairness, while noting scalability challenges for complex models.

Significance. If the completeness guarantees and systematic path exploration hold without critical coverage gaps, the work would provide a rigorous, formal method for fairness testing of pre-trained DNNs that goes beyond empirical or heuristic approaches, with potential impact in high-stakes domains.

major comments (2)

[Abstract / dual-network section] Abstract and § on dual-network architecture: the claim of completeness guarantees for certain network types is load-bearing for the central contribution, yet the adaptation of PyCT (originally for discrete Python programs) to continuous DNN inputs with non-linear activations introduces approximations for floating-point and ReLU-like operations whose coverage properties are not shown to be preserved.
[Evaluation section] Evaluation section: the abstract asserts efficacy on 25 models and detection of discriminatory instances, but supplies no quantitative results, error rates, coverage metrics, or comparison baselines, which is required to substantiate the performance claims.

minor comments (2)

Clarify the precise definition of individual fairness used and how the dual-network construction encodes it as path constraints.
Specify which network types receive completeness guarantees and under what architectural restrictions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below, clarifying our approach to the completeness guarantees and committing to strengthen the quantitative presentation of results. We believe these revisions will address the concerns while preserving the core contributions of PyFair.

read point-by-point responses

Referee: [Abstract / dual-network section] Abstract and § on dual-network architecture: the claim of completeness guarantees for certain network types is load-bearing for the central contribution, yet the adaptation of PyCT (originally for discrete Python programs) to continuous DNN inputs with non-linear activations introduces approximations for floating-point and ReLU-like operations whose coverage properties are not shown to be preserved.

Authors: We agree that the completeness guarantees are central and that the manuscript must explicitly demonstrate preservation of coverage properties under the necessary approximations. The dual-network architecture maps continuous DNN inputs to symbolic constraints by treating ReLU activations as piecewise-linear and employing conservative over-approximations for floating-point operations that are designed not to miss discriminatory paths for the targeted network classes (e.g., those with bounded inputs and ReLU or linear activations). However, we acknowledge that a self-contained argument showing these approximations preserve the original PyCT coverage guarantees is not fully elaborated. In the revised version we will add a dedicated subsection in the dual-network architecture description that provides the conditions under which completeness holds and sketches the argument that the approximations remain sound for individual fairness verification. revision: yes
Referee: [Evaluation section] Evaluation section: the abstract asserts efficacy on 25 models and detection of discriminatory instances, but supplies no quantitative results, error rates, coverage metrics, or comparison baselines, which is required to substantiate the performance claims.

Authors: We accept that the abstract and evaluation section require more concrete quantitative support. The manuscript already contains experimental results across the 25 benchmark models (including bias-mitigated variants), but we agree that key metrics—such as path coverage percentages, number and rate of detected discriminatory instances, any observed false-positive rates, and direct comparisons against baseline fairness-testing approaches—are not sufficiently summarized or tabulated. We will revise the abstract to include a concise summary of these quantitative outcomes and expand the evaluation section with additional tables and discussion that report coverage metrics, detection statistics, and baseline comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: framework adaptation and dual-network claims are independent of inputs

full rationale

The paper presents PyFair as an adaptation of PyCT to generate fairness path constraints plus a dual-network architecture for assessments and completeness guarantees on certain DNN types. No equations, fitted parameters, or self-definitional reductions appear in the abstract or description; the completeness claim is stated as a property of the new architecture for restricted network classes rather than derived by renaming or fitting prior results. The derivation chain therefore remains self-contained against external benchmarks and does not reduce any prediction to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, background axioms, or newly postulated entities; the dual network is described only at the architectural level without further decomposition.

pith-pipeline@v0.9.0 · 5641 in / 1025 out tokens · 44304 ms · 2026-05-18T17:55:54.274544+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our key innovation is a dual network architecture that enables comprehensive fairness assessments and provides completeness guarantees for certain network types.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Thanks to our employment of SMT solvers, this exploration is complete when the neural network can be faithfully encoded in an SMT theory.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 1 internal anchor

[1]

Fair preprocessing: Towards understanding compositional fairness of data trans- formers,

S. Biswas and H. Rajan, “Fair preprocessing: Towards understanding compositional fairness of data trans- formers,” inProc. 29th ACM ESEC/FSE, 2021, pp. 981–993

work page 2021
[2]

Fairea: Model behavior mutation for bias mitig

M. Hort, J. M. Zhang, F. Sarro, and M. Harman, “Fairea: Model behavior mutation for bias mitig.” inProc. 29th ACM ESEC/FSE, 2021, pp. 994–1006

work page 2021
[3]

Ignorance and prejudice in sw fairness,

J. M. Zhang and M. Harman, “Ignorance and prejudice in sw fairness,” inProc. IEEE/ACM 43rd Int. Conf. Softw. Eng., 2021, pp. 1436–1447

work page 2021
[4]

False positives, negatives, and analyses: A rejoinder to machine bias,

A. W. Flores, K. Bechtel, and C. T. Lowenkamp, “False positives, negatives, and analyses: A rejoinder to machine bias,”Fed. Probation, Vol. 80, 2016, p. 38

work page 2016
[5]

Dastin,Amazon scraps secret AI recruiting tool that showed bias against women

J. Dastin,Amazon scraps secret AI recruiting tool that showed bias against women. Auerbach Publications, 2022

work page 2022
[6]

Certifying and removing disparate impact,

M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian, “Certifying and removing disparate impact,” inACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2015, pp. 259–268

work page 2015
[7]

Fairness through awareness,

C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” inProc. 3rd Innov. Theor. Comput. Sci. Conf., 2012, pp. 214–226

work page 2012
[8]

Fairness testing: Testing software for discrimination,

S. Galhotra, Y. Brun, and A. Meliou, “Fairness testing: Testing software for discrimination,” inProc. 11th Joint Meeting Found. Softw. Eng., 2017, pp. 498–510

work page 2017
[9]

Aequitas: A Bias and Fairness Audit Toolkit

P. Saleiro, B. Kuester, L. Hinkson, J. London, A. Stevens, A. Anisfeld, K. T. Rodolfa, and R. Ghani, “Aequitas: Bias and fairness audit toolkit,”arXiv preprint arXiv:1811.05577, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

White-box fairness testing via adversarial sampling,

P. Zhang, J. Wang, J. Sun, G. Dong, X. Wang, X. Wang, J. S. Dong, and T. Dai, “White-box fairness testing via adversarial sampling,” inProc. ACM/IEEE 42nd Int. Conf. Softw. Eng., 2020, pp. 949–960

work page 2020
[11]

Efficient fairness testing via gradient search,

L. Zhang, Y. Zhang, and M. Zhang, “Efficient fairness testing via gradient search,” inProc. 30th ACM SIGSOFT Int. Symp. Softw. Test. Anal., 2021, pp. 103–114

work page 2021
[12]

Neuronfair: Interp. white-box fairness testing w/ biased neuron ident

H. Zheng, Z. Chen, T. Du, X. Zhang, Y. Cheng, S. Ji, J. Wang, Y. Yu, and J. Chen, “Neuronfair: Interp. white-box fairness testing w/ biased neuron ident.” inProc. 44th Int. Conf. Softw. Eng., 2022, pp. 1–13

work page 2022
[13]

Pyct: A python concolic tester,

Y.-F. Chen, W.-L. Tsai, W.-C. Wu, D.-D. Yen, and F. Yu, “Pyct: A python concolic tester,” inProc. 19th Asian Symp. Prog. Lang. Syst.Springer, 2021, pp. 38–46

work page 2021
[14]

Constraint-based adversarial example synthesis,

F. Yu, Y.-Y. Chi, and Y.-F. Chen, “Constraint-based adversarial example synthesis,”arXiv preprint arXiv:2406.01219, 2024

work page arXiv 2024
[15]

cvc5: A versatile and industrial-strength smt solver,

H. Barbosa, C. Barrett, M. Brain, G. Kremer, H. Lachnitt, M. Mann, A. Mohamed, M. Mohamed, A. Niemetz, and A. N ¨otzli, “cvc5: A versatile and industrial-strength smt solver,” inInt. Conf. Tools Algorithms Constr. Anal. Syst.Springer, 2022, pp. 415–442

work page 2022
[16]

Fairify: Fairness verification of NNs,

S. Biswas and H. Rajan, “Fairify: Fairness verification of NNs,” in2023 IEEE/ACM 45th Int. Conf. Softw. Eng.IEEE, 2023, pp. 1546–1558

work page 2023
[17]

Black-box fairness testing of ML models,

A. Aggarwal, P. Lohia, S. Nagar, K. Dey, and D. Saha, “Black-box fairness testing of ML models,” inProc. 27th ACM ESEC/FSE, 2019, pp. 625–635

work page 2019
[18]

Adversarial sampling for fairness testing in DNNs,

T. Ige, W. Marfo, J. Tonkinson, S. Adewale, and B. H. Matti, “Adversarial sampling for fairness testing in DNNs,”arXiv preprint arXiv:2303.02874, 2023. Concolic Testing of Neural Network Fairness 13

work page arXiv 2023
[19]

Fairsquare: Probabilistic verification of program fairness,

A. Albarghouthi, L. D’ Antoni, S. Drews, and A. V. Nori, “Fairsquare: Probabilistic verification of program fairness,”Proc. ACM Prog. Lang., Vol. 1, 2017, pp. 1–30

work page 2017
[20]

Probabilistic verification of fairness via concentration,

O. Bastani, X. Zhang, and A. Solar-Lezama, “Probabilistic verification of fairness via concentration,”Proc. ACM Prog. Lang., Vol. 3, 2019, pp. 1–27

work page 2019
[21]

Verifying individual fairness in ML models,

P. G. John, D. Vijaykeerthy, and D. Saha, “Verifying individual fairness in ML models,” inConf. Uncertainty in AI, 2020, pp. 749–758

work page 2020
[22]

Learning individually fair reprs

A. Ruoss, M. Balunovic, M. Fischer, and M. Vechev, “Learning individually fair reprs.”Adv. Neural Inf. Process. Syst., Vol. 33, 2020, pp. 7584–7596

work page 2020
[23]

Perfectly parallel fairness cert. of neural nets,

C. Urban, M. Christakis, V. W¨ ustholz, and F. Zhang, “Perfectly parallel fairness cert. of neural nets,”Proc. ACM Program. Lang., Vol. 4, no. OOPSLA, 2020, pp. 1–30

work page 2020
[24]

Training individually fair ML w/ sensitive subspace robustness,

M. Yurochkin, A. Bower, and Y. Sun, “Training individually fair ML w/ sensitive subspace robustness,” arXiv preprint arXiv:1907.00020, 2019

work page arXiv 1907
[25]

Faire: Repairing fairness via neuron synthesis,

T. Li, X. Xie, J. Wang, Q. Guo, A. Liu, L. Ma, and Y. Liu, “Faire: Repairing fairness via neuron synthesis,” ACM Trans. Softw. Eng. Methodol., Vol. 33, 2023, pp. 1–24

work page 2023
[26]

Certifair: Certified global fairness of NNs,

H. Khedr and Y. Shoukry, “Certifair: Certified global fairness of NNs,” inProc. AAAI Conf. Artif. Intell., Vol. 37, 2023, pp. 8237–8245

work page 2023
[27]

Feta: Fairness enforced NN algorithms,

K. Mohammadi, A. Sivaraman, and G. Farnadi, “Feta: Fairness enforced NN algorithms,” inProc. 3rd ACM Conf. Equity Access Algorithms Mech. Optim., 2023, pp. 1–11

work page 2023
[28]

Concolic testing for DNNs,

Y. Sun, M. Wu, W. Ruan, X. Huang, M. Kwiatkowska, and D. Kroening, “Concolic testing for DNNs,” in Proc. 33rd ACM/IEEE Int. Conf. Autom. Softw. Eng., 2018, pp. 109–119

work page 2018
[29]

Deepxplore: Automated white-box testing of DNNs,

K. Pei, Y. Cao, J. Yang, and S. Jana, “Deepxplore: Automated white-box testing of DNNs,” inSymp. Oper. Syst. Principles, 2017, pp. 1–18

work page 2017
[30]

Deepgauge: Multi-granularity testing for DNN systems,

L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su, L. Li, and Y. Liu, “Deepgauge: Multi-granularity testing for DNN systems,” inInt. Conf. Autom. Softw. Eng., 2018, pp. 120–131

work page 2018
[31]

Deepcon: Contribution coverage testing for DNN systems,

Z. Zhou, W. Dou, J. Liu, C. Zhang, J. Wei, and D. Ye, “Deepcon: Contribution coverage testing for DNN systems,” inIEEE Int. Conf. Softw. Anal., Evol. and Reeng.IEEE, 2021, pp. 189–200

work page 2021
[32]

Deepconcolic: Testing and debugging DNNs,

Y. Sun, X. Huang, D. Kroening, J. Sharp, M. Hill, and R. Ashmore, “Deepconcolic: Testing and debugging DNNs,” inIEEE/ACM Int. Conf. Softw. Eng. Companion. IEEE, 2019, pp. 111–114

work page 2019
[33]

Z3: Efficient smt solver,

L. De Moura and N. Bjørner, “Z3: Efficient smt solver,” inInt. Conf. Tools and Algorithms for Constr. and Anal. of Syst.Springer, 2008, pp. 337–340

work page 2008
[34]

Automated directed fairness testing,

S. Udeshi, P. Arora, and S. Chattopadhyay, “Automated directed fairness testing,” inInt. Conf. Autom. Softw. Eng., 2018, pp. 98–108

work page 2018
[35]

Majority voting in pattern recognition,

L. Lam and S. Y. Suen, “Majority voting in pattern recognition,”IEEE Trans. Syst. Man Cybern. A: Syst. Hum., Vol. 27, no. 5, 1997, pp. 553–568. 14 Ming-I Huang, Chih-Duo Hong, and Fang Yu Algorithm 2:2-DNN Computation Input:Original modelMand a list of PA indicesPA Output:The weight matrixw 2 of the 2-DNN forM 1w←the weight matrix ofM;// Retrieve weights ...

work page 1997

[1] [1]

Fair preprocessing: Towards understanding compositional fairness of data trans- formers,

S. Biswas and H. Rajan, “Fair preprocessing: Towards understanding compositional fairness of data trans- formers,” inProc. 29th ACM ESEC/FSE, 2021, pp. 981–993

work page 2021

[2] [2]

Fairea: Model behavior mutation for bias mitig

M. Hort, J. M. Zhang, F. Sarro, and M. Harman, “Fairea: Model behavior mutation for bias mitig.” inProc. 29th ACM ESEC/FSE, 2021, pp. 994–1006

work page 2021

[3] [3]

Ignorance and prejudice in sw fairness,

J. M. Zhang and M. Harman, “Ignorance and prejudice in sw fairness,” inProc. IEEE/ACM 43rd Int. Conf. Softw. Eng., 2021, pp. 1436–1447

work page 2021

[4] [4]

False positives, negatives, and analyses: A rejoinder to machine bias,

A. W. Flores, K. Bechtel, and C. T. Lowenkamp, “False positives, negatives, and analyses: A rejoinder to machine bias,”Fed. Probation, Vol. 80, 2016, p. 38

work page 2016

[5] [5]

Dastin,Amazon scraps secret AI recruiting tool that showed bias against women

J. Dastin,Amazon scraps secret AI recruiting tool that showed bias against women. Auerbach Publications, 2022

work page 2022

[6] [6]

Certifying and removing disparate impact,

M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian, “Certifying and removing disparate impact,” inACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2015, pp. 259–268

work page 2015

[7] [7]

Fairness through awareness,

C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” inProc. 3rd Innov. Theor. Comput. Sci. Conf., 2012, pp. 214–226

work page 2012

[8] [8]

Fairness testing: Testing software for discrimination,

S. Galhotra, Y. Brun, and A. Meliou, “Fairness testing: Testing software for discrimination,” inProc. 11th Joint Meeting Found. Softw. Eng., 2017, pp. 498–510

work page 2017

[9] [9]

Aequitas: A Bias and Fairness Audit Toolkit

P. Saleiro, B. Kuester, L. Hinkson, J. London, A. Stevens, A. Anisfeld, K. T. Rodolfa, and R. Ghani, “Aequitas: Bias and fairness audit toolkit,”arXiv preprint arXiv:1811.05577, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

White-box fairness testing via adversarial sampling,

P. Zhang, J. Wang, J. Sun, G. Dong, X. Wang, X. Wang, J. S. Dong, and T. Dai, “White-box fairness testing via adversarial sampling,” inProc. ACM/IEEE 42nd Int. Conf. Softw. Eng., 2020, pp. 949–960

work page 2020

[11] [11]

Efficient fairness testing via gradient search,

L. Zhang, Y. Zhang, and M. Zhang, “Efficient fairness testing via gradient search,” inProc. 30th ACM SIGSOFT Int. Symp. Softw. Test. Anal., 2021, pp. 103–114

work page 2021

[12] [12]

Neuronfair: Interp. white-box fairness testing w/ biased neuron ident

H. Zheng, Z. Chen, T. Du, X. Zhang, Y. Cheng, S. Ji, J. Wang, Y. Yu, and J. Chen, “Neuronfair: Interp. white-box fairness testing w/ biased neuron ident.” inProc. 44th Int. Conf. Softw. Eng., 2022, pp. 1–13

work page 2022

[13] [13]

Pyct: A python concolic tester,

Y.-F. Chen, W.-L. Tsai, W.-C. Wu, D.-D. Yen, and F. Yu, “Pyct: A python concolic tester,” inProc. 19th Asian Symp. Prog. Lang. Syst.Springer, 2021, pp. 38–46

work page 2021

[14] [14]

Constraint-based adversarial example synthesis,

F. Yu, Y.-Y. Chi, and Y.-F. Chen, “Constraint-based adversarial example synthesis,”arXiv preprint arXiv:2406.01219, 2024

work page arXiv 2024

[15] [15]

cvc5: A versatile and industrial-strength smt solver,

H. Barbosa, C. Barrett, M. Brain, G. Kremer, H. Lachnitt, M. Mann, A. Mohamed, M. Mohamed, A. Niemetz, and A. N ¨otzli, “cvc5: A versatile and industrial-strength smt solver,” inInt. Conf. Tools Algorithms Constr. Anal. Syst.Springer, 2022, pp. 415–442

work page 2022

[16] [16]

Fairify: Fairness verification of NNs,

S. Biswas and H. Rajan, “Fairify: Fairness verification of NNs,” in2023 IEEE/ACM 45th Int. Conf. Softw. Eng.IEEE, 2023, pp. 1546–1558

work page 2023

[17] [17]

Black-box fairness testing of ML models,

A. Aggarwal, P. Lohia, S. Nagar, K. Dey, and D. Saha, “Black-box fairness testing of ML models,” inProc. 27th ACM ESEC/FSE, 2019, pp. 625–635

work page 2019

[18] [18]

Adversarial sampling for fairness testing in DNNs,

T. Ige, W. Marfo, J. Tonkinson, S. Adewale, and B. H. Matti, “Adversarial sampling for fairness testing in DNNs,”arXiv preprint arXiv:2303.02874, 2023. Concolic Testing of Neural Network Fairness 13

work page arXiv 2023

[19] [19]

Fairsquare: Probabilistic verification of program fairness,

A. Albarghouthi, L. D’ Antoni, S. Drews, and A. V. Nori, “Fairsquare: Probabilistic verification of program fairness,”Proc. ACM Prog. Lang., Vol. 1, 2017, pp. 1–30

work page 2017

[20] [20]

Probabilistic verification of fairness via concentration,

O. Bastani, X. Zhang, and A. Solar-Lezama, “Probabilistic verification of fairness via concentration,”Proc. ACM Prog. Lang., Vol. 3, 2019, pp. 1–27

work page 2019

[21] [21]

Verifying individual fairness in ML models,

P. G. John, D. Vijaykeerthy, and D. Saha, “Verifying individual fairness in ML models,” inConf. Uncertainty in AI, 2020, pp. 749–758

work page 2020

[22] [22]

Learning individually fair reprs

A. Ruoss, M. Balunovic, M. Fischer, and M. Vechev, “Learning individually fair reprs.”Adv. Neural Inf. Process. Syst., Vol. 33, 2020, pp. 7584–7596

work page 2020

[23] [23]

Perfectly parallel fairness cert. of neural nets,

C. Urban, M. Christakis, V. W¨ ustholz, and F. Zhang, “Perfectly parallel fairness cert. of neural nets,”Proc. ACM Program. Lang., Vol. 4, no. OOPSLA, 2020, pp. 1–30

work page 2020

[24] [24]

Training individually fair ML w/ sensitive subspace robustness,

M. Yurochkin, A. Bower, and Y. Sun, “Training individually fair ML w/ sensitive subspace robustness,” arXiv preprint arXiv:1907.00020, 2019

work page arXiv 1907

[25] [25]

Faire: Repairing fairness via neuron synthesis,

T. Li, X. Xie, J. Wang, Q. Guo, A. Liu, L. Ma, and Y. Liu, “Faire: Repairing fairness via neuron synthesis,” ACM Trans. Softw. Eng. Methodol., Vol. 33, 2023, pp. 1–24

work page 2023

[26] [26]

Certifair: Certified global fairness of NNs,

H. Khedr and Y. Shoukry, “Certifair: Certified global fairness of NNs,” inProc. AAAI Conf. Artif. Intell., Vol. 37, 2023, pp. 8237–8245

work page 2023

[27] [27]

Feta: Fairness enforced NN algorithms,

K. Mohammadi, A. Sivaraman, and G. Farnadi, “Feta: Fairness enforced NN algorithms,” inProc. 3rd ACM Conf. Equity Access Algorithms Mech. Optim., 2023, pp. 1–11

work page 2023

[28] [28]

Concolic testing for DNNs,

Y. Sun, M. Wu, W. Ruan, X. Huang, M. Kwiatkowska, and D. Kroening, “Concolic testing for DNNs,” in Proc. 33rd ACM/IEEE Int. Conf. Autom. Softw. Eng., 2018, pp. 109–119

work page 2018

[29] [29]

Deepxplore: Automated white-box testing of DNNs,

K. Pei, Y. Cao, J. Yang, and S. Jana, “Deepxplore: Automated white-box testing of DNNs,” inSymp. Oper. Syst. Principles, 2017, pp. 1–18

work page 2017

[30] [30]

Deepgauge: Multi-granularity testing for DNN systems,

L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su, L. Li, and Y. Liu, “Deepgauge: Multi-granularity testing for DNN systems,” inInt. Conf. Autom. Softw. Eng., 2018, pp. 120–131

work page 2018

[31] [31]

Deepcon: Contribution coverage testing for DNN systems,

Z. Zhou, W. Dou, J. Liu, C. Zhang, J. Wei, and D. Ye, “Deepcon: Contribution coverage testing for DNN systems,” inIEEE Int. Conf. Softw. Anal., Evol. and Reeng.IEEE, 2021, pp. 189–200

work page 2021

[32] [32]

Deepconcolic: Testing and debugging DNNs,

Y. Sun, X. Huang, D. Kroening, J. Sharp, M. Hill, and R. Ashmore, “Deepconcolic: Testing and debugging DNNs,” inIEEE/ACM Int. Conf. Softw. Eng. Companion. IEEE, 2019, pp. 111–114

work page 2019

[33] [33]

Z3: Efficient smt solver,

L. De Moura and N. Bjørner, “Z3: Efficient smt solver,” inInt. Conf. Tools and Algorithms for Constr. and Anal. of Syst.Springer, 2008, pp. 337–340

work page 2008

[34] [34]

Automated directed fairness testing,

S. Udeshi, P. Arora, and S. Chattopadhyay, “Automated directed fairness testing,” inInt. Conf. Autom. Softw. Eng., 2018, pp. 98–108

work page 2018

[35] [35]

Majority voting in pattern recognition,

L. Lam and S. Y. Suen, “Majority voting in pattern recognition,”IEEE Trans. Syst. Man Cybern. A: Syst. Hum., Vol. 27, no. 5, 1997, pp. 553–568. 14 Ming-I Huang, Chih-Duo Hong, and Fang Yu Algorithm 2:2-DNN Computation Input:Original modelMand a list of PA indicesPA Output:The weight matrixw 2 of the 2-DNN forM 1w←the weight matrix ofM;// Retrieve weights ...

work page 1997