pith. sign in

arxiv: 2605.04460 · v2 · submitted 2026-05-06 · 💻 cs.LG

Discovering Sparse Counterfactual Factors via Latent Adjustment for Survey-based Community Intervention

Pith reviewed 2026-05-11 01:54 UTC · model grok-4.3

classification 💻 cs.LG
keywords sparse counterfactualslatent adjustmentoptimal transportShapley attributionsurvey interventionspolicy feasibilitytransportation surveysdistributional alignment
0
0 comments X

The pith

A fixed-basis nonnegative latent space plus Shapley selection and entropy-regularized optimal transport yields sparse, policy-feasible adjustments that shift survey respondent groups toward a reference distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method for turning transportation survey data into sparse counterfactual interventions that move a target group toward a desired reference group through minimal changes to controllable variables. Most survey work stays descriptive or predictive; this approach instead solves a distributional alignment problem so that the resulting adjustments are both effective at population level and practical for policy makers because they remain sparse and explicitly quantified. The method first embeds responses in a fixed nonnegative latent representation that keeps pre- and post-intervention data comparable, selects relevant factors with Shapley attribution, and then learns group-level adjustments by minimizing an entropy-regularized optimal-transport cost together with an l_{2,1} sparsity penalty.

Core claim

The authors formulate sparse counterfactual community intervention as a policy-feasible distributional alignment task. They embed survey responses in a fixed-basis nonnegative latent representation that preserves pre/post comparability and supplies a stable map back to original variables. Target-relevant latent factors are identified by Shapley-guided attribution; feasible adjustments are then obtained by minimizing an entropy-regularized optimal-transport discrepancy between the post-intervention target distribution and the reference distribution, subject to a weighted l_{2,1} penalty that enforces shared policy-lever sparsity. Experiments on real transportation survey datasets confirm that

What carries the argument

The fixed-basis nonnegative latent representation that preserves pre/post comparability while providing a stable invertible map from latent factors to original survey variables, combined with Shapley-guided factor selection and entropy-regularized optimal transport minimization under an l_{2,1} sparsity penalty.

If this is right

  • The framework produces compact and interpretable policy-feasible interventions with explicit adjustment magnitudes.
  • Population-level conversion from the target group toward the reference group improves after the adjustments.
  • Intervention sparsity is preserved through the weighted l_{2,1} penalty, focusing changes on shared policy levers.
  • The same pipeline works on multiple real-world transportation survey datasets without requiring changes to the core formulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same latent-adjustment pipeline could be tested on non-transportation surveys such as health-behavior or energy-consumption questionnaires to check whether the sparsity and comparability properties transfer.
  • Because adjustments are expressed as explicit magnitudes on original variables, policy makers could directly translate the output into pilot programs with measurable costs.
  • If the reference group is itself time-varying, the method could be extended by recomputing the target alignment at successive time steps to track how intervention priorities evolve.

Load-bearing premise

The fixed-basis nonnegative latent representation preserves pre/post comparability and supplies a stable map from latent factors to original variables so that Shapley-selected factors translate directly into controllable survey adjustments.

What would settle it

If applying the learned group-level adjustments fails to reduce the entropy-regularized optimal-transport distance between the adjusted target distribution and the reference distribution below the distance obtained with no intervention, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2605.04460 by Fatima Ashraf, Junbiao Pang, Muhammad Ayub Sabir, Yan Shang, Yufang Zhou.

Figure 1
Figure 1. Figure 1: Overview of the conversion-by-alignment framework for survey-based view at source ↗
Figure 2
Figure 2. Figure 2: MNIST 3→8 conversion: representative samples and latent-space movement view at source ↗
Figure 3
Figure 3. Figure 3: Optimization trajectory showing monotonic objective decrease and view at source ↗
read the original abstract

Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual community intervention from survey responses, where the goal is to shift a target respondent group toward a desired reference group through controllable survey-variable adjustments. We formulate this task as a policy-feasible distributional alignment problem using a fixed-basis nonnegative latent representation that preserves pre/post comparability and provides a stable map from latent factors to original variables. To make latent movement actionable, target-relevant latent factors are identified through Shapley-guided attribution and transferred to controllable variables as intervention priorities. Feasible group-level adjustments are then learned by minimizing an entropy-regularized optimal-transport discrepancy between the post-intervention target distribution and the reference distribution, together with a weighted $\ell_{2,1}$ penalty that promotes shared policy-lever sparsity. Experiments on real-world transportation survey datasets show that the proposed framework produces compact and interpretable policy-feasible interventions with explicit adjustment magnitudes, improves population-level conversion, and preserves intervention sparsity. Code and datasets are publicly available at: https://github.com/pangjunbiao/latent-group-alignment.git

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a framework for discovering sparse counterfactual factors to enable policy-feasible community interventions from transportation survey data. It employs a fixed-basis nonnegative latent representation to maintain pre/post comparability, applies Shapley-guided attribution to identify target-relevant latent factors, and learns group-level adjustments by minimizing an entropy-regularized optimal transport discrepancy to a reference distribution combined with a weighted ℓ_{2,1} penalty for shared sparsity. Experiments on real-world survey datasets are reported to produce compact, interpretable interventions with explicit magnitudes that improve population-level conversion while preserving sparsity.

Significance. If the central assumptions hold, particularly the stability of the latent representation under adjustment, the work could offer a practical bridge from descriptive survey analysis to actionable, sparse policy interventions in transportation and related domains. The public availability of code and datasets supports reproducibility, which is a clear strength. The combination of standard components (latent factor models, Shapley values, entropy-regularized OT) is applied in a domain-specific way, though the primary advance appears to be in the integrated pipeline rather than new theoretical machinery.

major comments (2)
  1. [Abstract] Abstract: the central empirical claims ('improves population-level conversion' and 'preserves intervention sparsity') are stated without any quantitative metrics, baselines, ablation results, error bars, or statistical tests, leaving the strength of the experimental support difficult to evaluate and load-bearing for the paper's conclusions.
  2. [Method] Method (latent representation and adjustment steps): the fixed-basis nonnegative latent map is asserted to 'preserve pre/post comparability' and provide a 'stable map from latent factors to original variables' after the entropy-regularized OT step plus ℓ_{2,1} penalty, but no derivation, invariance proof, or post-adjustment empirical check (e.g., preserved nonnegativity or loading stability in original variable space) is supplied. This directly undermines the claim that Shapley-selected factors translate into controllable, policy-feasible survey adjustments.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by including at least one key quantitative result (e.g., conversion improvement percentage or sparsity metric) to give readers an immediate sense of effect size.
  2. [Method] Notation for the entropy regularization parameter and ℓ_{2,1} penalty weight should be explicitly defined with equations, as these are the only free parameters listed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight opportunities to strengthen the presentation of empirical results and the justification of the latent representation's stability. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central empirical claims ('improves population-level conversion' and 'preserves intervention sparsity') are stated without any quantitative metrics, baselines, ablation results, error bars, or statistical tests, leaving the strength of the experimental support difficult to evaluate and load-bearing for the paper's conclusions.

    Authors: We agree that the abstract would be strengthened by including key quantitative results to better substantiate the central claims. The main text reports specific metrics (e.g., conversion improvements and sparsity levels relative to baselines), but these are not summarized in the abstract. In the revision, we will add concise quantitative statements, such as average percentage improvements in population-level conversion and achieved sparsity ratios, while keeping the abstract within length limits. This directly addresses the concern about evaluability of the empirical support. revision: yes

  2. Referee: [Method] Method (latent representation and adjustment steps): the fixed-basis nonnegative latent map is asserted to 'preserve pre/post comparability' and provide a 'stable map from latent factors to original variables' after the entropy-regularized OT step plus ℓ_{2,1} penalty, but no derivation, invariance proof, or post-adjustment empirical check (e.g., preserved nonnegativity or loading stability in original variable space) is supplied. This directly undermines the claim that Shapley-selected factors translate into controllable, policy-feasible survey adjustments.

    Authors: The fixed-basis nonnegative latent representation is constructed to use the same basis matrix before and after adjustment, which by design preserves the linear mapping from latent factors to original variables and ensures nonnegativity is maintained under the adjustment constraints. However, we acknowledge that the current manuscript provides no explicit derivation of invariance properties or post-adjustment empirical validation (such as checks on loading stability or nonnegativity preservation in variable space). We will add a short derivation sketch in the method section explaining the stability due to the fixed basis and include an empirical verification subsection in the experiments demonstrating that post-adjustment loadings remain stable and nonnegative. This will better support the policy-feasibility claims. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation applies standard OT, Shapley, and latent-factor tools without reduction to fitted inputs or self-citations

full rationale

The paper formulates the intervention task using a fixed-basis nonnegative latent representation, Shapley-guided factor selection, entropy-regularized optimal transport, and an ℓ_{2,1} penalty. These components are presented as standard techniques applied to survey data; the abstract states that the representation 'preserves pre/post comparability' as a modeling choice rather than deriving it from the OT step itself. No equation or claim reduces the central output (sparse policy-feasible adjustments) to a quantity defined solely by the fitted parameters or by a self-citation chain. The derivation chain therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

2 free parameters · 3 axioms · 0 invented entities

The central claim depends on standard mathematical properties of optimal transport and Shapley values plus domain assumptions about the stability of the chosen latent representation; no new entities are postulated.

free parameters (2)
  • entropy regularization parameter
    Controls smoothness of the transport plan in the OT objective and must be chosen or tuned.
  • l2,1 sparsity penalty weight
    Balances the group-sparsity term against the alignment objective and is a tunable hyperparameter.
axioms (3)
  • standard math Entropy-regularized optimal transport yields a feasible and differentiable alignment between two distributions.
    Invoked as the core discrepancy minimized between post-intervention target and reference distributions.
  • domain assumption Shapley values provide reliable attribution for identifying target-relevant latent factors.
    Used to select which latent factors to adjust for the intervention.
  • domain assumption Fixed-basis nonnegative latent representation preserves pre/post comparability and gives a stable linear map back to original variables.
    Stated explicitly as enabling actionable translation from latent movement to survey-variable adjustments.

pith-pipeline@v0.9.0 · 5518 in / 1624 out tokens · 50073 ms · 2026-05-11T01:54:15.983350+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    A case study and survey-based assessment of the manage- ment of innovation and technology,

    M. Srivastava, “A case study and survey-based assessment of the manage- ment of innovation and technology,”Journal of technology management & innovation, vol. 6, no. 1, pp. 147–160, 2011

  2. [2]

    Predicting the use of public transportation: a case study from putrajaya, malaysia,

    M. N. Borhan, D. Syamsunur, N. Mohd Akhir, M. R. Mat Yazid, A. Ismail, and R. A. Rahmat, “Predicting the use of public transportation: a case study from putrajaya, malaysia,”The Scientific World Journal, vol. 2014, no. 1, p. 784145, 2014

  3. [3]

    Using a technology acceptance model to explore the intention to use digital health technologies among people with disabilities: cross-sectional survey study,

    J.-H. Kim, J. Kim, and B.-Y . Youn, “Using a technology acceptance model to explore the intention to use digital health technologies among people with disabilities: cross-sectional survey study,”Journal of Medical Internet Research, vol. 27, p. e79595, 2025

  4. [4]

    A social equity analysis of the us public transportation system based on job accessibility,

    A. J. Yeganeh, R. P. Hall, A. R. Pearce, and S. Hankey, “A social equity analysis of the us public transportation system based on job accessibility,” Journal of Transport and Land Use, vol. 11, no. 1, pp. 1039–1056, 2018

  5. [5]

    The impact of public transportation on carbon emissions—from the perspective of energy consumption,

    Q.-L. Jing, H.-Z. Liu, W.-Q. Yu, and X. He, “The impact of public transportation on carbon emissions—from the perspective of energy consumption,”Sustainability, vol. 14, no. 10, p. 6248, 2022

  6. [6]

    Investigating consumers’ behavioral intentions in the adoption of 5g mobile networks: a holistic approach to technology acceptance and business process integration,

    H. Susanto, I. N. Hj Ahamad, and A. K. Shafa Susanto, “Investigating consumers’ behavioral intentions in the adoption of 5g mobile networks: a holistic approach to technology acceptance and business process integration,”Frontiers in Communications and Networks, vol. 6, p. 1594378, 2025

  7. [7]

    How incentives affect commuter willingness for public transport: Analysis of travel mode shift across various cities,

    B. Liu, Z. Ma, H. Kong, and X. Ma, “How incentives affect commuter willingness for public transport: Analysis of travel mode shift across various cities,”Travel Behaviour and Society, vol. 39, p. 100966, 2025

  8. [8]

    Accessibility and transportation equity,

    A. Antipova, S. Sultana, Y . Hu, and J. P. Rhudy Jr, “Accessibility and transportation equity,” p. 3611, 2020

  9. [9]

    Promoting green transportation through changing behaviors with low-carbon-travel function of digital maps,

    L. Zhang, L. Tao, F. Yang, Y . Bao, and C. Li, “Promoting green transportation through changing behaviors with low-carbon-travel function of digital maps,”Humanities and Social Sciences Communications, vol. 11, no. 1, pp. 1–10, 2024

  10. [10]

    Is it possible to attract private vehicle users towards public transport? understanding the key role of service quality, satisfaction and involvement on behavioral intentions,

    J. de O ˜na and R. de O ˜na, “Is it possible to attract private vehicle users towards public transport? understanding the key role of service quality, satisfaction and involvement on behavioral intentions,”Transportation, vol. 50, no. 3, pp. 1073–1101, 2023

  11. [11]

    Uncov- ering distinct public transport user profiles and the factors influencing the users’ intentions,

    W. Kriswardhana, K. Ismael, S. Duleba, and D. Eszterg ´ar-Kiss, “Uncov- ering distinct public transport user profiles and the factors influencing the users’ intentions,”Journal of Urban Mobility, vol. 7, p. 100127, 2025

  12. [12]

    Counterfactual explanations for deep learning-based traffic forecasting,

    R. Wang, Y . Xin, Y . Zhang, F. Perez-Cruz, and M. Raubal, “Counterfactual explanations for deep learning-based traffic forecasting,”Communications in Transportation Research, vol. 5, p. 100176, 2025

  13. [13]

    Distributional counterfac- tual explanations with optimal transport,

    L. You, L. Cao, M. Nilsson, B. Zhao, and L. Lei, “Distributional counterfac- tual explanations with optimal transport,”arXiv preprint arXiv:2401.13112, 2024

  14. [14]

    Counterfactual explanations as interventions in latent space,

    R. Crupi, A. Castelnovo, D. Regoli, and B. San Miguel Gonzalez, “Counterfactual explanations as interventions in latent space,”Data Mining and Knowledge Discovery, vol. 38, no. 5, pp. 2733–2769, 2024

  15. [15]

    Impact analysis of actual traveling performance on bus passenger’s perception and satisfaction,

    R. Rong, L. Liu, N. Jia, and S. Ma, “Impact analysis of actual traveling performance on bus passenger’s perception and satisfaction,” Transportation Research Part A: Policy and Practice, vol. 160, pp. 80– 100, 2022

  16. [16]

    Private car users’ willingness to switch to public transportation and its influencing factors in the yangtze river delta,

    X. Ye and M. Sato, “Private car users’ willingness to switch to public transportation and its influencing factors in the yangtze river delta,”Asian Transport Studies, vol. 11, p. 100171, 2025

  17. [17]

    Scaling up public transport usage: a systematic literature review of service quality, satisfaction and attitude towards bus transport systems in developing countries,

    E. Sogbe, S. Susilawati, and T. C. Pin, “Scaling up public transport usage: a systematic literature review of service quality, satisfaction and attitude towards bus transport systems in developing countries,”Public Transport, vol. 17, no. 1, pp. 1–44, 2025

  18. [18]

    Importance-aware topic modeling for discovering public transit risk from noisy social media,

    F. Ashraf, M. A. Sabir, J. Deng, J. Pang, and H. Yu, “Importance-aware topic modeling for discovering public transit risk from noisy social media,” arXiv preprint arXiv:2512.06293, 2025

  19. [19]

    Topic modeling help enhancing sustainable mobility,

    X. Li, G. He, P. Guo, Z. Guo, S. Lin, and S. Du, “Topic modeling help enhancing sustainable mobility,”Journal of Cleaner Production, vol. 534, p. 147068, 2025

  20. [20]

    Identifying latent activity behaviors and lifestyles using mobility data to describe urban dynamics,

    Y . Yang, A. Pentland, and E. Moro, “Identifying latent activity behaviors and lifestyles using mobility data to describe urban dynamics,”EPJ Data Science, vol. 12, no. 1, p. 15, 2023

  21. [21]

    Unveiling mobility patterns beyond home/work activities: A topic modeling approach using transit smart card and land-use data,

    N. Aminpour and S. Saidi, “Unveiling mobility patterns beyond home/work activities: A topic modeling approach using transit smart card and land-use data,”Travel Behaviour and Society, vol. 38, p. 100905, 2025

  22. [22]

    Toward practical and plausible counterfactual explanation through latent adjustment in disentangled space,

    S.-H. Na, W.-J. Nam, and S.-W. Lee, “Toward practical and plausible counterfactual explanation through latent adjustment in disentangled space,” Expert Systems with Applications, vol. 233, p. 120982, 2023

  23. [23]

    Cirf: Importance of related features for plausible counterfactual explanations,

    H.-D. Kim, Y .-J. Ju, J.-H. Hong, and S.-W. Lee, “Cirf: Importance of related features for plausible counterfactual explanations,”Information Sciences, vol. 678, p. 120974, 2024

  24. [24]

    Counterfactual explanations and algorithmic recourses for machine learning: A review,

    S. Verma, V . Boonsanong, M. Hoang, K. Hines, J. Dickerson, and C. Shah, “Counterfactual explanations and algorithmic recourses for machine learning: A review,”ACM Computing Surveys, vol. 56, no. 12, pp. 1–42, 2024

  25. [25]

    Nonnegative matrix factorization in dimensionality reduction: A survey,

    F. Saberi-Movahed, K. Berahmand, R. Sheikhpour, Y . Li, S. Pan, and M. Jalili, “Nonnegative matrix factorization in dimensionality reduction: A survey,”ACM Computing Surveys, vol. 58, no. 5, pp. 1–41, 2025

  26. [26]

    The many shapley values for explainable artificial intelligence: A sensitivity analysis perspective,

    E. Borgonovo, E. Plischke, and G. Rabitti, “The many shapley values for explainable artificial intelligence: A sensitivity analysis perspective,” European Journal of Operational Research, vol. 318, no. 3, pp. 911–926, 2024

  27. [27]

    Peyr ´e and M

    G. Peyr ´e and M. Cuturi,Computational optimal transport: With applica- tions to data science. Now Foundations and Trends, 2019

  28. [28]

    Santa clara valley on-board transit survey (2013),

    “Santa clara valley on-board transit survey (2013),” https://www.nlr.gov/transportation/secure-transportation-data/ tsdc-santa-clara-valley-onboard-transit-survey, 2013