pith. sign in

arxiv: 2604.13256 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.GR

Counterfactual Peptide Editing for Causal TCR--pMHC Binding Inference

Pith reviewed 2026-05-10 15:55 UTC · model grok-4.3

classification 💻 cs.LG cs.GR
keywords TCR-pMHC bindingcounterfactual learningshortcut learningcausal inferenceinvariant predictionimmunoinformaticsmachine learning
0
0 comments X

The pith

Counterfactual peptide edits at anchor positions train TCR-pMHC models to respect causal binding rules instead of data shortcuts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neural models for TCR-pMHC binding often learn spurious correlations such as peptide length or V-gene patterns rather than the actual physical interface. The paper introduces Counterfactual Invariant Prediction (CIP) that generates constrained edits to peptides, keeping changes minimal away from anchors while making them disruptive at anchor sites. Training then adds an invariance penalty for the conservative edits and a contrastive term that amplifies prediction shifts for anchor edits. Under family-held-out testing where shortcuts break, this produces higher AUROC and greater consistency than standard training. A reader would care because such models could support more reliable immune specificity predictions for vaccine design or autoimmunity studies.

Core claim

The central claim is that augmenting a base TCR-pMHC classifier with two auxiliary objectives—an invariance loss that penalizes prediction changes under conservative non-anchor substitutions and a contrastive loss that encourages large changes under anchor disruptions—yields predictions that capture causal binding determinants, evidenced by improved performance and reduced shortcut reliance on VDJdb-IEDB benchmarks under family-held-out and distance-aware splits.

What carries the argument

Counterfactual Invariant Prediction (CIP), the training framework that generates biologically constrained peptide edits and applies invariance and contrastive losses to enforce differential sensitivity based on anchor versus non-anchor positions.

If this is right

  • Models exhibit lower shortcut index and higher counterfactual consistency on family-held-out data splits.
  • Anchor-aware edit generation accounts for the majority of gains in out-of-distribution robustness.
  • Predictions become less brittle to variations in peptide length or gene co-occurrence that do not affect the binding interface.
  • The approach supplies a concrete recipe for incorporating causal constraints into other binding-prediction tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same edit-generation strategy could be tested on MHC class II or other receptor-ligand systems where anchor motifs are known.
  • If the generated edits can be validated against structural databases, the framework might serve as a prior for physics-informed models of binding.
  • Extending the contrastive term to include known non-binding peptide variants from literature could further sharpen causal separation.

Load-bearing premise

The generated counterfactual peptide edits must accurately reflect biological reality by remaining conservative at non-anchor positions and disruptive at anchors in a manner that isolates true binding causality.

What would settle it

If real experimental binding assays show that the model’s invariance to non-anchor edits does not align with actual binding outcomes for independently verified mutations, while its sensitivity to anchor edits also fails to match known motif disruptions, the claim that CIP captures causal structure would be refuted.

read the original abstract

Neural models for TCR-pMHC binding prediction are susceptible to shortcut learning: they exploit spurious correlations in training data -- such as peptide length bias or V-gene co-occurrence -- rather than the physical binding interface. This renders predictions brittle under family-held-out and distance-aware evaluation, where such shortcuts do not transfer. We introduce \emph{Counterfactual Invariant Prediction} (CIP), a training framework that generates biologically constrained counterfactual peptide edits and enforces invariance to edits at non-anchor positions while amplifying sensitivity at MHC anchor residues. CIP augments the base classifier with two auxiliary objectives: (1) an invariance loss penalizing prediction changes under conservative non-anchor substitutions, and (2) a contrastive loss encouraging large prediction changes under anchor-position disruptions. Evaluated on a curated VDJdb-IEDB benchmark under family-held-out, distance-aware, and random splits, CIP achieves AUROC 0.831 and counterfactual consistency (CFC) 0.724 under the challenging family-held-out protocol -- a 39.7\% reduction in shortcut index relative to the unconstrained baseline. Ablations confirm that anchor-aware edit generation is the dominant driver of OOD gains, providing a practical recipe for causally-grounded TCR specificity modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces Counterfactual Invariant Prediction (CIP), a training framework for TCR-pMHC binding models that generates anchor-aware counterfactual peptide edits. It augments the base classifier with an invariance loss (penalizing prediction changes under conservative non-anchor substitutions) and a contrastive loss (encouraging large changes under anchor disruptions) to reduce shortcut learning from spurious correlations such as peptide length or V-gene co-occurrence. On a VDJdb-IEDB benchmark under family-held-out, distance-aware, and random splits, CIP reports AUROC 0.831 and counterfactual consistency (CFC) 0.724 in the family-held-out setting, corresponding to a 39.7% reduction in shortcut index relative to the unconstrained baseline, with ablations attributing gains primarily to anchor-aware edit generation.

Significance. If the generated edits prove biologically valid and the auxiliary losses demonstrably extract causal binding structure rather than additional regularization, CIP offers a practical, biologically motivated recipe for improving OOD generalization in immunological sequence models. The reported metrics and ablation results are consistent with reduced reliance on known shortcuts, which could benefit downstream applications such as TCR specificity prediction for immunotherapy. However, the absence of independent experimental validation against measured affinity changes under mutation leaves the causal interpretation plausible but not yet strongly supported.

major comments (3)
  1. [§3.2] §3.2 (Counterfactual Edit Generation): The claim that non-anchor edits are 'conservative' and 'biologically constrained' while anchor edits are 'disruptive' rests on prior knowledge of anchor positions, but the manuscript provides no quantitative check (e.g., predicted ΔΔG or comparison to known mutation databases) that these edits actually reflect physical binding causality rather than the chosen perturbation distribution.
  2. [§4.3] §4.3 and Table 2 (Family-held-out results): The 39.7% shortcut-index reduction and CFC=0.724 are presented as evidence of causal grounding, yet no external validation against experimental affinity measurements under single-residue mutations is reported; without this, the gains could equally result from improved robustness to length/V-gene artifacts.
  3. [Table 3] Ablation study (Table 3): While anchor-aware generation is identified as the dominant driver, the manuscript does not report whether removing the invariance loss alone (while keeping anchor-aware edits) still yields comparable OOD gains, which would help isolate whether the auxiliary objectives are extracting causality or simply acting as regularizers.
minor comments (3)
  1. [Eq. 7] The definition of the shortcut index (Eq. 7) should be expanded with an explicit formula and a brief derivation showing how it quantifies reliance on spurious features.
  2. [Figure 2] Figure 2 (edit examples) would benefit from an additional column showing the base model's prediction change for each edit to allow direct visual comparison with the CIP model.
  3. [§4.1] The family-held-out protocol description should clarify whether the held-out families overlap with the edit-generation prior or are completely disjoint.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and have revised the manuscript accordingly where feasible to strengthen the presentation and analysis.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Counterfactual Edit Generation): The claim that non-anchor edits are 'conservative' and 'biologically constrained' while anchor edits are 'disruptive' rests on prior knowledge of anchor positions, but the manuscript provides no quantitative check (e.g., predicted ΔΔG or comparison to known mutation databases) that these edits actually reflect physical binding causality rather than the chosen perturbation distribution.

    Authors: We appreciate this observation. Our edit generation strategy is explicitly based on well-established prior knowledge of MHC anchor positions from the immunology literature, which defines non-anchor positions as more tolerant to substitution. We have added a clarifying paragraph in the revised Section 3.2 that explicitly states this reliance on prior biological knowledge, discusses the chosen perturbation distribution, and acknowledges the absence of new ΔΔG or database-based validation. We agree such quantitative checks would be valuable but are outside the current computational scope. revision: partial

  2. Referee: [§4.3] §4.3 and Table 2 (Family-held-out results): The 39.7% shortcut-index reduction and CFC=0.724 are presented as evidence of causal grounding, yet no external validation against experimental affinity measurements under single-residue mutations is reported; without this, the gains could equally result from improved robustness to length/V-gene artifacts.

    Authors: We agree that direct experimental validation against measured affinity changes would provide stronger support for a causal interpretation. Our work is a computational study focused on reducing shortcut learning via counterfactual training and demonstrating improved OOD generalization on family-held-out and distance-aware splits. We have expanded the discussion section to more clearly articulate this limitation and the possibility that gains partly reflect robustness to length and V-gene artifacts, while emphasizing that the CFC metric and shortcut-index reduction are designed to quantify invariance properties. revision: no

  3. Referee: [Table 3] Ablation study (Table 3): While anchor-aware generation is identified as the dominant driver, the manuscript does not report whether removing the invariance loss alone (while keeping anchor-aware edits) still yields comparable OOD gains, which would help isolate whether the auxiliary objectives are extracting causality or simply acting as regularizers.

    Authors: We thank the referee for this helpful suggestion to better isolate the role of the invariance loss. We have performed the requested ablation (anchor-aware edits without the invariance loss) and added the corresponding results to the revised Table 3. The new row shows that anchor-aware generation remains the primary driver of OOD gains, while the invariance loss contributes an additional reduction in shortcut reliance, helping to distinguish its effect from pure regularization. revision: yes

standing simulated objections not resolved
  • Absence of independent experimental validation against measured affinity changes under single-residue mutations, as this would require wet-lab experiments beyond the scope of the current computational study.

Circularity Check

0 steps flagged

No circularity: empirical metrics and auxiliary objectives are independent of self-definition.

full rationale

The paper defines CIP via auxiliary invariance and contrastive losses applied to externally generated counterfactual edits (using biological priors on anchor positions). Reported AUROC, CFC, and shortcut-index reduction are measured on held-out family-held-out splits, not derived by construction from the training objectives or fitted parameters. No self-citation chain, uniqueness theorem, or ansatz is invoked to justify the core claims; the method augments a base classifier with new objectives whose outputs are evaluated externally. The derivation chain remains self-contained against the benchmark data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on domain knowledge about MHC anchor residues and the feasibility of generating valid counterfactual edits; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Anchor residues primarily determine binding specificity while non-anchor positions admit conservative substitutions that preserve binding behavior.
    This premise directly motivates the edit generation strategy and the design of the invariance versus contrastive losses.

pith-pipeline@v0.9.0 · 5519 in / 1324 out tokens · 22828 ms · 2026-05-10T15:55:03.807720+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    Integrated mrna sequence optimization using deep learning,

    H. Gong, J. Wen, R. Luo, Y . Feng, J. Guo, H. Fu, and X. Zhou, “Integrated mrna sequence optimization using deep learning,”Briefings in Bioinformatics, vol. 24, no. 1, Jan. 2023. [Online]. Available: http://dx.doi.org/10.1093/bib/bbad001

  2. [2]

    Methods for evaluating unsupervised vector representations of genomic regions,

    G. Zheng, J. Rymuza, E. Gharavi, N. LeRoy, A. Zhang, and N. Sheffield, “Methods for evaluating unsupervised vector representations of genomic regions,”NAR Genomics and Bioinformatics, vol. 6, no. 3, Jul. 2024. [Online]. Available: http://dx.doi.org/10.1093/nargab/lqae086

  3. [3]

    Deeptcr is a deep learning framework for revealing sequence concepts within t-cell repertoires,

    J.-W. Sidhom, H. B. Larman, D. M. Pardoll, and A. S. Baras, “Deeptcr is a deep learning framework for revealing sequence concepts within t-cell repertoires,”Nature Communications, vol. 12, no. 1, Mar. 2021. [Online]. Available: http://dx.doi.org/10.1038/s41467-021-21879-w

  4. [4]

    Deepmhcii: a novel binding core-aware deep interaction model for accurate mhc-ii peptide binding affinity prediction,

    R. You, W. Qu, H. Mamitsuka, and S. Zhu, “Deepmhcii: a novel binding core-aware deep interaction model for accurate mhc-ii peptide binding affinity prediction,”Bioinformatics, vol. 38, no. Supplement 1, p. i220–i228, Jun. 2022. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btac225

  5. [5]

    Epic-trace: predicting tcr binding to unseen epitopes using attention and contextualized embeddings,

    D. Korpela, E. Jokinen, A. Dumitrescu, J. Huuhtanen, S. Mustjoki, and H. L ¨ahdesm¨aki, “Epic-trace: predicting tcr binding to unseen epitopes using attention and contextualized embeddings,”Bioinformatics, vol. 39, no. 12, Dec. 2023. [Online]. Available: http://dx.doi.org/10. 1093/bioinformatics/btad743

  6. [6]

    Convolutional neural network architectures for predicting dna–protein binding,

    H. Zeng, M. D. Edwards, G. Liu, and D. K. Gifford, “Convolutional neural network architectures for predicting dna–protein binding,” Bioinformatics, vol. 32, no. 12, p. i121–i127, Jun. 2016. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btw255

  7. [7]

    Netmhcpan-4.1 and netmhciipan-4.0: improved predictions of mhc antigen presentation by concurrent motif deconvolution and integration of ms mhc eluted ligand data,

    B. Reynisson, B. Alvarez, S. Paul, B. Peters, and M. Nielsen, “Netmhcpan-4.1 and netmhciipan-4.0: improved predictions of mhc antigen presentation by concurrent motif deconvolution and integration of ms mhc eluted ligand data,”Nucleic Acids Research, vol. 48, no. W1, p. W449–W454, May 2020. [Online]. Available: http: //dx.doi.org/10.1093/nar/gkaa379

  8. [8]

    Lawrence Zitnick, Jerry Ma, and Rob Fergus

    A. Rives, J. Meier, T. Sercu, S. Goyal, Z. Lin, J. Liu, D. Guo, M. Ott, C. L. Zitnick, J. Ma, and R. Fergus, “Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,”Proceedings of the National Academy of Sciences, vol. 118, no. 15, Apr. 2021. [Online]. Available: http://dx.doi.org/10.1073/pnas.2016239118

  9. [9]

    Modeling tcr-pmhc binding with dual encoders and cross-attention fusion,

    W. Wang, C. Qi, and Z. Wei, “Modeling tcr-pmhc binding with dual encoders and cross-attention fusion,” in2025 IEEE International Con- ference on Bioinformatics and Biomedicine (BIBM). IEEE, 2025, pp. 5083–5090

  10. [10]

    Multiple instance learning: A survey of problem characteristics and applications,

    M.-A. Carbonneau, V . Cheplygina, E. Granger, and G. Gagnon, “Multiple instance learning: A survey of problem characteristics and applications,”Pattern Recognition, vol. 77, p. 329–353, May 2018. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2017.10.009

  11. [11]

    Multi-trait, multi- environment deep learning modeling for genomic-enabled prediction of plant traits,

    O. A. Montesinos-L ´opez, A. Montesinos-L ´opez, J. Crossa, D. Gianola, C. M. Hern ´andez-Su´arez, and J. Mart ´ın-Vallejo, “Multi-trait, multi- environment deep learning modeling for genomic-enabled prediction of plant traits,”G3 Genes—Genomes—Genetics, vol. 8, no. 12, p. 3829–3840, Dec. 2018. [Online]. Available: http://dx.doi.org/10.1534/ g3.118.200728

  12. [12]

    Masset, R

    T. J. O’Donnell, A. Rubinsteyn, and U. Laserson, “Mhcflurry 2.0: Improved pan-allele prediction of mhc class i-presented peptides by incorporating antigen processing,”Cell Systems, vol. 11, no. 1, pp. 42–48.e7, Jul. 2020. [Online]. Available: http://dx.doi.org/10.1016/j. cels.2020.06.010

  13. [13]

    Results of a randomized phase iib trial of nelipepimut-s + trastuzumab versus trastuzumab to prevent recurrences in patients with high- risk her2 low-expressing breast cancer,

    G. T. Clifton, D. Hale, T. J. Vreeland, A. T. Hickerson, J. K. Litton, G. Alatrash, R. K. Murthy, N. Qiao, A. V . Philips, J. J. Lukas, J. P. Holmes, G. E. Peoples, and E. A. Mittendorf, “Results of a randomized phase iib trial of nelipepimut-s + trastuzumab versus trastuzumab to prevent recurrences in patients with high- risk her2 low-expressing breast c...

  14. [14]

    Dlptcr: an ensemble deep learning framework for predicting immunogenic peptide recognized by t cell receptor,

    Z. Xu, M. Luo, W. Lin, G. Xue, P. Wang, X. Jin, C. Xu, W. Zhou, Y . Cai, W. Yang, H. Nie, and Q. Jiang, “Dlptcr: an ensemble deep learning framework for predicting immunogenic peptide recognized by t cell receptor,”Briefings in Bioinformatics, vol. 22, no. 6, Aug. 2021. [Online]. Available: http://dx.doi.org/10.1093/bib/bbab335