pith. machine review for the scientific record. sign in

arxiv: 2604.11272 · v1 · submitted 2026-04-13 · 💻 cs.LG · cs.AI

Recognition: no theorem link

AbLWR:A Context-Aware Listwise Ranking Framework for Antibody-Antigen Binding Affinity Prediction via Positive-Unlabeled Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:57 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords antibody-antigen affinitylistwise rankingpositive-unlabeled learningcontrastive learningself-attentionbinding predictiontherapeutic design
0
0 comments X

The pith

AbLWR reformulates antibody-antigen affinity prediction as listwise ranking to handle sparse labels and subtle variations better.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AbLWR to improve prediction of how well antibodies bind to antigens, a key step in designing therapies. It changes the standard approach of directly predicting affinity numbers into a ranking task where candidates are ordered by predicted strength. Positive-unlabeled learning helps deal with the fact that many potential non-binders are unlabeled by using contrastive methods and smart label adjustment. Sampling similar antigens and using attention to compare them within lists allows the model to notice small differences that matter for distinguishing mutations. This results in more accurate top picks for lab testing, shown by better performance on real cases like flu and IL-33.

Core claim

AbLWR reformulates the conventional affinity regression task as a listwise ranking problem. To mitigate label sparsity, it incorporates a PU learning mechanism leveraging a dual-level contrastive objective and meta-optimized label refinement to learn robust representations. It addresses antigenic variation by employing a homologous antigen sampling strategy where Multi-Head Self-Attention explicitly models inter-sample relationships within training lists to capture subtle affinity nuances. Experiments demonstrate that AbLWR significantly outperforms state-of-the-art baselines, improving the Precision@1 by over 10% in randomized cross-validation experiments, with case studies validating its实用

What carries the argument

The listwise ranking reformulation with positive-unlabeled learning that uses dual-level contrastive objectives and meta-optimized label refinement, combined with homologous antigen sampling and multi-head self-attention to model inter-sample relationships.

If this is right

  • Improved precision in selecting top antibody candidates for laboratory screening.
  • More reliable distinction of subtle differences caused by viral mutations.
  • Reduced impact of missing labels on the quality of affinity predictions.
  • Better prioritization of candidates in therapeutic antibody design workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could extend to other sparse-label ranking tasks in biology such as protein-protein interactions.
  • If the attention mechanism generalizes, it might apply to modeling relationships in other sequence-based predictions.
  • Success here suggests that listwise approaches may outperform pointwise regression in similar affinity or binding problems.

Load-bearing premise

The dual-level contrastive objective and meta-optimized label refinement in positive-unlabeled learning, together with attention on homologous antigens, can capture subtle affinity differences without introducing bias from the specific training data.

What would settle it

Running the model on a held-out set of antibody-antigen pairs with independently measured affinities and finding that the top-ranked predictions do not match the true highest-affinity ones better than existing methods.

Figures

Figures reproduced from arXiv: 2604.11272 by Dongxu Zhang, Fan Xu, Haohuai He, Kay Chen Tan, Wei Liu, Yao Hu, Yidong Song, Zhi-An Huang.

Figure 1
Figure 1. Figure 1: Key challenges in Ab-Ag binding affinity prediction. (a) Distinct affinity profiles of multiple antibodies binding to the same antigen. (b) The large reservoir of unlabeled data remains largely unexploited by existing supervised paradigms, compared with the immense volume of antibody and antigen sequences. (c) Analysis of existing binding affinity data from public datasets (refer to Appendix Section A.1 fo… view at source ↗
Figure 2
Figure 2. Figure 2: The overall framework of AbLWR. The architecture comprises three key components: (a) PU Pre-training: Dual GNN encoders learn robust representations from the combined dataset (D) by jointly optimizing a composite contrastive objective (LIns + LClus) and a cross entropy loss (LCE) with meta-reweighted pseudo-labels. (b) List Sampling: A homologous sampling strategy constructs informative training lists T ba… view at source ↗
Figure 4
Figure 4. Figure 4: Case studies on FNI17 antibody mechanism and can￾didates screening for IL-33 antigen. (a) Comparison of ground truth ranks and predicted ranks. Lines connect the same strain across the two rankings. (b) Dual-axis plot showing the ground truth values and predicted scores for the corresponding viral strains. (c) Hit rate curve for identifying the top-1 binder to the IL-33 antigen. (d) Recall curve showing th… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of ranking predictions on 18 sampled lists. The heatmap displays the ground truth ranks mapped to the model’s predicted positions. Each row represents a test list colored by their ground truth rank, and columns Pred1–Pred5 denote the predicted order. The adjacent bar chart shows the corresponding Kτ calculated for each list row. We further analyzed the influenza virus dataset from two complem… view at source ↗
Figure 5
Figure 5. Figure 5: Binding affinity statistics. (a) Affinity variance per antigen. (b) Affinity distribution for the top 10 most represented antigen groups. whether the model can effectively screen for binders that possess low sequence identity to the majority of the training data, a critical requirement for discovering diverse therapeutic candidates. To facilitate comparison with baseline methods that require explicit bindi… view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of sequence clustering. (a) Antigen sequence clustering. Cluster 0 is identified as a divergent group distinct from the main distribution and is used to define the Ag-based split. (b) Antibody sequence clustering based on CDRs. Cluster 1 and Cluster 5 exhibit significant deviation from the main trend and are selected for the Ab-based split. In both scenarios, these distinct clusters serve as … view at source ↗
Figure 7
Figure 7. Figure 7: Detailed visualization of ranking performance on (a) influenza viruses and (b) human IL-33. The comparison includes (left) Confusion Matrices between predicted and ground truth ranks, (middle) Distribution of Ground Truth Ranks for candidates predicted as Top-1, and (right) Boxplots showing the true affinity value distribution across predicted rank positions [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of ranking predictions on 18 sampled lists by four representative data driven baseline models. (a) AntiBERTy (sequence-based); (b) AbRank (structure-based pairwise ranking); (c) GraphDTA (structure-based); and (d) ANTIPASTI (complex-based). 22 [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Analysis of FNI17 antibody predictions by four baselines. Models: (a) AntiBERTy (sequence-based); (b) AbRank (structure￾based pairwise ranking); (c) GraphDTA (structure-based); (d) ANTIPASTI (complex-based). 23 [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
read the original abstract

Accurate prediction of antibody-antigen binding affinity is fundamental to therapeutic design, yet remains constrained by severe label sparsity and the complexity of antigenic variations. In this paper, we propose AbLWR (Antibody-antigen binding affinity List-Wise Ranking), a novel framework that reformulates the conventional affinity regression task as a listwise ranking problem. To mitigate label sparsity, AbLWR incorporates a PU (Positive-Unlabeled) learning mechanism leveraging a dual-level contrastive objective and meta-optimized label refinement to learn robust representations. Furthermore, we address antigenic variation by employing a homologous antigen sampling strategy where Multi-Head Self-Attention (MHSA) explicitly models inter-sample relationships within training lists to capture subtle affinity nuances. Extensive experiments demonstrate that AbLWR significantly outperforms state-of-the-art baselines, improving the Precision@1 (P@1) by over 10$\%$ in randomized cross-validation experiments. Notably, case studies on Influenza and IL-33 validate its practical utility, demonstrating robust ranking consistency in distinguishing subtle viral mutations and efficiently prioritizing top-tier candidates for wet-lab screening.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript proposes AbLWR, a context-aware listwise ranking framework for antibody-antigen binding affinity prediction. It reformulates the conventional regression task as listwise ranking, incorporates positive-unlabeled (PU) learning via a dual-level contrastive objective and meta-optimized label refinement to address label sparsity, and uses homologous antigen sampling with multi-head self-attention (MHSA) to model inter-sample relationships and capture subtle affinity differences. The central claims are that AbLWR significantly outperforms state-of-the-art baselines (improving Precision@1 by over 10% in randomized cross-validation) and demonstrates practical utility in case studies on Influenza and IL-33 for distinguishing viral mutations and prioritizing candidates.

Significance. If the reported gains hold under homology-aware evaluation, the framework could meaningfully advance therapeutic antibody design by better handling severe label sparsity and antigenic variation through listwise ranking and attention-based modeling. The case studies provide concrete evidence of utility for wet-lab prioritization, which is a strength if the underlying performance claims are robust.

major comments (1)
  1. [§4] §4 (Experimental Setup and Results): The central claim of >10% P@1 improvement rests on randomized cross-validation. The description does not specify homology-aware splitting (e.g., sequence clustering or identity thresholds <30% to prevent leakage). Given that the method explicitly employs homologous antigen sampling within training lists and MHSA for inter-sample modeling, near-identical or evolutionarily related sequences may appear across folds. This risks the observed gains being artifacts of memorization rather than genuine capture of subtle affinity differences by the dual-level contrastive + meta-refinement PU mechanism. Please clarify the exact splitting protocol (with code or pseudocode) and, if needed, re-run experiments with cluster-based CV to validate the claim.
minor comments (3)
  1. [Abstract] Abstract and §1: The improvement is stated as 'over 10%' without exact value, standard deviation, or per-baseline breakdown; add these for precision.
  2. [§3] §3 (Method): The meta-optimized label refinement step lacks an explicit equation or algorithm box; providing one would improve clarity of the PU mechanism.
  3. [§4.3] §4.3 (Case Studies): More details on how 'robust ranking consistency' was quantified (e.g., specific metrics for mutation distinction) would strengthen the practical utility argument.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comment on the experimental validation protocol raises an important point about potential data leakage, and we address it directly below.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental Setup and Results): The central claim of >10% P@1 improvement rests on randomized cross-validation. The description does not specify homology-aware splitting (e.g., sequence clustering or identity thresholds <30% to prevent leakage). Given that the method explicitly employs homologous antigen sampling within training lists and MHSA for inter-sample modeling, near-identical or evolutionarily related sequences may appear across folds. This risks the observed gains being artifacts of memorization rather than genuine capture of subtle affinity differences by the dual-level contrastive + meta-refinement PU mechanism. Please clarify the exact splitting protocol (with code or pseudocode) and, if needed, re-run experiments with cluster-based CV to validate the claim.

    Authors: We appreciate the referee's careful reading and agree that the current description of the splitting protocol is insufficiently detailed. The experiments in the manuscript use randomized partitioning of individual antibody-antigen pairs into folds at the sample level, with no explicit homology filtering applied across folds. Homologous antigen sampling and MHSA are used only to construct training lists within each fold. To address the concern, we will revise §4 to include: (1) an explicit statement of the randomized sample-level splitting procedure, (2) pseudocode for the full data partitioning and list construction pipeline, and (3) new results from homology-aware cross-validation. For the latter, we will cluster sequences using a standard tool (e.g., CD-HIT at 30% identity threshold) and ensure that no sequences from the same cluster appear in both training and test folds, then re-report Precision@1 and other metrics. This will allow direct assessment of whether the >10% gains persist when leakage from evolutionary relatedness is prevented. We believe these additions will strengthen the evidence that the improvements stem from the listwise ranking and PU-learning components rather than memorization. revision: yes

Circularity Check

0 steps flagged

No circularity: framework proposal with independent empirical claims

full rationale

The paper introduces AbLWR as a new listwise ranking reformulation of affinity prediction, augmented by a PU-learning mechanism (dual-level contrastive objective plus meta-optimized label refinement) and homologous antigen sampling with MHSA. No equations or steps are shown that define a target quantity in terms of itself, rename a fitted parameter as a prediction, or reduce the central performance claim to a self-citation chain. The reported >10% P@1 gains are presented as outcomes of randomized cross-validation experiments and case studies rather than tautological consequences of the method's own inputs. The derivation chain is therefore self-contained architectural design plus external validation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so specific free parameters, axioms, or invented entities cannot be identified. The approach likely relies on standard deep-learning hyperparameters and domain assumptions about positive-unlabeled data distributions and homology-based sampling.

pith-pipeline@v0.9.0 · 5515 in / 1342 out tokens · 58623 ms · 2026-05-10T15:57:22.219709+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Massively parallel singlecell b-cell receptor sequencing enables rapid discovery of diverse antigen- reactive antibodies

    Goldstein, L., Chen, Y ., Wu, J., Chaudhuri, S., Hsiao, Y ., Schneider, K., Hoi, K., Lin, Z., Guerrero, S., Jaiswal, B., et al. Massively parallel singlecell b-cell receptor sequencing enables rapid discovery of diverse antigen- reactive antibodies. commun biol. 2019; 2: 304,

  2. [2]

    Huang, Y ., Zhang, Z., and Zhou, Y

    doi: 10.1109/TPAMI.2023.3298332. Huang, Y ., Zhang, Z., and Zhou, Y . Abagintpre: A deep learning method for predicting antibody-antigen interac- tions based on sequence information.Frontiers in Im- munology, 13:1053617,

  3. [3]

    Semi-Supervised Classification with Graph Convolutional Networks

    Kipf, T. Semi-supervised classification with graph con- volutional networks.arXiv preprint arXiv:1609.02907,

  4. [4]

    C., Paige, B., and Mart ´ınez, M

    10 Liu, C., Pelissier, A., Shao, Y ., Denzler, L., Martin, A. C., Paige, B., and Mart ´ınez, M. R. Abrank: A benchmark dataset and metric-learning framework for antibody-antigen affinity ranking.arXiv preprint arXiv:2506.17857,

  5. [5]

    A., Gray, J

    Ruffolo, J. A., Gray, J. J., and Sulam, J. Decipher- ing antibody affinity maturation with language mod- els and weakly supervised learning.arXiv preprint arXiv:2112.07782,

  6. [6]

    M., Steiger, A

    Shanehsazzadeh, A., Bachas, S., McPartlon, M., Kasun, G., Sutton, J. M., Steiger, A. K., Shuai, R., Kohnert, C., Rakocevic, G., Gutierrez, J. M., et al. Unlocking de novo antibody design with generative artificial intelligence. BioRxiv, pp. 2023–01,

  7. [7]

    J., Li, X., et al

    Wan, H., Gao, J., Yang, H., Yang, S., Harvey, R., Chen, Y .-Q., Zheng, N.-Y ., Chang, J., Carney, P. J., Li, X., et al. The neuraminidase of a (h3n2) influenza viruses circulat- ing since 2016 is antigenically distinct from the a/hong kong/4801/2014 vaccine strain.Nature Microbiology, 4 (12):2216–2225,

  8. [8]

    Con- trastive learning with negative sampling correction.arXiv preprint arXiv:2401.08690,

    Wang, L., Du, C., Zhao, P., Luo, C., Zhu, Z., Qiao, B., Zhang, W., Lin, Q., Rajmohan, S., Zhang, D., et al. Con- trastive learning with negative sampling correction.arXiv preprint arXiv:2401.08690,

  9. [9]

    A generative foundation model for antibody design.BioRxiv, pp

    11 Wang, R., Wu, F., Shi, J., Song, Y ., Kong, Y ., Ma, J., He, B., Yan, Q., Ying, T., Zhao, P., et al. A generative foundation model for antibody design.BioRxiv, pp. 2025–09,

  10. [10]

    Boltz-1 democratizing biomolecular interaction modeling.BioRxiv, pp

    Wohlwend, J., Corso, G., Passaro, S., Getz, N., Reveiz, M., Leidal, K., Swiderski, W., Atkinson, L., Portnoi, T., Chinn, I., et al. Boltz-1 democratizing biomolecular interaction modeling.BioRxiv, pp. 2024–11,

  11. [11]

    Details for Data Collection and Analysis A.1

    12 A. Details for Data Collection and Analysis A.1. Data Collection The distribution of Ab-Ag pairs across different data sources is available from Liu et al. (Liu et al., 2025). Brief descriptions are provided below: • RBD-escape (Greaney et al., 2022):Derived from Deep Mutational Scanning (DMS) experiments, this dataset quantifies how mutations in the S...

  12. [12]

    GNN Encoders The graph encoders are implemented using the PyTorch Geometric library

    to encode the antigen sequence: H(0) Ag =ESM-2(S Ag)∈R LAg×1280.(9) B.2. GNN Encoders The graph encoders are implemented using the PyTorch Geometric library. As detailed in Equation (7), the adjacency matrix A is constructed based on a spatial distance threshold of 4.5 ˚A between residue centroids. We employ GCNConv layers with symmetric normalization. Le...

  13. [13]

    Details for Ranking Module The Ranking Module first projects the input matrix ET ∈R K×2dout to a hidden dimension dr, yielding the initial features Z(0) ∈R K×d r

    12:Updateθ ϕ, θrank ←Optimizer(∇L ListMLE) 13:end for 14:end for D. Details for Ranking Module The Ranking Module first projects the input matrix ET ∈R K×2dout to a hidden dimension dr, yielding the initial features Z(0) ∈R K×d r. Then, Nr layers of ISAB are applied with M learnable inducing points I∈R M×d r for robust context modeling. The update rule fo...

  14. [14]

    While AbRank (Figure 9 (b)) captures the general ranking trend, this global alignment masks a critical deficiency in practical screening scenarios

    Consistent with the results on the sampled lists, AntiBERTy, GraphDTA, and ANTIPASTI failed to capture the correct ranking relationship (lines crossing significantly in Figure 9 (a), (c), (d)). While AbRank (Figure 9 (b)) captures the general ranking trend, this global alignment masks a critical deficiency in practical screening scenarios. As shown in Fig...