pith. sign in

arxiv: 2602.17330 · v4 · submitted 2026-02-19 · 💻 cs.LG · cs.AI

SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework

Pith reviewed 2026-05-15 21:09 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords immune repertoire analysisMinHash prefilteringdifferentiable gatingfairness-constrained clusteringsubquadratic retrievalclonotype groupingantigen-specific subgroups
0
0 comments X

The pith

SubQuad pairs MinHash prefiltering with fairness calibration to analyze large immune repertoires at reduced quadratic cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SubQuad as a pipeline for population-scale immune repertoire comparison that tackles two bottlenecks: the near-quadratic expense of pairwise sequence affinity checks and the tendency of imbalanced datasets to bury clinically relevant minority clonotypes. It combines compact MinHash prefiltering to limit candidate pairs, a differentiable gating module that learns to weight alignment and embedding signals per pair, and an automated calibration step that enforces proportional representation of rare antigen-specific groups. A sympathetic reader would care because these steps together promise to let standard hardware handle much larger viral and tumor datasets while keeping or raising recall, cluster purity, and subgroup equity metrics.

Core claim

SubQuad is an end-to-end system that performs antigen-aware near-subquadratic retrieval, GPU-accelerated affinity kernels, learned multimodal fusion through per-pair differentiable gating, and fairness-constrained clustering, delivering measured improvements in throughput and peak memory on large repertoires while preserving or improving recall@k, cluster purity, and subgroup equity.

What carries the argument

Compact MinHash prefiltering combined with a differentiable gating module for adaptive weighting of alignment and embedding channels, plus an automated routine that enforces proportional representation of rare subgroups.

If this is right

  • Vaccine target prioritization can run on larger patient cohorts without proportional increases in compute or memory.
  • Biomarker discovery pipelines gain the ability to surface signals from underrepresented antigen subgroups.
  • Clustering results become more stable across dataset imbalance ratios without post-hoc reweighting.
  • Downstream translational tasks such as subgroup-specific response prediction become feasible on standard GPU hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same prefilter-plus-gating design could transfer to other large-scale sequence clustering domains where quadratic costs currently limit scale.
  • If the fairness calibration proves robust, it may reduce the need for separate rebalancing stages in related single-cell or metagenomic pipelines.
  • A natural next test would measure how the method behaves when the underlying embeddings are replaced by newer protein language models.

Load-bearing premise

The MinHash prefilter and gating module will not discard clinically relevant minority clonotypes and the fairness calibration will not distort the underlying biological signals.

What would settle it

A benchmark set of repertoires containing known rare clonotypes in which SubQuad reports materially lower recall for those minorities or lower equity scores than an exhaustive pairwise baseline.

Figures

Figures reproduced from arXiv: 2602.17330 by Jiekai Wu, Kun Liu, Rong Fu, Simon Fong, Xianda Li, Zijian Zhang.

Figure 1
Figure 1. Figure 1: Overview of the SubQuad framework for near-quadratic-free, equity-aware repertoire inference. Scalable Preprocessing: Raw sequences S are processed via MinHash-based Indexing to generate a sparse candidate list CAN D and optimized using hardware-aware batching B. Representation Learning: A Dual-Phase Meta-Encoder utilizes ImmunoBERT-style pretraining followed by MetaNet fine-tuning. The Meta-Controller dyn… view at source ↗
Figure 2
Figure 2. Figure 2: Community structure in immune receptor networks. Vertices denote unique CDR3 [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: reports empirical median and p98 latencies for a 107 -sequence index under the efConstruction=200 and M=16 configuration used in our experiments [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: UMAP projection of ImmunoBERT embeddings showing conserved antigen clusters. [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: F1 Score Heatmap for MinHash Parameter Selection [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Parameter optimization landscape for MinHash configurations. [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance enhancement across computational domains. [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Topological community organization in immune receptor network. Node size indicates TCR frequency, [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Feature distributions of immune receptor sequences across different antigens. Each subplot compares two [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Scale sensitivity of fairness metrics. Normalized disparity ( [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Clinical decision support dashboard with human-AI collaboration. [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
read the original abstract

Comparative analysis of adaptive immune repertoires at population scale is hampered by two practical bottlenecks: the near-quadratic cost of pairwise affinity evaluations and dataset imbalances that obscure clinically important minority clonotypes. We introduce SubQuad, an end-to-end pipeline that addresses these challenges by combining antigen-aware, near-subquadratic retrieval with GPU-accelerated affinity kernels, learned multimodal fusion, and fairness-constrained clustering. The system employs compact MinHash prefiltering to sharply reduce candidate comparisons, a differentiable gating module that adaptively weights complementary alignment and embedding channels on a per-pair basis, and an automated calibration routine that enforces proportional representation of rare antigen-specific subgroups. On large viral and tumor repertoires SubQuad achieves measured gains in throughput and peak memory usage while preserving or improving recall@k, cluster purity, and subgroup equity. By co-designing indexing, similarity fusion, and equity-aware objectives, SubQuad offers a scalable, bias-aware platform for repertoire mining and downstream translational tasks such as vaccine target prioritization and biomarker discovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces SubQuad, an end-to-end pipeline for population-scale analysis of adaptive immune repertoires. It combines compact MinHash prefiltering for near-subquadratic candidate retrieval, a differentiable gating module that adaptively fuses alignment and embedding channels on a per-pair basis, GPU-accelerated affinity kernels, and an automated fairness calibration routine that enforces proportional representation of rare antigen-specific subgroups. The central claim is that this co-design yields measured gains in throughput and peak memory usage on large viral and tumor repertoires while preserving or improving recall@k, cluster purity, and subgroup equity.

Significance. If the performance and recall guarantees are rigorously validated, SubQuad would offer a practical, bias-aware platform for repertoire mining with direct relevance to vaccine target prioritization and biomarker discovery. The explicit integration of distribution-balanced objectives with subquadratic indexing is a constructive contribution to scalable, fairness-aware methods in computational immunology.

major comments (3)
  1. [MinHash prefiltering] MinHash prefiltering component: no analytic recall bound (e.g., via Jaccard-to-affinity mapping) or empirical recall@k curves on minority subgroups in imbalanced repertoires are supplied. This is load-bearing for the claim that downstream fairness calibration operates on the true distribution rather than a filtered subset.
  2. [Evaluation and results] Evaluation section: the abstract states 'measured gains in throughput and peak memory usage' but supplies no numerical values, error bars, baseline comparisons, or ablation results. Without these data the central performance assertions cannot be verified.
  3. [Differentiable gating module] Differentiable gating and calibration: the description of the learned gating module and automated calibration does not specify whether parameters are fit on held-out data or the same evaluation set, raising a circularity risk for the reported recall@k and equity metrics.
minor comments (1)
  1. [Abstract] Abstract: inclusion of at least one concrete quantitative result (e.g., 'X-fold throughput improvement at Y% recall') would make the claims more immediately assessable.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive review of our manuscript. We address each major comment below and commit to revisions that strengthen the clarity and rigor of the SubQuad pipeline description.

read point-by-point responses
  1. Referee: [MinHash prefiltering] MinHash prefiltering component: no analytic recall bound (e.g., via Jaccard-to-affinity mapping) or empirical recall@k curves on minority subgroups in imbalanced repertoires are supplied. This is load-bearing for the claim that downstream fairness calibration operates on the true distribution rather than a filtered subset.

    Authors: We agree that an analytic recall bound would strengthen the theoretical claims. Deriving a tight closed-form Jaccard-to-affinity mapping under our learned multimodal fusion is non-trivial, but we will add extensive empirical recall@k curves with explicit breakdowns for minority antigen-specific subgroups across imbalanced viral and tumor repertoires. These results will be placed in a dedicated subsection of the evaluation to demonstrate that prefiltering preserves the underlying distribution for fairness calibration. revision: yes

  2. Referee: [Evaluation and results] Evaluation section: the abstract states 'measured gains in throughput and peak memory usage' but supplies no numerical values, error bars, baseline comparisons, or ablation results. Without these data the central performance assertions cannot be verified.

    Authors: We acknowledge that the abstract and evaluation section require explicit numerical support. We will revise the abstract to report concrete throughput and memory gains with error bars from repeated runs. The evaluation section will be expanded to include full baseline comparisons (standard MinHash, embedding-only, alignment-only, and fairness-unaware clustering) together with ablation studies on each component, reporting all metrics (recall@k, cluster purity, subgroup equity) with standard deviations. revision: yes

  3. Referee: [Differentiable gating module] Differentiable gating and calibration: the description of the learned gating module and automated calibration does not specify whether parameters are fit on held-out data or the same evaluation set, raising a circularity risk for the reported recall@k and equity metrics.

    Authors: We thank the referee for identifying this ambiguity. The gating module parameters and fairness calibration routine are fit exclusively on held-out validation sets; final recall@k and equity metrics are computed on completely disjoint test sets. We will add an explicit description of the train/validation/test splits and training protocol in the methods section to remove any risk of circularity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; pipeline claims rest on empirical measurements rather than self-referential definitions

full rationale

The abstract and available description present SubQuad as an end-to-end pipeline combining MinHash prefiltering, a differentiable gating module, multimodal fusion, and fairness-constrained clustering. No equations, derivation steps, or self-citations are exhibited that reduce any claimed prediction or uniqueness result to a fitted parameter or prior author result by construction. The reported gains in throughput, memory, recall@k, purity, and equity are framed as measured outcomes on viral and tumor repertoires, with no indication that any core quantity is defined in terms of itself or renamed from a known result. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claim rests on domain assumptions about approximate retrieval preserving recall and fairness objectives not distorting biology; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption MinHash prefiltering combined with learned gating preserves recall for antigen-specific minority clonotypes
    Invoked to justify near-subquadratic cost without loss of clinically relevant signals.
  • domain assumption Automated calibration can enforce proportional subgroup representation without introducing new bias
    Used to claim equity gains while preserving cluster purity.

pith-pipeline@v0.9.0 · 5487 in / 1246 out tokens · 32589 ms · 2026-05-15T21:09:01.056103+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 1 internal anchor

  1. [1]

    Nguyen, and Ilya Razenshteyn

    Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. Beyond locality-sensitive hashing. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1018–1028. SIAM, 2014

  2. [2]

    Subquadratic high-dimensional hierarchical clustering.Advances in Neural Information Processing Systems, 32, 2019

    Amir Abboud, Vincent Cohen-Addad, and Hussein Houdrouge. Subquadratic high-dimensional hierarchical clustering.Advances in Neural Information Processing Systems, 32, 2019

  3. [3]

    Darwin: A hardware-acceleration framework for genomic sequence alignment.Biorxiv, page 092171, 2017

    Yatish Turakhia, Kevin Jie Zheng, Gill Bejerano, and William J Dally. Darwin: A hardware-acceleration framework for genomic sequence alignment.Biorxiv, page 092171, 2017

  4. [4]

    Genomics-gpu: a benchmark suite for gpu-accelerated genome analysis

    Zhuren Liu, Shouzhe Zhang, Justin Garrigus, and Hui Zhao. Genomics-gpu: a benchmark suite for gpu-accelerated genome analysis. In2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 178–188. IEEE, 2023

  5. [5]

    Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2019

    Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2019

  6. [6]

    Soar: improved indexing for approximate nearest neighbor search.Advances in Neural Information Processing Systems, 36:3189–3204, 2023

    Philip Sun, David Simcha, Dave Dopson, Ruiqi Guo, and Sanjiv Kumar. Soar: improved indexing for approximate nearest neighbor search.Advances in Neural Information Processing Systems, 36:3189–3204, 2023

  7. [7]

    Konstantinidis

    Jianshu Zhao, Jean Pierre Both, Luis M Rodriguez-R, and Konstantinos T. Konstantinidis. Gsearch: ultra-fast and scalable genome search by combining k-mer hashing with hierarchical navigable small world graphs.Nucleic Acids Research, 52(16):e74, 2024. doi: 10.1093/nar/gkae609

  8. [8]

    PhD thesis, Johannes Gutenberg-Universität Mainz, 2023

    Robin Kobus.Accelerating bioinformatics applications on CUDA-enabled multi-GPU systems. PhD thesis, Johannes Gutenberg-Universität Mainz, 2023

  9. [9]

    Fed: Fast and efficient dataset deduplication framework with gpu acceleration.arXiv preprint arXiv:2501.01046, 2025

    Youngjun Son, Chaewon Kim, and Jaejin Lee. Fed: Fast and efficient dataset deduplication framework with gpu acceleration.arXiv preprint arXiv:2501.01046, 2025

  10. [10]

    Cs-phylo: Accelerating evolutionary distance estimation with closed syncmer-enhanced minhash

    Fajun Huang, Huan Liu, Hongyu Ou, Mengyuan Wang, and Xuhui Zuo. Cs-phylo: Accelerating evolutionary distance estimation with closed syncmer-enhanced minhash. InInternational Conference on Intelligent Computing (ICIC 2025), pages 80–91. Springer, 2025

  11. [11]

    Survey of protein sequence embedding models.Interna- tional Journal of Molecular Sciences, 24(4):3775, 2023

    Chau Tran, Siddharth Khadkikar, and Aleksey Porollo. Survey of protein sequence embedding models.Interna- tional Journal of Molecular Sciences, 24(4):3775, 2023

  12. [12]

    Interpreting bert architecture predictions for peptide presentation by mhc class i proteins.arXiv preprint arXiv:2111.07137, 2021

    Hans-Christof Gasser, Georges Bedran, Bo Ren, David Goodlett, Javier Alfaro, and Ajitha Rajan. Interpreting bert architecture predictions for peptide presentation by mhc class i proteins.arXiv preprint arXiv:2111.07137, 2021

  13. [13]

    Multiple sequence alignment-based rna language model and its application to structural inference.Nucleic Acids Research, 52(1):e3, 2024

    Yikun Zhang, Mei Lang, Jiuhong Jiang, Zhiqiang Gao, Fan Xu, Thomas Litfin, Ke Chen, Jaswinder Singh, Xiansong Huang, Guoli Song, et al. Multiple sequence alignment-based rna language model and its application to structural inference.Nucleic Acids Research, 52(1):e3, 2024. doi: 10.1093/nar/gkad1031

  14. [14]

    Transfusion: Multi-modal fusion for video tag inference via translation-based knowledge embedding

    Di Jin, Zhongang Qi, Yingmin Luo, and Ying Shan. Transfusion: Multi-modal fusion for video tag inference via translation-based knowledge embedding. InProceedings of the 29th ACM International Conference on Multimedia, pages 1093–1101, 2021

  15. [15]

    Multimodal fusion refiner networks.arXiv preprint arXiv:2104.03435, 2021

    Sethuraman Sankaran, David Yang, and Ser-Nam Lim. Multimodal fusion refiner networks.arXiv preprint arXiv:2104.03435, 2021

  16. [16]

    Mfeclip: Clip with mapping-fusion embedding for text-guided image editing.IEEE Signal Processing Letters, 31:116–120, 2023

    Fei Wu, Yongheng Ma, Hao Jin, Xiao-Yuan Jing, and Guo-Ping Jiang. Mfeclip: Clip with mapping-fusion embedding for text-guided image editing.IEEE Signal Processing Letters, 31:116–120, 2023

  17. [17]

    M3l: Language-based video editing via multi-modal multi-level transformers

    Tsu-Jui Fu, Xin Eric Wang, Scott T Grafton, Miguel P Eckstein, and William Yang Wang. M3l: Language-based video editing via multi-modal multi-level transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10513–10522, 2022. 11 SubQuad

  18. [18]

    Learning discrete structures for graph neural networks

    Luca Franceschi, Mathias Niepert, Massimiliano Pontil, and Xiao He. Learning discrete structures for graph neural networks. InInternational conference on machine learning, pages 1972–1982. PMLR, 2019

  19. [19]

    Community detection in protein-protein interaction networks and applications.IEEE/ACM Transactions on Computational Biology and Bioinformatics, 20(1):217–237, 2021

    Ichcha Manipur, Maurizio Giordano, Marina Piccirillo, Seetharaman Parashuraman, and Lucia Maddalena. Community detection in protein-protein interaction networks and applications.IEEE/ACM Transactions on Computational Biology and Bioinformatics, 20(1):217–237, 2021

  20. [20]

    Algorithmic decision making and the cost of fairness.Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797–806, 2017

    Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness.Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797–806, 2017

  21. [21]

    Fairness, semi-supervised learning, and more: A general framework for clustering with stochastic pairwise constraints

    Brian Brubach, Darshan Chakrabarti, John P Dickerson, Aravind Srinivasan, and Leonidas Tsepenekas. Fairness, semi-supervised learning, and more: A general framework for clustering with stochastic pairwise constraints. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 6822–6830, 2021

  22. [22]

    Constrained clustering: general pairwise and cardinality constraints.IEEE Access, 11:5824–5836, 2023

    Adel Bibi, Ali Alqahtani, and Bernard Ghanem. Constrained clustering: general pairwise and cardinality constraints.IEEE Access, 11:5824–5836, 2023

  23. [23]

    Doubly constrained fair clustering

    John Dickerson, Seyed Esmaeili, Jamie H Morgenstern, and Claire Jie Zhang. Doubly constrained fair clustering. Advances in Neural Information Processing Systems, 36:13267–13293, 2023

  24. [24]

    Fairness-aware clique-preserving spectral clustering of temporal graphs

    Dongqi Fu, Dawei Zhou, Ross Maciejewski, Arie Croitoru, Marcus Boyd, and Jingrui He. Fairness-aware clique-preserving spectral clustering of temporal graphs. InProceedings of the ACM Web Conference (WWW), pages 3755–3765, 2023

  25. [25]

    Diversifying the genomic data science research community.Genome Research, 32(7):1231–1241, 2022

    Rosa Alcazar, Maria Alvarez, Rachel Arnold, Mentewab Ayalew, et al. Diversifying the genomic data science research community.Genome Research, 32(7):1231–1241, 2022

  26. [26]

    Fairness-enhancing mixed effects deep learning improves fairness on in-and out-of-distribution clustered (non-iid) data.arXiv preprint arXiv:2310.03146, 2023

    Son Nguyen, Adam Wang, and Albert Montillo. Fairness-enhancing mixed effects deep learning improves fairness on in-and out-of-distribution clustered (non-iid) data.arXiv preprint arXiv:2310.03146, 2023

  27. [27]

    Empowering bioinformatics communities with nextflow and nf-core.Genome Biology, 26(1):228, 2025

    Björn E Langer, Andreia Amaral, Marie-Odile Baudement, et al. Empowering bioinformatics communities with nextflow and nf-core.Genome Biology, 26(1):228, 2025

  28. [28]

    Fairly big: A framework for computationally reproducible processing of large-scale data.Scientific Data, 9(1):80, 2022

    Adina S Wagner, Laura K Waite, Małgorzata Wierzba, Felix Hoffstaedter, et al. Fairly big: A framework for computationally reproducible processing of large-scale data.Scientific Data, 9(1):80, 2022

  29. [29]

    Metanet: a scalable and integrated tool for reproducible omics network analysis.bioRxiv, pages 2025–06, 2025

    Chen Peng, Zinuo Huang, Xin Wei, Liuyiqi Jiang, Xiaoping Zhu, Zhen Liu, Qiong Chen, Xiaotao Shen, Peng Gao, and Chao Jiang. Metanet: a scalable and integrated tool for reproducible omics network analysis.bioRxiv, pages 2025–06, 2025

  30. [30]

    Berttcr: a bert-based deep learning framework for predicting cancer-related immune status based on t cell receptor repertoire.Briefings in Bioinformatics, 25(5):bbae420, 2024

    Min Zhang, Qi Cheng, Zhenyu Wei, Jiayu Xu, Shiwei Wu, Nan Xu, Chengkui Zhao, Lei Yu, and Weixing Feng. Berttcr: a bert-based deep learning framework for predicting cancer-related immune status based on t cell receptor repertoire.Briefings in Bioinformatics, 25(5):bbae420, 2024

  31. [31]

    Tcr-pmhc binding specificity prediction from structure using graph neural networks.IEEE Transactions on Computational Biology and Bioinformatics, 2025

    Jared K Slone, Anja Conev, Mauricio M Rigo, Alexandre Reuben, and Lydia E Kavraki. Tcr-pmhc binding specificity prediction from structure using graph neural networks.IEEE Transactions on Computational Biology and Bioinformatics, 2025

  32. [32]

    Analyzing immunomes using sequence embedding and network analysis

    Kristina Motuzenko and Ilya Makarov. Analyzing immunomes using sequence embedding and network analysis. In2023 IEEE 21st World Symposium on Applied Machine Intelligence and Informatics (SAMI), pages 000325– 000330. IEEE, 2023

  33. [33]

    Heterotcr: A heterogeneous graph neural network-based method for predicting peptide-tcr interaction.Communications Biology, 7(1):684, 2024

    Zilan Yu, Mengnan Jiang, and Xun Lan. Heterotcr: A heterogeneous graph neural network-based method for predicting peptide-tcr interaction.Communications Biology, 7(1):684, 2024

  34. [34]

    Giana allows computationally-efficient tcr clustering and multi-disease repertoire classification by isometric transformation.Nature communications, 12(1):4699, 2021

    Hongyi Zhang, Xiaowei Zhan, and Bo Li. Giana allows computationally-efficient tcr clustering and multi-disease repertoire classification by isometric transformation.Nature communications, 12(1):4699, 2021

  35. [35]

    Large-scale gpu-based network analysis of the human t-cell receptor repertoire.arXiv preprint arXiv:2112.06613, 2021

    Paul Richter. Large-scale gpu-based network analysis of the human t-cell receptor repertoire.arXiv preprint arXiv:2112.06613, 2021. 12 SubQuad

  36. [36]

    Tcrmatch: predicting t-cell receptor specificity based on sequence similarity to previously characterized receptors.Frontiers in immunology, 12: 640725, 2021

    William D Chronister, Austin Crinklaw, Swapnil Mahajan, Randi Vita, Zeynep Ko¸ salo˘glu-Yalçın, Zhen Yan, Jason A Greenbaum, Leon E Jessen, Morten Nielsen, Scott Christley, et al. Tcrmatch: predicting t-cell receptor specificity based on sequence similarity to previously characterized receptors.Frontiers in immunology, 12: 640725, 2021

  37. [37]

    Nair: network analysis of immune repertoire.Frontiers in Immunology, 14:1181825, 2023

    Hai Yang, Jason Cham, Brian Patrick Neal, Zenghua Fan, Tao He, and Li Zhang. Nair: network analysis of immune repertoire.Frontiers in Immunology, 14:1181825, 2023

  38. [38]

    xtrimopglm: unified 100b-scale pre-trained transformer for deciphering the language of protein

    Bo Chen, Xingyi Cheng, Pan Li, Yangli-ao Geng, Jing Gong, Shen Li, Zhilei Bei, Xu Tan, Boyan Wang, Xin Zeng, et al. xtrimopglm: unified 100b-scale pre-trained transformer for deciphering the language of protein.arXiv preprint arXiv:2401.06199, 2024

  39. [39]

    Vdjdb: a curated database of t-cell receptor sequences with known antigen specificity.Nucleic acids research, 46(D1):D419–D427, 2018

    Mikhail Shugay, Dmitriy V Bagaev, Ivan V Zvyagin, Renske M Vroomans, Jeremy Chase Crawford, Garry Dolton, Ekaterina A Komech, Anastasiya L Sycheva, Anna E Koneva, Evgeniy S Egorov, et al. Vdjdb: a curated database of t-cell receptor sequences with known antigen specificity.Nucleic acids research, 46(D1):D419–D427, 2018

  40. [40]

    Mcpas-tcr: a manually curated catalogue of pathology-associated t cell receptor sequences.Bioinformatics, 33(18):2924–2929, 2017

    Nili Tickotsky, Tal Sagiv, Jaime Prilusky, Eric Shifrut, and Nir Friedman. Mcpas-tcr: a manually curated catalogue of pathology-associated t cell receptor sequences.Bioinformatics, 33(18):2924–2929, 2017

  41. [41]

    Evolutionary-scale prediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023

    Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023

  42. [42]

    Protst: Multi-modality learning of protein sequences and biomedical texts

    Minghao Xu, Xinyu Yuan, Santiago Miret, and Jian Tang. Protst: Multi-modality learning of protein sequences and biomedical texts. InInternational Conference on Machine Learning, pages 38749–38767. PMLR, 2023

  43. [43]

    Nepdb: a database of t-cell experimentally-validated neoantigens and pan-cancer predicted neoepitopes for cancer immunotherapy

    Jiaqi Xia, Peng Bai, Weiliang Fan, Qiming Li, Yongzheng Li, Dehe Wang, Lei Yin, and Yu Zhou. Nepdb: a database of t-cell experimentally-validated neoantigens and pan-cancer predicted neoepitopes for cancer immunotherapy. Frontiers in Immunology, 12:644637, 2021. A Repertoire-Level Distance Measure To compare two immune repertoires at the library scale we ...