AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes
Pith reviewed 2026-05-24 14:55 UTC · model grok-4.3
The pith
AirLift remaps read sets to new reference genomes up to 27.4 times faster than complete remapping while preserving variant accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AirLift is the first read remapping tool that enables users to quickly and comprehensively map a read set, that had been previously mapped to one reference genome, to another similar reference. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces the overall execution time to remap read sets between two reference genome versions by up to 27.4x. We validate our remapping results with GATK and find that AirLift provides high accuracy in identifying ground truth SNP/INDEL variants.
What carries the argument
The remapping technique that adjusts alignments only in regions where the two reference genomes differ, rather than re-aligning all reads from scratch.
If this is right
- Users can quickly run downstream analysis of read sets for each latest reference release.
- Remapping execution time is reduced by up to 27.4x compared to full mapping.
- High accuracy is maintained in identifying ground truth SNP/INDEL variants as validated by GATK.
Where Pith is reading between the lines
- Analyses on large genomic datasets could become more iterative, allowing frequent incorporation of updated references without prohibitive costs.
- Similar adjustment strategies might extend to remapping in other sequencing technologies or between assemblies if similarity holds.
- Laboratories with limited compute resources could perform more variant calling studies on updated genomes.
Load-bearing premise
The two reference genomes must be similar enough that most alignments can be adjusted by handling only the differing regions without missing or incorrectly remapping a substantial fraction of reads.
What would settle it
Running AirLift and full remapping on read sets between two dissimilar reference genomes and observing a large drop in variant calling accuracy or many unmapped reads in the AirLift output.
Figures
read the original abstract
AirLift is the first read remapping tool that enables users to quickly and comprehensively map a read set, that had been previously mapped to one reference genome, to another similar reference. Users can then quickly run a downstream analysis of read sets for each latest reference release. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces the overall execution time to remap read sets between two reference genome versions by up to 27.4x. We validate our remapping results with GATK and find that AirLift provides high accuracy in identifying ground truth SNP/INDEL variants AirLift source code and readme describing how to reproduce our results are available at https://github.com/CMU-SAFARI/AirLift.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents AirLift, a read remapping tool that adjusts existing alignments between two similar reference genomes by focusing on differing intervals rather than performing full re-alignment. It claims up to 27.4x reduction in wall-clock time versus full mapping on human references (hg19↔hg38) and high accuracy in recovering ground-truth SNP/INDEL calls when downstream analysis is performed with GATK.
Significance. If the performance and accuracy results hold, the work would be useful for genomics pipelines that must periodically re-analyze large read sets against updated references; the open-source release and reproduction instructions are a concrete strength that supports reproducibility.
major comments (2)
- [Abstract and method description] The central speedup (up to 27.4×) and GATK concordance claims rest on the unquantified assumption that reference differences are localized and small enough that the fraction of reads requiring de-novo placement or crossing unhandled structural events remains negligible. The manuscript reports results only on hg19↔hg38 pairs whose differences are mostly small indels/SNVs but supplies no bound on tolerable inversion size or structural variation fraction; this directly affects both the reported execution-time reduction and variant-calling fidelity.
- [Results / validation section] Validation experiments cite GATK concordance but do not report the precise read-set sizes, coverage depths, or the handling of reads whose correct placement spans difference boundaries; without these details the claim that accuracy remains “high” cannot be assessed for generalizability beyond the tested human pairs.
minor comments (2)
- [Introduction] Define the term “similar reference” quantitatively (e.g., maximum allowed structural-event size) in the introduction.
- [Methods] Add a short algorithmic outline or pseudocode for the interval-adjustment procedure to improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We respond point-by-point to the major comments below.
read point-by-point responses
-
Referee: [Abstract and method description] The central speedup (up to 27.4×) and GATK concordance claims rest on the unquantified assumption that reference differences are localized and small enough that the fraction of reads requiring de-novo placement or crossing unhandled structural events remains negligible. The manuscript reports results only on hg19↔hg38 pairs whose differences are mostly small indels/SNVs but supplies no bound on tolerable inversion size or structural variation fraction; this directly affects both the reported execution-time reduction and variant-calling fidelity.
Authors: AirLift targets similar reference genomes whose differences are localized (primarily small indels and SNVs), as exemplified by the hg19–hg38 pair. The algorithm identifies differing intervals and only remaps reads overlapping those intervals or their immediate vicinity; reads outside differing intervals retain their original placements. We do not provide a quantitative bound on inversion size or SV fraction because the manuscript evaluates the specific case of human reference updates. We will add an explicit limitations paragraph stating the assumption of localized differences and noting that large structural events would require separate handling or full re-mapping, thereby clarifying the scope of the reported speedup and accuracy. revision: partial
-
Referee: [Results / validation section] Validation experiments cite GATK concordance but do not report the precise read-set sizes, coverage depths, or the handling of reads whose correct placement spans difference boundaries; without these details the claim that accuracy remains “high” cannot be assessed for generalizability beyond the tested human pairs.
Authors: We will revise the results and methods sections to report the exact read-set sizes, sequencing coverage depths, and the precise rule used when a read’s correct placement spans a difference boundary (such reads are extracted and re-mapped de novo by the underlying aligner). These additions will make the experimental conditions fully reproducible and allow readers to judge generalizability. revision: yes
Circularity Check
No circularity: empirical performance claims rest on direct measurements, not self-referential derivations
full rationale
The paper presents an engineering tool whose central claims (up to 27.4× wall-clock reduction versus full mapping, high GATK concordance) are obtained by running the implemented remapper on real read sets and comparing outputs to a baseline full-mapping run on identical hardware. No equations, fitted parameters, or predictions derived from the same data appear; the method description relies on explicit region detection and adjustment rather than any self-definitional or fitted-input construction. External validation via GATK supplies an independent benchmark. No self-citations are invoked as load-bearing uniqueness theorems. The similarity assumption noted by the skeptic is a scope limitation on applicability, not a circular reduction of the reported results to their inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Reference genomes are sufficiently similar that remapping alignments is feasible without full re-alignment for the majority of reads.
Reference graph
Works this paper leans on
-
[1]
: The Simons Genome Diversity Project: 300 Genomes from 142 Diverse Populations
Mallick, S., Li, H., Lipson, M., Mathieson, I., Gymrek, M., Racimo, F., Zhao, M., Chennagiri, N., Nordenfelt, S., Tandon, A., et al. : The Simons Genome Diversity Project: 300 Genomes from 142 Diverse Populations. Nature 538(7624), 201 (2016)
work page 2016
-
[2]
: Assembly of a Pan-genome from Deep Sequencing of 910 Humans of African Descent
Sherman, R.M., Forman, J., Antonescu, V., Puiu, D., Daya, M., Rafaels, N., Boorgula, M.P., Chavan, S., Vergara, C., Ortega, V.E., et al. : Assembly of a Pan-genome from Deep Sequencing of 910 Humans of African Descent. Nature Genetics 51(1), 30 (2019)
work page 2019
-
[3]
: Analysis of Error Profiles in Deep Next-Generation Sequencing Data
Ma, X., Shao, Y., Tian, L., Flasch, D.A., Mulder, H.L., Edmonson, M.N., Liu, Y., Chen, X., Newman, S., Nakitandwe, J., et al. : Analysis of Error Profiles in Deep Next-Generation Sequencing Data. Genome Biology 20(1), 50 (2019)
work page 2019
-
[4]
Nature Methods 8(1), 61 (2011) Jeremie S
Alkan, C., Sajjadian, S., Eichler, E.E.: Limitations of Next-Generation Genome Sequence Assembly. Nature Methods 8(1), 61 (2011) Jeremie S. Kim et al. Page 15 of 16
work page 2011
-
[5]
Proceedings of the IEEE 105(3), 422–435 (2017)
Steinberg, K.M., Schneider, V.A., Alkan, C., Montague, M.J., Warren, W.C., Church, D.M., Wilson, R.K.: Building and Improving Reference Genome Assemblies. Proceedings of the IEEE 105(3), 422–435 (2017)
work page 2017
-
[6]
https://www.ncbi.nlm.nih.gov/refseq/about/human/
RefSeq Curation and Annotation of the Human Reference Genome. https://www.ncbi.nlm.nih.gov/refseq/about/human/
-
[7]
https://www.ncbi.nlm.nih.gov/grc/help/patches/#frequency
Genome Reference Consortium Introduction to Patches. https://www.ncbi.nlm.nih.gov/grc/help/patches/#frequency
-
[8]
Miga, K.H., Koren, S., Rhie, A., Vollger, M.R., Gershman, A., Bzikadze, A., Brooks, S., Howe, E., Porubsky, D., Logsdon, G.A., et al.: Telomere-to-Telomere Assembly of a Complete Human X Chromosome. Nature (2020)
work page 2020
-
[9]
Guo, Y., Dai, Y., Yu, H., Zhao, S., Samuels, D.C., Shyr, Y.: Improvements and Impacts of GRCh38 Human Reference on High Throughput Sequencing Data Analysis. Genomics 109(2), 83–90 (2017)
work page 2017
-
[10]
1000 Genomes Project Consortium: A Global Reference for Human Genetic Variation. Nature 526(7571), 68 (2015)
work page 2015
-
[11]
Zheng-Bradley, X., Streeter, I., Fairley, S., Richardson, D., Clarke, L., Flicek, P., Consortium, .G.P.: Alignment of 1000 Genomes Project Reads to Reference Assembly GRCh38. GigaScience 6(7), 1–8 (2017)
work page 2017
-
[12]
Bioinformatics 27(20), 2790–2796 (2011)
Ruffalo, M., LaFramboise, T., Koyuturk, M.: Comparative Analysis of Algorithms for Next-Generation Sequencing Read Alignment. Bioinformatics 27(20), 2790–2796 (2011). doi:10.1093/bioinformatics/btr477
-
[13]
Proceedings of the IEEE 105(3), 436–458 (2015)
Canzar, S., Salzberg, S.L.: Short Read Mapping: An Algorithmic Tour. Proceedings of the IEEE 105(3), 436–458 (2015)
work page 2015
-
[14]
arXiv preprint arXiv:2003.00110 (2020)
Alser, M., Rotman, J., Taraszka, K., Shi, H., Baykal, P.I., Yang, H.T., Xue, V., Knyazev, S., Singer, B.D., Balliu, B., et al.: Technology Dictates Algorithms: Recent Developments in Read Alignment. arXiv preprint arXiv:2003.00110 (2020)
-
[15]
Alser, M., Bing¨ ol, Z., Cali, D.S., Kim, J., Ghose, S., Alkan, C., Mutlu, O.: Accelerating Genome Analysis: A Primer on an Ongoing Journey. IEEE Micro (2020)
work page 2020
-
[16]
Broad Communications: Broad Institute Sequences Its 100,000th Whole Human Genome on National DNA Day. https://www.broadinstitute.org/news/ broad-institute-sequences-its-100000th-whole-human-genome-national-dna-day
-
[17]
https://www.broadinstitute.org/blog/harnessing-flood-scaling-data-science-big-genomics-era
Ulrich, T.: Harnessing the Flood: Scaling up Data Science in the Big Genomics Era. https://www.broadinstitute.org/blog/harnessing-flood-scaling-data-science-big-genomics-era
-
[18]
Briefings in Bioinformatics 20(4), 1542–1559 (2019)
Senol Cali, D., Kim, J.S., Ghose, S., Alkan, C., Mutlu, O.: Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions. Briefings in Bioinformatics 20(4), 1542–1559 (2019)
work page 2019
-
[19]
: Genome Sequence of the Date Palm Phoenix dactylifera L
Al-Mssallem, I.S., Hu, S., Zhang, X., Lin, Q., Liu, W., Tan, J., Yu, X., Liu, J., Pan, L., Zhang, T., et al. : Genome Sequence of the Date Palm Phoenix dactylifera L. Nature Communications 4, 2274 (2013)
work page 2013
-
[20]
: Genome Sequence and Genetic Diversity of the Common Carp, Cyprinus carpio
Xu, P., Zhang, X., Wang, X., Li, J., Liu, G., Kuang, Y., Xu, J., Zheng, X., Ren, L., Wang, G., et al. : Genome Sequence and Genetic Diversity of the Common Carp, Cyprinus carpio. Nature Genetics 46(11), 1212 (2014)
work page 2014
-
[21]
: The First Korean Genome Sequence and Analysis: Full Genome Sequencing for a Socio-ethnic Group
Ahn, S.-M., Kim, T.-H., Lee, S., Kim, D., Ghang, H., Kim, D.-S., Kim, B.-C., Kim, S.-Y., Kim, W.-Y., Kim, C., et al. : The First Korean Genome Sequence and Analysis: Full Genome Sequencing for a Socio-ethnic Group. Genome Research 19(9), 1622–1629 (2009)
work page 2009
-
[22]
: The Diploid Genome Sequence of an Asian Individual
Wang, J., Wang, W., Li, R., Li, Y., Tian, G., Goodman, L., Fan, W., Zhang, J., Li, J., Zhang, J., et al. : The Diploid Genome Sequence of an Asian Individual. Nature 456(7218), 60 (2008)
work page 2008
-
[23]
: Complete Khoisan and Bantu Genomes from Southern Africa
Schuster, S.C., Miller, W., Ratan, A., Tomsho, L.P., Giardine, B., Kasson, L.R., Harris, R.S., Petersen, D.C., Zhao, F., Qi, J., et al. : Complete Khoisan and Bantu Genomes from Southern Africa. Nature 463(7283), 943 (2010)
work page 2010
-
[24]
BMC Genomics 16(1), 1093 (2015)
Huang, T., Shu, Y., Cai, Y.-D.: Genetic Differences among Ethnic Groups. BMC Genomics 16(1), 1093 (2015)
work page 2015
-
[25]
BMC Genomics 20(1), 459 (2019)
Shukla, H.G., Bawa, P.S., Srinivasan, S.: hg19KIndel: Ethnicity Normalized Human Reference Genome. BMC Genomics 20(1), 459 (2019)
work page 2019
-
[26]
https://genome.ucsc.edu/cgi-bin/hgLiftOver
UCSC: UCSC LiftOver: Lift Genome Annotations. https://genome.ucsc.edu/cgi-bin/hgLiftOver
-
[27]
http://crossmap.sourceforge.net/#use-pip-to-install-crossmap
Zhao, Hao and Sun, Zhifu and Wang, Jing and Huang, Haojie and Kocher, Jean-Pierre and Wang, Liguo: CrossMap: Convert Genome Coordinates Between Assemblies. http://crossmap.sourceforge.net/#use-pip-to-install-crossmap
-
[28]
https://pypi.org/project/segment-liftover/
Gao, B.: Segment Liftover. https://pypi.org/project/segment-liftover/
-
[29]
Gao, B., Huang, Q., Baudis, M.: Segment Liftover: A Python Tool to Convert Segments Between Genome Assemblies. F1000Research 7 (2018)
work page 2018
-
[30]
Bioinformatics 30(7), 1006–1007 (2013)
Zhao, H., Sun, Z., Wang, J., Huang, H., Kocher, J.-P., Wang, L.: CrossMap: A Versatile Tool for Coordinate Conversion Between Genome Assemblies. Bioinformatics 30(7), 1006–1007 (2013)
work page 2013
-
[31]
https://www.ncbi.nlm.nih.gov/genome/tools/remap
NCBI: NCBI Genome Remapping Service. https://www.ncbi.nlm.nih.gov/genome/tools/remap
- [32]
-
[33]
https://pypi.org/project/pyliftover/
Tretyakov, K.: PyLiftover. https://pypi.org/project/pyliftover/
-
[34]
http://samtools.github.io/hts-specs/
SAM/BAM and related specifications. http://samtools.github.io/hts-specs/
-
[35]
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
Li, H.: Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA-MEM. arXiv:1303.3997 (2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[36]
Genome Research 20(9), 1297–1303 (2010)
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20(9), 1297–1303 (2010). doi:10.1101/gr.107524.110
-
[37]
https://genome.ucsc.edu/goldenPath/help/blatSpec.html
UCSC: Blat Suite Program Specifications and User Guide. https://genome.ucsc.edu/goldenPath/help/blatSpec.html
-
[38]
Current Protocols in Bioinformatics 43(1) (2013)
Auwera, G.A., Carneiro, M.O., Hartl, C., Poplin, R., del Angel, G., Levy-Moonshine, A., Jordan, T., Shakir, K., Roazen, D., Thibault, J., Banks, E., Garimella, K.V., Altshuler, D., Gabriel, S., DePristo, M.A.: From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Current Protocols in Bioinformatics 43(1) (20...
-
[39]
: The variant call format and vcftools
Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T., et al. : The variant call format and vcftools. Bioinformatics 27(15), 2156–2158 Jeremie S. Kim et al. Page 16 of 16 (2011)
work page 2011
-
[40]
Genome Research 27(1), 157–164 (2017)
Eberle, M.A., Fritzilas, E., Krusche, P., K¨ allberg, M., Moore, B.L., Bekritsky, M.A., Iqbal, Z., Chuang, H.-Y., Humphray, S.J., Halpern, A.L., Kruglyak, S., Margulies, E.H., McVean, G., Bentley, D.R.: A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome R...
-
[41]
Zook, J.M., Chapman, B., Wang, J., Mittelman, D., Hofmann, O., Hide, W., Salit, M.: Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nature Biotechnology (2014). doi:10.1038/nbt.2835
-
[42]
Bioinformatics 32(15), 2243–2247 (2016)
Firtina, C., Alkan, C.: On genomic repeats and reproducibility. Bioinformatics 32(15), 2243–2247 (2016). doi:10.1093/bioinformatics/btw139
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.