StereoFactory: A Unified Merging Framework for Robust Stereo Matching

Mang Ye; Pinhan Fu; Qin Zou; Ruilin Wang; Wenke Huang; Xianda Guo

arxiv: 2606.17475 · v1 · pith:DPIIBCG2new · submitted 2026-06-16 · 💻 cs.CV

StereoFactory: A Unified Merging Framework for Robust Stereo Matching

Xianda Guo , Pinhan Fu , Ruilin Wang , Wenke Huang , Mang Ye , Qin Zou This is my paper

Pith reviewed 2026-06-27 01:58 UTC · model grok-4.3

classification 💻 cs.CV

keywords stereo matchingmodel merginggenetic algorithmCMA-ESevolutionary optimizationfoundation modelstask vectorscomputer vision

0 comments

The pith

StereoFactory merges stereo models with genetic search and CMA-ES routing to beat retraining on benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that post-hoc model merging can overcome the scalability issues of foundation models in stereo matching by adaptively combining knowledge from multiple checkpoints. It introduces a two-stage process where a genetic algorithm first selects which models to include and CMA-ES then optimizes how to route their contributions at the module level. This approach achieves lower average errors across four benchmarks while using only a small fraction of the time required for joint retraining. A sympathetic reader would care because it offers a way to incorporate new data without expensive retraining. The results show consistent improvements over baselines that either use all models or greedy selection.

Core claim

StereoFactory is a coarse-to-fine evolutionary framework that first uses a genetic algorithm to search the combinatorial space of model subsets and then CMA-ES optimization of architecture-adaptive routing over the selected task vectors, with optional module-level scaling, achieving the best four-benchmark average under the same checkpoint pool while requiring only 2.7--3.7% of joint-retraining wall-clock time.

What carries the argument

The two-stage evolutionary process of genetic-algorithm subset selection followed by CMA-ES optimization of module-level routing over task vectors.

Load-bearing premise

The genetic algorithm search for model subsets and the subsequent CMA-ES optimization of module-level routing will generalize beyond the specific validation data used during the search without introducing selection bias or overfitting.

What would settle it

Running the merged models on a new stereo matching benchmark withheld from the search process and finding that the error reduction disappears or reverses relative to the strongest baseline would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2606.17475 by Mang Ye, Pinhan Fu, Qin Zou, Ruilin Wang, Wenke Huang, Xianda Guo.

**Figure 2.** Figure 2: Overview of the StereoFactory framework. Stage 1 searches for the optimal subset of fine-tuned checkpoints via evolutionary search [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison across NMRF-Stereo and FoundationStereo. We compare uniform averaging (Mean) with StereoFactory (Ours) on representative samples from four target benchmarks. StereoFactory produces cleaner structures and fewer localized artifacts across different architectural families. in the table indicate per-metric best/second-best values; consequently, some individual metrics are led by other ba… view at source ↗

**Figure 4.** Figure 4: Selection statistics of the FoundationStereo training dataset. Across validation settings where the FoundationStereo training dataset is in the candidate pool, we compare its selection frequency and average fusion weight against the strongest competing source. The dataset is frequently selected with non-trivial weights, indicating that it provides a useful but not universally dominant source of transferabl… view at source ↗

**Figure 5.** Figure 5: Population-size ablation for Stage 1 evolutionary search. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

read the original abstract

Stereo matching has advanced through foundation models trained on large-scale datasets, yet this paradigm suffers from a scalability bottleneck: incorporating new data requires costly joint retraining. Model merging offers a scalable post-hoc alternative by integrating knowledge from specialized models after source checkpoints are available. However, existing merging methods typically retain all available models or rely on greedy inclusion, which can preserve harmful task-vector interference. We propose StereoFactory, a coarse-to-fine evolutionary framework for adaptive model merging. Stage~1 employs a genetic algorithm to search the combinatorial space of model subsets, determining which models should participate. Stage~2 addresses module-level knowledge specialization (different functional modules exhibit distinct preferences for knowledge sources) through CMA-ES optimization of architecture-adaptive routing over the selected task vectors, with optional module-level scaling. Experiments across two architectures and four benchmarks demonstrate that StereoFactory consistently achieves the best four-benchmark average under the same checkpoint pool, reducing the average error from 3.80 to 3.30 on NMRF and from 2.88 to 2.19 on FoundationStereo relative to the strongest controlled baseline. The post-hoc search requires only 2.7--3.7\% of the corresponding joint-retraining wall-clock time. Analysis reveals that knowledge contributions are inherently module-specific, and selected subsets can transfer across architectures with minimal degradation. Code will be publicly released upon acceptance at: https://github.com/XiandaGuo/StereoFactory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

StereoFactory's GA subset search plus CMA-ES routing beats the listed baselines on the four benchmarks but the optimization is performed on the same data, so the gains need checks for overfitting.

read the letter

The paper's main contribution is a two-stage post-hoc merging method: a genetic algorithm first picks which checkpoints to include, then CMA-ES tunes module-level routing and scaling on the selected task vectors. This produces lower average errors than the strongest controlled baselines (3.80 to 3.30 on NMRF, 2.88 to 2.19 on FoundationStereo) while using 2.7-3.7% of joint retraining time. The claim that subsets transfer across architectures with little drop is also new relative to the greedy or full-inclusion methods cited.

The work is useful because it directly targets the cost of adding data to stereo foundation models and supplies evidence that different modules prefer different sources. The module-specific analysis and the transfer result are the parts that feel like they could be cited.

The soft spot is the search itself. Both stages optimize on the validation portions of the same four benchmarks that supply the final numbers. The abstract gives no sign of a held-out split for the evolutionary procedure or of testing the discovered subsets and routes on completely separate datasets. If the full experiments include such controls, the performance edge is more convincing; without them the numbers risk reflecting benchmark-specific fitting rather than robust merging.

The paper is for people already working on model merging or efficient stereo pipelines who want a concrete evolutionary recipe. It is coherent on its own terms and shows clear thinking about the merging problem, so it deserves a serious referee to examine the experimental controls and the generalization tests.

Referee Report

2 major / 1 minor

Summary. The paper presents StereoFactory, a coarse-to-fine evolutionary model-merging framework for stereo matching. Stage 1 applies a genetic algorithm to search over subsets of available checkpoints; Stage 2 applies CMA-ES to optimize module-level routing (and optional scaling) over the selected task vectors. The central empirical claim is that the resulting merged models achieve the best four-benchmark average under a fixed checkpoint pool, reducing average error from 3.80 to 3.30 (NMRF) and from 2.88 to 2.19 (FoundationStereo) relative to the strongest controlled baseline, while requiring only 2.7–3.7 % of the wall-clock time of joint retraining. Additional claims include module-specific knowledge contributions and limited degradation when subsets transfer across architectures.

Significance. If the reported gains survive proper controls for search overfitting, the work supplies a practical post-hoc route to incorporating new stereo data without full retraining of foundation models. The planned public code release and the explicit analysis of module-level specialization are concrete strengths that would aid reproducibility and further research.

major comments (2)

[Experiments] Experiments section: the manuscript provides no description of held-out validation splits used during the genetic-algorithm subset search or the subsequent CMA-ES routing optimization. Because the four reported benchmarks supply both the checkpoint pool and the final metrics, the absence of such splits leaves open the possibility that the discovered configurations exploit benchmark-specific statistics rather than generalizing.
[Abstract and Experiments] Abstract and Experiments: the quantitative claims (e.g., error reductions to 3.30 and 2.19) are presented without reporting the number of independent search runs, standard deviations, or statistical significance tests, and without explicit controls for post-hoc selection bias in the evolutionary procedure. These omissions directly affect the load-bearing claim that StereoFactory “consistently achieves the best four-benchmark average.”

minor comments (1)

[Abstract] The abstract states that code “will be publicly released upon acceptance” yet already supplies a GitHub link; the repository should include the precise search seeds, validation splits (if any), and configuration files used to produce the reported numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on experimental validation and statistical reporting. We address each major comment below and will incorporate revisions to strengthen these aspects of the manuscript.

read point-by-point responses

Referee: [Experiments] Experiments section: the manuscript provides no description of held-out validation splits used during the genetic-algorithm subset search or the subsequent CMA-ES routing optimization. Because the four reported benchmarks supply both the checkpoint pool and the final metrics, the absence of such splits leaves open the possibility that the discovered configurations exploit benchmark-specific statistics rather than generalizing.

Authors: We agree this is a valid concern: the manuscript does not describe held-out splits, and direct optimization on the reported benchmarks could in principle allow exploitation of benchmark-specific statistics. In the original experiments the searches operated on the standard benchmark evaluation protocols to identify merging configurations. For the revision we will add an explicit held-out validation protocol (reserving one benchmark or a data subset exclusively for search and CMA-ES, with final metrics reported on the remaining benchmarks) and update the Experiments section accordingly to demonstrate generalization. revision: yes
Referee: [Abstract and Experiments] Abstract and Experiments: the quantitative claims (e.g., error reductions to 3.30 and 2.19) are presented without reporting the number of independent search runs, standard deviations, or statistical significance tests, and without explicit controls for post-hoc selection bias in the evolutionary procedure. These omissions directly affect the load-bearing claim that StereoFactory “consistently achieves the best four-benchmark average.”

Authors: We acknowledge that the current presentation lacks multiple-run statistics, standard deviations, significance tests, and explicit bias controls, which weakens the “consistently” claim. The reported numbers reflect single-run searches performed under computational constraints. In the revised manuscript we will rerun the genetic algorithm and CMA-ES for multiple independent trials (minimum 5 runs), report means and standard deviations in the tables, add statistical significance tests (e.g., paired t-tests against baselines), and include additional controls such as random-subset and greedy baselines to quantify post-hoc selection effects. The abstract and Experiments section will be updated to reflect these results. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an empirical post-hoc optimization procedure (genetic algorithm for subset selection followed by CMA-ES for module routing) applied to existing model checkpoints. Performance numbers are reported as direct outcomes of this optimization on the evaluation benchmarks, but the derivation chain does not reduce any claimed result to its inputs by construction, nor does it rely on self-citations, ansatzes smuggled via prior work, or renaming of known results. The method is self-contained as a standard evolutionary search whose outputs are compared against controlled baselines; no load-bearing step equates the final metric to the search objective by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of specific free parameters or axioms; the approach relies on standard evolutionary computation techniques whose hyperparameters would typically be tuned but are not detailed here.

pith-pipeline@v0.9.1-grok · 5799 in / 1169 out tokens · 39768 ms · 2026-06-27T01:58:05.008003+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 5 linked inside Pith

[1]

Akiba, M

T. Akiba, M. Shing, Y. Tang, Q. Sun, and D. Ha. Evolution- ary optimization of model merging recipes.Nature Machine Intelligence, 2025

2025
[2]

T. H. W. B¨ ack, A. V. Kononova, B. van Stein, H. Wang, K. A. Antonov, R. T. Kalkreuth, J. de Nobel, D. Vermetten, R. de Winter, and F. Ye. Evolutionary algorithms for parameter optimization—thirty years later.Evolutionary Computation, 2023

2023
[3]

W. Bao, W. Wang, Y. Xu, Y. Guo, S. Hong, and X. Zhang. Instereo2k: a large real dataset for stereo matching in indoor scenes.Science China Information Sciences, 2020

2020
[4]

Y. Bi, B. Xue, P. Mesejo, S. Cagnoni, and M. Zhang. A survey on evolutionary computation for computer vision and image analysis: Past, present, and future trends.IEEE Transactions on Evolutionary Computation, 2023

2023
[5]

D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A naturalistic open source movie for optical flow evaluation. In ECCV, 2012

2012
[6]

Cabon, N

Y. Cabon, N. Murray, and M. Humenberger. Virtual kitti 2. arXiv preprint arXiv:2001.10773, 2020

Pith/arXiv arXiv 2001
[7]

Chang and Y.-S

J.-R. Chang and Y.-S. Chen. Pyramid stereo matching network. InCVPR, 2018

2018
[8]

Chang, X

T. Chang, X. Yang, T. Zhang, and M. Wang. Domain generalized stereo matching via hierarchical visual transformation. InCVPR, 2023

2023
[9]

Cheng, H

C. Cheng, H. Li, and L. Zhang. Two-branch deconvolutional network with application in stereo matching.TIP, 2022

2022
[10]

Z. Dai, Z. Tang, H. Zhang, and Y. Xie. Mgs-stereo: Multi-scale geometric-structure-enhanced stereo matching for complex real- world scenes.TIP, 2025

2025
[11]

Y. Deng, J. Xiao, S. Z. Zhou, and J. Feng. Detail preserving coarse-to-fine matching for stereo matching and optical flow. TIP, 2021

2021
[12]

T. G. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems, 2000

2000
[13]

Z. Gao, Y. Li, J. Zhang, R. Zhao, T. Wu, H. Tang, Z. Yu, H. Dong, G. Chen, and T. Huang. Spikestereonet: A brain-inspired framework for stereo depth estimation from spike streams. In ICLR, 2026

2026
[14]

Geiger, P

A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. InCVPR, 2012

2012
[15]

J. L. Gray, A. T. Naman, and D. S. Taubman. Multi-view disparity estimation using the gradient consistency model.TIP, 2025

2025
[16]

T. Guan, J. Guo, T. Huang, J. Dong, C. Wang, and Y.-H. Liu. DispViT: Direct Stereo Disparity Regression with a Single- Stream Vision Transformer. InICLR, 2026

2026
[17]

T. Guan, C. Wang, and Y.-H. Liu. Neural markov random field for stereo matching. InCVPR, 2024

2024
[18]

X. Guo, J. Lu, C. Zhang, Y. Wang, Y. Duan, T. Yang, Z. Zhu, and L. Chen. Openstereo: A comprehensive benchmark for stereo matching and strong baseline.arXiv preprint arXiv:2312.00343, 2023

arXiv 2023
[19]

X. Guo, K. Yang, W. Yang, X. Wang, and H. Li. Group-wise correlation stereo network. InCVPR, 2019

2019
[20]

X. Guo, C. Zhang, R. Wang, Y. Zhang, W. Zheng, M. Poggi, H. Zhao, Q. Zou, and L. Chen. Stereocarla: A high-fidelity driving dataset for generalizable stereo.arXiv preprint arXiv:2509.12683, 2025

arXiv 2025
[21]

X. Guo, C. Zhang, Y. Zhang, R. Wang, D. Nie, W. Zheng, M. Poggi, H. Zhao, M. Ye, Q. Zou, et al. Stereo anything: Unifying zero-shot stereo matching with large-scale mixed data. arXiv preprint arXiv:2411.14053, 2024

arXiv 2024
[22]

Hansen and A

N. Hansen and A. Ostermeier. Completely derandomized self- adaptation in evolution strategies.Evolutionary Computation, 2001

2001
[23]

Ilharco, M

G. Ilharco, M. T. Ribeiro, M. Wortsman, L. Schmidt, H. Ha- jishirzi, and A. Farhadi. Editing models with task arithmetic. InProceedings of the International Conference on Learning Representations, 2023

2023
[24]

Karaev, I

N. Karaev, I. Rocco, B. Graham, N. Neverova, A. Vedaldi, and C. Rupprecht. Dynamicstereo: Consistent dynamic depth from stereo videos. InIROS, 2023

2023
[25]

Kendall, H

A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach, and A. Bry. End-to-end learning of geometry and context for deep stereo regression. InICCV, 2017

2017
[26]

Khazatsky, K

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y. Chen, K. Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset. arXiv preprint arXiv:2403.12945, 2024

Pith/arXiv arXiv 2024
[27]

J. Li, P. Wang, P. Xiong, T. Cai, Z. Yan, L. Yang, J. Liu, H. Fan, and S. Liu. Practical stereo matching via cascaded recurrent network with adaptive correlation. InCVPR, 2022

2022
[28]

N. Li, L. Ma, G. Yu, B. Xue, M. Zhang, and Y. Jin. Survey on evolutionary deep learning: Principles, algorithms, applications and open issues.ACM Computing Surveys, 2024

2024
[29]

Lipson, Z

L. Lipson, Z. Teed, and J. Deng. Raft-stereo: Multilevel recurrent field transforms for stereo matching. In3DV, 2021

2021
[30]

Y. Liu, Y. Sun, B. Xue, M. Zhang, G. G. Yen, and K. C. Tan. A survey on evolutionary neural architecture search.TNNLS, 2023

2023
[31]

Loshchilov and F

I. Loshchilov and F. Hutter. CMA-ES for hyperparame- ter optimization of deep neural networks.arXiv preprint arXiv:1604.07269, 2016

Pith/arXiv arXiv 2016
[32]

M. S. Matena and C. A. Raffel. Merging models with fisher- weighted averaging. InNeurIPS, 2022

2022
[33]

Mayer, E

N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Doso- vitskiy, and T. Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. InCVPR, 2016

2016
[34]

L. Mehl, J. Schmalfuss, A. Jahedi, Y. Nalivayko, and A. Bruhn. Spring: A high-resolution high-detail dataset and benchmark for scene flow, optical flow and stereo. InCVPR, 2023

2023
[35]

Menze and A

M. Menze and A. Geiger. Object scene flow for autonomous vehicles. InCVPR, 2015

2015
[36]

Ranftl, K

R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer.IEEE transactions on pattern analysis and machine intelligence, 2020

2020
[37]

Z. Rao, B. Xiong, M. He, Y. Dai, R. He, Z. Shen, and X. Li. Masked representation learning for domain generalized stereo matching. InCVPR, 2023

2023
[38]

Scharstein, H

D. Scharstein, H. Hirschm¨ uller, Y. Kitajima, G. Krathwohl, N. Neˇ si´ c, X. Wang, and P. Westling. High-resolution stereo 13 datasets with subpixel-accurate ground truth. InGCPR, 2014

2014
[39]

Sch¨ ops, J

T. Sch¨ ops, J. L. Sch¨ onberger, S. Galliani, T. Sattler, K. Schindler, M. Pollefeys, and A. Geiger. A multi-view stereo benchmark with high-resolution images and multi-camera videos. InCVPR, 2017

2017
[40]

F. Tosi, Y. Liao, C. Schmitt, and A. Geiger. Smd-nets: Stereo mixture density networks. InCVPR, 2021

2021
[41]

Tremblay, T

J. Tremblay, T. To, and S. Birchfield. Falling things: A synthetic dataset for 3d object detection and pose estimation. InCVPRW, 2018

2018
[42]

S. Wang, J. Sun, B. Fan, Q. Wang, B. Lu, and Y. Dai. Ec-mvsnet: Enhanced cascaded multi-view stereo with cross-scale relevance integration. InAAAI, 2026

2026
[43]

W. Wang, D. Zhu, X. Wang, Y. Hu, Y. Qiu, C. Wang, Y. Hu, A. Kapoor, and S. Scherer. Tartanair: A dataset to push the limits of visual slam. InIROS, 2020

2020
[44]

X. Wang, G. Xu, H. Jia, and X. Yang. Selective-stereo: Adaptive frequency information selection for stereo matching. InCVPR, 2024

2024
[45]

Y. Wang, K. Li, L. Wang, J. Hu, D. Oliver Wu, and Y. Guo. Adstereo: Efficient stereo matching with adaptive downsampling and disparity alignment.TIP, 2025

2025
[46]

Y. Wang, Z. Yang, J. Zheng, Z. Zhang, D. O. Wu, and Y. Guo. Smformer: Empowering self-supervised stereo matching via foundation models and data augmentation.TIP, 2026

2026
[47]

B. Wen, S. Dewan, and S. Birchfield. Fast-foundationstereo: Real-time zero-shot stereo matching. InCVPR, 2026

2026
[48]

B. Wen, M. Trepte, J. Aribido, J. Kautz, O. Gallo, and S. Birch- field. FoundationStereo: Zero-shot stereo matching. InCVPR, 2025

2025
[49]

Wortsman, G

M. Wortsman, G. Ilharco, S. Y. Gadre, R. Roelofs, R. Gontijo- Lopes, A. S. Morcos, H. Namkoong, A. Farhadi, Y. Carmon, S. Kornblith, and L. Schmidt. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. InICML, 2022

2022
[50]

G. Xu, X. Wang, X. Ding, and X. Yang. Iterative geometry encoding volume for stereo matching. InCVPR, 2023

2023
[51]

G. Xu, X. Wang, Z. Zhang, J. Cheng, C. Liao, and X. Yang. Igev++: iterative multi-range geometry encoding volumes for stereo matching.TPAMI, 2025

2025
[52]

Xu and J

H. Xu and J. Zhang. Aanet: Adaptive aggregation network for efficient stereo matching. InCVPR, 2020

2020
[53]

P. Xu, Z. Xiang, C. Qiao, J. Fu, and T. Pu. Adaptive multi-modal cross-entropy loss for stereo matching. InCVPR, 2024

2024
[54]

Yadav, C

P. Yadav, C. Shieh, D. Tam, M. Bansal, C. Raffel, and M. Muqeeth. A survey on model MoErging: Recycling and routing among specialized experts for collaborative learning. arXiv preprint arXiv:2408.07057, 2024

arXiv 2024
[55]

Yadav, D

P. Yadav, D. Tam, L. Choshen, C. Raffel, and M. Bansal. Ties-merging: Resolving interference when merging models. In NeurIPS, 2023

2023
[56]

E. Yang, L. Shen, G. Guo, X. Wang, X. Cao, J. Zhang, and D. Tao. Model merging in LLMs, MLLMs, and beyond: Methods, theories, applications and opportunities.arXiv preprint arXiv:2408.07666, 2024

Pith/arXiv arXiv 2024
[57]

E. Yang, Z. Wang, L. Shen, S. Liu, G. Guo, X. Wang, and D. Tao. Adamerging: Adaptive model merging for multi-task learning. InICLR, 2024

2024
[58]

L. Yang, B. Kang, Z. Huang, X. Xu, J. Feng, and H. Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. In CVPR, 2024

2024
[59]

L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao. Depth anything v2.arXiv preprint arXiv:2406.09414, 2024

Pith/arXiv arXiv 2024
[60]

M. Yang, F. Wu, and W. Li. Rlstereo: Real-time stereo matching based on reinforcement learning.TIP, 2021

2021
[61]

L. Yu, B. Yu, H. Yu, F. Huang, and Y. Li. Language models are super mario: Absorbing abilities from homologous models as a free lunch. InICML, 2024

2024
[62]

K. Zeng, Y. Wang, J. Mao, C. Liu, W. Peng, and Y. Yang. Deep stereo matching with hysteresis attention and supervised cost volume construction.TIP, 2022

2022
[63]

Zhang, X

F. Zhang, X. Qi, R. Yang, V. Prisacariu, B. Wah, and P. Torr. Domain-invariant stereo matching networks. InECCV, 2020

2020
[64]

Zhang, L

J. Zhang, L. Jing, Y. Wang, X. Li, G. Yang, S. Jin, and C. Qiu. Geometry-aware stereo matching via monocular disparity distribution prior and gradient enhancement. InAAAI, 2026

2026
[65]

Zhang, L

Y. Zhang, L. Wang, K. Li, Y. Wang, and Y. Guo. Learning representations from foundation models for domain generalized stereo matching. InECCV, 2024

2024
[66]

Zheng, Q

J. Zheng, Q. Liu, H. Xu, and Z. Chen. Pip-stereo: Progressive iterations pruner for iterative optimization based stereo matching. InCVPR, 2026. Xianda Guois currently a Ph.D. student at the School of Computer Science, Wuhan University. Before that, he received a mas- ter’s degree from the Dalian University of Technology in 2019. His work is mainly in comp...

2026

[1] [1]

Akiba, M

T. Akiba, M. Shing, Y. Tang, Q. Sun, and D. Ha. Evolution- ary optimization of model merging recipes.Nature Machine Intelligence, 2025

2025

[2] [2]

T. H. W. B¨ ack, A. V. Kononova, B. van Stein, H. Wang, K. A. Antonov, R. T. Kalkreuth, J. de Nobel, D. Vermetten, R. de Winter, and F. Ye. Evolutionary algorithms for parameter optimization—thirty years later.Evolutionary Computation, 2023

2023

[3] [3]

W. Bao, W. Wang, Y. Xu, Y. Guo, S. Hong, and X. Zhang. Instereo2k: a large real dataset for stereo matching in indoor scenes.Science China Information Sciences, 2020

2020

[4] [4]

Y. Bi, B. Xue, P. Mesejo, S. Cagnoni, and M. Zhang. A survey on evolutionary computation for computer vision and image analysis: Past, present, and future trends.IEEE Transactions on Evolutionary Computation, 2023

2023

[5] [5]

D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A naturalistic open source movie for optical flow evaluation. In ECCV, 2012

2012

[6] [6]

Cabon, N

Y. Cabon, N. Murray, and M. Humenberger. Virtual kitti 2. arXiv preprint arXiv:2001.10773, 2020

Pith/arXiv arXiv 2001

[7] [7]

Chang and Y.-S

J.-R. Chang and Y.-S. Chen. Pyramid stereo matching network. InCVPR, 2018

2018

[8] [8]

Chang, X

T. Chang, X. Yang, T. Zhang, and M. Wang. Domain generalized stereo matching via hierarchical visual transformation. InCVPR, 2023

2023

[9] [9]

Cheng, H

C. Cheng, H. Li, and L. Zhang. Two-branch deconvolutional network with application in stereo matching.TIP, 2022

2022

[10] [10]

Z. Dai, Z. Tang, H. Zhang, and Y. Xie. Mgs-stereo: Multi-scale geometric-structure-enhanced stereo matching for complex real- world scenes.TIP, 2025

2025

[11] [11]

Y. Deng, J. Xiao, S. Z. Zhou, and J. Feng. Detail preserving coarse-to-fine matching for stereo matching and optical flow. TIP, 2021

2021

[12] [12]

T. G. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems, 2000

2000

[13] [13]

Z. Gao, Y. Li, J. Zhang, R. Zhao, T. Wu, H. Tang, Z. Yu, H. Dong, G. Chen, and T. Huang. Spikestereonet: A brain-inspired framework for stereo depth estimation from spike streams. In ICLR, 2026

2026

[14] [14]

Geiger, P

A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. InCVPR, 2012

2012

[15] [15]

J. L. Gray, A. T. Naman, and D. S. Taubman. Multi-view disparity estimation using the gradient consistency model.TIP, 2025

2025

[16] [16]

T. Guan, J. Guo, T. Huang, J. Dong, C. Wang, and Y.-H. Liu. DispViT: Direct Stereo Disparity Regression with a Single- Stream Vision Transformer. InICLR, 2026

2026

[17] [17]

T. Guan, C. Wang, and Y.-H. Liu. Neural markov random field for stereo matching. InCVPR, 2024

2024

[18] [18]

X. Guo, J. Lu, C. Zhang, Y. Wang, Y. Duan, T. Yang, Z. Zhu, and L. Chen. Openstereo: A comprehensive benchmark for stereo matching and strong baseline.arXiv preprint arXiv:2312.00343, 2023

arXiv 2023

[19] [19]

X. Guo, K. Yang, W. Yang, X. Wang, and H. Li. Group-wise correlation stereo network. InCVPR, 2019

2019

[20] [20]

X. Guo, C. Zhang, R. Wang, Y. Zhang, W. Zheng, M. Poggi, H. Zhao, Q. Zou, and L. Chen. Stereocarla: A high-fidelity driving dataset for generalizable stereo.arXiv preprint arXiv:2509.12683, 2025

arXiv 2025

[21] [21]

X. Guo, C. Zhang, Y. Zhang, R. Wang, D. Nie, W. Zheng, M. Poggi, H. Zhao, M. Ye, Q. Zou, et al. Stereo anything: Unifying zero-shot stereo matching with large-scale mixed data. arXiv preprint arXiv:2411.14053, 2024

arXiv 2024

[22] [22]

Hansen and A

N. Hansen and A. Ostermeier. Completely derandomized self- adaptation in evolution strategies.Evolutionary Computation, 2001

2001

[23] [23]

Ilharco, M

G. Ilharco, M. T. Ribeiro, M. Wortsman, L. Schmidt, H. Ha- jishirzi, and A. Farhadi. Editing models with task arithmetic. InProceedings of the International Conference on Learning Representations, 2023

2023

[24] [24]

Karaev, I

N. Karaev, I. Rocco, B. Graham, N. Neverova, A. Vedaldi, and C. Rupprecht. Dynamicstereo: Consistent dynamic depth from stereo videos. InIROS, 2023

2023

[25] [25]

Kendall, H

A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach, and A. Bry. End-to-end learning of geometry and context for deep stereo regression. InICCV, 2017

2017

[26] [26]

Khazatsky, K

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y. Chen, K. Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset. arXiv preprint arXiv:2403.12945, 2024

Pith/arXiv arXiv 2024

[27] [27]

J. Li, P. Wang, P. Xiong, T. Cai, Z. Yan, L. Yang, J. Liu, H. Fan, and S. Liu. Practical stereo matching via cascaded recurrent network with adaptive correlation. InCVPR, 2022

2022

[28] [28]

N. Li, L. Ma, G. Yu, B. Xue, M. Zhang, and Y. Jin. Survey on evolutionary deep learning: Principles, algorithms, applications and open issues.ACM Computing Surveys, 2024

2024

[29] [29]

Lipson, Z

L. Lipson, Z. Teed, and J. Deng. Raft-stereo: Multilevel recurrent field transforms for stereo matching. In3DV, 2021

2021

[30] [30]

Y. Liu, Y. Sun, B. Xue, M. Zhang, G. G. Yen, and K. C. Tan. A survey on evolutionary neural architecture search.TNNLS, 2023

2023

[31] [31]

Loshchilov and F

I. Loshchilov and F. Hutter. CMA-ES for hyperparame- ter optimization of deep neural networks.arXiv preprint arXiv:1604.07269, 2016

Pith/arXiv arXiv 2016

[32] [32]

M. S. Matena and C. A. Raffel. Merging models with fisher- weighted averaging. InNeurIPS, 2022

2022

[33] [33]

Mayer, E

N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Doso- vitskiy, and T. Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. InCVPR, 2016

2016

[34] [34]

L. Mehl, J. Schmalfuss, A. Jahedi, Y. Nalivayko, and A. Bruhn. Spring: A high-resolution high-detail dataset and benchmark for scene flow, optical flow and stereo. InCVPR, 2023

2023

[35] [35]

Menze and A

M. Menze and A. Geiger. Object scene flow for autonomous vehicles. InCVPR, 2015

2015

[36] [36]

Ranftl, K

R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer.IEEE transactions on pattern analysis and machine intelligence, 2020

2020

[37] [37]

Z. Rao, B. Xiong, M. He, Y. Dai, R. He, Z. Shen, and X. Li. Masked representation learning for domain generalized stereo matching. InCVPR, 2023

2023

[38] [38]

Scharstein, H

D. Scharstein, H. Hirschm¨ uller, Y. Kitajima, G. Krathwohl, N. Neˇ si´ c, X. Wang, and P. Westling. High-resolution stereo 13 datasets with subpixel-accurate ground truth. InGCPR, 2014

2014

[39] [39]

Sch¨ ops, J

T. Sch¨ ops, J. L. Sch¨ onberger, S. Galliani, T. Sattler, K. Schindler, M. Pollefeys, and A. Geiger. A multi-view stereo benchmark with high-resolution images and multi-camera videos. InCVPR, 2017

2017

[40] [40]

F. Tosi, Y. Liao, C. Schmitt, and A. Geiger. Smd-nets: Stereo mixture density networks. InCVPR, 2021

2021

[41] [41]

Tremblay, T

J. Tremblay, T. To, and S. Birchfield. Falling things: A synthetic dataset for 3d object detection and pose estimation. InCVPRW, 2018

2018

[42] [42]

S. Wang, J. Sun, B. Fan, Q. Wang, B. Lu, and Y. Dai. Ec-mvsnet: Enhanced cascaded multi-view stereo with cross-scale relevance integration. InAAAI, 2026

2026

[43] [43]

W. Wang, D. Zhu, X. Wang, Y. Hu, Y. Qiu, C. Wang, Y. Hu, A. Kapoor, and S. Scherer. Tartanair: A dataset to push the limits of visual slam. InIROS, 2020

2020

[44] [44]

X. Wang, G. Xu, H. Jia, and X. Yang. Selective-stereo: Adaptive frequency information selection for stereo matching. InCVPR, 2024

2024

[45] [45]

Y. Wang, K. Li, L. Wang, J. Hu, D. Oliver Wu, and Y. Guo. Adstereo: Efficient stereo matching with adaptive downsampling and disparity alignment.TIP, 2025

2025

[46] [46]

Y. Wang, Z. Yang, J. Zheng, Z. Zhang, D. O. Wu, and Y. Guo. Smformer: Empowering self-supervised stereo matching via foundation models and data augmentation.TIP, 2026

2026

[47] [47]

B. Wen, S. Dewan, and S. Birchfield. Fast-foundationstereo: Real-time zero-shot stereo matching. InCVPR, 2026

2026

[48] [48]

B. Wen, M. Trepte, J. Aribido, J. Kautz, O. Gallo, and S. Birch- field. FoundationStereo: Zero-shot stereo matching. InCVPR, 2025

2025

[49] [49]

Wortsman, G

M. Wortsman, G. Ilharco, S. Y. Gadre, R. Roelofs, R. Gontijo- Lopes, A. S. Morcos, H. Namkoong, A. Farhadi, Y. Carmon, S. Kornblith, and L. Schmidt. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. InICML, 2022

2022

[50] [50]

G. Xu, X. Wang, X. Ding, and X. Yang. Iterative geometry encoding volume for stereo matching. InCVPR, 2023

2023

[51] [51]

G. Xu, X. Wang, Z. Zhang, J. Cheng, C. Liao, and X. Yang. Igev++: iterative multi-range geometry encoding volumes for stereo matching.TPAMI, 2025

2025

[52] [52]

Xu and J

H. Xu and J. Zhang. Aanet: Adaptive aggregation network for efficient stereo matching. InCVPR, 2020

2020

[53] [53]

P. Xu, Z. Xiang, C. Qiao, J. Fu, and T. Pu. Adaptive multi-modal cross-entropy loss for stereo matching. InCVPR, 2024

2024

[54] [54]

Yadav, C

P. Yadav, C. Shieh, D. Tam, M. Bansal, C. Raffel, and M. Muqeeth. A survey on model MoErging: Recycling and routing among specialized experts for collaborative learning. arXiv preprint arXiv:2408.07057, 2024

arXiv 2024

[55] [55]

Yadav, D

P. Yadav, D. Tam, L. Choshen, C. Raffel, and M. Bansal. Ties-merging: Resolving interference when merging models. In NeurIPS, 2023

2023

[56] [56]

E. Yang, L. Shen, G. Guo, X. Wang, X. Cao, J. Zhang, and D. Tao. Model merging in LLMs, MLLMs, and beyond: Methods, theories, applications and opportunities.arXiv preprint arXiv:2408.07666, 2024

Pith/arXiv arXiv 2024

[57] [57]

E. Yang, Z. Wang, L. Shen, S. Liu, G. Guo, X. Wang, and D. Tao. Adamerging: Adaptive model merging for multi-task learning. InICLR, 2024

2024

[58] [58]

L. Yang, B. Kang, Z. Huang, X. Xu, J. Feng, and H. Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. In CVPR, 2024

2024

[59] [59]

L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao. Depth anything v2.arXiv preprint arXiv:2406.09414, 2024

Pith/arXiv arXiv 2024

[60] [60]

M. Yang, F. Wu, and W. Li. Rlstereo: Real-time stereo matching based on reinforcement learning.TIP, 2021

2021

[61] [61]

L. Yu, B. Yu, H. Yu, F. Huang, and Y. Li. Language models are super mario: Absorbing abilities from homologous models as a free lunch. InICML, 2024

2024

[62] [62]

K. Zeng, Y. Wang, J. Mao, C. Liu, W. Peng, and Y. Yang. Deep stereo matching with hysteresis attention and supervised cost volume construction.TIP, 2022

2022

[63] [63]

Zhang, X

F. Zhang, X. Qi, R. Yang, V. Prisacariu, B. Wah, and P. Torr. Domain-invariant stereo matching networks. InECCV, 2020

2020

[64] [64]

Zhang, L

J. Zhang, L. Jing, Y. Wang, X. Li, G. Yang, S. Jin, and C. Qiu. Geometry-aware stereo matching via monocular disparity distribution prior and gradient enhancement. InAAAI, 2026

2026

[65] [65]

Zhang, L

Y. Zhang, L. Wang, K. Li, Y. Wang, and Y. Guo. Learning representations from foundation models for domain generalized stereo matching. InECCV, 2024

2024

[66] [66]

Zheng, Q

J. Zheng, Q. Liu, H. Xu, and Z. Chen. Pip-stereo: Progressive iterations pruner for iterative optimization based stereo matching. InCVPR, 2026. Xianda Guois currently a Ph.D. student at the School of Computer Science, Wuhan University. Before that, he received a mas- ter’s degree from the Dalian University of Technology in 2019. His work is mainly in comp...

2026