A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

Nikolaus Mayer , Eddy Ilg , Philip H\"ausser , Philipp Fischer , Daniel Cremers , Alexey Dosovitskiy , Thomas Brox

Authors on Pith no claims yet

classification 💻 cs.CV cs.LGstat.ML

keywords flowestimationconvolutionaldatasetsdisparitynetworksscenelarge

read the original abstract

Recent work has shown that optical flow estimation can be formulated as a supervised learning task and can be successfully solved with convolutional networks. Training of the so-called FlowNet was enabled by a large synthetically generated dataset. The present paper extends the concept of optical flow estimation via convolutional networks to disparity and scene flow estimation. To this end, we propose three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks. Our datasets are the first large-scale datasets to enable training and evaluating scene flow methods. Besides the datasets, we present a convolutional network for real-time disparity estimation that provides state-of-the-art results. By combining a flow and disparity estimation network and training it jointly, we demonstrate the first scene flow estimation with a convolutional network.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Attention-Guided Dual-Stream Learning for Group Engagement Recognition: Fusing Transformer-Encoded Motion Dynamics with Scene Context via Adaptive Gating
cs.CV 2026-04 unverdicted novelty 6.0

DualEngage fuses transformer-encoded student motion dynamics with 3D scene features via softmax-gated fusion to recognize group engagement in classroom videos, reporting 96.21% average accuracy on a university dataset.
SynFlow: Scaling Up LiDAR Scene Flow Estimation with Synthetic Data
cs.CV 2026-04 conditional novelty 6.0

SynFlow creates a 34-times larger synthetic LiDAR scene flow dataset that lets models trained only on simulation match or beat supervised real-data baselines on multiple benchmarks.