Unsupervised Learning for Optical Flow Estimation Using Pyramid Convolution LSTM

Haoxin Li; Shuosen Guan; Wei-Shi Zheng

arxiv: 1907.11628 · v1 · pith:RSVXZBOZnew · submitted 2019-07-26 · 💻 cs.CV

Unsupervised Learning for Optical Flow Estimation Using Pyramid Convolution LSTM

Shuosen Guan , Haoxin Li , Wei-Shi Zheng This is my paper

Pith reviewed 2026-05-24 15:45 UTC · model grok-4.3

classification 💻 cs.CV

keywords unsupervised optical flowConvLSTMpyramid networkframe reconstructionaction recognitionvideo motionCNN embedding

0 comments

The pith

Pyramid ConvLSTM estimates optical flow from video by reconstructing adjacent frames without ground truth labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PCLNet, an unsupervised framework that trains a pyramid Convolution LSTM to estimate optical flow by enforcing reconstruction of neighboring frames in a video clip. This constraint supplies the only supervision, enabling multi-frame flow output from arbitrary real videos rather than synthetic data with labels. The architecture decouples motion feature learning from the final flow representation, removing the need for complex skip connections found in other unsupervised models. Because it operates on features from standard CNN backbones, the flow estimator can be inserted into pipelines for downstream tasks such as action recognition while preserving competitive accuracy.

Core claim

The authors show that a pyramid ConvLSTM trained solely under an adjacent-frame reconstruction loss produces accurate optical flow estimates. Decoupling motion feature extraction from flow decoding removes shortcut connections, improves accuracy, supports flexible multi-frame inference from any clip, and allows the module to attach directly to generic CNN features for other vision tasks.

What carries the argument

Pyramid Convolution LSTM with adjacent-frame reconstruction constraint; it performs multi-frame flow estimation while separating motion feature learning from flow representation.

If this is right

Optical flow can be learned directly from unlabeled real-world video clips.
The same flow module can be inserted into any CNN backbone for tasks beyond flow estimation.
Action recognition performance remains comparable when the estimated flow is used as input.
Multi-frame flows are produced from a single forward pass on any length video clip.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Large collections of unlabeled video could now serve as training sources for dense motion models.
The reconstruction-based objective might extend to related dense prediction problems such as depth or segmentation from video.
Embedding the flow head inside existing action models could reduce the need for separate optical-flow pre-processing steps.

Load-bearing premise

Adjacent frame reconstruction alone supplies sufficient and unbiased supervision to learn accurate optical flow without ground-truth data or extra regularization terms.

What would settle it

Running the trained model on a standard optical flow benchmark that supplies ground truth and observing endpoint error higher than current supervised methods would falsify the accuracy claim.

read the original abstract

Most of current Convolution Neural Network (CNN) based methods for optical flow estimation focus on learning optical flow on synthetic datasets with groundtruth, which is not practical. In this paper, we propose an unsupervised optical flow estimation framework named PCLNet. It uses pyramid Convolution LSTM (ConvLSTM) with the constraint of adjacent frame reconstruction, which allows flexibly estimating multi-frame optical flows from any video clip. Besides, by decoupling motion feature learning and optical flow representation, our method avoids complex short-cut connections used in existing frameworks while improving accuracy of optical flow estimation. Moreover, different from those methods using specialized CNN architectures for capturing motion, our framework directly learns optical flow from the features of generic CNNs and thus can be easily embedded in any CNN based frameworks for other tasks. Extensive experiments have verified that our method not only estimates optical flow effectively and accurately, but also obtains comparable performance on action recognition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PCLNet combines pyramid ConvLSTM with feature decoupling for unsupervised multi-frame flow that slots into other CNNs, but the photometric reconstruction loss alone leaves accuracy claims vulnerable in non-ideal regions.

read the letter

The paper's core move is a pyramid ConvLSTM setup that learns motion features separately from the final flow output and trains unsupervised on arbitrary video clips via adjacent-frame reconstruction. This decoupling is the practical bit: it skips the usual shortcut wiring and lets the flow module drop into generic CNN backbones for tasks like action recognition. That flexibility is the main thing worth noting if you're building video pipelines that need motion without labeled flow data. Experiments apparently show usable flow plus downstream numbers that match some supervised baselines on action recognition, which suggests the architecture at least trains stably. The multi-frame capability from any clip length is also a small plus over single-pair methods. The soft spot is exactly the one the stress test flags. Photometric reconstruction by itself is known to permit many wrong flows when brightness constancy breaks (occlusions, lighting shifts, non-Lambertian surfaces), and the abstract gives no sign of forward-backward checks, occlusion masks, or explicit smoothness terms. If those are missing or weak in the full loss, the reported accuracy could be inflated by fitting to the loss rather than true motion. The action-recognition transfer might still work even with noisy flow, but that would need separate verification. This is aimed at practitioners who want an embeddable unsupervised flow block rather than pure flow researchers chasing SOTA numbers. The idea is concrete enough and the unsupervised angle is honest, so it deserves a serious referee even if the loss design needs tightening and more ablation on the decoupling benefit.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes PCLNet, an unsupervised optical flow framework that employs a pyramid Convolution LSTM (ConvLSTM) architecture trained solely under an adjacent-frame photometric reconstruction constraint. It claims this enables flexible multi-frame flow estimation from arbitrary video clips, decouples motion feature learning from flow representation to avoid shortcut connections, allows direct use of generic CNN features, and yields effective optical flow estimates plus comparable action-recognition performance.

Significance. If the reconstruction constraint proves sufficient to recover accurate motion without ground truth or explicit regularizers, the decoupling mechanism and generic-CNN compatibility would constitute a practical advance for embedding flow estimation into downstream video tasks. The unsupervised multi-frame capability is a potential strength relative to synthetic-supervised baselines.

major comments (2)

[§3] §3 (method): the loss is described as relying on adjacent-frame reconstruction alone; no forward-backward consistency, explicit occlusion mask, or smoothness term is referenced. Standard photometric losses are known to admit degenerate solutions under brightness-constancy violations, so the central claim that this constraint supplies unbiased supervision for accurate flow requires explicit justification or ablation.
[§4] §4 (experiments): the reported optical-flow and action-recognition numbers are presented without ablations that isolate the contribution of the pyramid ConvLSTM versus the reconstruction objective itself; if the network can minimize reconstruction error via non-motion solutions, the transfer performance claim is undermined.

minor comments (2)

Notation for the pyramid levels and ConvLSTM hidden states should be defined once in a single table or equation block rather than re-introduced inline.
[Abstract] The abstract states 'comparable performance on action recognition' without naming the baseline methods or datasets; this should be expanded with concrete numbers in the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript accordingly to strengthen the justification and experimental analysis.

read point-by-point responses

Referee: [§3] §3 (method): the loss is described as relying on adjacent-frame reconstruction alone; no forward-backward consistency, explicit occlusion mask, or smoothness term is referenced. Standard photometric losses are known to admit degenerate solutions under brightness-constancy violations, so the central claim that this constraint supplies unbiased supervision for accurate flow requires explicit justification or ablation.

Authors: We agree that the current §3 description would benefit from expanded justification. The pyramid ConvLSTM structure and explicit decoupling of motion feature learning from flow representation are intended to prevent shortcut solutions by forcing the network to capture temporal motion dynamics rather than static appearance cues. In the revision we will add a paragraph in §3 discussing this mechanism, citing related unsupervised flow works that rely primarily on photometric reconstruction, and include a targeted ablation comparing performance with and without the ConvLSTM component under the same reconstruction loss. revision: yes
Referee: [§4] §4 (experiments): the reported optical-flow and action-recognition numbers are presented without ablations that isolate the contribution of the pyramid ConvLSTM versus the reconstruction objective itself; if the network can minimize reconstruction error via non-motion solutions, the transfer performance claim is undermined.

Authors: We acknowledge the absence of isolating ablations in the current experiments. To directly address the concern that reconstruction error could be minimized without learning motion, the revised manuscript will add ablation studies in §4 that compare (i) the full PCLNet, (ii) a version without the pyramid ConvLSTM, and (iii) variants using only generic CNN features without temporal modeling, all under the identical reconstruction objective. These results will be used to support the claim that the observed flow accuracy and downstream action-recognition performance stem from the motion-feature decoupling rather than non-motion shortcuts. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained with independent architecture and loss; no circular reductions

full rationale

The paper proposes PCLNet as a new unsupervised framework that applies pyramid ConvLSTM to multi-frame optical flow estimation under an adjacent-frame reconstruction constraint, while decoupling motion features from flow representation to avoid shortcut connections. No equations, training procedures, or claims in the provided text reduce a prediction or central result to a fitted parameter, self-citation chain, or definitional tautology. The reconstruction constraint is presented as the supervision source without any indication that performance metrics are forced by construction from the same inputs. Self-citations, if present in the full text, are not load-bearing for the core novelty. This matches the default expectation of an independent proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; full paper text unavailable so ledger is necessarily incomplete and conservative.

axioms (1)

domain assumption Adjacent frame reconstruction error supplies adequate supervision signal for learning accurate optical flow
Central training constraint stated in the abstract as the basis for unsupervised learning.

pith-pipeline@v0.9.0 · 5679 in / 1171 out tokens · 20744 ms · 2026-05-24T15:45:52.138814+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

[1]

coarse-to- ﬁne

INTRODUCTION As a key problem in video analysis, optical ﬂow estimation is widely used in lots of ﬁelds such as visual SLAM, au- tonomous driving, action recognition, etc. Traditional optical ﬂow estimation methods (e.g. TVL-1 [1]) treat the estimation problem as a energy function minimization problem, mak- ing strong assumptions on the pixel-level inform...

work page
[2]

Correlation

RELATED WORK Traditional methods for ﬂow estimation are mainly based on variational approach. The most representative one is the method proposed by Horn and Schunck [2]. It estimates op- tical ﬂow by minimizing an energy function with some pho- tometry assumptions such as brightness consistency and spa- tial smoothness. However, these assumptions could no...

work page
[3]

W” denotes “inverse warp

APPROACH Our framework mainly consists of three modules: the generic CNN that used for appearance feature extraction, the motion concentration module that learns multi-scale motion represen- tation and the optical ﬂow reconstruction module that esti- mates optical ﬂows from the motion features (see Figure 1). We use the generic ResNet18 [11] as our featur...

work page
[4]

couple connection

EXPERIMENTS 4.1. Datasets Datasets without groundtruth. We investigate the per- formance of optical ﬂow estimation on two real-world ac- tion recogniton datasets: UCF101 [15] and HMDB51 [16]. UCF101 consists of 101 action categories and 13,320 videos. HMDB51 contains 6766 videos clips from 51 action classes. Datasets with groundtruth. We perform experimen...

work page
[5]

By utilizing reconstruction constraint as supervision, our framework is able to efﬁciently learn optical ﬂow on real-world videos without groundtruth

CONCLUSIONS In this paper, we present a novel end-to-end trainable frame- work for optical ﬂow estimation. By utilizing reconstruction constraint as supervision, our framework is able to efﬁciently learn optical ﬂow on real-world videos without groundtruth. In addition, we decouple motion feature learning and optical ﬂow reconstruction by applying ConvLST...

work page
[6]

A duality based ap- proach for realtime tv-l1 optical ﬂow,

C. Zach, T. Pock, and H. Bischof, “A duality based ap- proach for realtime tv-l1 optical ﬂow,” inPattern Recog- nition. 2007, pp. 214–223, Springer Berlin Heidelberg

work page 2007
[7]

Determining optical ﬂow,

B.K Horn and B.G. Schunck, “Determining optical ﬂow,” Artiﬁcial intelligence, pp. 185–203, 1981

work page 1981
[8]

Flownet: Learning optical ﬂow with convo- lutional networks,

A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazir- bas, V . Golkov, P. Van Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical ﬂow with convo- lutional networks,” in Proceedings of the IEEE interna- tional conference on computer vision, 2015

work page 2015
[9]

Flownet 2.0: Evolution of optical ﬂow estimation with deep networks,

E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovit- skiy, and T. Brox, “Flownet 2.0: Evolution of optical ﬂow estimation with deep networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017

work page 2017
[10]

PWC-Net: CNNs for optical ﬂow using pyramid, warping, and cost volume,

D. Sun, X. Yang, M.Y . Liu, and J. Kautz, “PWC-Net: CNNs for optical ﬂow using pyramid, warping, and cost volume,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018
[11]

Back to basics: Unsupervised learning of optical ﬂow via bright- ness constancy and motion smoothness,

J.Y . Jason, A.W. Harley, and K.G. Derpanis, “Back to basics: Unsupervised learning of optical ﬂow via bright- ness constancy and motion smoothness,” in ECCV 2016 Workshops, Part 3, 2016

work page 2016
[12]

Hidden Two-Stream Convolutional Networks for Action Recognition

Y . Zhu, Z. Lan, S. Newsam, and A.G. Hauptmann, “Hidden Two-Stream Convolutional Networks for Ac- tion Recognition,” arXiv preprint arXiv:1704.00389 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

The robust estimation of multiple motions: Parametric and piecewise-smooth ﬂow ﬁelds,

M.J. Black and P. Anandan, “The robust estimation of multiple motions: Parametric and piecewise-smooth ﬂow ﬁelds,” Computer vision and image understanding, vol. 63, no. 1, pp. 75–104, 1996

work page 1996
[14]

U-net: Convo- lutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convo- lutional networks for biomedical image segmentation,” in MICCAI. Springer, 2015, pp. 234–241

work page 2015
[15]

Optical ﬂow estimation using a spatial pyramid network,

A. Ranjan and M.J. Black, “Optical ﬂow estimation using a spatial pyramid network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017

work page 2017
[16]

Deep resid- ual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep resid- ual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016

work page 2016
[17]

Spatial pyra- mid pooling in deep convolutional networks for visual recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyra- mid pooling in deep convolutional networks for visual recognition,” in IEEE transactions on pattern analysis and machine intelligence, 2014

work page 2014
[18]

Spatial transformer networks,

M. Jaderberg, K. Simonyan, and A. Zisserman, “Spatial transformer networks,” in Advances in neural informa- tion processing systems, 2015, pp. 2017–2025

work page 2015
[19]

Image quality assessment: from error vis- ibility to structural similarity,

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli, “Image quality assessment: from error vis- ibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004

work page 2004
[20]

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

K. Soomro, A.R. Zamir, and M. Shah, “Ucf101: A dataset of 101 human actions classes from videos in the wild,” arXiv preprint arXiv:1212.0402, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012
[21]

Hmdb: a large video database for human mo- tion recognition,

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human mo- tion recognition,” in Proceedings of the IEEE interna- tional conference on computer vision, 2011

work page 2011
[22]

A naturalistic open source movie for optical ﬂow evalua- tion,

D.J. Butler, J. Wulff, G.B. Stanley, and M.J. J Black, “A naturalistic open source movie for optical ﬂow evalua- tion,” in European Conf. on Computer Vision, 2012

work page 2012
[23]

Fast optical ﬂow using dense inverse search,

T. Kroeger, R. Timofte, D. Dai, and L. Van Gool, “Fast optical ﬂow using dense inverse search,” in European Conference on Computer Vision, 2016

work page 2016
[24]

Deepﬂow: Large displacement optical ﬂow with deep matching,

P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, “Deepﬂow: Large displacement optical ﬂow with deep matching,” in Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition , 2013

work page 2013

[1] [1]

coarse-to- ﬁne

INTRODUCTION As a key problem in video analysis, optical ﬂow estimation is widely used in lots of ﬁelds such as visual SLAM, au- tonomous driving, action recognition, etc. Traditional optical ﬂow estimation methods (e.g. TVL-1 [1]) treat the estimation problem as a energy function minimization problem, mak- ing strong assumptions on the pixel-level inform...

work page

[2] [2]

Correlation

RELATED WORK Traditional methods for ﬂow estimation are mainly based on variational approach. The most representative one is the method proposed by Horn and Schunck [2]. It estimates op- tical ﬂow by minimizing an energy function with some pho- tometry assumptions such as brightness consistency and spa- tial smoothness. However, these assumptions could no...

work page

[3] [3]

W” denotes “inverse warp

APPROACH Our framework mainly consists of three modules: the generic CNN that used for appearance feature extraction, the motion concentration module that learns multi-scale motion represen- tation and the optical ﬂow reconstruction module that esti- mates optical ﬂows from the motion features (see Figure 1). We use the generic ResNet18 [11] as our featur...

work page

[4] [4]

couple connection

EXPERIMENTS 4.1. Datasets Datasets without groundtruth. We investigate the per- formance of optical ﬂow estimation on two real-world ac- tion recogniton datasets: UCF101 [15] and HMDB51 [16]. UCF101 consists of 101 action categories and 13,320 videos. HMDB51 contains 6766 videos clips from 51 action classes. Datasets with groundtruth. We perform experimen...

work page

[5] [5]

By utilizing reconstruction constraint as supervision, our framework is able to efﬁciently learn optical ﬂow on real-world videos without groundtruth

CONCLUSIONS In this paper, we present a novel end-to-end trainable frame- work for optical ﬂow estimation. By utilizing reconstruction constraint as supervision, our framework is able to efﬁciently learn optical ﬂow on real-world videos without groundtruth. In addition, we decouple motion feature learning and optical ﬂow reconstruction by applying ConvLST...

work page

[6] [6]

A duality based ap- proach for realtime tv-l1 optical ﬂow,

C. Zach, T. Pock, and H. Bischof, “A duality based ap- proach for realtime tv-l1 optical ﬂow,” inPattern Recog- nition. 2007, pp. 214–223, Springer Berlin Heidelberg

work page 2007

[7] [7]

Determining optical ﬂow,

B.K Horn and B.G. Schunck, “Determining optical ﬂow,” Artiﬁcial intelligence, pp. 185–203, 1981

work page 1981

[8] [8]

Flownet: Learning optical ﬂow with convo- lutional networks,

A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazir- bas, V . Golkov, P. Van Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical ﬂow with convo- lutional networks,” in Proceedings of the IEEE interna- tional conference on computer vision, 2015

work page 2015

[9] [9]

Flownet 2.0: Evolution of optical ﬂow estimation with deep networks,

E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovit- skiy, and T. Brox, “Flownet 2.0: Evolution of optical ﬂow estimation with deep networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017

work page 2017

[10] [10]

PWC-Net: CNNs for optical ﬂow using pyramid, warping, and cost volume,

D. Sun, X. Yang, M.Y . Liu, and J. Kautz, “PWC-Net: CNNs for optical ﬂow using pyramid, warping, and cost volume,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018

[11] [11]

Back to basics: Unsupervised learning of optical ﬂow via bright- ness constancy and motion smoothness,

J.Y . Jason, A.W. Harley, and K.G. Derpanis, “Back to basics: Unsupervised learning of optical ﬂow via bright- ness constancy and motion smoothness,” in ECCV 2016 Workshops, Part 3, 2016

work page 2016

[12] [12]

Hidden Two-Stream Convolutional Networks for Action Recognition

Y . Zhu, Z. Lan, S. Newsam, and A.G. Hauptmann, “Hidden Two-Stream Convolutional Networks for Ac- tion Recognition,” arXiv preprint arXiv:1704.00389 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

The robust estimation of multiple motions: Parametric and piecewise-smooth ﬂow ﬁelds,

M.J. Black and P. Anandan, “The robust estimation of multiple motions: Parametric and piecewise-smooth ﬂow ﬁelds,” Computer vision and image understanding, vol. 63, no. 1, pp. 75–104, 1996

work page 1996

[14] [14]

U-net: Convo- lutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convo- lutional networks for biomedical image segmentation,” in MICCAI. Springer, 2015, pp. 234–241

work page 2015

[15] [15]

Optical ﬂow estimation using a spatial pyramid network,

A. Ranjan and M.J. Black, “Optical ﬂow estimation using a spatial pyramid network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017

work page 2017

[16] [16]

Deep resid- ual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep resid- ual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016

work page 2016

[17] [17]

Spatial pyra- mid pooling in deep convolutional networks for visual recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyra- mid pooling in deep convolutional networks for visual recognition,” in IEEE transactions on pattern analysis and machine intelligence, 2014

work page 2014

[18] [18]

Spatial transformer networks,

M. Jaderberg, K. Simonyan, and A. Zisserman, “Spatial transformer networks,” in Advances in neural informa- tion processing systems, 2015, pp. 2017–2025

work page 2015

[19] [19]

Image quality assessment: from error vis- ibility to structural similarity,

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli, “Image quality assessment: from error vis- ibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004

work page 2004

[20] [20]

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

K. Soomro, A.R. Zamir, and M. Shah, “Ucf101: A dataset of 101 human actions classes from videos in the wild,” arXiv preprint arXiv:1212.0402, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012

[21] [21]

Hmdb: a large video database for human mo- tion recognition,

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human mo- tion recognition,” in Proceedings of the IEEE interna- tional conference on computer vision, 2011

work page 2011

[22] [22]

A naturalistic open source movie for optical ﬂow evalua- tion,

D.J. Butler, J. Wulff, G.B. Stanley, and M.J. J Black, “A naturalistic open source movie for optical ﬂow evalua- tion,” in European Conf. on Computer Vision, 2012

work page 2012

[23] [23]

Fast optical ﬂow using dense inverse search,

T. Kroeger, R. Timofte, D. Dai, and L. Van Gool, “Fast optical ﬂow using dense inverse search,” in European Conference on Computer Vision, 2016

work page 2016

[24] [24]

Deepﬂow: Large displacement optical ﬂow with deep matching,

P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, “Deepﬂow: Large displacement optical ﬂow with deep matching,” in Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition , 2013

work page 2013