DIMOS: Disentangling Instance-level Moving Object Segmentation
Pith reviewed 2026-06-27 07:50 UTC · model grok-4.3
The pith
A dual-disentangling framework separates appearance and motion in image and event modalities to improve segmentation of small moving instances.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Separating appearance and motion information within both image and event modalities, followed by distributionally and semantically consistent cross-modal alignment at multiple granularities, yields fused features that enable state-of-the-art moving instance segmentation performance, especially for small instances under fast motion and low-light conditions.
What carries the argument
The dual-disentangling feature extraction framework that isolates appearance from motion in each modality, combined with multi-granularity cross-modal alignment for fusion.
If this is right
- Small moving objects become detectable in sparse event streams when motion cues are isolated from appearance.
- Cross-modal fusion gains reliability once features are aligned both statistically and semantically at multiple scales.
- Performance holds under fast motion and low light because motion information is no longer diluted by appearance entanglement.
- The approach extends directly to traffic surveillance and autonomous driving where small distant objects matter.
Where Pith is reading between the lines
- The same separation step could be tested on other event-plus-image tasks such as optical flow or depth estimation.
- If the disentangled features prove more compact, downstream models might run at lower compute cost without accuracy loss.
- Extending the alignment to additional modalities like lidar would check whether the framework generalizes beyond two sensors.
Load-bearing premise
Separating appearance and motion inside each modality produces denser features that remain complete enough for effective cross-modal fusion.
What would settle it
A controlled comparison in which entangled event and image features achieve equal or higher segmentation accuracy on small instances than the disentangled versions.
Figures
read the original abstract
Moving instance segmentation (MIS) attracts increasing attention due to its broad applications in traffic surveillance, autonomous driving, and animal tracking. Event cameras record asynchronous brightness changes, providing high temporal resolution and dynamic range, which makes them highly sensitive to motion information. By fusing event and image features, motion cues from events can complement spatial details from images, enhancing the performance of MIS. However, current multimodal MIS methods still struggle to segment small moving instances, as event cameras often yield sparse features under limited resolution. Moreover, event features entangle appearance attributes with motion cues, which further restricts effective cross-modal fusion. To address these challenges, we first propose a dual-disentangling feature extraction framework that separates and extracts appearance and motion information within both image and event modalities, thereby improving feature density. Subsequently, a multi-granularity cross-modal alignment is introduced to align distributionally and semantically consistent features across modalities, enabling more effective fusion with rich spatial and temporal details. The experiment results demonstrate that our method achieves state-of-the-art performance in multimodal MIS, especially for small instances under challenging conditions such as fast motion and low-light settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DIMOS, a dual-disentangling feature extraction framework that separates appearance and motion cues within both image and event modalities to improve feature density for moving instance segmentation (MIS), followed by a multi-granularity cross-modal alignment module to enable effective fusion. It claims state-of-the-art performance on multimodal MIS, with particular gains for small instances under fast motion and low-light conditions.
Significance. If the performance claims hold after proper validation, the dual-disentangling strategy could meaningfully advance multimodal MIS by addressing sparsity and entanglement issues in event data, offering denser features for small-object cases that are critical in applications such as autonomous driving and surveillance.
major comments (2)
- [Abstract] Abstract: The central claim that 'the experiment results demonstrate that our method achieves state-of-the-art performance' is unsupported; the manuscript supplies no experimental protocol, metrics, baselines, datasets, ablation studies, or quantitative results, making the SOTA assertion unverifiable and load-bearing for the paper's contribution.
- [Abstract] Abstract (paragraph on dual-disentangling framework): No explicit argument, mathematical definition, or empirical verification is provided that the appearance/motion separation operators preserve low-amplitude motion signals and fine appearance gradients required for small-instance segmentation; if the disentangling implicitly thresholds or projects away such cues, downstream fusion cannot recover them, directly undermining the headline gains under challenging conditions.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We address the two major comments point by point below, proposing revisions to strengthen verifiability and clarity where the current text falls short.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'the experiment results demonstrate that our method achieves state-of-the-art performance' is unsupported; the manuscript supplies no experimental protocol, metrics, baselines, datasets, ablation studies, or quantitative results, making the SOTA assertion unverifiable and load-bearing for the paper's contribution.
Authors: The provided manuscript text consists solely of the abstract, which indeed contains no experimental protocol, metrics, baselines, datasets, ablation studies, or quantitative results. The SOTA claim therefore cannot be verified from the given text. We will revise the abstract to remove the unsupported claim or qualify it pending addition of a concise results summary (e.g., key mIoU/AP numbers and dataset names) in a revised version. revision: yes
-
Referee: [Abstract] Abstract (paragraph on dual-disentangling framework): No explicit argument, mathematical definition, or empirical verification is provided that the appearance/motion separation operators preserve low-amplitude motion signals and fine appearance gradients required for small-instance segmentation; if the disentangling implicitly thresholds or projects away such cues, downstream fusion cannot recover them, directly undermining the headline gains under challenging conditions.
Authors: The provided manuscript text supplies no mathematical definitions, arguments, or empirical verification that the separation operators preserve low-amplitude signals and fine gradients. This is a substantive gap in the abstract. We will revise the abstract to include a brief description of the operators' design intent and will add an ablation study in the full revision to empirically demonstrate preservation of these cues for small instances. revision: yes
Circularity Check
No circularity detected in derivation chain
full rationale
The provided abstract and method outline describe an architectural proposal (dual-disentangling feature extraction plus multi-granularity alignment) whose core steps are presented as design choices rather than derived quantities. No equations, fitted parameters, or predictions are shown that reduce by construction to their own inputs. No self-citation chains or uniqueness theorems are invoked to justify the framework. The SOTA performance claim rests on experimental results, which remain externally falsifiable and independent of the framework definition. This is the normal case of a self-contained empirical method paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Manideep Reddy Aliminati, Bharatesh Chakravarthi, Aayush Atul Verma, Arpitsinh Vaghela, Hua Wei, Xuesong Zhou, and Yezhou Yang. Sevd: Synthetic event-based vision dataset for ego and fixed traffic perception.arXiv preprint arXiv:2404.10540, 2024. 6
arXiv 2024
-
[2]
Foreground segmen- tation using a triplet convolutional neural network for mul- tiscale feature encoding.arXiv e-prints, pages arXiv–1801,
Long Ang Lim and Hacer Yalim Keles. Foreground segmen- tation using a triplet convolutional neural network for mul- tiscale feature encoding.arXiv e-prints, pages arXiv–1801,
-
[3]
Per- pixel classification is not all you need for semantic segmen- tation.Advances in neural information processing systems, 34:17864–17875, 2021
Bowen Cheng, Alex Schwing, and Alexander Kirillov. Per- pixel classification is not all you need for semantic segmen- tation.Advances in neural information processing systems, 34:17864–17875, 2021. 4, 3
2021
-
[4]
Masked-attention mask transformer for universal image segmentation
Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022. 3
2022
-
[5]
Xmem: Long- term video object segmentation with an atkinson-shiffrin memory model
Ho Kei Cheng and Alexander G Schwing. Xmem: Long- term video object segmentation with an atkinson-shiffrin memory model. InEuropean conference on computer vision, pages 640–658. Springer, 2022. 1
2022
-
[6]
Disentangling writer and character styles for handwriting generation
Gang Dai, Yifan Zhang, Qingfeng Wang, Qing Du, Zhuliang Yu, Zhuoman Liu, and Shuangping Huang. Disentangling writer and character styles for handwriting generation. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5977–5986, 2023. 2
2023
-
[7]
One-dm: One-shot diffusion mimicker for handwritten text generation
Gang Dai, Yifan Zhang, Quhui Ke, Qiangya Guo, and Shuangping Huang. One-dm: One-shot diffusion mimicker for handwritten text generation. InEuropean Conference on Computer Vision, pages 410–427. Springer, 2024. 3
2024
-
[8]
Vg-sam: Visual in-context guided sam for universal medical image segmentation.Fractal and Frac- tional, 9(11):722, 2025
Gang Dai, Qingfeng Wang, Yutao Qin, Gang Wei, and Shuangping Huang. Vg-sam: Visual in-context guided sam for universal medical image segmentation.Fractal and Frac- tional, 9(11):722, 2025. 3
2025
-
[9]
Beyond isolated words: Diffusion brush for handwritten text-line generation
Gang Dai, Yifan Zhang, Yutao Qin, Qiangya Guo, Shuang- ping Huang, and Shuicheng Yan. Beyond isolated words: Diffusion brush for handwritten text-line generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19054–19064, 2025. 2
2025
-
[10]
Flownet: Learning optical flow with convolutional networks
Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. Flownet: Learning optical flow with convolutional networks. InPro- ceedings of the IEEE international conference on computer vision, pages 2758–2766, 2015. 2
2015
-
[11]
Spikeram: A 48.1 pw/synapse/bit event-driven spiking compute-near/in- memory processor with neuromorphic sensor enabling life- long on-chip learning
Haotian Fu, Yue Zhou, Zhuo Zhang, Hongzhao Zheng, Renxu Yang, Yulong Huang, Dezhen Yang, Yannan Xing, Tugba Demirci, Ning Qiao, et al. Spikeram: A 48.1 pw/synapse/bit event-driven spiking compute-near/in- memory processor with neuromorphic sensor enabling life- long on-chip learning. In2026 IEEE International Solid- State Circuits Conference (ISSCC), page...
-
[12]
A unifying contrast maximization framework for event cam- eras, with applications to motion, depth, and optical flow estimation
Guillermo Gallego, Henri Rebecq, and Davide Scaramuzza. A unifying contrast maximization framework for event cam- eras, with applications to motion, depth, and optical flow estimation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3867–3876,
-
[13]
Event-based vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020
Guillermo Gallego, Tobi Delbr ¨uck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, J ¨org Conradt, Kostas Daniilidis, et al. Event-based vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020. 2
2020
-
[14]
From motion blur to motion flow: A deep learning so- lution for removing heterogeneous motion blur
Dong Gong, Jie Yang, Lingqiao Liu, Yanning Zhang, Ian Reid, Chunhua Shen, Anton Van Den Hengel, and Qinfeng Shi. From motion blur to motion flow: A deep learning so- lution for removing heterogeneous motion blur. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 2319–2328, 2017. 2
2017
-
[15]
Mousesis: A frames-and- events dataset for space-time instance segmentation of mice
Friedhelm Hamann, Hanxiong Li, Paul Mieske, Lars Lewe- johann, and Guillermo Gallego. Mousesis: A frames-and- events dataset for space-time instance segmentation of mice. InEuropean Conference on Computer Vision, pages 156–
-
[16]
1, 3, 6, 7, 8
Springer, 2024. 1, 3, 6, 7, 8
2024
-
[17]
Sis-challenge: Event-based spatio-temporal instance segmentation chal- lenge at the cvpr 2025 event-based vision workshop
Friedhelm Hamann, Emil Mededovic, Fabian G ¨ulhan, Yuli Wu, Johannes Stegmaier, Jing He, Yiqing Wang, Kexin Zhang, Lingling Li, Licheng Jiao, et al. Sis-challenge: Event-based spatio-temporal instance segmentation chal- lenge at the cvpr 2025 event-based vision workshop. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 467...
2025
-
[18]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 8
2016
-
[19]
Exploring temporal dynamics in event- based eye tracker
Hongxiang Huang, Xiaopeng Lin, Hongwei Ren, Yue Zhou, and Bojun Cheng. Exploring temporal dynamics in event- based eye tracker. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 5145–5154,
-
[20]
Clif: Complementary leaky integrate-and-fire neuron for spiking neural networks
Yulong Huang, Xiaopeng Lin, Hongwei Ren, Haotian Fu, Yue Zhou, Zunchang Liu, Biao Pan, and Bojun Cheng. Clif: Complementary leaky integrate-and-fire neuron for spiking neural networks. InInternational Conference on Machine Learning, pages 19949–19972. PMLR, 2024. 1
2024
-
[21]
Adam: A method for stochastic opti- mization.arXiv preprint arXiv:1412.6980, 2014
Diederik P Kingma. Adam: A method for stochastic opti- mization.arXiv preprint arXiv:1412.6980, 2014. 6
Pith/arXiv arXiv 2014
-
[22]
Segment any- thing
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 3
2023
-
[23]
Dam-vsr: Disentanglement of appear- ance and motion for video super-resolution
Zhe Kong, Le Li, Yong Zhang, Feng Gao, Shaoshu Yang, Tao Wang, Kaihao Zhang, Zhuoliang Kang, Xiaoming Wei, Guanying Chen, et al. Dam-vsr: Disentanglement of appear- ance and motion for video super-resolution. InProceedings of the Special Interest Group on Computer Graphics and In- teractive Techniques Conference Conference Papers, pages 1–11, 2025. 2
2025
-
[24]
Event-assisted low-light video object segmentation
Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, and Xiaoyan Sun. Event-assisted low-light video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 3250–3259, 2024. 1, 3
2024
-
[25]
Efficient event-based se- mantic segmentation via exploiting frame-event fusion: A hybrid neural network approach
Hebei Li, Yansong Peng, Jiahui Yuan, Peixi Wu, Jin Wang, Yueyi Zhang, and Xiaoyan Sun. Efficient event-based se- mantic segmentation via exploiting frame-event fusion: A hybrid neural network approach. InProceedings of the AAAI Conference on Artificial Intelligence, pages 18296–18304,
-
[26]
A 128×128 120 dB 15µslatency asynchronous temporal con- trast vision sensor.IEEE journal of solid-state circuits, 43 (2):566–576, 2008
Patrick Lichtsteiner, Christoph Posch, and Tobi Delbruck. A 128×128 120 dB 15µslatency asynchronous temporal con- trast vision sensor.IEEE journal of solid-state circuits, 43 (2):566–576, 2008. 1
2008
-
[27]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 6
2014
-
[28]
Clearsight: Human vision-inspired solutions for event-based motion deblurring
Xiaopeng Lin, Yulong Huang, Hongwei Ren, Zunchang Liu, Hongxiang Huang, Yue Zhou, Haotian Fu, and Bojun Cheng. Clearsight: Human vision-inspired solutions for event-based motion deblurring. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 7462–7471,
-
[29]
Event- based motion deblurring via multi-temporal granularity fu- sion.IEEE Transactions on Circuits and Systems for Video Technology, 2026
Xiaopeng Lin, Hongwei Ren, Yulong Huang, Zunchang Liu, Yue Zhou, Haotian Fu, Biao Pan, and Bojun Cheng. Event- based motion deblurring via multi-temporal granularity fu- sion.IEEE Transactions on Circuits and Systems for Video Technology, 2026. 1
2026
-
[30]
Ddflow: Learning optical flow with unlabeled data distilla- tion
Pengpeng Liu, Irwin King, Michael R Lyu, and Jia Xu. Ddflow: Learning optical flow with unlabeled data distilla- tion. InProceedings of the AAAI conference on artificial intelligence, pages 8770–8777, 2019. 5
2019
-
[31]
Ev-imo: Motion seg- mentation dataset and learning pipeline for event cameras
Anton Mitrokhin, Chengxi Ye, Cornelia Ferm ¨uller, Yian- nis Aloimonos, and Tobi Delbruck. Ev-imo: Motion seg- mentation dataset and learning pipeline for event cameras. In2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6105–6112. IEEE, 2019. 3, 6
2019
-
[32]
Learning visual motion segmentation using event surfaces
Anton Mitrokhin, Zhiyuan Hua, Cornelia Fermuller, and Yiannis Aloimonos. Learning visual motion segmentation using event surfaces. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 14414–14423, 2020. 2, 3
2020
-
[33]
Repre- sentation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Repre- sentation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018. 3, 4
Pith/arXiv arXiv 2018
-
[34]
Disentangle domain features for cross-modality cardiac im- age segmentation.Medical Image Analysis, 71:102078,
Chenhao Pei, Fuping Wu, Liqin Huang, and Xiahai Zhuang. Disentangle domain features for cross-modality cardiac im- age segmentation.Medical Image Analysis, 71:102078,
-
[35]
Optical flow estima- tion using a spatial pyramid network
Anurag Ranjan and Michael J Black. Optical flow estima- tion using a spatial pyramid network. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 4161–4170, 2017. 2
2017
-
[36]
You only look once: Unified, real-time object de- tection
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. 3
2016
-
[37]
E2b: A single modality point-based tracker with event cameras
Hongwei Ren, Zhuo Li, Aiersi Tuerhong, Haobo Liu, Fei Liang, Yongxiang Feng, Wenhui Wang, Yaoyuan Wang, Ziyang Zhang, Weihua He, et al. E2b: A single modality point-based tracker with event cameras. In2025 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 6461–6468. IEEE, 2025. 1
2025
-
[38]
Rethinking efficient and effective point- based networks for event camera classification and regres- sion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
Hongwei Ren, Yue Zhou, Jiadong Zhu, Xiaopeng Lin, Hao- tian Fu, Yulong Huang, Yuetong Fang, Fei Ma, Hao Yu, and Bojun Cheng. Rethinking efficient and effective point- based networks for event camera classification and regres- sion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 1
2025
-
[39]
Mobilenetv2: Inverted residuals and linear bottlenecks
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zh- moginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 4510–4520, 2018. 8
2018
-
[40]
Adversarial discriminative domain adaptation
Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 7167–7176, 2017. 5
2017
-
[41]
Feelvos: Fast end-to-end embedding learning for video object seg- mentation
Paul V oigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, and Liang-Chieh Chen. Feelvos: Fast end-to-end embedding learning for video object seg- mentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9481–9490,
-
[42]
Instance-level moving object segmentation from a sin- gle image with events.International Journal of Computer Vision, pages 1–22, 2025
Zhexiong Wan, Bin Fan, Le Hui, Yuchao Dai, and Gim Hee Lee. Instance-level moving object segmentation from a sin- gle image with events.International Journal of Computer Vision, pages 1–22, 2025. 2, 3, 4, 5, 6, 7
2025
-
[43]
Disentangling light fields for super-resolution and disparity estimation
Yingqian Wang, Longguang Wang, Gaochang Wu, Jungang Yang, Wei An, Jingyi Yu, and Yulan Guo. Disentangling light fields for super-resolution and disparity estimation. IEEE Transactions on Pattern Analysis and Machine Intel- ligence, 45(1):425–443, 2022. 2
2022
-
[44]
Un- evimo: Unsupervised event-based independent motion seg- mentation
Ziyun Wang, Jinyuan Guo, and Kostas Daniilidis. Un- evimo: Unsupervised event-based independent motion seg- mentation. InEuropean Conference on Computer Vision, pages 228–245. Springer, 2024. 3
2024
-
[45]
Disentangle then parse: Night-time se- mantic segmentation with illumination disentanglement
Zhixiang Wei, Lin Chen, Tao Tu, Pengyang Ling, Huaian Chen, and Yi Jin. Disentangle then parse: Night-time se- mantic segmentation with illumination disentanglement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21593–21603, 2023. 2
2023
-
[46]
Seqformer: Sequential transformer for video instance segmentation
Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, and Xiang Bai. Seqformer: Sequential transformer for video instance segmentation. InEuropean Conference on Computer Vision, pages 553–569. Springer, 2022. 3
2022
-
[47]
In defense of online models for video instance segmentation
Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, and Xiang Bai. In defense of online models for video instance segmentation. InEuropean Conference on Computer Vision, pages 588–605. Springer, 2022. 1, 3, 6, 7
2022
-
[48]
Eisnet: A multi-modal fusion network for semantic segmen- tation with events and images.IEEE Transactions on Multi- media, 26:8639–8650, 2024
Bochen Xie, Yongjian Deng, Zhanpeng Shao, and Youfu Li. Eisnet: A multi-modal fusion network for semantic segmen- tation with events and images.IEEE Transactions on Multi- media, 26:8639–8650, 2024. 2
2024
-
[49]
Collaborative video object segmentation by foreground-background inte- gration
Zongxin Yang, Yunchao Wei, and Yi Yang. Collaborative video object segmentation by foreground-background inte- gration. InEuropean Conference on Computer Vision, pages 332–348. Springer, 2020. 2
2020
-
[50]
Collabora- tive video object segmentation by multi-scale foreground- background integration.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(9):4701–4712, 2021
Zongxin Yang, Yunchao Wei, and Yi Yang. Collabora- tive video object segmentation by multi-scale foreground- background integration.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(9):4701–4712, 2021. 2
2021
-
[51]
Temporal-wise at- tention spiking neural networks for event streams classifica- tion
Man Yao, Huanhuan Gao, Guangshe Zhao, Dingheng Wang, Yihan Lin, Zhaoxu Yang, and Guoqi Li. Temporal-wise at- tention spiking neural networks for event streams classifica- tion. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 10221–10230, 2021. 1
2021
-
[52]
Eventpsr: Surface normal and reflectance estimation from photomet- ric stereo using an event camera
Bohan Yu, Jin Han, Boxin Shi, and Imari Sato. Eventpsr: Surface normal and reflectance estimation from photomet- ric stereo using an event camera. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11427–11436, 2025. 2
2025
-
[53]
Isomer: Isomerous transformer for zero-shot video object segmenta- tion
Yichen Yuan, Yifan Wang, Lijun Wang, Xiaoqi Zhao, Huchuan Lu, Yu Wang, Weibo Su, and Lei Zhang. Isomer: Isomerous transformer for zero-shot video object segmenta- tion. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 966–976, 2023. 2
2023
-
[54]
Radar instance transformer: Reliable moving instance segmenta- tion in sparse radar point clouds.IEEE Transactions on Robotics, 40:2357–2372, 2023
Matthias Zeller, Vardeep S Sandhu, Benedikt Mersch, Jens Behley, Michael Heidingsfeld, and Cyrill Stachniss. Radar instance transformer: Reliable moving instance segmenta- tion in sparse radar point clouds.IEEE Transactions on Robotics, 40:2357–2372, 2023. 1
2023
-
[55]
Bo Zhang and Jian Zhang. A traffic surveillance system for obtaining comprehensive information of the passing vehicles based on instance segmentation.IEEE Transactions on In- telligent Transportation Systems, 22(11):7040–7055, 2020. 1
2020
-
[56]
Adaptive multi-source predictor for zero-shot video object segmentation.International Journal of Computer Vision, 132(8):3232–3250, 2024
Xiaoqi Zhao, Shijie Chang, Youwei Pang, Jiaxing Yang, Lihe Zhang, and Huchuan Lu. Adaptive multi-source predictor for zero-shot video object segmentation.International Journal of Computer Vision, 132(8):3232–3250, 2024. 2
2024
-
[57]
Matnet: Motion-attentive transition network for zero-shot video object segmentation.IEEE transactions on image processing, 29:8326–8338, 2020
Tianfei Zhou, Jianwu Li, Shunzhou Wang, Ran Tao, and Jianbing Shen. Matnet: Motion-attentive transition network for zero-shot video object segmentation.IEEE transactions on image processing, 29:8326–8338, 2020. 2
2020
-
[58]
Event-based motion segmentation with spatio- temporal graph cuts.IEEE transactions on neural networks and learning systems, 34(8):4868–4880, 2021
Yi Zhou, Guillermo Gallego, Xiuyuan Lu, Siqi Liu, and Shaojie Shen. Event-based motion segmentation with spatio- temporal graph cuts.IEEE transactions on neural networks and learning systems, 34(8):4868–4880, 2021. 6
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.