Recognition: 2 theorem links
· Lean TheoremBIAS: A Biologically Inspired Algorithm for Video Saliency Detection
Pith reviewed 2026-05-10 18:19 UTC · model grok-4.3
The pith
BIAS adds a retina-inspired motion detector to the Itti-Koch framework and uses greedy multi-Gaussian fitting to produce fast saliency maps for video.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BIAS detects salient regions with millisecond-scale latency and outperforms heuristic-based approaches and several deep-learning models on the DHF1K dataset, particularly in videos dominated by bottom-up attention. Applied to traffic accident analysis, BIAS demonstrates strong real-world utility, achieving state-of-the-art performance in cause-effect recognition and anticipating accidents up to 0.72 seconds before manual annotation with reliable accuracy.
What carries the argument
Retina-inspired motion detector that extracts temporal features, paired with a greedy multi-Gaussian peak-fitting algorithm that identifies foci of attention while balancing winner-take-all competition and information maximization.
Load-bearing premise
The retina-inspired motion detector and greedy peak-fitting step together produce saliency maps that capture human attention patterns and generalize beyond the DHF1K and traffic-video test sets.
What would settle it
Evaluating BIAS on a fresh collection of videos dominated by top-down attention cues and finding that its saliency maps or accident-anticipation accuracy fall below the deep-learning baselines it beat on DHF1K.
Figures
read the original abstract
We present BIAS, a fast, biologically inspired model for dynamic visual saliency detection in continuous video streams. Building on the Itti--Koch framework, BIAS incorporates a retina-inspired motion detector to extract temporal features, enabling the generation of saliency maps that integrate both static and motion information. Foci of attention (FOAs) are identified using a greedy multi-Gaussian peak-fitting algorithm that balances winner-take-all competition with information maximization. BIAS detects salient regions with millisecond-scale latency and outperforms heuristic-based approaches and several deep-learning models on the DHF1K dataset, particularly in videos dominated by bottom-up attention. Applied to traffic accident analysis, BIAS demonstrates strong real-world utility, achieving state-of-the-art performance in cause-effect recognition and anticipating accidents up to 0.72 seconds before manual annotation with reliable accuracy. Overall, BIAS bridges biological plausibility and computational efficiency to achieve interpretable, high-speed dynamic saliency detection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents BIAS, a biologically inspired algorithm for dynamic visual saliency detection in continuous video streams. It extends the Itti-Koch framework with a retina-inspired motion detector for temporal features and employs a greedy multi-Gaussian peak-fitting algorithm to identify foci of attention (FOAs), balancing winner-take-all competition with information maximization. The model is claimed to achieve millisecond-scale latency, outperform heuristic-based and several deep-learning approaches on the DHF1K dataset (especially bottom-up attention videos), and deliver state-of-the-art results in traffic accident cause-effect recognition while anticipating accidents up to 0.72 seconds before manual annotation.
Significance. If the performance claims hold with proper validation, BIAS could provide an efficient, interpretable, and low-latency alternative to deep models for video saliency, potentially useful for real-time applications such as traffic monitoring. The hybrid biological-computational approach is a strength if the retina-inspired components are shown to contribute measurably beyond standard motion detectors.
major comments (4)
- Methods section on retina-inspired motion detector: no quantitative validation against retinal recordings or physiological data is provided to establish the fidelity of the temporal feature extraction; without this, the biological inspiration claim cannot be assessed as load-bearing for the reported performance gains.
- Results section on DHF1K dataset: outperformance is asserted over heuristics and deep models but no ablation studies removing the motion detector or biological components are described, leaving it unclear whether the gains derive from the retina-inspired elements rather than the peak-fitting or other factors.
- Results section on traffic accident analysis: the 0.72-second anticipation claim and SOTA cause-effect recognition require explicit dataset details, exact evaluation metrics, baseline comparisons, and statistical significance tests; absent these, the real-world utility assertion lacks direct support.
- Evaluation methodology: no held-out video domains or cross-dataset tests beyond DHF1K and traffic videos are mentioned, raising the risk that the greedy multi-Gaussian fitting overfits to dataset-specific motion statistics rather than generalizing.
minor comments (2)
- Abstract: specific performance metrics (e.g., AUC, NSS, sAUC) and the exact deep-learning models compared are not listed, reducing clarity of the outperformance claim.
- Figure captions and pseudocode: the peak-fitting algorithm would benefit from an explicit equation or algorithm box to clarify the information-maximization term.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications from the manuscript and indicating where revisions will be made to strengthen the paper.
read point-by-point responses
-
Referee: Methods section on retina-inspired motion detector: no quantitative validation against retinal recordings or physiological data is provided to establish the fidelity of the temporal feature extraction; without this, the biological inspiration claim cannot be assessed as load-bearing for the reported performance gains.
Authors: The retina-inspired motion detector is constructed from established models of retinal ganglion cell responses and direction selectivity, as detailed in the methods with supporting citations to physiological literature. We do not claim an exact replica of biological recordings but rather a computationally efficient abstraction that captures key temporal dynamics. To address the concern, we will revise the methods section to include an expanded discussion mapping each component to specific retinal mechanisms and known physiological properties. New direct quantitative validation against raw retinal data would require dedicated physiological experiments outside the scope of this computational modeling paper. revision: partial
-
Referee: Results section on DHF1K dataset: outperformance is asserted over heuristics and deep models but no ablation studies removing the motion detector or biological components are described, leaving it unclear whether the gains derive from the retina-inspired elements rather than the peak-fitting or other factors.
Authors: We agree that ablation studies are necessary to isolate contributions. The revised manuscript will include new ablation experiments on DHF1K: one removing the retina-inspired motion detector (replaced by standard frame differencing), one replacing it with conventional optical flow, and one ablating the multi-Gaussian fitting in favor of standard WTA. These results will quantify the incremental benefit of the biological components, particularly on bottom-up attention videos. revision: yes
-
Referee: Results section on traffic accident analysis: the 0.72-second anticipation claim and SOTA cause-effect recognition require explicit dataset details, exact evaluation metrics, baseline comparisons, and statistical significance tests; absent these, the real-world utility assertion lacks direct support.
Authors: The traffic accident experiments use a standard annotated traffic video dataset with cause-effect labels. The 0.72 s figure is the mean anticipation interval at which saliency-based prediction accuracy remains above a fixed threshold prior to annotated accident onset. In revision we will expand the section with: complete dataset statistics and source, precise metric definitions (anticipation time, cause-effect accuracy), full list of baselines with their scores, and statistical significance results (e.g., paired t-tests and confidence intervals). These additions will directly support the reported utility. revision: yes
-
Referee: Evaluation methodology: no held-out video domains or cross-dataset tests beyond DHF1K and traffic videos are mentioned, raising the risk that the greedy multi-Gaussian fitting overfits to dataset-specific motion statistics rather than generalizing.
Authors: DHF1K already spans diverse motion and scene statistics, and the traffic videos constitute an independent real-world domain. To further demonstrate generalization, the revision will add an internal cross-validation protocol on DHF1K (holding out video subsets stratified by motion intensity and scene type) and report the resulting variance. We will also discuss the low-parameter nature of the greedy fitting procedure, which reduces overfitting risk compared with learned models. revision: partial
Circularity Check
No circularity: BIAS is an independently defined algorithm
full rationale
The paper constructs BIAS by extending the standard Itti-Koch saliency framework with an explicitly described retina-inspired motion detector and a greedy multi-Gaussian peak-fitting procedure for FOAs. These components are introduced as design choices motivated by biology and information-maximization principles rather than derived from or fitted to the DHF1K or traffic-accident evaluation data. No equation reduces to a parameter estimated on the same test sets, no self-citation supplies a uniqueness theorem or ansatz, and performance results are presented as empirical outcomes of the algorithm rather than predictions forced by construction. The derivation chain therefore remains self-contained.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
BIAS incorporates a retina-inspired motion detector to extract temporal features... Foci of attention (FOAs) are identified using a greedy multi-Gaussian peak-fitting algorithm
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Motion saliency detection... Hassenstein–Reichardt detector... M(σ, t, D, τ) = ReLU(...)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Information capacity of a single retinal channel,
D. Kelly, “Information capacity of a single retinal channel,” IRE Trans. Inf. Theory, vol. 8, no. 3, pp. 221–226, Apr. 1962
work page 1962
-
[2]
A new framework for understanding vision from the perspective of the primary visual cortex,
L. Zhaoping, “A new framework for understanding vision from the perspective of the primary visual cortex,” Curr. Opin. in Neurobio., vol. 58, pp. 1–10, 2019. [Online]. Available: https://www.sciencedirect. com/science/article/pii/S0959438819300042
work page 2019
-
[3]
How much the eye tells the brain,
K. Koch, J. McLean, R. Segev, M. A. Freed, M. J. Berry, V . Balasubra- manian, and P. Sterling, “How much the eye tells the brain,” Curr. Biol., vol. 16, no. 14, pp. 1428–1434, Jul. 2006. [Online]. Available: https://www.cell.com/current-biology/abstract/S0960-9822(06)01639-3
work page 2006
-
[4]
The unbearable slowness of being: Why do we live at 10 bits/s?
J. Zheng and M. Meister, “The unbearable slowness of being: Why do we live at 10 bits/s?” Neuron, vol. 113, no. 2, pp. 192– 204, 2025. [Online]. Available: https://www.cell.com/neuron/abstract/ S0896-6273(24)00808-0
work page 2025
-
[5]
The attention system of the human brain: 20 years after,
S. E. Petersen and M. I. Posner, “The attention system of the human brain: 20 years after,” Annu. Rev. of Neurosci., vol. 35, no. 1, pp. 73–89, Jun. 2012. [Online]. Available: https: //www.annualreviews.org/doi/10.1146/annurev-neuro-062111-150525
-
[6]
A feature-integration theory of attention,
A. M. Treisman and G. Gelade, “A feature-integration theory of attention,” Cogn. Psychol., vol. 12, no. 1, pp. 97–136, 1980. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ 0010028580900055
work page 1980
-
[7]
Shifts in selective visual attention: Towards the underlying neural circuitry,
C. Koch and S. Ullman, “Shifts in selective visual attention: Towards the underlying neural circuitry,” Human Neurobiology, vol. 4, no. 4, pp. 219–227, 1985
work page 1985
-
[8]
A model of saliency-based visual attention for rapid scene analysis,
L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE TPAMI, vol. 20, no. 11, pp. 1254–1259, 1998
work page 1998
-
[9]
State-of-the-art in visual attention modeling,
A. Borji and L. Itti, “State-of-the-art in visual attention modeling,” IEEE TPAMI, vol. 35, no. 1, pp. 185–207, Jan. 2013
work page 2013
-
[10]
Computational modelling of visual attention,
L. Itti and C. Koch, “Computational modelling of visual attention,” Nat. Rev. Neurosci., vol. 2, no. 3, pp. 194–203, Mar. 2001. [Online]. Available: https://www.nature.com/articles/35058500
-
[11]
Revisiting video saliency prediction in the deep learning era,
W. Wang, J. Shen, J. Xie, M.-M. Cheng, H. Ling, and A. Borji, “Revisiting video saliency prediction in the deep learning era,” IEEE TPAMI, vol. 43, no. 1, pp. 220–237, Jan. 2021. [Online]. Available: https://ieeexplore.ieee.org/document/8744328
-
[12]
Geneva: World Health Organization, 2023
Global Status Report on Road Safety 2023, 1st ed. Geneva: World Health Organization, 2023
work page 2023
-
[13]
Anticipating accidents in dashcam videos,
F.-H. Chan, Y .-T. Chen, Y . Xiang, and M. Sun, “Anticipating accidents in dashcam videos,” in ACCV, S.-H. Lai, V . Lepetit, K. Nishino, and Y . Sato, Eds. Cham: Springer Int. Publishing, 2017, pp. 136–153
work page 2017
-
[14]
DoTA: Unsupervised detection of traffic anomaly in driving videos,
Y . Yao, X. Wang, M. Xu, Z. Pu, Y . Wang, E. Atkins, and D. J. Crandall, “DoTA: Unsupervised detection of traffic anomaly in driving videos,” IEEE TPAMI, vol. 45, no. 1, pp. 444–459, Jan. 2023
work page 2023
-
[15]
Revisiting video saliency: A large-scale benchmark and a new model,
W. Wang, J. Shen, F. Guo, M.-M. Cheng, and A. Borji, “Revisiting video saliency: A large-scale benchmark and a new model,” in CVPR, 2018, pp. 4894–4903
work page 2018
-
[16]
Driver anomaly detection: A dataset and contrastive learning approach,
O. Kopuklu, J. Zheng, H. Xu, and G. Rigoll, “Driver anomaly detection: A dataset and contrastive learning approach,” in 2021 IEEE Winter Conf. on Appl. of Comput. Vis.. (W ACV). Waikoloa, HI, USA: IEEE, Jan. 2021, pp. 91–100. [Online]. Available: https://ieeexplore.ieee.org/document/9423242/
-
[17]
Traffic accident benchmark for causality recogni- tion,
T. You and B. Han, “Traffic accident benchmark for causality recogni- tion,” in ECCV, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds. Cham: Springer Int. Publishing, 2020, pp. 540–556
work page 2020
-
[18]
Dada-2000: Can driving accident be predicted by driver attention ƒ analyzed by a benchmark
J. Fang, D. Yan, J. Qiao, J. Xue, H. Wang, and S. Li, “DADA-2000: Can driving accident be predicted by driver attentionƒ analyzed by a benchmark,” in 2019 IEEE Intell. Transp. Syst. Conf. (ITSC). Auckland, New Zealand: IEEE Press, 2019, pp. 4303–4309. [Online]. Available: https://doi.org/10.1109/ITSC.2019.8917218
-
[19]
A measure of motion salience for surveillance applications,
R. Wildes, “A measure of motion salience for surveillance applications,” in ICIP, Oct. 1998, pp. 183–187 vol.3. [Online]. Available: https: //ieeexplore.ieee.org/document/727163
work page 1998
-
[20]
Detecting salient motion by accumulating directionally- consistent flow,
L. Wixson, “Detecting salient motion by accumulating directionally- consistent flow,”IEEE TPAMI, vol. 22, no. 8, pp. 774–780, Aug. 2000. [Online]. Available: https://ieeexplore.ieee.org/document/868680
work page 2000
-
[21]
The dis- criminant center-surround hypothesis for bottom-up saliency,
D. Gao, V . Mahadevan, and N. Vasconcelos, “The dis- criminant center-surround hypothesis for bottom-up saliency,” in NeurIPS, vol. 20. Curran Associates, Inc., 2007. [Online]. Available: https://papers.nips.cc/paper files/paper/2007/hash/ 51ef186e18dc00c2d31982567235c559-Abstract.html
work page 2007
-
[22]
A model of motion attention for video skimming,
Y .-F. Ma and H.-J. Zhang, “A model of motion attention for video skimming,” in ICIP, vol. 1, Sep. 2002, pp. I–I. [Online]. Available: https://ieeexplore.ieee.org/document/1037976
-
[23]
Static and space-time visual saliency detection by self-resemblance,
H. J. Seo and P. Milanfar, “Static and space-time visual saliency detection by self-resemblance,” J. Vis., vol. 9, no. 12, p. 15, Nov. 2009. [Online]. Available: https://doi.org/10.1167/9.12.15
-
[24]
Spatiotemporal saliency detection and its applications in static and dynamic scenes,
W. Kim, C. Jung, and C. Kim, “Spatiotemporal saliency detection and its applications in static and dynamic scenes,” IEEE TCSVT, vol. 21, no. 4, pp. 446–456, Apr. 2011. [Online]. Available: https://ieeexplore.ieee.org/document/5728853
-
[25]
Spatiotemporal saliency in dynamic scenes,
V . Mahadevan and N. Vasconcelos, “Spatiotemporal saliency in dynamic scenes,” IEEE TPAMI, vol. 32, no. 1, pp. 171–177, Jan. 2010. [Online]. Available: https://ieeexplore.ieee.org/document/4967608
-
[26]
Video saliency incorporating spatiotemporal cues and uncertainty weighting,
Y . Fang, Z. Wang, W. Lin, and Z. Fang, “Video saliency incorporating spatiotemporal cues and uncertainty weighting,” IEEE TIP, vol. 23, no. 9, pp. 3910–3921, Sep. 2014. [Online]. Available: https://ieeexplore.ieee.org/document/6857361
-
[27]
A generic framework of user attention model and its application in video summarization,
Y .-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang, “A generic framework of user attention model and its application in video summarization,” IEEE TMM, vol. 7, no. 5, pp. 907–919, Oct. 2005. [Online]. Available: https://ieeexplore.ieee.org/document/1510638
-
[28]
Visual attention detection in video sequences using spatiotemporal cues,
Y . Zhai and M. Shah, “Visual attention detection in video sequences using spatiotemporal cues,” in ACM MM, ser. MM ’06. New York, NY , USA: Association for Computing Machinery, Oct. 2006, pp. 815–824. [Online]. Available: https://doi.org/10.1145/1180639.1180824
-
[29]
Predicting visual fixations on video based on low-level visual features,
O. Le Meur, P. Le Callet, and D. Barba, “Predicting visual fixations on video based on low-level visual features,” Vis. Res., vol. 47, no. 19, pp. 2483–2498, Sep. 2007. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0042698907002593
work page 2007
-
[30]
How many bits does it take for a stimulus to be salient?
S. H. Khatoonabadi, N. Vasconcelos, I. V . Baji ´c, and Y . Shan, “How many bits does it take for a stimulus to be salient?” in CVPR, Jun. 2015, pp. 5501–5510. [Online]. Available: https: //ieeexplore.ieee.org/document/7299189
-
[31]
Salient motion detection in compressed domain,
K. Muthuswamy and D. Rajan, “Salient motion detection in compressed domain,” IEEE Sign. Process. Letters, vol. 20, no. 10, pp. 996–999, 2013
work page 2013
-
[32]
Region-of-interest based compressed domain video transcoding scheme,
A. Sinha, G. Agarwal, and A. Anbu, “Region-of-interest based compressed domain video transcoding scheme,” in ICASSP, vol. 3. Montreal, Que., Canada: IEEE, 2004, pp. iii–161–4. [Online]. Available: http://ieeexplore.ieee.org/document/1326506/
-
[33]
Bayesian surprise attracts human attention,
L. Itti and P. Baldi, “Bayesian surprise attracts human attention,” Vis. Res., vol. 49, no. 10, pp. 1295–1306, Jun. 2009. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0042698908004380
work page 2009
-
[34]
SUN: A bayesian framework for saliency using natural statistics,
L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W. Cottrell, “SUN: A bayesian framework for saliency using natural statistics,” J. Vis., vol. 8, no. 7, p. 32, Dec. 2008. [Online]. Available: http://jov.arvojournals.org/article.aspx?doi=10.1167/8.7.32
-
[35]
A new perceived motion based shot content representation,
Y .-F. Ma and H.-J. Zhang, “A new perceived motion based shot content representation,” in ICIP, vol. 3, Oct. 2001, pp. 426–429. [Online]. Available: https://ieeexplore.ieee.org/document/958142
work page 2001
-
[36]
Dynamic visual attention: Searching for coding length increments,
X. Hou and L. Zhang, “Dynamic visual attention: Searching for coding length increments,” in NeurIPS, vol. 21. Curran Associates, Inc.,
-
[37]
[Online]. Available: https://papers.nips.cc/paper files/paper/2008/ hash/a8baa56554f96369ab93e4f3bb068c22-Abstract.html
work page 2008
-
[38]
Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform,
C. Guo, Q. Ma, and L. Zhang, “Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform,” in CVPR, Jun. 2008, pp. 1–8. [Online]. Available: https://ieeexplore.ieee.org/document/ 4587715
work page 2008
-
[39]
C. Guo and L. Zhang, “A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression,” IEEE TIP, vol. 19, no. 1, pp. 185–198, 2010. [Online]. Available: https://ieeexplore.ieee.org/document/5223506
-
[40]
V . Lebor ´an, A. Garc ´ıa-D´ıaz, X. R. Fdez-Vidal, and X. M. Pardo, “Dynamic whitening saliency,” IEEE TPAMI, vol. 39, no. 5, pp. 893–907, May 2017. [Online]. Available: https://ieeexplore.ieee.org/ document/7469361
-
[41]
Clustering of gaze during dynamic scene viewing is predicted by motion,
P. K. Mital, T. J. Smith, R. L. Hill, and J. M. Henderson, “Clustering of gaze during dynamic scene viewing is predicted by motion,” Cogn. Comput., vol. 3, no. 1, pp. 5–24, Mar. 2011. [Online]. Available: https://doi.org/10.1007/s12559-010-9074-z
-
[42]
Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition,
S. Mathe and C. Sminchisescu, “Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition,” IEEE TPAMI, vol. 37, no. 7, pp. 1408–1424, Jul. 2015. [Online]. Available: https://ieeexplore.ieee.org/document/6942210
-
[43]
Eye-tracking database for a set of standard video sequences,
H. Hadizadeh, M. J. Enriquez, and I. V . Bajic, “Eye-tracking database for a set of standard video sequences,” IEEE TIP, vol. 21, no. 2, pp. 898–903, Feb. 2012. [Online]. Available: https://ieeexplore.ieee.org/document/5986709 10
-
[44]
Automatic foveation for video compression using a neurobiologi- cal model of visual attention,
L. Itti, “Automatic foveation for video compression using a neurobiologi- cal model of visual attention,”IEEE TIP, vol. 13, no. 10, pp. 1304–1318, Oct. 2004
work page 2004
-
[45]
Two-stream convolutional networks for action recognition in videos,
K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in NeurIPS, ser. NIPS’14, vol. 1. Cambridge, MA, USA: MIT Press, Dec. 2014, pp. 568–576
work page 2014
-
[46]
Deepvs: A deep learning based video saliency prediction approach,
L. Jiang, M. Xu, T. Liu, M. Qiao, and Z. Wang, “Deepvs: A deep learning based video saliency prediction approach,” in ECCV, 2018, pp. 602–617
work page 2018
-
[47]
Spatio-temporal saliency networks for dynamic saliency prediction,
C. Bak, A. Kocak, E. Erdem, and A. Erdem, “Spatio-temporal saliency networks for dynamic saliency prediction,” IEEE TMM, vol. 20, no. 7, pp. 1688–1698, Jul. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8119879
-
[48]
Video saliency prediction based on spatial- temporal two-stream network,
K. Zhang and Z. Chen, “Video saliency prediction based on spatial- temporal two-stream network,” IEEE TCSVT, vol. 29, no. 12, pp. 3544–3557, Dec. 2019. [Online]. Available: https://ieeexplore.ieee.org/ document/8543830
-
[49]
Salsac: A video saliency prediction model with shuffled attentions and correlation-based convlstm,
X. Wu, Z. Wu, J. Zhang, L. Ju, and S. Wang, “Salsac: A video saliency prediction model with shuffled attentions and correlation-based convlstm,” in AAAI, vol. 34, 2020, pp. 12 410–12 417
work page 2020
-
[50]
Simple vs complex temporal recurrences for video saliency prediction
P. Linardos, E. Mohedano, J. J. Nieto, N. E. O’Connor, X. Gir ´o-i Nieto, and K. McGuinness, “Simple vs complex temporal recurrences for video saliency prediction,” arXiv preprint arXiv:1907.01869, 2019
work page Pith review arXiv 1907
-
[51]
Going from image to video saliency: Augmenting image salience with dynamic attentional push,
S. Gorji and J. J. Clark, “Going from image to video saliency: Augmenting image salience with dynamic attentional push,” in CVPR, Jun. 2018, pp. 7501–7511. [Online]. Available: https: //ieeexplore.ieee.org/document/8578881
-
[52]
Unified image and video saliency modeling,
R. Droste, J. Jiao, and J. A. Noble, “Unified image and video saliency modeling,” in ECCV, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds., vol. 12350. Cham: Springer Int. Publishing, 2020, pp. 419–435. [Online]. Available: https://link.springer.com/10. 1007/978-3-030-58558-7 25
work page 2020
-
[53]
Predicting human eye fixations via an LSTM-based saliency attentive model,
M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “Predicting human eye fixations via an LSTM-based saliency attentive model,” IEEE TIP, vol. 27, no. 10, pp. 5142–5154, Oct. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8400593
-
[54]
A spatial-temporal recurrent neural network for video saliency prediction,
K. Zhang, Z. Chen, and S. Liu, “A spatial-temporal recurrent neural network for video saliency prediction,” IEEE TIP, vol. 30, pp. 572–587,
-
[55]
Available: https://ieeexplore.ieee.org/document/9263359
[Online]. Available: https://ieeexplore.ieee.org/document/9263359
-
[56]
Deep3DSaliency: Deep stereo- scopic video saliency detection model by 3d convolutional networks,
Y . Fang, G. Ding, J. Li, and Z. Fang, “Deep3DSaliency: Deep stereo- scopic video saliency detection model by 3d convolutional networks,” IEEE TIP, Dec. 2018
work page 2018
-
[57]
Tased-net: Temporally-aggregating spatial encoder-decoder network for video saliency detection,
K. Min and J. J. Corso, “Tased-net: Temporally-aggregating spatial encoder-decoder network for video saliency detection,” in CVPR, 2019, pp. 2394–2403
work page 2019
-
[58]
Spatio-temporal self-attention network for video saliency prediction,
Z. Wang, Z. Liu, G. Li, Y . Wang, T. Zhang, L. Xu, and J. Wang, “Spatio-temporal self-attention network for video saliency prediction,” IEEE TMM, vol. 25, pp. 1161–1174, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/9667292/
-
[59]
Hierarchical domain-adapted feature learning for video saliency prediction,
G. Bellitto, F. Proietto Salanitri, S. Palazzo, F. Rundo, D. Giordano, and C. Spampinato, “Hierarchical domain-adapted feature learning for video saliency prediction,” IJCV, vol. 129, no. 12, pp. 3216– 3232, Dec. 2021. [Online]. Available: https://link.springer.com/10.1007/ s11263-021-01519-y
work page 2021
-
[60]
Video saliency prediction using spatiotemporal residual attentive networks,
Q. Lai, W. Wang, H. Sun, and J. Shen, “Video saliency prediction using spatiotemporal residual attentive networks,” IEEE TIP, vol. 29, pp. 1113–1126, 2019
work page 2019
-
[61]
Temporal-spatial feature pyramid for video saliency detection,
Q. Chang and S. Zhu, “Temporal-spatial feature pyramid for video saliency detection,” Sep. 2021, arXiv:2105.04213. [Online]. Available: http://arxiv.org/abs/2105.04213
-
[62]
ViNet: Pushing the limits of visual modality for audio-visual saliency prediction,
S. Jain, P. Yarlagadda, S. Jyoti, S. Karthik, R. Sub- ramanian, and V . Gandhi, “ViNet: Pushing the limits of visual modality for audio-visual saliency prediction,” in 2021 IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS). Prague, Czech Republic: IEEE, Sep. 2021, pp. 3520–3527. [Online]. Available: https://ieeexplore.ieee.org/document/9635989/
-
[63]
Transformer-based multi-scale feature integration network for video saliency prediction,
X. Zhou, S. Wu, R. Shi, B. Zheng, S. Wang, H. Yin, J. Zhang, and C. Yan, “Transformer-based multi-scale feature integration network for video saliency prediction,” IEEE TCSVT, vol. 33, no. 12, pp. 7696–7707, Dec. 2023. [Online]. Available: https: //ieeexplore.ieee.org/document/10130326/authors#authors
-
[64]
SalFoM: Dynamic saliency prediction with video founda- tion models,
M. Moradi, M. Moradi, F. Rundo, C. Spampinato, A. Borji, and S. Palazzo, “SalFoM: Dynamic saliency prediction with video founda- tion models,” in Pattern Recognition, A. Antonacopoulos, S. Chaudhuri, R. Chellappa, C.-L. Liu, S. Bhattacharya, and U. Pal, Eds. Cham: Springer Nature Switzerland, 2025, pp. 33–48
work page 2025
-
[65]
C. Li and S. Liu, “TM2SP: A transformer-based multi-level spatiotem- poral feature pyramid network for video saliency prediction,” IEEE TCSVT, vol. 35, no. 6, pp. 5236–5250, Jun. 2025. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10841372/authors
-
[66]
Video saliency forecasting transformer,
C. Ma, H. Sun, Y . Rao, J. Zhou, and J. Lu, “Video saliency forecasting transformer,” IEEE TCSVT, vol. 32, no. 10, pp. 6850–6862, Oct. 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9770033/
-
[67]
Transformer-based video saliency prediction with high temporal dimension decoding,
M. Moradi, S. Palazzo, and C. Spampinato, “Transformer-based video saliency prediction with high temporal dimension decoding,” Jan. 2024, arXiv:2401.07942. [Online]. Available: http://arxiv.org/abs/2401.07942
-
[68]
Scalability in perception for autonomous driving: Waymo open dataset,
P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine, V . Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y . Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” in CVPR, Jun. 2020, pp. 2443...
-
[69]
Large scale interactive motion forecasting for autonomous driving : The waymo open motion dataset,
S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, B. Sapp, C. Qi, Y . Zhou, Z. Yang, A. Chouard, P. Sun, J. Ngiam, V . Vasudevan, A. McCauley, J. Shlens, and D. Anguelov, “Large scale interactive motion forecasting for autonomous driving : The waymo open motion dataset,” in ICCV, Oct. 2021, pp. 9690–9699. [Online]. Available: https:/...
-
[70]
Exploring the limitations of behavior cloning for autonomous driving,
F. Codevilla, E. Santana, A. Lopez, and A. Gaidon, “Exploring the limitations of behavior cloning for autonomous driving,” in ICCV, Oct. 2019, pp. 9328–9337. [Online]. Available: https://ieeexplore.ieee.org/ document/9009463
-
[71]
Safety-critical learning for long-tail events: The TUM traffic accident dataset,
W. Zimmer, R. Greer, X. Zhou, R. Song, M. Pavel, D. Lehmberg, A. Ghita, A. Gopalkrishnan, M. Trivedi, and A. Knoll, “Safety-critical learning for long-tail events: The TUM traffic accident dataset,” Aug
-
[72]
Available: http://arxiv.org/abs/2508.14567
[Online]. Available: http://arxiv.org/abs/2508.14567
-
[73]
Uncertainty-based traffic accident anticipation with spatio-temporal relational learning,
W. Bao, Q. Yu, and Y . Kong, “Uncertainty-based traffic accident anticipation with spatio-temporal relational learning,” in ACM MM, ser. MM ’20. New York, NY , USA: Association for Computing Machinery, Oct. 2020, pp. 2682–2690. [Online]. Available: https: //dl.acm.org/doi/10.1145/3394171.3413827
-
[74]
Y . Ali, F. Hussain, and M. M. Haque, “Advances, challenges, and future research needs in machine learning-based crash prediction models: A systematic review,” Accident Analysis & Prevention, vol. 194, p. 107378, Jan. 2024. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0001457523004256
work page 2024
-
[75]
PLOS Computational Biology21, e1012101 (2025)
H. Li and L. Chen, “Traffic accident risk prediction based on deep learning and spatiotemporal features of vehicle trajectories,” PLOS ONE, vol. 20, no. 5, p. e0320656, May 2025. [Online]. Available: https://journals.plos.org/plosone/article?id=10.1371/journal. pone.0320656
-
[76]
Prediction of traffic accident risk based on vehicle trajectory data,
H. Li and L. Yu, “Prediction of traffic accident risk based on vehicle trajectory data,” Traffic Injury Prevention, vol. 26, no. 2, pp. 164–171, 2025
work page 2025
-
[77]
A dynamic spatial-temporal attention network for early anticipation of traffic accidents,
M. M. Karim, Y . Li, R. Qin, and Z. Yin, “A dynamic spatial-temporal attention network for early anticipation of traffic accidents,” IEEE Trans. on Intell. Transp Syst., vol. 23, no. 7, pp. 9590–9600, Jul. 2022. [Online]. Available: https://doi.org/10.1109/TITS.2022.3155613
-
[78]
Applying computational tools to predict gaze direction in interactive visual environments,
R. J. Peters and L. Itti, “Applying computational tools to predict gaze direction in interactive visual environments,” ACM Trans. Appl. Percept., vol. 5, no. 2, pp. 9:1–9:19, May 2008. [Online]. Available: https://doi.org/10.1145/1279920.1279923
-
[79]
R. BT et al., “Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios,” Int. radio consultative committee Int. telecommunication union, Switzerland, CCIR Rep, 2011
work page 2011
-
[80]
Fast 2d complex gabor filter with kernel decomposition,
J. Kim, S. Um, and D. Min, “Fast 2d complex gabor filter with kernel decomposition,” IEEE TIP, vol. 27, no. 4, pp. 1713–1722, Apr. 2018. [Online]. Available: http://ieeexplore.ieee.org/document/8207611/
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.