arxiv: 2604.08858 · v1 · submitted 2026-04-10 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

BIAS: A Biologically Inspired Algorithm for Video Saliency Detection

Zhao-ji Zhang , Ya-tang Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:19 UTC · model grok-4.3

classification 💻 cs.CV

keywords video saliency detectionbiologically inspiredmotion detectorItti-Koch frameworksaliency mapstraffic accident anticipationdynamic attentionfoci of attention

0 comments

The pith

BIAS adds a retina-inspired motion detector to the Itti-Koch framework and uses greedy multi-Gaussian fitting to produce fast saliency maps for video.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a dynamic saliency detector called BIAS that adds temporal motion features drawn from retinal processing to a classic attention model. It locates attention foci by fitting multiple Gaussian peaks in a greedy way that trades off competition and coverage. This combination runs at millisecond latency and produces saliency maps that beat several deep-learning systems on the DHF1K benchmark when attention is driven mainly by motion. The same maps are then shown to support early recognition of traffic accidents, reaching state-of-the-art cause-effect labeling and flagging incidents up to 0.72 seconds before human annotators mark them.

Core claim

BIAS detects salient regions with millisecond-scale latency and outperforms heuristic-based approaches and several deep-learning models on the DHF1K dataset, particularly in videos dominated by bottom-up attention. Applied to traffic accident analysis, BIAS demonstrates strong real-world utility, achieving state-of-the-art performance in cause-effect recognition and anticipating accidents up to 0.72 seconds before manual annotation with reliable accuracy.

What carries the argument

Retina-inspired motion detector that extracts temporal features, paired with a greedy multi-Gaussian peak-fitting algorithm that identifies foci of attention while balancing winner-take-all competition and information maximization.

Load-bearing premise

The retina-inspired motion detector and greedy peak-fitting step together produce saliency maps that capture human attention patterns and generalize beyond the DHF1K and traffic-video test sets.

What would settle it

Evaluating BIAS on a fresh collection of videos dominated by top-down attention cues and finding that its saliency maps or accident-anticipation accuracy fall below the deep-learning baselines it beat on DHF1K.

Figures

Figures reproduced from arXiv: 2604.08858 by Ya-tang Li, Zhao-ji Zhang.

**Figure 2.** Figure 2: Predicted saliency maps on an example video clip from the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of performance and runtime between BIAS and other [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Correlation coefficients for different center–delta pairs. (b) Per [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: (a) From left to right: original frames, human fixation ground truth, [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: The input is a sequence of predicted saliency maps. SPARK-ResNet [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of predicted times across different models. (a) Predicted [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

We present BIAS, a fast, biologically inspired model for dynamic visual saliency detection in continuous video streams. Building on the Itti--Koch framework, BIAS incorporates a retina-inspired motion detector to extract temporal features, enabling the generation of saliency maps that integrate both static and motion information. Foci of attention (FOAs) are identified using a greedy multi-Gaussian peak-fitting algorithm that balances winner-take-all competition with information maximization. BIAS detects salient regions with millisecond-scale latency and outperforms heuristic-based approaches and several deep-learning models on the DHF1K dataset, particularly in videos dominated by bottom-up attention. Applied to traffic accident analysis, BIAS demonstrates strong real-world utility, achieving state-of-the-art performance in cause-effect recognition and anticipating accidents up to 0.72 seconds before manual annotation with reliable accuracy. Overall, BIAS bridges biological plausibility and computational efficiency to achieve interpretable, high-speed dynamic saliency detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BIAS layers a retina-inspired motion detector and greedy multi-Gaussian fitting onto Itti-Koch for video saliency, claiming fast performance and real-world gains in traffic accident anticipation, but the superiority claims rest on experiments that are not shown here.

read the letter

BIAS adds a retina-inspired motion detector and a greedy multi-Gaussian peak-fitting step to the Itti-Koch saliency model for handling motion in video. It reports millisecond-scale latency and better results than some heuristic and deep-learning baselines on DHF1K, especially in bottom-up scenes, plus an application to traffic videos where it spots cause-effect patterns and anticipates accidents up to 0.72 seconds ahead of manual labels.

Referee Report

4 major / 2 minor

Summary. The paper presents BIAS, a biologically inspired algorithm for dynamic visual saliency detection in continuous video streams. It extends the Itti-Koch framework with a retina-inspired motion detector for temporal features and employs a greedy multi-Gaussian peak-fitting algorithm to identify foci of attention (FOAs), balancing winner-take-all competition with information maximization. The model is claimed to achieve millisecond-scale latency, outperform heuristic-based and several deep-learning approaches on the DHF1K dataset (especially bottom-up attention videos), and deliver state-of-the-art results in traffic accident cause-effect recognition while anticipating accidents up to 0.72 seconds before manual annotation.

Significance. If the performance claims hold with proper validation, BIAS could provide an efficient, interpretable, and low-latency alternative to deep models for video saliency, potentially useful for real-time applications such as traffic monitoring. The hybrid biological-computational approach is a strength if the retina-inspired components are shown to contribute measurably beyond standard motion detectors.

major comments (4)

Methods section on retina-inspired motion detector: no quantitative validation against retinal recordings or physiological data is provided to establish the fidelity of the temporal feature extraction; without this, the biological inspiration claim cannot be assessed as load-bearing for the reported performance gains.
Results section on DHF1K dataset: outperformance is asserted over heuristics and deep models but no ablation studies removing the motion detector or biological components are described, leaving it unclear whether the gains derive from the retina-inspired elements rather than the peak-fitting or other factors.
Results section on traffic accident analysis: the 0.72-second anticipation claim and SOTA cause-effect recognition require explicit dataset details, exact evaluation metrics, baseline comparisons, and statistical significance tests; absent these, the real-world utility assertion lacks direct support.
Evaluation methodology: no held-out video domains or cross-dataset tests beyond DHF1K and traffic videos are mentioned, raising the risk that the greedy multi-Gaussian fitting overfits to dataset-specific motion statistics rather than generalizing.

minor comments (2)

Abstract: specific performance metrics (e.g., AUC, NSS, sAUC) and the exact deep-learning models compared are not listed, reducing clarity of the outperformance claim.
Figure captions and pseudocode: the peak-fitting algorithm would benefit from an explicit equation or algorithm box to clarify the information-maximization term.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications from the manuscript and indicating where revisions will be made to strengthen the paper.

read point-by-point responses

Referee: Methods section on retina-inspired motion detector: no quantitative validation against retinal recordings or physiological data is provided to establish the fidelity of the temporal feature extraction; without this, the biological inspiration claim cannot be assessed as load-bearing for the reported performance gains.

Authors: The retina-inspired motion detector is constructed from established models of retinal ganglion cell responses and direction selectivity, as detailed in the methods with supporting citations to physiological literature. We do not claim an exact replica of biological recordings but rather a computationally efficient abstraction that captures key temporal dynamics. To address the concern, we will revise the methods section to include an expanded discussion mapping each component to specific retinal mechanisms and known physiological properties. New direct quantitative validation against raw retinal data would require dedicated physiological experiments outside the scope of this computational modeling paper. revision: partial
Referee: Results section on DHF1K dataset: outperformance is asserted over heuristics and deep models but no ablation studies removing the motion detector or biological components are described, leaving it unclear whether the gains derive from the retina-inspired elements rather than the peak-fitting or other factors.

Authors: We agree that ablation studies are necessary to isolate contributions. The revised manuscript will include new ablation experiments on DHF1K: one removing the retina-inspired motion detector (replaced by standard frame differencing), one replacing it with conventional optical flow, and one ablating the multi-Gaussian fitting in favor of standard WTA. These results will quantify the incremental benefit of the biological components, particularly on bottom-up attention videos. revision: yes
Referee: Results section on traffic accident analysis: the 0.72-second anticipation claim and SOTA cause-effect recognition require explicit dataset details, exact evaluation metrics, baseline comparisons, and statistical significance tests; absent these, the real-world utility assertion lacks direct support.

Authors: The traffic accident experiments use a standard annotated traffic video dataset with cause-effect labels. The 0.72 s figure is the mean anticipation interval at which saliency-based prediction accuracy remains above a fixed threshold prior to annotated accident onset. In revision we will expand the section with: complete dataset statistics and source, precise metric definitions (anticipation time, cause-effect accuracy), full list of baselines with their scores, and statistical significance results (e.g., paired t-tests and confidence intervals). These additions will directly support the reported utility. revision: yes
Referee: Evaluation methodology: no held-out video domains or cross-dataset tests beyond DHF1K and traffic videos are mentioned, raising the risk that the greedy multi-Gaussian fitting overfits to dataset-specific motion statistics rather than generalizing.

Authors: DHF1K already spans diverse motion and scene statistics, and the traffic videos constitute an independent real-world domain. To further demonstrate generalization, the revision will add an internal cross-validation protocol on DHF1K (holding out video subsets stratified by motion intensity and scene type) and report the resulting variance. We will also discuss the low-parameter nature of the greedy fitting procedure, which reduces overfitting risk compared with learned models. revision: partial

Circularity Check

0 steps flagged

No circularity: BIAS is an independently defined algorithm

full rationale

The paper constructs BIAS by extending the standard Itti-Koch saliency framework with an explicitly described retina-inspired motion detector and a greedy multi-Gaussian peak-fitting procedure for FOAs. These components are introduced as design choices motivated by biology and information-maximization principles rather than derived from or fitted to the DHF1K or traffic-accident evaluation data. No equation reduces to a parameter estimated on the same test sets, no self-citation supplies a uniqueness theorem or ansatz, and performance results are presented as empirical outcomes of the algorithm rather than predictions forced by construction. The derivation chain therefore remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are detailed; the motion detector and peak-fitting algorithm are presented as constructed components without specified fitting values or unproven assumptions.

pith-pipeline@v0.9.0 · 5460 in / 1275 out tokens · 67483 ms · 2026-05-10T18:19:53.835398+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

BIAS incorporates a retina-inspired motion detector to extract temporal features... Foci of attention (FOAs) are identified using a greedy multi-Gaussian peak-fitting algorithm
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Motion saliency detection... Hassenstein–Reichardt detector... M(σ, t, D, τ) = ReLU(...)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages

[1]

Information capacity of a single retinal channel,

D. Kelly, “Information capacity of a single retinal channel,” IRE Trans. Inf. Theory, vol. 8, no. 3, pp. 221–226, Apr. 1962

work page 1962
[2]

A new framework for understanding vision from the perspective of the primary visual cortex,

L. Zhaoping, “A new framework for understanding vision from the perspective of the primary visual cortex,” Curr. Opin. in Neurobio., vol. 58, pp. 1–10, 2019. [Online]. Available: https://www.sciencedirect. com/science/article/pii/S0959438819300042

work page 2019
[3]

How much the eye tells the brain,

K. Koch, J. McLean, R. Segev, M. A. Freed, M. J. Berry, V . Balasubra- manian, and P. Sterling, “How much the eye tells the brain,” Curr. Biol., vol. 16, no. 14, pp. 1428–1434, Jul. 2006. [Online]. Available: https://www.cell.com/current-biology/abstract/S0960-9822(06)01639-3

work page 2006
[4]

The unbearable slowness of being: Why do we live at 10 bits/s?

J. Zheng and M. Meister, “The unbearable slowness of being: Why do we live at 10 bits/s?” Neuron, vol. 113, no. 2, pp. 192– 204, 2025. [Online]. Available: https://www.cell.com/neuron/abstract/ S0896-6273(24)00808-0

work page 2025
[5]

The attention system of the human brain: 20 years after,

S. E. Petersen and M. I. Posner, “The attention system of the human brain: 20 years after,” Annu. Rev. of Neurosci., vol. 35, no. 1, pp. 73–89, Jun. 2012. [Online]. Available: https: //www.annualreviews.org/doi/10.1146/annurev-neuro-062111-150525

work page doi:10.1146/annurev-neuro-062111-150525 2012
[6]

A feature-integration theory of attention,

A. M. Treisman and G. Gelade, “A feature-integration theory of attention,” Cogn. Psychol., vol. 12, no. 1, pp. 97–136, 1980. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ 0010028580900055

work page 1980
[7]

Shifts in selective visual attention: Towards the underlying neural circuitry,

C. Koch and S. Ullman, “Shifts in selective visual attention: Towards the underlying neural circuitry,” Human Neurobiology, vol. 4, no. 4, pp. 219–227, 1985

work page 1985
[8]

A model of saliency-based visual attention for rapid scene analysis,

L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE TPAMI, vol. 20, no. 11, pp. 1254–1259, 1998

work page 1998
[9]

State-of-the-art in visual attention modeling,

A. Borji and L. Itti, “State-of-the-art in visual attention modeling,” IEEE TPAMI, vol. 35, no. 1, pp. 185–207, Jan. 2013

work page 2013
[10]

Computational modelling of visual attention,

L. Itti and C. Koch, “Computational modelling of visual attention,” Nat. Rev. Neurosci., vol. 2, no. 3, pp. 194–203, Mar. 2001. [Online]. Available: https://www.nature.com/articles/35058500

work page arXiv 2001
[11]

Revisiting video saliency prediction in the deep learning era,

W. Wang, J. Shen, J. Xie, M.-M. Cheng, H. Ling, and A. Borji, “Revisiting video saliency prediction in the deep learning era,” IEEE TPAMI, vol. 43, no. 1, pp. 220–237, Jan. 2021. [Online]. Available: https://ieeexplore.ieee.org/document/8744328

work page arXiv 2021
[12]

Geneva: World Health Organization, 2023

Global Status Report on Road Safety 2023, 1st ed. Geneva: World Health Organization, 2023

work page 2023
[13]

Anticipating accidents in dashcam videos,

F.-H. Chan, Y .-T. Chen, Y . Xiang, and M. Sun, “Anticipating accidents in dashcam videos,” in ACCV, S.-H. Lai, V . Lepetit, K. Nishino, and Y . Sato, Eds. Cham: Springer Int. Publishing, 2017, pp. 136–153

work page 2017
[14]

DoTA: Unsupervised detection of traffic anomaly in driving videos,

Y . Yao, X. Wang, M. Xu, Z. Pu, Y . Wang, E. Atkins, and D. J. Crandall, “DoTA: Unsupervised detection of traffic anomaly in driving videos,” IEEE TPAMI, vol. 45, no. 1, pp. 444–459, Jan. 2023

work page 2023
[15]

Revisiting video saliency: A large-scale benchmark and a new model,

W. Wang, J. Shen, F. Guo, M.-M. Cheng, and A. Borji, “Revisiting video saliency: A large-scale benchmark and a new model,” in CVPR, 2018, pp. 4894–4903

work page 2018
[16]

Driver anomaly detection: A dataset and contrastive learning approach,

O. Kopuklu, J. Zheng, H. Xu, and G. Rigoll, “Driver anomaly detection: A dataset and contrastive learning approach,” in 2021 IEEE Winter Conf. on Appl. of Comput. Vis.. (W ACV). Waikoloa, HI, USA: IEEE, Jan. 2021, pp. 91–100. [Online]. Available: https://ieeexplore.ieee.org/document/9423242/

work page arXiv 2021
[17]

Traffic accident benchmark for causality recogni- tion,

T. You and B. Han, “Traffic accident benchmark for causality recogni- tion,” in ECCV, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds. Cham: Springer Int. Publishing, 2020, pp. 540–556

work page 2020
[18]

Dada-2000: Can driving accident be predicted by driver attention ƒ analyzed by a benchmark

J. Fang, D. Yan, J. Qiao, J. Xue, H. Wang, and S. Li, “DADA-2000: Can driving accident be predicted by driver attentionƒ analyzed by a benchmark,” in 2019 IEEE Intell. Transp. Syst. Conf. (ITSC). Auckland, New Zealand: IEEE Press, 2019, pp. 4303–4309. [Online]. Available: https://doi.org/10.1109/ITSC.2019.8917218

work page doi:10.1109/itsc.2019.8917218 2000
[19]

A measure of motion salience for surveillance applications,

R. Wildes, “A measure of motion salience for surveillance applications,” in ICIP, Oct. 1998, pp. 183–187 vol.3. [Online]. Available: https: //ieeexplore.ieee.org/document/727163

work page 1998
[20]

Detecting salient motion by accumulating directionally- consistent flow,

L. Wixson, “Detecting salient motion by accumulating directionally- consistent flow,”IEEE TPAMI, vol. 22, no. 8, pp. 774–780, Aug. 2000. [Online]. Available: https://ieeexplore.ieee.org/document/868680

work page 2000
[21]

The dis- criminant center-surround hypothesis for bottom-up saliency,

D. Gao, V . Mahadevan, and N. Vasconcelos, “The dis- criminant center-surround hypothesis for bottom-up saliency,” in NeurIPS, vol. 20. Curran Associates, Inc., 2007. [Online]. Available: https://papers.nips.cc/paper files/paper/2007/hash/ 51ef186e18dc00c2d31982567235c559-Abstract.html

work page 2007
[22]

A model of motion attention for video skimming,

Y .-F. Ma and H.-J. Zhang, “A model of motion attention for video skimming,” in ICIP, vol. 1, Sep. 2002, pp. I–I. [Online]. Available: https://ieeexplore.ieee.org/document/1037976

work page arXiv 2002
[23]

Static and space-time visual saliency detection by self-resemblance,

H. J. Seo and P. Milanfar, “Static and space-time visual saliency detection by self-resemblance,” J. Vis., vol. 9, no. 12, p. 15, Nov. 2009. [Online]. Available: https://doi.org/10.1167/9.12.15

work page doi:10.1167/9.12.15 2009
[24]

Spatiotemporal saliency detection and its applications in static and dynamic scenes,

W. Kim, C. Jung, and C. Kim, “Spatiotemporal saliency detection and its applications in static and dynamic scenes,” IEEE TCSVT, vol. 21, no. 4, pp. 446–456, Apr. 2011. [Online]. Available: https://ieeexplore.ieee.org/document/5728853

work page arXiv 2011
[25]

Spatiotemporal saliency in dynamic scenes,

V . Mahadevan and N. Vasconcelos, “Spatiotemporal saliency in dynamic scenes,” IEEE TPAMI, vol. 32, no. 1, pp. 171–177, Jan. 2010. [Online]. Available: https://ieeexplore.ieee.org/document/4967608

work page arXiv 2010
[26]

Video saliency incorporating spatiotemporal cues and uncertainty weighting,

Y . Fang, Z. Wang, W. Lin, and Z. Fang, “Video saliency incorporating spatiotemporal cues and uncertainty weighting,” IEEE TIP, vol. 23, no. 9, pp. 3910–3921, Sep. 2014. [Online]. Available: https://ieeexplore.ieee.org/document/6857361

work page arXiv 2014
[27]

A generic framework of user attention model and its application in video summarization,

Y .-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang, “A generic framework of user attention model and its application in video summarization,” IEEE TMM, vol. 7, no. 5, pp. 907–919, Oct. 2005. [Online]. Available: https://ieeexplore.ieee.org/document/1510638

work page arXiv 2005
[28]

Visual attention detection in video sequences using spatiotemporal cues,

Y . Zhai and M. Shah, “Visual attention detection in video sequences using spatiotemporal cues,” in ACM MM, ser. MM ’06. New York, NY , USA: Association for Computing Machinery, Oct. 2006, pp. 815–824. [Online]. Available: https://doi.org/10.1145/1180639.1180824

work page doi:10.1145/1180639.1180824 2006
[29]

Predicting visual fixations on video based on low-level visual features,

O. Le Meur, P. Le Callet, and D. Barba, “Predicting visual fixations on video based on low-level visual features,” Vis. Res., vol. 47, no. 19, pp. 2483–2498, Sep. 2007. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0042698907002593

work page 2007
[30]

How many bits does it take for a stimulus to be salient?

S. H. Khatoonabadi, N. Vasconcelos, I. V . Baji ´c, and Y . Shan, “How many bits does it take for a stimulus to be salient?” in CVPR, Jun. 2015, pp. 5501–5510. [Online]. Available: https: //ieeexplore.ieee.org/document/7299189

work page arXiv 2015
[31]

Salient motion detection in compressed domain,

K. Muthuswamy and D. Rajan, “Salient motion detection in compressed domain,” IEEE Sign. Process. Letters, vol. 20, no. 10, pp. 996–999, 2013

work page 2013
[32]

Region-of-interest based compressed domain video transcoding scheme,

A. Sinha, G. Agarwal, and A. Anbu, “Region-of-interest based compressed domain video transcoding scheme,” in ICASSP, vol. 3. Montreal, Que., Canada: IEEE, 2004, pp. iii–161–4. [Online]. Available: http://ieeexplore.ieee.org/document/1326506/

work page arXiv 2004
[33]

Bayesian surprise attracts human attention,

L. Itti and P. Baldi, “Bayesian surprise attracts human attention,” Vis. Res., vol. 49, no. 10, pp. 1295–1306, Jun. 2009. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0042698908004380

work page 2009
[34]

SUN: A bayesian framework for saliency using natural statistics,

L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W. Cottrell, “SUN: A bayesian framework for saliency using natural statistics,” J. Vis., vol. 8, no. 7, p. 32, Dec. 2008. [Online]. Available: http://jov.arvojournals.org/article.aspx?doi=10.1167/8.7.32

work page doi:10.1167/8.7.32 2008
[35]

A new perceived motion based shot content representation,

Y .-F. Ma and H.-J. Zhang, “A new perceived motion based shot content representation,” in ICIP, vol. 3, Oct. 2001, pp. 426–429. [Online]. Available: https://ieeexplore.ieee.org/document/958142

work page 2001
[36]

Dynamic visual attention: Searching for coding length increments,

X. Hou and L. Zhang, “Dynamic visual attention: Searching for coding length increments,” in NeurIPS, vol. 21. Curran Associates, Inc.,

work page
[37]

Available: https://papers.nips.cc/paper files/paper/2008/ hash/a8baa56554f96369ab93e4f3bb068c22-Abstract.html

[Online]. Available: https://papers.nips.cc/paper files/paper/2008/ hash/a8baa56554f96369ab93e4f3bb068c22-Abstract.html

work page 2008
[38]

Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform,

C. Guo, Q. Ma, and L. Zhang, “Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform,” in CVPR, Jun. 2008, pp. 1–8. [Online]. Available: https://ieeexplore.ieee.org/document/ 4587715

work page 2008
[39]

A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression,

C. Guo and L. Zhang, “A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression,” IEEE TIP, vol. 19, no. 1, pp. 185–198, 2010. [Online]. Available: https://ieeexplore.ieee.org/document/5223506

work page arXiv 2010
[40]

Dynamic whitening saliency,

V . Lebor ´an, A. Garc ´ıa-D´ıaz, X. R. Fdez-Vidal, and X. M. Pardo, “Dynamic whitening saliency,” IEEE TPAMI, vol. 39, no. 5, pp. 893–907, May 2017. [Online]. Available: https://ieeexplore.ieee.org/ document/7469361

work page arXiv 2017
[41]

Clustering of gaze during dynamic scene viewing is predicted by motion,

P. K. Mital, T. J. Smith, R. L. Hill, and J. M. Henderson, “Clustering of gaze during dynamic scene viewing is predicted by motion,” Cogn. Comput., vol. 3, no. 1, pp. 5–24, Mar. 2011. [Online]. Available: https://doi.org/10.1007/s12559-010-9074-z

work page doi:10.1007/s12559-010-9074-z 2011
[42]

Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition,

S. Mathe and C. Sminchisescu, “Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition,” IEEE TPAMI, vol. 37, no. 7, pp. 1408–1424, Jul. 2015. [Online]. Available: https://ieeexplore.ieee.org/document/6942210

work page arXiv 2015
[43]

Eye-tracking database for a set of standard video sequences,

H. Hadizadeh, M. J. Enriquez, and I. V . Bajic, “Eye-tracking database for a set of standard video sequences,” IEEE TIP, vol. 21, no. 2, pp. 898–903, Feb. 2012. [Online]. Available: https://ieeexplore.ieee.org/document/5986709 10

work page arXiv 2012
[44]

Automatic foveation for video compression using a neurobiologi- cal model of visual attention,

L. Itti, “Automatic foveation for video compression using a neurobiologi- cal model of visual attention,”IEEE TIP, vol. 13, no. 10, pp. 1304–1318, Oct. 2004

work page 2004
[45]

Two-stream convolutional networks for action recognition in videos,

K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in NeurIPS, ser. NIPS’14, vol. 1. Cambridge, MA, USA: MIT Press, Dec. 2014, pp. 568–576

work page 2014
[46]

Deepvs: A deep learning based video saliency prediction approach,

L. Jiang, M. Xu, T. Liu, M. Qiao, and Z. Wang, “Deepvs: A deep learning based video saliency prediction approach,” in ECCV, 2018, pp. 602–617

work page 2018
[47]

Spatio-temporal saliency networks for dynamic saliency prediction,

C. Bak, A. Kocak, E. Erdem, and A. Erdem, “Spatio-temporal saliency networks for dynamic saliency prediction,” IEEE TMM, vol. 20, no. 7, pp. 1688–1698, Jul. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8119879

work page arXiv 2018
[48]

Video saliency prediction based on spatial- temporal two-stream network,

K. Zhang and Z. Chen, “Video saliency prediction based on spatial- temporal two-stream network,” IEEE TCSVT, vol. 29, no. 12, pp. 3544–3557, Dec. 2019. [Online]. Available: https://ieeexplore.ieee.org/ document/8543830

work page arXiv 2019
[49]

Salsac: A video saliency prediction model with shuffled attentions and correlation-based convlstm,

X. Wu, Z. Wu, J. Zhang, L. Ju, and S. Wang, “Salsac: A video saliency prediction model with shuffled attentions and correlation-based convlstm,” in AAAI, vol. 34, 2020, pp. 12 410–12 417

work page 2020
[50]

Simple vs complex temporal recurrences for video saliency prediction

P. Linardos, E. Mohedano, J. J. Nieto, N. E. O’Connor, X. Gir ´o-i Nieto, and K. McGuinness, “Simple vs complex temporal recurrences for video saliency prediction,” arXiv preprint arXiv:1907.01869, 2019

work page Pith review arXiv 1907
[51]

Going from image to video saliency: Augmenting image salience with dynamic attentional push,

S. Gorji and J. J. Clark, “Going from image to video saliency: Augmenting image salience with dynamic attentional push,” in CVPR, Jun. 2018, pp. 7501–7511. [Online]. Available: https: //ieeexplore.ieee.org/document/8578881

work page arXiv 2018
[52]

Unified image and video saliency modeling,

R. Droste, J. Jiao, and J. A. Noble, “Unified image and video saliency modeling,” in ECCV, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds., vol. 12350. Cham: Springer Int. Publishing, 2020, pp. 419–435. [Online]. Available: https://link.springer.com/10. 1007/978-3-030-58558-7 25

work page 2020
[53]

Predicting human eye fixations via an LSTM-based saliency attentive model,

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “Predicting human eye fixations via an LSTM-based saliency attentive model,” IEEE TIP, vol. 27, no. 10, pp. 5142–5154, Oct. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8400593

work page arXiv 2018
[54]

A spatial-temporal recurrent neural network for video saliency prediction,

K. Zhang, Z. Chen, and S. Liu, “A spatial-temporal recurrent neural network for video saliency prediction,” IEEE TIP, vol. 30, pp. 572–587,

work page
[55]

Available: https://ieeexplore.ieee.org/document/9263359

[Online]. Available: https://ieeexplore.ieee.org/document/9263359

work page arXiv
[56]

Deep3DSaliency: Deep stereo- scopic video saliency detection model by 3d convolutional networks,

Y . Fang, G. Ding, J. Li, and Z. Fang, “Deep3DSaliency: Deep stereo- scopic video saliency detection model by 3d convolutional networks,” IEEE TIP, Dec. 2018

work page 2018
[57]

Tased-net: Temporally-aggregating spatial encoder-decoder network for video saliency detection,

K. Min and J. J. Corso, “Tased-net: Temporally-aggregating spatial encoder-decoder network for video saliency detection,” in CVPR, 2019, pp. 2394–2403

work page 2019
[58]

Spatio-temporal self-attention network for video saliency prediction,

Z. Wang, Z. Liu, G. Li, Y . Wang, T. Zhang, L. Xu, and J. Wang, “Spatio-temporal self-attention network for video saliency prediction,” IEEE TMM, vol. 25, pp. 1161–1174, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/9667292/

work page arXiv 2023
[59]

Hierarchical domain-adapted feature learning for video saliency prediction,

G. Bellitto, F. Proietto Salanitri, S. Palazzo, F. Rundo, D. Giordano, and C. Spampinato, “Hierarchical domain-adapted feature learning for video saliency prediction,” IJCV, vol. 129, no. 12, pp. 3216– 3232, Dec. 2021. [Online]. Available: https://link.springer.com/10.1007/ s11263-021-01519-y

work page 2021
[60]

Video saliency prediction using spatiotemporal residual attentive networks,

Q. Lai, W. Wang, H. Sun, and J. Shen, “Video saliency prediction using spatiotemporal residual attentive networks,” IEEE TIP, vol. 29, pp. 1113–1126, 2019

work page 2019
[61]

Temporal-spatial feature pyramid for video saliency detection,

Q. Chang and S. Zhu, “Temporal-spatial feature pyramid for video saliency detection,” Sep. 2021, arXiv:2105.04213. [Online]. Available: http://arxiv.org/abs/2105.04213

work page arXiv 2021
[62]

ViNet: Pushing the limits of visual modality for audio-visual saliency prediction,

S. Jain, P. Yarlagadda, S. Jyoti, S. Karthik, R. Sub- ramanian, and V . Gandhi, “ViNet: Pushing the limits of visual modality for audio-visual saliency prediction,” in 2021 IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS). Prague, Czech Republic: IEEE, Sep. 2021, pp. 3520–3527. [Online]. Available: https://ieeexplore.ieee.org/document/9635989/

work page arXiv 2021
[63]

Transformer-based multi-scale feature integration network for video saliency prediction,

X. Zhou, S. Wu, R. Shi, B. Zheng, S. Wang, H. Yin, J. Zhang, and C. Yan, “Transformer-based multi-scale feature integration network for video saliency prediction,” IEEE TCSVT, vol. 33, no. 12, pp. 7696–7707, Dec. 2023. [Online]. Available: https: //ieeexplore.ieee.org/document/10130326/authors#authors

work page arXiv 2023
[64]

SalFoM: Dynamic saliency prediction with video founda- tion models,

M. Moradi, M. Moradi, F. Rundo, C. Spampinato, A. Borji, and S. Palazzo, “SalFoM: Dynamic saliency prediction with video founda- tion models,” in Pattern Recognition, A. Antonacopoulos, S. Chaudhuri, R. Chellappa, C.-L. Liu, S. Bhattacharya, and U. Pal, Eds. Cham: Springer Nature Switzerland, 2025, pp. 33–48

work page 2025
[65]

TM2SP: A transformer-based multi-level spatiotem- poral feature pyramid network for video saliency prediction,

C. Li and S. Liu, “TM2SP: A transformer-based multi-level spatiotem- poral feature pyramid network for video saliency prediction,” IEEE TCSVT, vol. 35, no. 6, pp. 5236–5250, Jun. 2025. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10841372/authors

work page arXiv 2025
[66]

Video saliency forecasting transformer,

C. Ma, H. Sun, Y . Rao, J. Zhou, and J. Lu, “Video saliency forecasting transformer,” IEEE TCSVT, vol. 32, no. 10, pp. 6850–6862, Oct. 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9770033/

work page arXiv 2022
[67]

Transformer-based video saliency prediction with high temporal dimension decoding,

M. Moradi, S. Palazzo, and C. Spampinato, “Transformer-based video saliency prediction with high temporal dimension decoding,” Jan. 2024, arXiv:2401.07942. [Online]. Available: http://arxiv.org/abs/2401.07942

work page arXiv 2024
[68]

Scalability in perception for autonomous driving: Waymo open dataset,

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine, V . Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y . Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” in CVPR, Jun. 2020, pp. 2443...

work page arXiv 2020
[69]

Large scale interactive motion forecasting for autonomous driving : The waymo open motion dataset,

S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, B. Sapp, C. Qi, Y . Zhou, Z. Yang, A. Chouard, P. Sun, J. Ngiam, V . Vasudevan, A. McCauley, J. Shlens, and D. Anguelov, “Large scale interactive motion forecasting for autonomous driving : The waymo open motion dataset,” in ICCV, Oct. 2021, pp. 9690–9699. [Online]. Available: https:/...

work page arXiv 2021
[70]

Exploring the limitations of behavior cloning for autonomous driving,

F. Codevilla, E. Santana, A. Lopez, and A. Gaidon, “Exploring the limitations of behavior cloning for autonomous driving,” in ICCV, Oct. 2019, pp. 9328–9337. [Online]. Available: https://ieeexplore.ieee.org/ document/9009463

work page arXiv 2019
[71]

Safety-critical learning for long-tail events: The TUM traffic accident dataset,

W. Zimmer, R. Greer, X. Zhou, R. Song, M. Pavel, D. Lehmberg, A. Ghita, A. Gopalkrishnan, M. Trivedi, and A. Knoll, “Safety-critical learning for long-tail events: The TUM traffic accident dataset,” Aug

work page
[72]

Available: http://arxiv.org/abs/2508.14567

[Online]. Available: http://arxiv.org/abs/2508.14567

work page arXiv
[73]

Uncertainty-based traffic accident anticipation with spatio-temporal relational learning,

W. Bao, Q. Yu, and Y . Kong, “Uncertainty-based traffic accident anticipation with spatio-temporal relational learning,” in ACM MM, ser. MM ’20. New York, NY , USA: Association for Computing Machinery, Oct. 2020, pp. 2682–2690. [Online]. Available: https: //dl.acm.org/doi/10.1145/3394171.3413827

work page doi:10.1145/3394171.3413827 2020
[74]

Advances, challenges, and future research needs in machine learning-based crash prediction models: A systematic review,

Y . Ali, F. Hussain, and M. M. Haque, “Advances, challenges, and future research needs in machine learning-based crash prediction models: A systematic review,” Accident Analysis & Prevention, vol. 194, p. 107378, Jan. 2024. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0001457523004256

work page 2024
[75]

PLOS Computational Biology21, e1012101 (2025)

H. Li and L. Chen, “Traffic accident risk prediction based on deep learning and spatiotemporal features of vehicle trajectories,” PLOS ONE, vol. 20, no. 5, p. e0320656, May 2025. [Online]. Available: https://journals.plos.org/plosone/article?id=10.1371/journal. pone.0320656

work page doi:10.1371/journal 2025
[76]

Prediction of traffic accident risk based on vehicle trajectory data,

H. Li and L. Yu, “Prediction of traffic accident risk based on vehicle trajectory data,” Traffic Injury Prevention, vol. 26, no. 2, pp. 164–171, 2025

work page 2025
[77]

A dynamic spatial-temporal attention network for early anticipation of traffic accidents,

M. M. Karim, Y . Li, R. Qin, and Z. Yin, “A dynamic spatial-temporal attention network for early anticipation of traffic accidents,” IEEE Trans. on Intell. Transp Syst., vol. 23, no. 7, pp. 9590–9600, Jul. 2022. [Online]. Available: https://doi.org/10.1109/TITS.2022.3155613

work page doi:10.1109/tits.2022.3155613 2022
[78]

Applying computational tools to predict gaze direction in interactive visual environments,

R. J. Peters and L. Itti, “Applying computational tools to predict gaze direction in interactive visual environments,” ACM Trans. Appl. Percept., vol. 5, no. 2, pp. 9:1–9:19, May 2008. [Online]. Available: https://doi.org/10.1145/1279920.1279923

work page doi:10.1145/1279920.1279923 2008
[79]

Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios,

R. BT et al., “Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios,” Int. radio consultative committee Int. telecommunication union, Switzerland, CCIR Rep, 2011

work page 2011
[80]

Fast 2d complex gabor filter with kernel decomposition,

J. Kim, S. Um, and D. Min, “Fast 2d complex gabor filter with kernel decomposition,” IEEE TIP, vol. 27, no. 4, pp. 1713–1722, Apr. 2018. [Online]. Available: http://ieeexplore.ieee.org/document/8207611/

work page arXiv 2018

Showing first 80 references.