pith. sign in

arxiv: 2408.01859 · v2 · pith:SSGV7KYInew · submitted 2024-08-03 · 💻 cs.CV · eess.IV· eess.SP

Graph Unfolding and Sampling for Transitory Video Keyframe Selection via Gershgorin Disc Alignment

classification 💻 cs.CV eess.IVeess.SP
keywords graphgershgorinsamplingframeskeyframeselectionalgorithmalignment
0
0 comments X
read the original abstract

User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear-time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of $N$ frames in a UGV as an $M$-hop path graph $\cG^o$ for $M \ll N$, where the similarity between two frames within $M$ time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then ``unfold'' $\cG^o$ to a $1$-hop path graph $\cG$, specified by a generalized graph Laplacian matrix $\cL$, via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue $\lambda_{\min}(\B)$ of a coefficient matrix $\B = \diag{\h} + \mu \cL$, where $\h$ is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound $\lambda^-_{\min}(\B)$ by choosing $\h$ via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Experiments on multiple short video datasets show that our algorithm achieves comparable or better keyframe selection performance compared to state-of-the-art methods, at a substantially reduced complexity.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.