Graph Unfolding and Sampling for Transitory Video Keyframe Selection via Gershgorin Disc Alignment
read the original abstract
User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear-time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of $N$ frames in a UGV as an $M$-hop path graph $\cG^o$ for $M \ll N$, where the similarity between two frames within $M$ time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then ``unfold'' $\cG^o$ to a $1$-hop path graph $\cG$, specified by a generalized graph Laplacian matrix $\cL$, via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue $\lambda_{\min}(\B)$ of a coefficient matrix $\B = \diag{\h} + \mu \cL$, where $\h$ is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound $\lambda^-_{\min}(\B)$ by choosing $\h$ via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Experiments on multiple short video datasets show that our algorithm achieves comparable or better keyframe selection performance compared to state-of-the-art methods, at a substantially reduced complexity.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.