T2T generates textual captions of smartphone activities from encrypted mobile traffic via a flow feature encoder, caption decoder, and automatic annotations produced by the Qwen-VL-Max vision-language model on synchronized screen captures.
Efficient audio captioning with encoder-level knowledge distillation
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
T2T: Captioning Smartphone Activities Using Mobile Traffic
T2T generates textual captions of smartphone activities from encrypted mobile traffic via a flow feature encoder, caption decoder, and automatic annotations produced by the Qwen-VL-Max vision-language model on synchronized screen captures.