SAGA: Source Attribution of Generative AI Videos
Pith reviewed 2026-05-17 21:28 UTC · model grok-4.3
The pith
SAGA attributes generative AI videos to their exact source model using only 0.5 percent labeled data per class.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SAGA is the first framework for multi-granular source attribution of generative AI videos across authenticity, generation task such as text-to-video or image-to-video, model version, development team, and the exact generator. Its video transformer architecture extracts distinguishing spatio-temporal artifacts from features of a robust vision foundation model, while a data-efficient pretrain-and-attribute strategy achieves state-of-the-art performance with only 0.5 percent source-labeled data per class and matches fully supervised results. Temporal Attention Signatures provide the first visual explanation of why different video generators remain distinguishable by highlighting learned timing,
What carries the argument
The data-efficient pretrain-and-attribute strategy combined with Temporal Attention Signatures inside a video transformer that processes features from a robust vision foundation model to isolate stable spatio-temporal artifacts.
Load-bearing premise
Spatio-temporal artifacts extracted from a robust vision foundation model stay unique, stable, and transferable enough across generators and domains to support accurate attribution even when labeled data is reduced to 0.5 percent per class.
What would settle it
Apply SAGA to videos produced by a new generator unseen during training and measure whether attribution accuracy drops well below the fully supervised baseline.
Figures
read the original abstract
The proliferation of generative AI has led to hyper-realistic synthetic videos, escalating misuse risks and outstripping binary real/fake detectors. We introduce SAGA (Source Attribution of Generative AI videos), the first comprehensive framework to address the urgent need for AI-generated video source attribution at a large scale. Unlike traditional detection, SAGA identifies the specific generative model used. It uniquely provides multi-granular attribution across five levels: authenticity, generation task (e.g., T2V/I2V), model version, development team, and the precise generator, offering far richer forensic insights. Our novel video transformer architecture, leveraging features from a robust vision foundation model, effectively captures spatio-temporal artifacts. Critically, we introduce a data-efficient pretrain-and-attribute strategy, enabling SAGA to achieve state-of-the-art attribution using only 0.5\% of source-labeled data per class, matching fully supervised performance. Furthermore, we propose Temporal Attention Signatures (T-Sigs), a novel interpretability method that visualizes learned temporal differences, offering the first explanation for why different video generators are distinguishable. Extensive experiments on public datasets, including cross-domain scenarios, demonstrate that SAGA sets a new benchmark for synthetic video provenance, providing crucial, interpretable insights for forensic and regulatory applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces SAGA, a framework for multi-granular source attribution of generative AI videos across five levels (authenticity, generation task such as T2V/I2V, model version, development team, and precise generator). It proposes a video transformer architecture leveraging features from a robust vision foundation model to capture spatio-temporal artifacts, combined with a data-efficient pretrain-and-attribute strategy. The central claims are that this achieves state-of-the-art attribution performance using only 0.5% of source-labeled data per class while matching fully supervised results, and that the novel Temporal Attention Signatures (T-Sigs) provide the first explanation for generator distinguishability. Experiments on public datasets including cross-domain scenarios are said to support these results.
Significance. If the performance and interpretability claims hold after addressing controls for confounds, this would advance AI-generated video forensics beyond binary detection by enabling precise provenance tracking with minimal supervision and offering visual explanations of model-specific artifacts. Such capabilities could support regulatory and forensic applications in a domain where generative video misuse is growing rapidly.
major comments (2)
- [Abstract] Abstract: The claim that SAGA achieves state-of-the-art attribution matching fully supervised performance with only 0.5% source-labeled data per class supplies no quantitative details on baselines, error bars, data splits, or ablation studies. This information is load-bearing for evaluating whether the empirical results support the central data-efficiency claim.
- [Abstract] Abstract (cross-domain scenarios): The reported cross-domain results do not include explicit controls to demonstrate that the spatio-temporal artifacts extracted from the vision foundation model are dominated by stable, generator-specific temporal signatures rather than content statistics, prompt distributions, or video length/resolution cues. Without such controls, the multi-granular attribution performance (including version/team-level) could be undermined by distribution shift, directly affecting the weakest assumption underlying both the pretrain-and-attribute pipeline and T-Sigs.
minor comments (1)
- The five attribution granularity levels are listed in the abstract but would benefit from an early table or diagram defining each level with examples to improve clarity for readers.
Simulated Author's Rebuttal
We thank the referee for their insightful comments, which help us improve the clarity and robustness of our work. We address the major comments point by point below, proposing revisions to the manuscript where necessary.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that SAGA achieves state-of-the-art attribution matching fully supervised performance with only 0.5% source-labeled data per class supplies no quantitative details on baselines, error bars, data splits, or ablation studies. This information is load-bearing for evaluating whether the empirical results support the central data-efficiency claim.
Authors: We agree that incorporating quantitative details into the abstract would strengthen the presentation of our central claim. In the revised manuscript, we will update the abstract to include specific metrics, such as the top-1 attribution accuracy with 0.5% labeled data compared to fully supervised baselines, mention the use of standard data splits, and note that error bars and ablation studies are detailed in the experimental sections. This revision will provide the necessary context without exceeding abstract length constraints. revision: yes
-
Referee: [Abstract] Abstract (cross-domain scenarios): The reported cross-domain results do not include explicit controls to demonstrate that the spatio-temporal artifacts extracted from the vision foundation model are dominated by stable, generator-specific temporal signatures rather than content statistics, prompt distributions, or video length/resolution cues. Without such controls, the multi-granular attribution performance (including version/team-level) could be undermined by distribution shift, directly affecting the weakest assumption underlying both the pretrain-and-attribute pipeline and T-Sigs.
Authors: We appreciate this important point on potential confounds. Our experiments across public datasets already incorporate variations in content, prompts, and video properties to test generalization. The Temporal Attention Signatures (T-Sigs) are introduced precisely to highlight generator-specific temporal patterns independent of content. To directly address the referee's concern, we will add explicit control experiments in the revision, such as evaluations on content-matched video pairs or ablations removing temporal components, to confirm that the attribution relies on stable generator signatures rather than spurious cues. revision: yes
Circularity Check
No circularity: empirical architecture and experimental validation
full rationale
The paper introduces an empirical video transformer architecture that extracts spatio-temporal features from a vision foundation model, combined with a pretrain-and-attribute training strategy. Central performance claims (SOTA attribution at 0.5% labeled data per class, multi-granular results, and cross-domain generalization) are presented as outcomes of extensive experiments on public datasets rather than as quantities derived by construction from the paper's own equations or definitions. Temporal Attention Signatures are proposed as a post-hoc interpretability visualization of learned temporal differences, with no indication that they reduce to fitted parameters or self-referential inputs. No load-bearing self-citations, uniqueness theorems, or ansatzes smuggled via prior author work are invoked to force the results; the derivation chain remains self-contained through empirical evaluation.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
novel video transformer architecture... Temporal Attention Signatures (T-Sigs)... data-efficient pretrain-and-attribute strategy... Hard Negative Mining (HNM) objective
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
T-Sigs... unique fingerprints for Real, Seen, and even Unseen generators
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models
Introduces the first passive source attribution benchmark for 22 generative 3D models and a Transformer achieving 97.22% accuracy under full supervision and 77.17% with 1% training data.
-
Video as Natural Augmentation: Towards Unified AI-Generated Image and Video Detection
VINA trains a single detector on images plus video frames using a cross-modal supervised contrastive objective, yielding bidirectional gains and SOTA results on 14 image, video, and in-the-wild benchmarks.
Reference graph
Works this paper leans on
-
[1]
Ibrahim M Alabdulmohsin, Xiaohua Zhai, Alexander Kolesnikov, and Lucas Beyer. Getting vit in shape: Scaling laws for compute- optimal model design.Advances in Neural Information Process- ing Systems, 36, 2024. 2
work page 2024
-
[2]
Irene Amerini, Mauro Barni, Sebastiano Battiato, Paolo Bestagini, Giulia Boato, Tania Sari Bonaventura, Vittoria Bruni, Roberto Caldelli, Francesco De Natale, Rocco De Nicola, et al. Deepfake media forensics: State of the art and challenges ahead.arXiv preprint arXiv:2408.00388, 2024. 1
-
[3]
Vivit: A video vision trans- former
Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luˇci´c, and Cordelia Schmid. Vivit: A video vision trans- former. InProceedings of the IEEE/CVF international conference on computer vision, pages 6836–6846, 2021. 6
work page 2021
-
[4]
Ai-generated content: authorship and inventorship in the age of artificial in- telligence
Rosa Maria Ballardini, Kan He, and Teemu Roos. Ai-generated content: authorship and inventorship in the age of artificial in- telligence. InOnline Distribution of Content in the EU, pages 117–135. Edward Elgar Publishing, 2019. 1
work page 2019
-
[5]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Y am Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023. 1, 6
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[6]
Baoying Chen and Shunquan Tan. Featuretransfer: Unsupervised domain adaptation for cross-domain deepfake detection.Security and Communication Networks, 2021(1):9942754, 2021. 2, 4
work page 2021
-
[7]
Demamba: Ai-generated video detection on million-scale genvideo benchmark, 2024
Haoxing Chen, Y an Hong, Zizheng Huang, Zhuoer Xu, Zhangx- uan Gu, Y aohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, et al. Demamba: Ai-generated video detection on million- scale genvideo benchmark.arXiv preprint arXiv:2405.19707,
-
[8]
Rui Chen, Lei Sun, Jing Tang, Geng Li, and Xiangxiang Chu. Finger: Content aware fine-grained evaluation with reasoning for ai-generated videos.arXiv preprint arXiv:2504.10358, 2025. 2, 4
-
[9]
Seine: Short-to-long video diffusion model for generative transition and prediction
Xinyuan Chen, Y aohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Y u, Y ali Wang, Dahua Lin, Y u Qiao, and Ziwei Liu. Seine: Short-to-long video diffusion model for generative transition and prediction. InThe T welfth International Conference on Learning Representations, 2023. 1
work page 2023
-
[10]
Can we leave deepfake data behind in training deepfake detector?arXiv preprint arXiv:2408.17052,
Jikang Cheng, Zhiyuan Y an, Ying Zhang, Y uhao Luo, Zhongyuan Wang, and Chen Li. Can we leave deepfake data behind in training deepfake detector?arXiv preprint arXiv:2408.17052,
-
[11]
Intriguing properties of synthetic images: from generative adversarial networks to diffusion models
Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, and Luisa V erdoliva. Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. InProceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 973–982, 2023. 2
work page 2023
-
[12]
On the detection of synthetic images generated by diffusion models
Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa V erdoliva. On the detection of synthetic images generated by diffusion models. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023. 2
work page 2023
-
[13]
Raising the bar of ai-generated image detection with clip
Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, and Luisa V erdoliva. Raising the bar of ai-generated image detection with clip. InProceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 4356–4366, 2024. 6
work page 2024
-
[14]
Open set synthetic image source attribution.arXiv preprint arXiv:2308.11557, 2023
Shengbang Fang, Tai D Nguyen, and Matthew C Stamm. Open set synthetic image source attribution.arXiv preprint arXiv:2308.11557, 2023. 2
-
[15]
Towards discovery and attribution of open-world gan generated images
Sharath Girish, Saksham Suri, Sai Saketh Rambhatla, and Abhi- nav Shrivastava. Towards discovery and attribution of open-world gan generated images. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 14094–14103, 2021. 3
work page 2021
-
[16]
Spatiotemporal inconsistency learning for deepfake video detection
Zhihao Gu, Y ang Chen, Taiping Y ao, Shouhong Ding, Jilin Li, Feiyue Huang, and Lizhuang Ma. Spatiotemporal inconsistency learning for deepfake video detection. InProceedings of the 29th ACM international conference on multimedia, pages 3473–3481,
-
[17]
Hierarchical fine-grained image forgery detection and localization
Xiao Guo, Xiaohong Liu, Zhiyuan Ren, Steven Grosz, Iacopo Masi, and Xiaoming Liu. Hierarchical fine-grained image forgery detection and localization. InProceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 3155–3165, 2023. 6
work page 2023
-
[18]
Smart mining for deep metric learning
Ben Harwood, Vijay Kumar BG, Gustavo Carneiro, Ian Reid, and Tom Drummond. Smart mining for deep metric learning. In Proceedings of the IEEE international conference on computer vision, pages 2821–2829, 2017. 4
work page 2017
-
[19]
Gaussian Error Linear Units (GELUs)
Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016. 4
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[20]
A style-based genera- tor architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based genera- tor architecture for generative adversarial networks. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019. 2
work page 2019
-
[21]
Analyzing and improving the image quality of stylegan
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119,
-
[22]
Rohit Kundu, Hao Xiong, Vishal Mohanty, Athula Balachandran, and Amit K Roy-Chowdhury. Towards a universal synthetic video detector: From face or background manipulations to fully ai-generated content.Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, 2025. 1, 3
work page 2025
- [23]
-
[24]
The tug-of-war be- tween deepfake generation and detection.arXiv preprint arXiv:2407.06174, 2024
Hannah Lee, Changyeon Lee, Kevin Farhat, Lin Qiu, Steve Geluso, Aerin Kim, and Oren Etzioni. The tug-of-war be- tween deepfake generation and detection.arXiv preprint arXiv:2407.06174, 2024. 1
-
[25]
Yixuan Li, Xuelin Liu, Xiaoyang Wang, Bu Sung Lee, Shiqi Wang, Anderson Rocha, and Weisi Lin. Fakebench: Probing explainable fake image detection via large multimodal models. arXiv preprint arXiv:2404.13306, 2024. 1
-
[26]
OmniHuman-1: Rethinking the scaling-up of one-stage conditioned human animation models
Gaojie Lin, Jianwen Jiang, Jiaqi Y ang, Zerong Zheng, and Chao Liang. Omnihuman-1: Rethinking the scaling-up of one- stage conditioned human animation models.arXiv preprint arXiv:2502.01061, 2025. 1
-
[27]
Ts2-net: Token shift and selection transformer for text-video retrieval
Y uqi Liu, Pengfei Xiong, Luhui Xu, Shengming Cao, and Qin Jin. Ts2-net: Token shift and selection transformer for text-video retrieval. InEuropean conference on computer vision, pages 319–335. Springer, 2022. 6 9
work page 2022
-
[28]
Qingxuan Lv, Y uezun Li, Junyu Dong, Sheng Chen, Hui Y u, Huiyu Zhou, and Shu Zhang. Domainforensics: Exposing face forgery across domains via bi-directional adaptation.IEEE Trans- actions on Information F orensics and Security, 2024. 2, 4
work page 2024
-
[29]
John Mullan, Duncan Crawbuck, and Aakash Sastry. Hotshot- xl. https://github.com/hotshotco/hotshot-xl,
-
[30]
Towards universal fake image detectors that generalize across generative models
Utkarsh Ojha, Y uheng Li, and Y ong Jae Lee. Towards universal fake image detectors that generalize across generative models. In Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 24480–24489, 2023. 3, 6
work page 2023
- [31]
-
[32]
Ben Pinhasov, Raz Lapid, Rony Ohayon, Moshe Sipper, and Y ehudit Aperstein. Xai-based detection of adversarial attacks on deepfake detectors.arXiv preprint arXiv:2403.02955, 2024. 2
-
[33]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M¨uller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution im- age synthesis.arXiv preprint arXiv:2307.01952, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[34]
Thinking in frequency: Face forgery detection by mining frequency-aware clues
Y uyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by mining frequency-aware clues. InEuropean conference on computer vision, pages 86–103. Springer, 2020. 6
work page 2020
-
[35]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022. 2
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[36]
Ai-generated video purports to show apocalyptic scenes of los angeles wildfires, 2025
Reuters. Ai-generated video purports to show apocalyptic scenes of los angeles wildfires, 2025. 1
work page 2025
-
[37]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 10684– 10695, 2022. 2
work page 2022
-
[38]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gon- tijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photore- alistic text-to-image diffusion models with deep language under- standing.Advances in neural information processing systems, 35: 36479–36494, 2022. 2
work page 2022
-
[39]
Facenet: A unified embedding for face recognition and clustering
Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015. 4
work page 2015
-
[40]
De-fake: Detection and attribution of fake images generated by text-to- image generation models
Zeyang Sha, Zheng Li, Ning Y u, and Y ang Zhang. De-fake: Detection and attribution of fake images generated by text-to- image generation models. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 3418–3432, 2023. 6
work page 2023
-
[41]
Generative ai and intellectual property rights
Jan Smits and Tijn Borghuis. Generative ai and intellectual property rights. InLaw and artificial intelligence: Regulating AI and applying AI in legal practice, pages 323–344. Springer, 2022. 1
work page 2022
-
[42]
Xiufeng Song, Xiao Guo, Jiache Zhang, Qirui Li, Lei Bai, Xi- aoming Liu, Guangtao Zhai, and Xiaohong Liu. On learning multi-modal forgery representation for diffusion generated video detection.The Thirty-eighth Annual Conference on Neural Infor- mation Processing Systems, 2024. 2, 5, 6
work page 2024
-
[43]
Morph Studio. Morph studio. https : / / www . morphstudio.com/, 2024. 1, 6, 7
work page 2024
-
[44]
Chuangchuang Tan, Y ao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Y unchao Wei. Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detec- tion. InProceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 28130–28139, 2024. 3, 6
work page 2024
-
[45]
C2p-clip: Injecting category common prompt in clip to enhance generalization in deepfake detection
Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Y ao Zhao, and Y unchao Wei. C2p-clip: Injecting category common prompt in clip to enhance generalization in deepfake detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7184–7192, 2025. 3
work page 2025
-
[46]
Beyond deepfake images: Detecting ai- generated videos
Danial Samadi V ahdati, Tai D Nguyen, Aref Azizpour, and Matthew C Stamm. Beyond deepfake images: Detecting ai- generated videos. InProceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 4397–4408,
-
[47]
Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008
Laurens V an der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008. 6
work page 2008
-
[48]
Attention is all you need.NeurIPS,
A V aswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, A Gomez, Ł Kaiser, and I Polosukhin. Attention is all you need.NeurIPS,
-
[49]
ModelScope Text-to-Video Technical Report
Jiuniu Wang, Hangjie Y uan, Dayou Chen, Yingya Zhang, Xiang Wang, and Shiwei Zhang. Modelscope text-to-video technical report.arXiv preprint arXiv:2308.06571, 2023. 1
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[50]
Cnn-generated images are surprisingly easy to spot
Sheng-Y u Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. Cnn-generated images are surprisingly easy to spot... for now. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8695–8704,
-
[51]
Dire for diffusion- generated image detection
Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. Dire for diffusion- generated image detection. InProceedings of the IEEE/CVF In- ternational Conference on Computer V ision, pages 22445–22455,
-
[52]
Zhenting Wang, Chen Chen, Yi Zeng, Lingjuan Lyu, and Shiqing Ma. Where did i come from? origin attribution of ai-generated images.Advances in neural information processing systems, 36: 74478–74500, 2023. 1, 2, 3
work page 2023
-
[53]
Dynamicrafter: Animating open-domain im- ages with video diffusion priors
Jinbo Xing, Menghan Xia, Y ong Zhang, Haoxin Chen, Wangbo Y u, Hanyuan Liu, Gongye Liu, Xintao Wang, Ying Shan, and Tien-Tsin Wong. Dynamicrafter: Animating open-domain im- ages with video diffusion priors. InEuropean Conference on Computer V ision, pages 399–417. Springer, 2024. 1
work page 2024
-
[54]
Tall: Thumbnail layout for deepfake video detection
Y uting Xu, Jian Liang, Gengyun Jia, Ziming Y ang, Y anhao Zhang, and Ran He. Tall: Thumbnail layout for deepfake video detection. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 22658–22668, 2023. 1, 6
work page 2023
-
[55]
Improved em- beddings with easy positive triplet mining
Hong Xuan, Abby Stylianou, and Robert Pless. Improved em- beddings with easy positive triplet mining. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer V ision, pages 2474–2482, 2020. 4
work page 2020
-
[56]
Deepfake network architecture attribution
Tianyun Y ang, Ziyao Huang, Juan Cao, Lei Li, and Xirong Li. Deepfake network architecture attribution. InProceedings of the AAAI Conference on Artificial Intelligence, pages 4662–4670,
-
[57]
Progressive open space expansion for open-set model attribution
Tianyun Y ang, Danding Wang, Fan Tang, Xinying Zhao, Juan Cao, and Sheng Tang. Progressive open space expansion for open-set model attribution. InProceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 15856–15865, 2023. 3
work page 2023
-
[58]
David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, Y uchao Gu, Difei Gao, and Mike Zheng Shou. Show-1: Marrying pixel and latent diffusion models for text- to-video generation.International Journal of Computer V ision, pages 1–15, 2024. 7
work page 2024
-
[59]
Open-sora: Democratizing efficient video production for all, 2024
Zangwei Zheng, Xiangyu Peng, Tianji Y ang, Chenhui Shen, Shenggui Li, Hongxin Liu, Y ukun Zhou, Tianyi Li, and Y ang Y ou. Open-sora: Democratizing efficient video production for all, 2024. 1
work page 2024
-
[60]
Shuai Zhou, Chi Liu, Dayong Y e, Tianqing Zhu, Wanlei Zhou, and Philip S Y u. Adversarial attacks and defenses in deep learning: From a perspective of cybersecurity.ACM Computing Surveys, 55(8):1–39, 2022. 2
work page 2022
-
[61]
Unpaired image-to-image translation using cycle-consistent ad- versarial networks
Jun-Y an Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent ad- versarial networks. InProceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017. 2 11
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.