pith. sign in

Saining Xie

Identifiers

  • name variant Saining Xie 0.60 · backfill

Papers (26)

  1. Benchmarking Visual State Tracking in Multimodal Video Understanding cs.CV · 2026 · author #11
  2. PaintBench: Deterministic Evaluation of Precise Visual Editing cs.GR · 2026 · author #6
  3. Cambrian-P: Pose-Grounded Video Understanding cs.CV · 2026 · author #8
  4. Improved Baselines with Representation Autoencoders cs.CV · 2026 · author #6
  5. Image Generators are Generalist Vision Learners cs.CV · 2026 · author #20
  6. Self-Refining Video Sampling cs.CV · 2026 · author #4
  7. Cambrian-S: Towards Spatial Supersensing in Video cs.CV · 2025 · author #15
  8. Diffusion Transformers with Representation Autoencoders cs.CV · 2025 · author #4
  9. BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset cs.CV · 2025 · author #9
  10. Transfer between Modalities with MetaQueries cs.CV · 2025 · author #12
  11. SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training cs.AI · 2025 · author #5
  12. Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps cs.CV · 2025 · author #11
  13. Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces cs.CV · 2024 · author #6
  14. MetaMorph: Multimodal Understanding and Generation via Instruction Tuning cs.CV · 2024 · author #9
  15. Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think cs.CV · 2024 · author #7
  16. Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs cs.CV · 2024 · author #14
  17. Demystifying CLIP Data cs.CV · 2023 · author #2
  18. Scalable Diffusion Models with Transformers cs.CV · 2022 · author #2
  19. Masked Autoencoders Are Scalable Vision Learners cs.CV · 2021 · author #3
  20. On Network Design Spaces for Visual Recognition cs.CV · 2019 · author #3
  21. Exploring Randomly Wired Neural Networks for Image Recognition cs.CV · 2019 · author #1
  22. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification cs.CV · 2017 · author #1
  23. Aggregated Residual Transformations for Deep Neural Networks cs.CV · 2016 · author #1
  24. Top-Down Learning for Structured Labeling with Convolutional Pseudoprior cs.CV · 2015 · author #1
  25. Holistically-Nested Edge Detection cs.CV · 2015 · author #1
  26. Deeply-Supervised Nets stat.ML · 2014 · author #2

Mentions

  • 1511.07409 #1 · backfill · confidence 0.70 Saining Xie
  • 2604.20329 #20 · arxiv_oai · confidence 0.70 Saining Xie
  • 1504.06375 #1 · backfill · confidence 0.70 Saining Xie
  • 2606.03920 #11 · arxiv_oai · confidence 0.70 Saining Xie
  • 2606.00188 #6 · arxiv_oai · confidence 0.70 Saining Xie
  • 1409.5185 #2 · backfill · confidence 0.70 Saining Xie
  • 2412.14171 #6 · arxiv_oai · confidence 0.70 Saining Xie
  • 2605.22819 #8 · arxiv_oai · confidence 0.70 Saining Xie
  • 2601.18577 #4 · arxiv_oai · confidence 0.70 Saining Xie
  • 2501.09732 #11 · arxiv_oai · confidence 0.70 Saining Xie
  • 2605.18324 #6 · arxiv_oai · confidence 0.70 Saining Xie
  • 2511.04670 #15 · arxiv_oai · confidence 0.70 Saining Xie
  • 2412.14164 #9 · arxiv_oai · confidence 0.70 Saining Xie
  • 2406.16860 #14 · arxiv_oai · confidence 0.70 Saining Xie
  • 2309.16671 #2 · arxiv_oai · confidence 0.70 Saining Xie
  • 2111.06377 #3 · arxiv_oai · confidence 0.70 Saining Xie

Frequent Coauthors