pith. sign in

Xiangyu Zhang

Identifiers

  • name variant Xiangyu Zhang 0.60 · backfill

Papers (53)

  1. MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models cs.RO · 2026 · author #7
  2. The WER Trap: Shattering the Illusion of Unified Tokens in Speech Language Models eess.AS · 2026 · author #1
  3. Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles cs.AI · 2026 · author #3
  4. AndroidDaily: A Verifiable Benchmark for Mobile GUI Agents on Real-World Closed-Source Applications cs.CV · 2026 · author #15
  5. StepAudio 2.5 Technical Report eess.AS · 2026 · author #100
  6. Vision Foundation Models as Generalist Tokenizers for Image Generation cs.CV · 2026 · author #7
  7. Step-Audio-R1.5 Technical Report eess.AS · 2026 · author #18
  8. Spike-NVPT: Learning Robust Visual Prompts via Bio-Inspired Temporal Filtering and Discretization cs.CV · 2026 · author #5
  9. Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials cs.DC · 2026 · author #7
  10. SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments cs.CV · 2026 · author #15
  11. Why Your Tokenizer Fails in Information Fusion: A Timing-Aware Pre-Quantization Fusion for Video-Enhanced Audio Tokenization eess.AS · 2026 · author #1
  12. MemoPhishAgent: Memory-Augmented Multi-Modal LLM Agent for Phishing URL Detection cs.CR · 2026 · author #6
  13. DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder cs.AI · 2026 · author #13
  14. Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models cs.CL · 2025 · author #9
  15. MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation cs.RO · 2025 · author #9
  16. WISCA: A Lightweight Model Transition Method to Improve LLM Training via Weight Scaling cs.LG · 2025 · author #7
  17. A New Class of Asymptotically Distribution-Free Smooth Tests math.ST · 2025 · author #1
  18. Step-Audio 2 Technical Report cs.CL · 2025 · author #108
  19. BugScope: Learn to Find Bugs Like Human cs.SE · 2025 · author #6
  20. VERA: Variational Inference Framework for Jailbreaking Large Language Models cs.CR · 2025 · author #4
  21. Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource cs.CL · 2025 · author #9
  22. Raw Pointer Rewriting with LLMs for Translating C to Safer Rust cs.SE · 2025 · author #6
  23. Step1X-Edit: A Practical Framework for General Image Editing cs.CV · 2025 · author #22
  24. Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model cs.LG · 2025 · author #5
  25. Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction cs.CL · 2025 · author #143
  26. Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model cs.CV · 2025 · author #111
  27. Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety cs.CR · 2025 · author #39
  28. NESA: Relational Neuro-Symbolic Static Program Analysis cs.PL · 2024 · author #8
  29. General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model cs.CV · 2024 · author #12
  30. Poisoning with A Pill: Circumventing Detection in Federated Learning cs.LG · 2024 · author #7
  31. Arbitrage of Energy Storage in Electricity Markets with Deep Reinforcement Learning cs.LG · 2019 · author #3
  32. Meta-SR: A Magnification-Arbitrary Network for Super-Resolution cs.CV · 2019 · author #3
  33. Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples cs.LG · 2018 · author #4
  34. Bounding Box Regression with Uncertainty for Accurate Object Detection cs.CV · 2018 · author #5
  35. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design cs.CV · 2018 · author #2
  36. MetaAnchor: Learning to Detect Objects with Customized Anchors cs.CV · 2018 · author #2
  37. CrowdHuman: A Benchmark for Detecting Human in a Crowd cs.CV · 2018 · author #6
  38. DetNet: A Backbone network for Object Detection cs.CV · 2018 · author #4
  39. ExFuse: Enhancing Feature Fusion for Semantic Segmentation cs.CV · 2018 · author #2
  40. Light-Head R-CNN: In Defense of Two-Stage Object Detector cs.CV · 2017 · author #4
  41. MegDet: A Large Mini-Batch Object Detector cs.CV · 2017 · author #5
  42. Channel Pruning for Accelerating Very Deep Neural Networks cs.CV · 2017 · author #2
  43. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices cs.CV · 2017 · author #1
  44. Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network cs.CV · 2017 · author #2
  45. Identity Mappings in Deep Residual Networks cs.CV · 2016 · author #2
  46. Deep Residual Learning for Image Recognition cs.CV · 2015 · author #2
  47. Accelerating Very Deep Convolutional Networks for Classification and Detection cs.CV · 2015 · author #1
  48. Discrete solitons in self-defocusing systems with $\mathcal{PT}$-symmetric defects nlin.PS · 2015 · author #4
  49. Object Detection Networks on Convolutional Feature Maps cs.CV · 2015 · author #4
  50. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification cs.CV · 2015 · author #2
  51. Efficient and Accurate Approximations of Nonlinear Convolutional Networks cs.CV · 2014 · author #1
  52. Discrete solitons and scattering of lattice waves in guiding arrays with a nonlinear $\mathcal{PT}$-symmetric defect physics.optics · 2014 · author #1
  53. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition cs.CV · 2014 · author #2

Mentions

  • 2606.09827 #7 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 1505.06798 #1 · backfill · confidence 0.70 Xiangyu Zhang
  • 1504.06191 #4 · backfill · confidence 0.70 Xiangyu Zhang
  • 1504.06066 #4 · backfill · confidence 0.70 Xiangyu Zhang
  • 1502.01852 #2 · backfill · confidence 0.70 Xiangyu Zhang
  • 2604.25719 #18 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2508.01973 #1 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2506.22666 #4 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 1411.4229 #1 · backfill · confidence 0.70 Xiangyu Zhang
  • 1411.3944 #1 · backfill · confidence 0.70 Xiangyu Zhang
  • 1406.4729 #2 · backfill · confidence 0.70 Xiangyu Zhang
  • 2605.29209 #1 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2605.27784 #3 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2605.27761 #15 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2605.23463 #100 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2605.18390 #7 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2506.12119 #9 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2502.10248 #111 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2502.11946 #143 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2409.01704 #12 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2507.16632 #108 · arxiv_oai · confidence 0.70 Xiangyu Zhang
  • 2508.19236 #9 · arxiv_oai · confidence 0.70 Xiangyu Zhang

Frequent Coauthors