pith. sign in

Jian Luan

Identifiers

  • name variant Jian Luan 0.60 · backfill

Papers (35)

  1. UniTranslator: A Unified Multi-modal Framework for End-to-end In-Image Machine Translation cs.CV · 2026 · author #9
  2. RS-Gen: A Multi-Stage Agentic Framework for Reasoning and Search-Augmented Image Generation cs.CV · 2026 · author #5
  3. ELVA: Exploring Ranking-Driven Universal Multimodal Retrieval cs.IR · 2026 · author #10
  4. STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-Training cs.AI · 2026 · author #5
  5. Teaching the Way, Not the Answer: Privileged Tutoring Distillation for Multimodal Policy Optimization cs.AI · 2026 · author #5
  6. SpeakerCard-1M: An Evidence-Grounded Corpus for In-the-Wild Speaker Verification eess.AS · 2026 · author #11
  7. Restoring Initial Noise Sensitivity in Text-to-Image Distillation via Geometric Alignment cs.CV · 2026 · author #6
  8. Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text cs.SD · 2026 · author #9
  9. Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation cs.AI · 2026 · author #7
  10. PixelWizard: Towards Efficient High-Fidelity Video Generation at Ultra-Large Spatial Resolution cs.CV · 2026 · author #6
  11. ScaleWoB: Guiding GUI Agents with Coding Agents via Large-Scale Environmental Synthesis cs.AI · 2026 · author #5
  12. PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media cs.CV · 2026 · author #9
  13. Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment cs.LG · 2026 · author #8
  14. How Mobile World Model Guides GUI Agents? cs.AI · 2026 · author #11
  15. StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video cs.CV · 2026 · author #9
  16. Reducing Linguistic Hallucination in LM-Based Speech Enhancement via Noise-Invariant Acoustic-Semantic Distillation eess.AS · 2026 · author #8
  17. Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding eess.AS · 2026 · author #8
  18. TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis cs.CL · 2026 · author #10
  19. ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling cs.MM · 2026 · author #13
  20. Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA cs.CL · 2026 · author #10
  21. Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models cs.CV · 2026 · author #10
  22. Borderless Long Speech Synthesis cs.SD · 2026 · author #15
  23. From Ideal to Real: Stable Video Object Removal under Imperfect Conditions cs.CV · 2026 · author #7
  24. Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension cs.CV · 2026 · author #8
  25. Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation cs.CV · 2026 · author #9
  26. Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models cs.CL · 2026 · author #9
  27. GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models cs.AI · 2026 · author #10
  28. REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding cs.CV · 2025 · author #10
  29. Revisiting Entropy in Reinforcement Learning for Large Reasoning Models cs.CL · 2025 · author #8
  30. Enhancing Trustworthy GUI Grounding via Self-Critiqued Reinforcement Learning cs.CV · 2025 · author #11
  31. MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks eess.AS · 2025 · author #10
  32. Mobile GUI Agents under Real-world Threats: Are We There Yet? cs.CR · 2025 · author #7
  33. End-to-End Optimization of LLM-Driven Multi-Agent Search Systems via Heterogeneous-Group-Based Reinforcement Learning cs.LG · 2025 · author #5
  34. Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding cs.CV · 2025 · author #16
  35. Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security cs.HC · 2024 · author #14

Mentions

  • 2601.18197 #10 · arxiv_oai · confidence 0.70 Jian Luan
  • 2606.24333 #9 · arxiv_oai · confidence 0.70 Jian Luan
  • 2606.23221 #5 · arxiv_oai · confidence 0.70 Jian Luan
  • 2606.20280 #10 · arxiv_oai · confidence 0.70 Jian Luan
  • 2606.17979 #5 · arxiv_oai · confidence 0.70 Jian Luan
  • 2606.07000 #5 · arxiv_oai · confidence 0.70 Jian Luan
  • 2602.02994 #9 · arxiv_oai · confidence 0.70 Jian Luan
  • 2606.03283 #11 · arxiv_oai · confidence 0.70 Jian Luan
  • 2606.01651 #6 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.27838 #9 · arxiv_oai · confidence 0.70 Jian Luan
  • 2510.27266 #11 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.27134 #7 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.25801 #6 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.25160 #5 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.10347 #11 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.16381 #9 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.14311 #8 · arxiv_oai · confidence 0.70 Jian Luan
  • 2503.13377 #16 · arxiv_oai · confidence 0.70 Jian Luan
  • 2401.05459 #14 · arxiv_oai · confidence 0.70 Jian Luan

Frequent Coauthors