hub Mixed citations

Janusvln: Decoupling semantics and spatiality with dual implicit memory for vision-language navigation

· 2025 · arXiv 2509.22548

Mixed citation behavior. Most common role is background (67%).

18 Pith papers citing it

Background 67% of classified citations

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 baseline 1

citation-polarity summary

background 4 baseline 1 unclear 1

representative citing papers

ReAlign: Generalizable Image Forgery Detection via Reasoning-Aligned Representation

cs.CV · 2026-05-15 · unverdicted · novelty 7.0

ReAlign distills LLM-generated reasoning texts into a lightweight AIGI forgery detector via contrastive image-text alignment to improve generalization on complex forgeries.

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

cs.CV · 2026-05-08 · conditional · novelty 7.0 · 3 refs

Reducing visual input to one token per frame in VLA world models maintains or improves long-horizon performance on MetaWorld, LIBERO, and real-robot tasks.

ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

ConeSep tackles noisy triplet correspondences in composed image retrieval by introducing geometric fidelity quantization to locate noise, negative boundary learning for semantic opposites, and targeted unlearning via optimal transport, outperforming prior methods on FashionIQ and CIRR.

Demystifying the Optimal Fair Classifier in Multi-Class Classification

cs.LG · 2026-05-30 · unverdicted · novelty 6.0

Derives tractable optimal fair multi-class classifier and supplies in-processing and post-processing algorithms that converge to the accuracy-fairness Pareto frontier.

Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation

cs.RO · 2026-05-26 · unverdicted · novelty 6.0

A zero-shot unified agent for VLN-CE, ObjectNav, EQA and Aerial-VLN on wheeled, quadruped, humanoid and UAV platforms that translates language and vision inputs into actions via MLLMs plus TDM and SCB mechanisms, matching trained foundation models on multiple benchmarks.

MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving

cs.RO · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

MindVLA-U1 is the first unified streaming VLA architecture that surpasses human drivers on WOD-E2E planning metrics while matching VA latency and preserving language interfaces.

SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

SpaAct activates spatial awareness in VLMs using action retrospection, future frame prediction, and progressive curriculum learning to reach SOTA on VLN-CE benchmarks.

Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Air-Know decouples MLLM-based external arbitration from proxy learning via knowledge internalization and dual-stream training to overcome noisy triplet correspondence in composed image retrieval.

FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

FineCog-Nav uses fine-grained cognitive modules driven by foundation models to outperform zero-shot baselines in UAV navigation and introduces the AerialVLN-Fine benchmark with refined instructions.

MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation

cs.CV · 2026-02-05 · unverdicted · novelty 6.0

MerNav's Memory-Execute-Review framework improves success rates in zero-shot object goal navigation by 5-8% over baselines on four datasets while outperforming both training-free and supervised methods on key benchmarks.

FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving

cs.CV · 2025-05-23 · conditional · novelty 6.0

FSDrive uses a generated future scene frame as visual spatio-temporal CoT to improve VLA models for safer autonomous driving trajectory prediction.

LCGNav: Local Candidate-Aware Geometric Enhancement for General Topological Planning in Vision-Language Navigation

cs.CV · 2026-05-09 · conditional · novelty 5.0

LCGNav improves online topological VLN-CE by converting local depth views to physically truncated 3D point clouds and applying selective dimension-preserving fusion, yielding consistent gains on R2R-CE and RxR-CE benchmarks with open code.

LiveVLN: Breaking the Stop-and-Go Loop in Vision-Language Navigation

cs.RO · 2026-04-21 · unverdicted · novelty 5.0

LiveVLN enables smoother vision-language navigation by overlapping action execution with ongoing observation processing, preserving benchmark scores while cutting real-world waiting time by up to 77.7 percent.

ROSClaw: A Hierarchical Semantic-Physical Framework for Heterogeneous Multi-Agent Collaboration

cs.RO · 2026-04-06 · unverdicted · novelty 5.0

ROSClaw is a hierarchical framework that unifies vision-language model control with e-URDF-based sim-to-real mapping and closed-loop data collection to enable semantic-physical collaboration among heterogeneous multi-agent robots.

Zero-Shot Vulnerability Detection in Low-Resource Smart Contracts Through Solidity-Only Training

cs.CR · 2026-03-22 · unverdicted · novelty 5.0

Sol2Vy transfers vulnerability detection from Solidity to Vyper in zero-shot fashion, outperforming prior methods on reentrancy, weak randomness, and unchecked transfers.

FAST: A Synergistic Framework of Attention and State-space Models for Spatiotemporal Traffic Prediction

cs.LG · 2026-04-15 · unverdicted · novelty 4.0

FAST uses a Temporal-Spatial-Temporal structure with attention and Mamba modules plus learnable embeddings to achieve better accuracy on traffic prediction tasks than previous models.

ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents

cs.CV · 2026-04-11 · unverdicted · novelty 4.0

ABot-Claw is an embodied software layer that adds unified robot scheduling, cross-embodiment visual memory, and critic-driven replanning on top of OpenClaw to support persistent multi-robot execution from natural-language goals.

SEDualVLN: A Spatially-Enhanced Dual-System for Vision-Language Navigation

cs.RO · 2026-05-17

citing papers explorer

Showing 1 of 1 citing paper after filters.

LCGNav: Local Candidate-Aware Geometric Enhancement for General Topological Planning in Vision-Language Navigation cs.CV · 2026-05-09 · conditional · none · ref 19
LCGNav improves online topological VLN-CE by converting local depth views to physically truncated 3D point clouds and applying selective dimension-preserving fusion, yielding consistent gains on R2R-CE and RxR-CE benchmarks with open code.

Janusvln: Decoupling semantics and spatiality with dual implicit memory for vision-language navigation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer