NeuroKalman mitigates state drift in vision-language UAV navigation by using memory-augmented Kalman filtering where attention retrieves historical anchors to correct predictions without gradient updates.
hub Canonical reference
Samurai: Adapting segment anything model for zero-shot visual tracking with motion-aware memory.arXiv preprint arXiv:2411.11922
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
ViewSAM achieves state-of-the-art weakly supervised performance on cross-view referring multi-object tracking by refining SAM tracklets via affinity-guided re-prompting and modeling view-induced variations as learnable conditions on SAM2.
HOIGS adds a cross-attention HOI module to Gaussian Splatting that combines HexPlane human features with Cubic Hermite Spline object features to model interaction-induced deformations.
HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.
SAMOSA adapts SAM 2 for complex visual object tracking by integrating explicit nonlinear motion prediction, semantic cues for failure recovery, and geometric constraints for stability, outperforming prior SAM 2-based and supervised methods on benchmarks including anti-UAV datasets.
A nine-camera multi-view workflow with 4D Gaussian Splatting reconstructs dynamic vessel surfaces in thrombectomy phantoms to enable standardized comparative displacement and stress-proxy tracking.
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.
citing papers explorer
-
Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering
NeuroKalman mitigates state drift in vision-language UAV navigation by using memory-augmented Kalman filtering where attention retrieves historical anchors to correct predictions without gradient updates.
-
3AM: 3egment Anything with Geometric Consistency in Videos
3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
-
SAM 3: Segment Anything with Concepts
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
-
ViewSAM: Learning View-aware Cross-modal Semantics for Weakly Supervised Cross-view Referring Multi-Object Tracking
ViewSAM achieves state-of-the-art weakly supervised performance on cross-view referring multi-object tracking by refining SAM tracklets via affinity-guided re-prompting and modeling view-induced variations as learnable conditions on SAM2.
-
HOIGS: Human-Object Interaction Gaussian Splatting
HOIGS adds a cross-attention HOI module to Gaussian Splatting that combines HexPlane human features with Cubic Hermite Spline object features to model interaction-induced deformations.
-
HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis
HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.
-
Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking
SAMOSA adapts SAM 2 for complex visual object tracking by integrating explicit nonlinear motion prediction, semantic cues for failure recovery, and geometric constraints for stability, outperforming prior SAM 2-based and supervised methods on benchmarks including anti-UAV datasets.
-
4D Vessel Reconstruction for Benchtop Thrombectomy Analysis
A nine-camera multi-view workflow with 4D Gaussian Splatting reconstructs dynamic vessel surfaces in thrombectomy phantoms to enable standardized comparative displacement and stress-proxy tracking.
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
-
Cosmos World Foundation Model Platform for Physical AI
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.