RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
hub Tool reference
arXiv preprint arXiv:2001.02908 (2020)
Tool reference. 80% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
AirQualityBench is a realistic global benchmark using hourly data from 3720 stations across 2021-2025 for six pollutants, preserving native missingness masks and evaluating on inverse-transformed physical scales.
IGSTGNN adds incident-context spatial fusion and temporal impact decay modules to model how events alter traffic patterns, achieving state-of-the-art results on a new time-aligned incident-traffic dataset.
HFD-TM predicts turning movements with 2.49 MAE by hierarchically decomposing corridor flows and enforcing conservation, outperforming Transformer and GRU baselines on six months of Nashville LiDAR data.
UNICA unifies motion planning, rigging, physical simulation, and rendering into a single skeleton-free neural framework that produces next-frame 3D avatar geometry from action inputs and renders it with Gaussian splatting.
CoLA-World jointly trains latent action models and world models with a warm-up phase to achieve co-evolution, matching or exceeding prior two-stage methods in video simulation quality and visual planning performance.
villa-X enhances latent action modeling in VLA models to support zero-shot action planning for unseen robot embodiments and open-vocabulary instructions, yielding better manipulation results in simulation and real-world tests.
UniVLA trains cross-embodiment vision-language-action policies from unlabeled videos via a latent action model in DINO space, beating OpenVLA on benchmarks with 1/20th pretraining compute and 1/10th downstream data.
AgiBot World supplies over 1 million trajectories enabling GO-1 to deliver 30% average gains over Open X-Embodiment and over 60% success on complex dexterous tasks while open-sourcing everything.
SimpleST is a model-agnostic prompt tuning framework that lets pre-trained spatio-temporal GNNs adapt to distribution shifts in traffic data while keeping all original model weights fixed.
A scalable framework harmonizes spatial and temporal representations via low-rank spatial compression and extended temporal horizons to reduce prediction uncertainty in large-scale spatiotemporal tasks.
AQ-Net combines LSTM-attention for time and neural kNN for space to reanalyze PM2.5 at monitored and unmonitored stations using 2013-2017 northern China data.
citing papers explorer
-
RotVLA: Rotational Latent Action for Vision-Language-Action Model
RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
-
AirQualityBench: A Realistic Evaluation Benchmark for Global Air Quality Forecasting
AirQualityBench is a realistic global benchmark using hourly data from 3720 stations across 2021-2025 for six pollutants, preserving native missingness masks and evaluating on inverse-transformed physical scales.
-
Incident-Guided Spatiotemporal Traffic Forecasting
IGSTGNN adds incident-context spatial fusion and temporal impact decay modules to model how events alter traffic patterns, achieving state-of-the-art results on a new time-aligned incident-traffic dataset.
-
Hierarchical Flow Decomposition for Turning Movement Prediction at Signalized Intersections
HFD-TM predicts turning movements with 2.49 MAE by hierarchically decomposing corridor flows and enforcing conservation, outperforming Transformer and GRU baselines on six months of Nashville LiDAR data.
-
UNICA: A Unified Neural Framework for Controllable 3D Avatars
UNICA unifies motion planning, rigging, physical simulation, and rendering into a single skeleton-free neural framework that produces next-frame 3D avatar geometry from action inputs and renders it with Gaussian splatting.
-
Co-Evolving Latent Action World Models
CoLA-World jointly trains latent action models and world models with a warm-up phase to achieve co-evolution, matching or exceeding prior two-stage methods in video simulation quality and visual planning performance.
-
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
villa-X enhances latent action modeling in VLA models to support zero-shot action planning for unseen robot embodiments and open-vocabulary instructions, yielding better manipulation results in simulation and real-world tests.
-
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
UniVLA trains cross-embodiment vision-language-action policies from unlabeled videos via a latent action model in DINO space, beating OpenVLA on benchmarks with 1/20th pretraining compute and 1/10th downstream data.
-
AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
AgiBot World supplies over 1 million trajectories enabling GO-1 to deliver 30% average gains over Open X-Embodiment and over 60% success on complex dexterous tasks while open-sourcing everything.
-
Efficient Prompt Learning for Traffic Forecasting
SimpleST is a model-agnostic prompt tuning framework that lets pre-trained spatio-temporal GNNs adapt to distribution shifts in traffic data while keeping all original model weights fixed.
-
Dimensional Balance Improves Large Scale Spatiotemporal Prediction Performance
A scalable framework harmonizes spatial and temporal representations via low-rank spatial compression and extended temporal horizons to reduce prediction uncertainty in large-scale spatiotemporal tasks.
-
Deep Spatio-Temporal Neural Network for Air Quality Reanalysis
AQ-Net combines LSTM-attention for time and neural kNN for space to reanalyze PM2.5 at monitored and unmonitored stations using 2013-2017 northern China data.