FactoryNet is the first universal pretraining corpus for industrial time-series data with a shared S-E-F-C schema that supports cross-embodiment transfer and competitive anomaly detection.
hub Canonical reference
Idd-x: A multi-view dataset for ego-relative important object localization and explanation in den se and unstructured traffic
Canonical reference. 79% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A smooth exponential obstacle cost with reduction factor in nonlinear MPC allows morphing quadrotors to traverse narrow gaps under limited 2D LiDAR perception.
A Galilean-equivariant filter jointly estimates INS navigation states and unknown GNSS time delays, preserving accuracy and consistency better than EKF in UAV flights and simulations with delays up to 500 ms.
Distance-r Independent Unlabeled Multi-Agent Pathfinding is PSPACE-complete, with reduction-based and configuration-generator algorithms that solve instances with hundreds of agents.
New framework for probabilistic safety shields in MDPs showing impossibility of strong classical guarantees and providing weaker but usable alternatives with offline and online constructions.
BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
LLM-Foraging uses off-the-shelf LLMs for decentralized tactical decisions in CPFA-based swarm foraging, collecting more resources than GA-tuned baselines across 36 varied configurations while showing greater consistency.
ESARBench is the first unified benchmark for MLLM-driven UAV agents that must explore, locate clues, and decide on victim positions in photorealistic simulated SAR environments.
The virtual object MPC framework enables stable shared teleoperation for transporting up to nine objects, cutting sliding distance by 72.45% and eliminating tip-overs compared to baseline.
ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.
CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.
AID trains diffusion policies via behavior cloning on existing MAIPP planners followed by RL fine-tuning to achieve faster execution and higher information gain in multi-agent coordination.
BEVCALIB performs LiDAR-camera calibration from raw data by fusing camera and LiDAR bird's-eye view features with a novel feature selector and reports state-of-the-art accuracy on KITTI and NuScenes.
DORI benchmark shows top vision-language models reach only 54.2% accuracy on coarse orientation tasks and 33% on granular judgments, with sharp drops on reference-frame shifts and compound rotations.
COMPASS uses VLMs to generate and refine code-based strategies with structured communication, achieving 57% win rate on SMACv2 Protoss 5v5 versus 27% for QMIX.
A hierarchical multi-robot motion planner that refines workspace decompositions to enable scalable coordination through discrete search over smaller decoupled subproblems.
LACE aligns human-robot visual features via semantic distribution matching on corresponding body parts plus Gram loss, yielding 65% better zero-shot policy transfer than baseline DINO.
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
VRA grounds discrete-time joint acceleration commands in voltage-constrained actuator physics to eliminate unrealizable accelerations and reduce oscillations in electric motor systems.
VISOR is a VLM-based automated test oracle that evaluates robot task correctness and quality from videos while reporting its own uncertainty, tested on GPT and Gemini across four tasks and over 1000 videos with Gemini showing higher recall and GPT higher precision but low uncertainty-correctness tie
MAG-VLAQ fuses multi-modal ground and aerial data via ODE-conditioned vector-of-locally-aggregated-queries to nearly double recall@1 on aerial-ground place recognition benchmarks.
The paper proposes ray-aware pointer memory with adaptive retain-or-replace updates to improve long-term stability and pose accuracy in streaming 3D reconstruction.
Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.
citing papers explorer
-
FactoryNet: A Large-Scale Dataset toward Industrial Time-Series Foundation Models
FactoryNet is the first universal pretraining corpus for industrial time-series data with a shared S-E-F-C schema that supports cross-embodiment transfer and competitive anomaly detection.
-
Constrained MPC-Based Motion Planning for Morphing Quadrotors in Ultra-Narrow Passages under Limited Perception
A smooth exponential obstacle cost with reduction factor in nonlinear MPC allows morphing quadrotors to traverse narrow gaps under limited 2D LiDAR perception.
-
Galilean State Estimation for Inertial Navigation Systems with Unknown Time Delay
A Galilean-equivariant filter jointly estimates INS navigation states and unknown GNSS time delays, preserving accuracy and consistency better than EKF in UAV flights and simulations with delays up to 500 ms.
-
Distance-Constrained Unlabeled Multi-Agent Pathfinding
Distance-r Independent Unlabeled Multi-Agent Pathfinding is PSPACE-complete, with reduction-based and configuration-generator algorithms that solve instances with hundreds of agents.
-
Shields to Guarantee Probabilistic Safety in MDPs
New framework for probabilistic safety shields in MDPs showing impossibility of strong classical guarantees and providing weaker but usable alternatives with offline and online constructions.
-
Beyond Self-Play and Scale: A Behavior Benchmark for Generalization in Autonomous Driving
BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
-
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
-
LLM-Foraging: Large Language Models for Decentralized Swarm Robot Foraging
LLM-Foraging uses off-the-shelf LLMs for decentralized tactical decisions in CPFA-based swarm foraging, collecting more resources than GA-tuned baselines across 36 varied configurations while showing greater consistency.
-
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue
ESARBench is the first unified benchmark for MLLM-driven UAV agents that must explore, locate clues, and decide on victim positions in photorealistic simulated SAR environments.
-
Towards Multi-Object Nonprehensile Transportation via Shared Teleoperation: A Framework Based on Virtual Object Model Predictive Control
The virtual object MPC framework enables stable shared teleoperation for transporting up to nine objects, cutting sliding distance by 72.45% and eliminating tip-overs compared to baseline.
-
Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation
ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.
-
A global dataset of continuous urban dashcam driving
CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.
-
AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning
AID trains diffusion policies via behavior cloning on existing MAIPP planners followed by RL fine-tuning to achieve faster execution and higher information gain in multi-agent coordination.
-
BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations
BEVCALIB performs LiDAR-camera calibration from raw data by fusing camera and LiDAR bird's-eye view features with a novel feature selector and reports state-of-the-art accuracy on KITTI and NuScenes.
-
Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs
DORI benchmark shows top vision-language models reach only 54.2% accuracy on coarse orientation tasks and 33% on granular judgments, with sharp drops on reference-frame shifts and compound rotations.
-
Closed-Loop Vision-Language Planning for Multi-Agent Coordination
COMPASS uses VLMs to generate and refine code-based strategies with structured communication, achieving 57% win rate on SMACv2 Protoss 5v5 versus 27% for QMIX.
-
Scalable Multi-robot Motion Planning via Hierarchical Subproblem Expansion and Workspace Decomposition Refinement
A hierarchical multi-robot motion planner that refines workspace decompositions to enable scalable coordination through discrete search over smaller decoupled subproblems.
-
LACE: Latent Visual Representation for Cross-Embodiment Learning
LACE aligns human-robot visual features via semantic distribution matching on corresponding body parts plus Gram loss, yielding 65% better zero-shot policy transfer than baseline DINO.
-
Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
-
VRA: Grounding Discrete-Time Joint Acceleration in Voltage-Constrained Actuation
VRA grounds discrete-time joint acceleration commands in voltage-constrained actuator physics to eliminate unrealizable accelerations and reduce oscillations in electric motor systems.
-
VISOR: A Vision-Language Model-based Test Oracle for Testing Robots
VISOR is a VLM-based automated test oracle that evaluates robot task correctness and quality from videos while reporting its own uncertainty, tested on GPT and Gemini across four tasks and over 1000 videos with Gemini showing higher recall and GPT higher precision but low uncertainty-correctness tie
-
MAG-VLAQ: Multi-modal Aerial-Ground Query Aggregation for Cross-View Place Recognition
MAG-VLAQ fuses multi-modal ground and aerial data via ODE-conditioned vector-of-locally-aggregated-queries to nearly double recall@1 on aerial-ground place recognition benchmarks.
-
Ray-Aware Pointer Memory with Adaptive Updates for Streaming 3D Reconstruction
The paper proposes ray-aware pointer memory with adaptive retain-or-replace updates to improve long-term stability and pose accuracy in streaming 3D reconstruction.
-
Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems
Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.
-
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models
SafetyALFRED shows multimodal LLMs recognize kitchen hazards accurately in QA tests but achieve low success rates when required to mitigate those hazards through embodied planning.
-
WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.
-
A Coordinate-Invariant Local Representation of Motion and Force Trajectories for Identification and Generalization Across Coordinate Systems
Introduces the Dual-Upper-Triangular Invariant Representation (DUTIR) as a coordinate-invariant local representation for motion and force trajectories with improved robustness to singularities and a supporting computational algorithm.
-
SynFlow: Scaling Up LiDAR Scene Flow Estimation with Synthetic Data
SynFlow creates a 34-times larger synthetic LiDAR scene flow dataset that lets models trained only on simulation match or beat supervised real-data baselines on multiple benchmarks.
-
Model Space Reasoning as Search in Feedback Space for Planning Domain Generation
An agentic LLM framework augmented with symbolic feedback and heuristic search over model space generates improved planning domains from natural language descriptions.
-
Sumo: Dynamic and Generalizable Whole-Body Loco-Manipulation
Test-time steering of pre-trained whole-body policies via sample-based planning lets legged robots generalize dynamic loco-manipulation to varied heavy objects and tasks without additional training or tuning.
-
frax: Fast Robot Kinematics and Dynamics in JAX
frax is a new open-source JAX library delivering low-microsecond CPU dynamics and over 100 million GPU evaluations per second for robot kinematics and dynamics with autodiff support.
-
SutureFormer: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space
SutureFormer models needle tip movement in video as sequential pixel-space actions via goal-conditioned offline RL with spline-based reward densification, cutting average displacement error by 58.6% on a new 1,158-trajectory kidney suturing dataset.
-
Acoustic Feedback for Closed-Loop Force Control in Robotic Grinding
AFRG estimates grinding force from contact microphone audio for closed-loop robotic control, delivering 4-fold better consistency across disc conditions at roughly 200 times lower cost than force sensors.
-
SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows
SERNF achieves sample-efficient real-world fine-tuning of multimodal dexterous policies by pairing exact-likelihood normalizing flow policies with action-chunked value critics.
-
Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot
Genie Sim 3.0 introduces an LLM-powered scene generator, the first LLM-based automated evaluation benchmark, and a large open synthetic dataset that demonstrates zero-shot sim-to-real transfer for robotic manipulation policies.
-
$\pi^{*}_{0.6}$: a VLA That Learns From Experience
RECAP enables a generalist VLA to self-improve via advantage-conditioned RL on mixed real-world data, more than doubling throughput and halving failure rates on hard manipulation tasks.
-
ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation
ViTacFormer learns a cross-modal visuo-tactile latent space with autoregressive tactile prediction and an easy-to-hard curriculum, then uses the representation for imitation learning that yields ~50% higher success and the first reported 11-stage, 2.5-minute autonomous dexterous tasks.
-
Bimodal Synchronization Performance: Why Noise and Sparse Connectivity Can Improve Collective Timing
In a discrete pulse-coupled oscillator model, synchronization is bimodal near a critical quorum-pulse balance, with noise and sparse connectivity suppressing multi-cluster states to favor global timing.
-
ORICF -- Open Robotics Inference and Control Framework
ORICF is a declarative, model-agnostic robotics framework with YAML specs and edge offloading that reduces robot compute utilization by up to 83% and energy by 66% in a ROS2 demo combining ASR, LLM, and CNN.
-
What Will Happen Next: Large Models-Driven Deduction for Emergency Instances
WLDS applies large models with factual and logical calibration to produce diverse text-and-image deductions of emergency scenarios beyond what traditional fixed simulations can generate.
-
E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation
E²DT couples a Decision Transformer with a k-Determinantal Point Process that scores trajectories on return-to-go quantiles, predictive uncertainty, and stage coverage to improve sample efficiency and policy quality in robotic manipulation.
-
Reliability-Guided Depth Fusion for Glare-Resilient Navigation Costmaps
Reliability modeling of depth measurements enables glare-resilient occupancy grid costmaps for mobile robots.
-
Robotic Nanoparticle Synthesis via Solution-based Processes
Screw-based motion planning extracted from single demonstrations enables robots to autonomously execute long-horizon nanoparticle synthesis protocols.
-
Event-Triggered Adaptive Consensus for Multi-Robot Task Allocation
An event-triggered consensus framework for heterogeneous robot swarms reduces communication overhead while preserving high task completion rates and resilience to failures in simulations.
-
SLOPE: Optimistic Potential Landscape Shaping for Model-based Reinforcement Learning
SLOPE improves MBRL in sparse reward settings by using optimistic distributional regression to build informative potential landscapes that provide better exploration gradients, outperforming baselines across 30+ tasks and real robotic deployments.
-
Learning to Control PDEs with Differentiable Predictive Control and Time-Integrated Neural Operators
A framework using time-integrated DeepONets inside differentiable predictive control learns fast neural policies that track targets and satisfy constraints on PDEs like heat, Burgers', and reaction-diffusion equations.
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
-
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
BYOL-γ uses self-predictive representations to approximate successor representations, improving zero-shot combinatorial generalization in goal-conditioned behavioral cloning.
-
Geometry-aided Vision-based Localization of Future Mars Helicopters in Challenging Illumination Conditions
Geo-LoFTR is a geometry-aided deep learning model for map-based localization that outperforms prior methods under large illumination and scale variations on simulated and real Mars imagery.
-
Learning-Accelerated Optimization-based Trajectory Planning for Cooperative Aerial-Ground Handover Missions
LSTM-based neural predictions accelerate centralized optimization for aerial-ground handover trajectories, reporting over 3x speedup and 100% success rate versus cold starts.