arXiv Robotics Digest

Curated Papers for April 9, 2026

30 papers ranked by maximum author h-index

Research Landscape

April 9 shows a field in robust diversification across hardware, learning paradigms, and application domains. Foundation model approaches to manipulation (HEX, ViVa, ActiveGlasses, BLaDA) now operate alongside classical learning methods (SIM1's physics-aligned data engine, PriPG-RL's privileged planning), suggesting the community has moved beyond winner-take-all debates toward pragmatic engineering. The emergence of systematic sim-to-real tooling (SIM1 achieving 90% zero-shot at 1:15 scale, Sumo's whole-body loco-manipulation) indicates maturation: practitioners care less about novelty, more about repeatability and scale.

Autonomous driving and aerial systems continue to be primary robotics laboratories. CrashSight and Fail2Drive push benchmarking rigor with challenging failure modes (22.8% average success-rate drop across distribution shift), while RAGE-XY demonstrates real-time tire force estimation on racing platforms using RADAR+IMU fusion. Simultaneously, UAV swarms and maritime systems tackle coordination without centralized control (Karma mechanisms, multi-agent path finding), and VLN surveys (aerial VLN taxonomy) chart emerging frontiers for embodied language understanding. Infrastructure investments (AgiPIX platform with digital twins, acoustic slip sensing via A-SLIP) reflect that 2026 robotics values systems repeatability.

Bio-inspired and hardware-centric contributions anchor the digest: soft robot co-design (EvoGymCM's continuous material stiffness), bird-wing mechanisms achieving single-actuator flapping, and chick-robot affective interfaces reveal an ecosystem still investing in morphological innovation. These papers, combined with perception advances (GEAR's articulated object Gaussian splatting, sensorimotor estimation via SO(3) filtering), sketch a field balancing foundation models with domain specificity—neither pure learning nor pure engineering dominates 2026.

VLA & Foundation Models for Manipulation

Vision-language approaches, policy learning, embodied LLMs

5

Autonomous Driving & Vehicle Intelligence

Benchmarks, planning, trajectory prediction, force control

5

Robot Learning & Sim-to-Real Transfer

Data generation, privileged learning, policy distillation, temporal modeling

6

Sensing, Estimation & Perception

State estimation, tactile sensing, SO(3) filtering, 3D reconstruction

5

Aerial, Multi-Robot & Maritime Systems

UAV autonomy, multi-agent coordination, world models, maritime navigation

6

Hardware Design & Community

Soft robotics, bio-inspired mechanisms, affective interfaces, sustainability

3

VLA & Foundation Models for Manipulation

Shuanghao Bai, Meng Li, Xinyuan Lv, Jiawei Wang, Xinhua Wang
Core Contributions
  • State-centric architecture with Mixture-of-Experts and flow-matching action head enables cross-embodiment transfer without task-specific retraining, advancing VLA generalization beyond single morphologies
  • Humanoid-aligned expert specialization directly optimizes for anthropomorphic kinematic structures, improving performance versus generic expert pooling
  • Demonstrates scalable whole-body manipulation framework applicable to biped platforms with different hardware properties
Show Abstract
State-centric framework for humanoid manipulation using Mixture-of-Experts and flow-matching action head for cross-embodiment whole-body control.
Jindi Lv, Hao Li, Jie Li, Yifei Nie, Fankun Kong
Core Contributions
  • Repurposes pretrained video generators (diffusion models) as value function estimators, reducing dependency on reward signal design and enabling grounding in embodiment dynamics
  • Novel architecture leverages visual imagination to guide RL policy optimization, bridging perceptual uncertainty and value estimation in continuous control
  • Demonstrates video-based value estimation improves sample efficiency versus scalar reward baselines, opening video generators as underutilized assets for embodied learning
Show Abstract
Video generator repurposed for value estimation grounding RL in anticipated embodiment dynamics.
Yanwen Zou, Chenyang Shi, Wenye Yu, Han Xue, Jun Lv
Core Contributions
  • Smart glasses capture ego-centric human manipulation with active gaze tracking, providing richer demonstration signal than passive vision—enables zero-shot robot policy transfer by aligning viewpoint
  • Active vision (head motion + gaze) conveys task-relevant spatial attention, reducing the domain gap between human demos and robot execution
  • First study showing smart glasses as manipulation learning interface, opening consumer AR hardware for robotics data collection at scale
Show Abstract
Smart glasses capture human demos with active vision for zero-shot robot manipulation transfer.
Peiran Xu, Jiaqi Zheng, Yadong Mu
Core Contributions
  • Capability-driven VLM pipeline decomposes embodied planning into atomic sub-capabilities (grasp, move, place) with multi-stage training, improving compositional generalization
  • Explicit factorization of task planning and skill execution separates concern better than end-to-end models, improving interpretability and scalability
  • Demonstrates modular skill composition outperforms monolithic policies on unseen task combinations, supporting hierarchical task learning
Show Abstract
Capability-driven VLM pipeline decomposes embodied planning into sub-capabilities with multi-stage training.
Fan Yang, Wenrui Chen, Guorun Yan, Ruize Liao, Wanjun Jia
Core Contributions
  • Zero-shot language-to-dexterous-grasp translation via 3D Gaussian Splatting grounds language in scene geometry, eliminating per-task fine-tuning
  • Triangular functional point localization enables precise contact prediction from scene representations, advancing from coarse grasp heuristics to spatially grounded policies
  • Demonstrates vision language models can emit dexterous grasps directly from language+geometry fusion, expanding VLM applicability beyond navigation and manipulation planning
Show Abstract
Zero-shot language-to-dexterous-grasp via 3D Gaussian Splatting with triangular functional point localization.

Autonomous Driving & Vehicle Intelligence

Jiawei Liu, Xun Gong, Fen Fang, Muli Yang, Bohao Qu
Core Contributions
  • LLM translates open-ended passenger instructions into executable multi-modal MPC planner scripts, bridging natural language and structured planning—first system enabling end-to-end language-conditioned autonomous driving
  • Multi-planner scheduling intelligently routes instructions to appropriate controllers (trajectory, behavior, safety), enabling richer interaction vocabulary than single-task baselines
  • Demonstrates compositional instruction decomposition improves handling of complex user requests, advancing autonomous vehicle human-AI interaction
Show Abstract
LLM translates passenger instructions into executable MPC planner scripts for autonomous driving.
Rui Gan, Junyi Ma, Pei Li, Xingyou Yang, Kai Chen
Core Contributions
  • 250-crash-video benchmark with 13K QA pairs from roadside camera perspective—first systematic dataset for infrastructure-centric safety assessment versus ego-vehicle centric approaches
  • Phase-aware annotation (pre-crash, crash, post-crash) enables temporal understanding of traffic incidents, improving VLM reasoning about causality and blame
  • Demonstrates significant VLM performance gap on safety-critical infrastructure tasks, establishing benchmark for future vision-language models in autonomous systems
Show Abstract
250-crash-video benchmark with 13K QA pairs for VLM evaluation from roadside camera perspective.
Simon Gerstenecker, Andreas Geiger, Katrin Renz
Core Contributions
  • Paired-route benchmark with 200 routes and 17 distribution shift categories reveals 22.8% average success-rate drop—demonstrates critical gap between development and deployment robustness
  • Closed-loop evaluation (real-time planner in action) versus open-loop metrics better captures actual autonomous vehicle failure modes and recovery strategies
  • Systematic taxonomy of distribution shifts (weather, traffic density, road types) guides future robustness research, establishing new standard for AV benchmarking
Show Abstract
Paired-route benchmark with 200 routes and 17 shift classes showing 22.8% average success-rate drop.
Amirhossein Afsharrad, Amirhesam Abedsoltan, Ahmadreza Moradipari, Sanjay Lall
Core Contributions
  • Graph Knowledge Distillation (GKD) trains 5x smaller student models from GPT-Driver teacher, approaching teacher-level nuScenes performance while enabling edge deployment
  • On-policy distillation preserves teacher's decision-making under deployment conditions, versus offline approaches that may accumulate distribution mismatch
  • Demonstrates LLM-based planning can be efficiently compressed for resource-constrained autonomous vehicles, bridging foundation models and embedded systems
Show Abstract
GKD distills 5x smaller student from GPT-Driver teacher approaching teacher-level nuScenes performance.
Davide Malvezzi, Nicola Musiu, Eugenio Mascaro, Francesco Iacovacci, Marko Bertogna
Core Contributions
  • RADAR+IMU framework estimates tire lateral and longitudinal forces in real-time on autonomous race cars, enabling closed-loop force control without strain gauges
  • Online calibration adapts to track-specific tire properties and wear, improving practical deployment robustness versus offline calibration approaches
  • Demonstrates indirect force estimation via sensor fusion viable for high-speed autonomous platforms, reducing instrumentation complexity for racing and performance driving
Show Abstract
RADAR+IMU framework for real-time tire force estimation on autonomous race car with online calibration.

Robot Learning & Sim-to-Real Transfer

Zi-Qi Yang, Mehrdad R. Kermani
Core Contributions
  • Layered control framework for learning from demonstration robustly handles imperfect human trajectories via variable impedance learning and null-space safety injection
  • Explicit impedance layer adapts compliance to contact forces, improving task success on compliant manipulation versus stiff tracking approaches
  • Null-space safety module prevents self-collisions and joint limits during learning, enabling safe LfD for collaborative robots without manual trajectory filtering
Show Abstract
Layered control framework for compliant LfD with variable impedance learning and null-space safety.
Yunsong Zhou, Hangxu Liu, Xuekun Jiang, Xing Shen, Yuanzhen Zhou
Core Contributions
  • Real-to-sim-to-real pipeline achieves 90% zero-shot manipulation success at 1:15 sim-to-real scale ratio, substantially outperforming prior sim2real approaches in deformable object handling
  • Physics-aligned simulator design prioritizes accurate contact dynamics and material property modeling over photorealism, improving transfer versus appearance-focused engines
  • Demonstrates data scalability without additional real robot experiments, enabling scalable sim-to-real paradigm for deformable manipulation tasks
Show Abstract
Real-to-sim-to-real data engine for deformable manipulation achieving 90% zero-shot success at 1:15 ratio.
Mohsen Amiri, Mohsen Amiri, Ali Beikmohammadi, Sindri Magnusson, Mehdi Hosseinzadeh
Core Contributions
  • Privileged MPC planner (with full state access) distills knowledge to RL policy under partial observability, enabling POMDP learning without complete state reconstruction
  • Teacher-student framework leverages classical planning when available (e.g., simulation) and transitions to learned policies for deployment, hybrid approach balancing robustness and adaptability
  • Deployed on Unitree Go2 quadruped with real-time constraints, demonstrating practical applicability versus simulation-only studies
Show Abstract
Privileged MPC planner distills knowledge to RL policy under partial observability; deployed on Unitree Go2.
Marco Gabriele Fedozzi, Yukie Nagai, Francesco Rea, Alessandra Sciutti
Core Contributions
  • Mirror neuron-inspired DMBN-PTE improves temporal encoding for action prediction, bridging neuroscience and robot learning via temporal attention mechanisms
  • Multimodal fusion of vision and proprioception enables richer state representation for predicting human-robot collaboration actions
  • Demonstrates biologically-grounded architectures improve prediction accuracy versus generic temporal models, suggesting embodied learning benefits from neuroscience insights
Show Abstract
Mirror neuron-inspired DMBN-PTE improves temporal encoding for visuo-motor action prediction.
John Z. Zhang, Maks Sorokin, Jan Brüdigam, Brandon Hung, Stephen Phillips
Core Contributions
  • Sim-to-real whole-body loco-manipulation with test-time steering enables pre-trained policies to adapt to novel heavy-object tasks without retraining, improving generalization
  • Unified framework combines locomotion and manipulation control, addressing challenge of coordinating base motion and arm trajectories simultaneously
  • Demonstrates real robot dynamics adaptation enables successful object transport, validating sim-to-real transfer for complex multi-task behaviors
Show Abstract
Sim-to-real whole-body loco-manipulation with test-time steering of pre-trained policy for heavy objects.
6
h-index: 5 cs.CV, cs.RO
Hang Ye, Xiaoxuan Ma, Fan Lu, Wayne Wu, Kwan-Yee Lin
Core Contributions
  • Two-layer paradigm enables autonomous digital humans to act in first-person perspective within reconstructed 3D environments, advancing embodied AI beyond third-person avatars
  • Vision-grounded control integrates perception and action tightly, enabling natural embodiment in synthetic scenes with realistic spatial constraints
  • Framework applicable to both simulation and photorealistic environments, suggesting scalability toward real-robot humanoid control
Show Abstract
Two-layer paradigm for autonomous digital humans with first-person perception in reconstructed 3D scenes.

Sensing, Estimation & Perception

Jue Chen, Alexander Mielke, Kaspar Althoefer, Elisabetta Versace
Core Contributions
  • Soft robotic interface with warmth, breathing, and face-like stimuli designed for animal-robot interaction—first study of affective soft robotics in inter-species contexts
  • Biologically-inspired morphology (soft materials, thermal properties) proves more engaging to chicks than conventional rigid interfaces, validating bio-inspired design for animal systems
  • Opens robotics applications to behavioral biology, enabling controlled interaction studies that were previously limited to human subjects or ethically constrained conditions
Show Abstract
Soft robotic affective interface for chicks with warmth, breathing, face-like stimuli for animal-robot interaction.
Edgar Granados, Patrick Meng, Charles Tang, Shrimed Sangani, William R. Johnson
Core Contributions
  • Factor graph approach fuses RGB-D camera observations with cable length sensors for tensegrity robot state estimation, improving observability of cable-driven structures
  • Chebyshev polynomial trajectory basis enables efficient parameterization of complex tensegrity dynamics, reducing state space dimensionality versus raw trajectory recording
  • Demonstrates hybrid sensing (vision + proprioception) critical for soft robots where traditional rigid-body assumptions fail, advancing state estimation theory for underactuated systems
Show Abstract
Factor graph approach for tensegrity robot state estimation fusing RGB-D camera with cable length sensors.
Alessandro Melis, Soulaimane Berkane, Tarek Hamel
Core Contributions
  • SO(3) observer design for attitude estimation from scalar measurements achieves almost-global stability, enabling robust attitude control from limited sensor suites
  • Complementary filtering framework integrates inertial measurements with gravity/magnetic field constraints, improving accuracy versus gyro-only integration
  • Theoretical stability analysis grounds the approach in control theory, advancing robustness certification for robot attitude estimation systems
Show Abstract
SO(3) observer for attitude estimation from scalar measurements with almost-global stability.
Uksang Yoo, Yuemin Mao, Jean Oh, Jeffrey Ichnowski
Core Contributions
  • Piezoelectric microphone system achieves 14.1 degree directional slip error with 64% improvement via multi-channel acoustic fusion, enabling low-cost slip detection
  • Acoustic sensing complements force/vision approaches, providing high-frequency slip signals without visual occlusion or complex tactile fabrication
  • Demonstrates passive acoustic sensing viable for robotic grasping, potentially scalable to multi-finger hands without per-finger instrumentation
Show Abstract
Piezoelectric microphone system for slip estimation: 14.1 degree directional error, 64% improvement with multi-channel.
Jialin Li, Bin Fu, Ruiping Wang, Xilin Chen
Core Contributions
  • EM-style alternating refinement jointly models articulated object geometry and motion in Gaussian Splatting framework, enabling accurate 3D reconstruction of dynamic scenes
  • Disentangled representation of static geometry and articulated motion improves over monolithic approaches, facilitating reuse of object models across scenes
  • Enables downstream robot tasks (grasp planning, trajectory prediction) via richer 3D scene understanding, bridging perception and manipulation planning
Show Abstract
EM-style Gaussian Splatting framework for articulated object geometry and motion joint modeling.

Aerial, Multi-Robot & Maritime Systems

Sasanka Kuruppu Arachchige, Juan Jose Garcia, Changda Tian, Lauri Suomela, Panos Trahanias
Core Contributions
  • Open-source platform for indoor aerial autonomy with integrated digital twin and containerized ROS 2 stack enables rapid development and validation without custom infrastructure
  • Digital twin synchronization enables sim-to-real transfer for UAV planning, reducing gap between simulation experiments and field deployment
  • Containerized middleware abstracts hardware details, enabling portability across drone platforms and lowering barrier to drone research adoption
Show Abstract
Open-source platform for indoor aerial autonomy with digital twin and containerized ROS 2 stack.
Kevin Riehl, Julius Schlapbach, Anastasios Kouvelas, Michail A. Makridis
Core Contributions
  • Non-tradeable Karma credits enable decentralized, fair MAPF in warehouse scenarios without centralized coordinator, improving scalability for large swarms
  • Mechanism design prevents credit exploitation while encouraging cooperation, providing game-theoretic fairness guarantees for multi-agent systems
  • Demonstrated on warehouse logistics, showing practical applicability of mechanism design to embodied multi-agent coordination
Show Abstract
Non-tradeable Karma credits for decentralized MAPF fairness in warehouse scenarios.
Assane Sankara, Daniel Bonilla Licea, Hajar El Hammouti
Core Contributions
  • DDQN-based UAV flight policy prioritizes semantically-relevant IoT image data collection, improving information utility per mission time versus uniform coverage approaches
  • Semantic awareness (object detection, scene understanding) guides trajectory planning, enabling task-specific data collection without manual route specification
  • Demonstrates RL-based semantic planning outperforms scripted IoT missions, suggesting embodied agents can be more intelligent data collectors than fixed-path systems
Show Abstract
DDQN-based UAV flight policy for semantic-aware IoT image data collection.
Xingyu Xia, Lekai Zhou, Yujie Tang, Xiaozhou Zhu, Hai Zhu
Core Contributions
  • Survey of aerial VLN with systematic taxonomy of 5 architectural categories (end-to-end, two-stage, modular, hierarchical, LLM-based) guides future research directions
  • Identifies 7 open problems (generalization, efficiency, real-world deployment, multimodal fusion) critical for advancing VLN beyond simulators to field robotics
  • Establishes aerial VLN as emerging frontier, positioning UAVs as next domain for embodied language understanding after ground navigation
Show Abstract
Survey of aerial VLN with taxonomy of 5 architectural categories and 7 open problems.
Joel Jose, Andreas Madsen, Andreas Brandsæter, Tor A. Johansen, Erlend M. Coates
Core Contributions
  • Contrastive explanations for maritime collision avoidance provide human-interpretable justifications for autonomous decisions, improving marine officer trust and compliance
  • User study with 4 marine officers validates effectiveness of explanations versus black-box autonomous systems, establishing human factors importance for maritime automation
  • Demonstrates explainability critical for high-stakes autonomous systems where human supervision remains legally and operationally required
Show Abstract
Contrastive explanations for maritime collision avoidance with user study of 4 marine officers.
Hongjin Chen, Shangyun Jiang, Tonghua Su, Chen Gao, Xinlei Chen
Core Contributions
  • World model teacher generates structured supervision for VLN trajectory prediction, achieving 18% absolute ADE reduction versus direct imitation learning
  • Two-stage learning (world model pre-training + student distillation) improves generalization, suggesting intermediate representations help embodied understanding
  • Demonstrates generative models can serve as privileged teachers for navigation, opening new paradigm for leveraging pre-trained models in embodied tasks
Show Abstract
World model teacher generates structured supervision for student VLN trajectory predictor; 18% ADE reduction.

Hardware Design & Community

Antun Skuric, Leandro Von Werra, Thomas Wolf
Core Contributions
  • Large-scale survey of approximately 50,000 arXiv cs.RO papers reveals sustainability motivation below 5%, identifying critical gap between robotics research and planetary imperatives
  • Quantitative analysis reveals systematic bias: robotics community underweights environmental considerations versus medical, energy, materials fields
  • Calls for integration of sustainability as first-class research objective, reshaping field values and funding priorities toward climate-aware robotics
Show Abstract
Survey of approximately 50,000 arXiv cs.RO papers showing sustainability motivation below 5%.
Daniel Huczala, Sun-Pill Jung, Frank C. Park
Core Contributions
  • Two coupled spatial four-bar linkages realize bird-like sweep-and-fold wing motion with single motor, reducing actuation complexity versus multi-DOF designs
  • Bio-inspired mechanical design enables efficient flapping without explicit control algorithms, leveraging passive mechanics for flight stability
  • Demonstrates mechanical advantage of biomorphic structure, suggesting nature's morphologies embed solutions to control challenges
Show Abstract
Two coupled spatial four-bar linkages realize sweep-and-fold wing motion with single motor.
Le Shen, Kangyao Huang, Wentao Zhao, Huaping Liu
Core Contributions
  • Benchmark for continuous material stiffness optimization in soft robot morphology-material-control co-design enables exploration of material properties as design variables
  • Allows joint optimization of body structure, material properties, and control policies rather than sequential design, improving overall robot performance
  • Demonstrates computational co-design framework applicable beyond soft robotics to modular and reconfigurable systems
Show Abstract
Benchmark for continuous material stiffness optimization in soft robot morphology-material-control co-design.