Bridging the Sim-to-Real Gap for Athletic Loco-Manipulation

Abstract

Achieving athletic loco‑manipulation on robots requires moving beyond traditional tracking rewards — which simply guide the robot along a reference trajectory — to task rewards that drive truly dynamic, goal-oriented behaviors. Commands such as “throw the ball as far as you can” or “lift the weight as quickly as possible” compel the robot to exhibit the agility and power inherent in athletic performance.

However, training solely with task rewards introduces two major challenges: 1) these rewards are prone to exploitation (reward hacking), and 2) the exploration process can lack sufficient direction.

To address these issues, we propose a two‑stage training pipeline.

We introduce the Unsupervised Actuator Net (UAN), which leverages real‑world data to bridge the sim-to-real gap for complex actuation mechanisms without requiring access to torque sensing. UAN mitigates reward hacking by ensuring that the learned behaviors remain robust and transferable.
We use a pre‑training and fine‑tuning strategy that leverages reference trajectories as initial hints to guide exploration.

With these innovations, our robot athlete learns to lift, throw, and drag with remarkable fidelity from simulation to reality.

Unsupervised Actuator Net (UAN)

We leverage real‑world data to train UAN, which not only corrects actuator errors in simulation but also mitigates reward hacking by ensuring that task rewards produce realistic and robust behaviors. The following sections illustrate key comparisons and failure cases.

Actuator Friction: Harmonic Drive vs. QDD Motor

UAN learns to correct for non‑linear actuator behavior, compensating for friction and other unmodeled dynamics.

Data Collection for UAN Training

Real‑world data from diverse motions trains UAN to output corrective torques, ensuring that task rewards drive authentic athletic behaviors.

Experiments

UAN calibration reduces the gap between simulated and real throwing performance.

End-to-end athletic task policies outperform policies trained only with tracking rewards. However, we find that a pretraining stage based on tracking rewards benefits the final performance by assisting exploration.

This video shows simulated and real robot behaviors side by side, highlighting our good sim‑to‑real transfer.

Architecture: UAN training, WBC pre‑training, fine‑tuning, then deployment.

Task Demonstrations

In ball throwing, full‑body coordination yields high release speeds.

The sled pulling task tests sustained force generation and stability.

In dumbbell snatch, task fine-tuning enables the robot to lift and stabilize heavy weights.

Attempting to muscle through lifting the same weight triggers the built-in power limit.

This video shows a failure case where the robot’s arm broke during a throw due to unmodeled link strength.

Baselines

Default: The unmodified baseline simulator. Although the open-loop throw trajectory (recorded in sim) appears reasonable on hardware, small deviations—when accumulated during closed-loop execution—result in a failed sim-to-real transfer.

CEM: Optimizes physical parameters—such as friction, frictional damping, and armature characteristics—using the cross-entropy method. While the policy achieves a successful ball throw, the resulting motion is noticeably jittery.

DR: The simulator augmented with domain randomization (randomizing PD gains, friction, and armature parameters). The closed-loop behavior on hardware deviates substantially from the simulation.

ROA: Combines domain randomization with an online system identification module via Regularized Online Adaptation. Although ROA improves performance relative to DR, it remains limited because the domain randomization does not fully capture the complex dynamics of the actuators.

Actuator Net: Utilizes a supervised learning approach where an actuator network predicts corrective torques based on motor current measurements. The sim-to-real gap persists due to nonlinear effects introduced by the harmonic reducers.

UAN: Our proposed unsupervised actuator network learns to compute corrective torques without requiring explicit torque labels. This approach effectively captures both actuator lag and the nonlinearities arising from harmonic reduction. Successful sim-to-real transfer!

Paper

Bridging the Sim-to-Real Gap for Athletic Loco-Manipulation
Nolan Fey, Gabriel B. Margolis, Martin Peticco, Pulkit Agrawal
arXiv Preprint, 2025
PDF / arXiv / project page / bibtex