Play2Perfect
What Matters in Dexterous Play Pretraining for Precise Assembly?

Tyler Ga Wei Lum^*, Kushal Kedia^*, C. Karen Liu^†, Jeannette Bohg^†

^* Equal contribution ^† Equal advising

arXiv Video Thread Code

All videos are 1× speed (60 Hz control).

Goal: Contact-Rich, Precise Assembly

Turn Sound On 🔊

Turn sound on

Key Idea: Learn to Play before Perfecting

Play2Perfect is a 2-stage RL pipeline that first plays with diverse objects in free space to acquire reusable manipulation priors and then finetunes the policy on contact-rich, precise-assembly tasks for zero-shot sim-to-real transfer.

Play2Perfect Enables Rapid Learning of Assembly

Starting from a play-pretrained prior, Play2Perfect learns precise assembly with only sparse rewards—far faster than training from scratch, which stalls near zero even with the hand-crafted dense reward shaping we designed for it.

Success rate vs. training time on four assembly tasks. Play2Perfect (sparse reward) rapidly reaches high success, while training from scratch with dense or sparse rewards stays near zero.

What Matters in Dexterous Play Pretraining?

We systematically study four key design choices in play pretraining—object diversity, training objective, trajectory diversity, and goal precision—and measure how each affects downstream RL finetuning across four precise-assembly tasks. Consistently, we find that pretraining transfers best when it forces the robot to manipulate objects in-hand with its fingers, rather than simply moving them with a fixed grasp.

Four pretraining ablations, each plotting downstream success rate vs. training time. Object diversity: 1000 objects (ours) beats 100 and 10. Training objective: 6D pose (ours) beats rotation-only and translation-only. Trajectory diversity: random trajectories (ours) beat 100 and 10. Goal precision: 1 cm (ours) beats 5 cm and 10 cm.

Watch Play2Perfect in Slow Speed

Our policy runs so fast that it is easy to miss its reactivity. Here, we slow down each rollout to highlight the micro-recoveries and corrections the policy makes to complete each assembly task.

Tight Insertion

Multi-Part Assembly

Screwing

Recovery Behavior (Sound On 🔊)

Even after an initial failure, the policy continues acting closed-loop: continuously retrying until it completes the task.

More Rollouts