Fast Visual Reinforcement Learning for Sim-to-Real Robotics
Squint is a visual Soft Actor Critic method, that through careful image preprocessing, architectural design choices, and hyperparameter selection, is able to leverage parallel environments and experience reuse effectively, achieving faster wall-clock training time than both prior visual off-policy and on-policy methods, and solving visual tasks in minutes.
Pytorch Implementation for [Squint: Fast Visual Reinforcement Learning for Sim-to-Real Robotics] by
Abdulaziz Almuzairee and Henrik I. Christensen (UC San Diego)
If you use this code in your research, kindly cite:
@article{almuzairee2026squint,
title={Squint: Fast Visual Reinforcement Learning for Sim-to-Real Robotics},
author={Almuzairee, Abdulaziz and Christensen, Henrik I.},
journal={arXiv preprint arXiv:2602.21203},
year={2026}
}- GPU: NVIDIA RTX 3080 or better (At least 10GB GPU RAM)
- Robot: SO-101 robot arm and wrist camera (Our robot and wrist camera are from WowRobo)
conda env create -f environment.yaml
conda activate squintTrain an agent on the LiftCube task:
python train_squint.py --env_id=SO101LiftCube-v1We use wandb (weights and biases) for logging, uploading saved models, and downloading them.
We recommend creating a wandb account, and then enabling --track flag and filling the --wandb_entity flag in train_squint.py. Or you can override with the commandline:
python train_squint.py \
--env_id=SO101LiftCube-v1 \
--track \
--wandb_entity=YOUR_WANDB_USERNAMEAt the end of training, the last checkpoint saved will be uploaded to wandb. You can download the last uploaded checkpoint and continue training on it by setting the --checkpoint=wandb flag.
You can visualize all available environments (8 environments) by running:
python examples/visualize_sim.py| Environment | Description | Time to Training Convergence |
|---|---|---|
SO101ReachCube-v1 |
Reach to a target cube position | 2 minutes |
SO101ReachCan-v1 |
Reach to a target can position | 2 minutes |
SO101LiftCube-v1 |
Pick up and lift a cube | 3 minutes |
SO101LiftCan-v1 |
Pick up and lift a can | 4 minutes |
SO101PlaceCube-v1 |
Pick up a cube and place in the bin | 5 minutes |
SO101PlaceCan-v1 |
Pick up a can and place in the bin | 6 minutes |
SO101StackCube-v1 |
Stack the smaller cube on the larger one | 6 minutes |
SO101StackCan-v1 |
Stack the cube on the can | 9 minutes |
For all our experiments we train with --total_timesteps=1_500_000 which takes approximately 15 minutes. You can reduce the number of total timesteps depending on the task. For example, in Reach tasks you can run with --total_timesteps=200_000 which will take ~2 minutes. Make sure your Squint agent achieves high success rate in simulation before deploying to your real SO-101 robot arm.
All environments have domain randomization implemented to help sim-to-real transfer. There are shared domain randomization parameters between all environments in envs/base_random_env.py and per-task domain randomization
parameters in each environment file in envs/. Feel free to tune these parameters to your real world robot setup.
For expected results, we show the plots of training with Squint agents below:
- SO-101 robot arm is functional
- Wrist camera mounted appropriately
- Calibrated motors using LeRobot calibration
We provide the stl files for all 3D objects used in our tasks in deploy_utils/blender_stls.
If you have access to a 3D printer, you should be able to print them, preferably with the following PLA colors:
bin.stl : white
can.stl : blue
cube.stl: red
large_cube.stl: blue
If you have these objects in different colors, you can alter the colors of these objects in simulation in each of the tasks to match the real world objects.
Edit deploy_utils/robot_config.py with your hardware settings.
Visual reinforcement learning agents are sensitive to slight visual changes. The more we reduce the difference, the better your agent will transfer.
We use a table with a black background. In ManiSkill3 simulation, we segment the objects of interest and replace the background with the image
provided in envs/black_overlay.png. Below, we show a visual of the Simulation Env, the Overlay Image (black_overlay.png), the Simulation Env with the Overlay in the background, and the Real World Input Image:
If your table has a different background or color, take a photo, save it, and then edit the Randomization Config in envs/base_random_env.py to point to your image. Once you have your image in the background, align your real camera view with the simulation:
python deploy_utils/tune_camera.pyAdjust the trackbars such that the gripper and base positions (outlined in the blue square) in both the simulation and the real world are as close as possible. Once they appear to match, press p to print
the wrist camera parameters, and then copy these parameters straight to wrist camera parameters in envs/base_random_env.py
Run your trained agent on the real robot:
python deploy.py \
--checkpoint=path/to/ckpt.pt \
--env_id=SO101LiftCube-v1If you trained with wandb, your last checkpoint should have been uploaded to wandb. You can deploy it by running:
python deploy.py \
--checkpoint=wandb \
--env_id=SO101LiftCube-v1 \
--wandb_entity=YOUR_WANDB_USERNAMEKeyboard Controls During Deployment:
s- Skip current episodeq- Quit evaluation
- For safety, run the first run with
--no-continuous_eval, which will query you for input before each step. If the robot moves reasonably, then you can run without it. - Test with
--env_id=SO101ReachCube-v1or--env_id=SO101ReachCan-v1before manipulation tasks. - If at any time during deployment you need to stop, you can press
qorctrl+c. - For best performances, run the robot in a well lit room with no sunlight.
- For better transfer from sim to real, make sure the robot motor calibration is good, and the visual alignment between sim and real is good.
squint/
โโโ train_squint.py # Main training script
โโโ deploy.py # Real robot deployment
โโโ utils.py # Training utilities
โโโ environment.yaml # Conda environment
โโโ envs/ # Custom ManiSkill environments
โ โโโ base_random_env.py # Base env with domain randomization
โ โโโ black_overlay.png # Background overlay for sim-to-real
โ โโโ reach.py
โ โโโ lift.py
โ โโโ place.py
โ โโโ stack.py
โ โโโ robot/ # Robot URDF and meshes
โโโ results/ # Training results (CSV files per task)
โโโ examples/
โ โโโ visualize_sim.py # Visualize all environments
โโโ deploy_utils/
โโโ robot_config.py # Robot hardware config
โโโ manipulator.py # Real robot interface
โโโ tune_camera.py # Camera alignment tool
This work would not have been possible without the awesome open source community below:
We would also like to thank @jackvial for setting up initial support for SO-101 Robot Arm in ManiSkill3
This project is MIT Licensed. Dependencies are subject to their own licenses.









