DexVLA (Vision-Language Model with Plug-In Diffusion Expert for Visuomotor Policy Learning)¶
Contributed by Midea Group
Install¶
To guarantee clean isolation between training and evaluation environments for both DexVLA and TinyVLA, we provide two distinct, self-contained setups.The training and testing environment can be used for both DexVLA and TinyVLA.
Training Environment:
cd policy/DexVLA
conda env create -f Train_Tiny_DexVLA_train.yml
conda activate dexvla-robo
cd policy_heads
pip install -e .
If you already have RoboTwin 2.0 installed, activate this conda environment and add the evaluation dependencies:
conda activate your_RoboTwin_env
pip install -r Eval_Tiny_DexVLA_requirements.txt
Prepare Training Data¶
This step performs data preprocessing, converting the original RoboTwin 2.0 data into the format required for DexVLA training. The expert_data_num parameter specifies the number of trajectory pairs to be used as training data.
python process_data.py ${task_name} ${task_config} ${expert_data_num}
# python process_data.py beat_block_hammer demo_clean 50
policy/Dexvla/data/sim_${task_name}/${setting}_${expert_data_num} folder. Train Policy¶
This step launches the training process.
Download official Qwen2_VL weights¶
We construct the VLM backbone by integrating Qwen2-VL-2B.You can download the official weights from this link:
| Model | Link |
|---|---|
| Qwen2-VL (~2B) | huggingface |
❗❗ After downloading the standard weights, you have to modify the official config.json file in the folder. Please update the 'architectures' field from "Qwen2VLForConditionalGenerationForVLA" to "DexVLA", and change the 'model_type' field from "qwen2_vla" to "dex_vla".
Download our pretrained ScaleDP-H weights¶
We released our pretrained weights of ScaleDP-H which is trained after Stage1. Now you can download the weights and directly finetuning your data on Stage 2.
| Model | Link |
|---|---|
| ScaleDP-H (~1B) | huggingface |
| ScaleDP-L (~400M) | huggingface |
| ### Train | |
| The training script are "scripts/aloha/vla_stage2_train.sh". And you need to change following parameters: | |
| 1. OUTPUT : refers to the save directory for training, which must include the keyword "qwen2" (and optionally "lora"). If LoRA training is used, the name must include "lora" (e.g., "qwen2_lora"). | |
| 2. TASKNAME : refers to the tasks used for training, which should be corresponded to "your_task_name" in aloha_scripts/constant.py | |
| 3. mnop : path to the pretrained VLM weights | |
| 4. load_pretrain_dit : True | |
| 5. DIT_PRETRAIN :Path to pretrained policy head (ScaleDP). |
Other hyperparameters like "batch_size", "save_steps" could be customized according to your computation resources.
Start training by following commands:
bash ./scripts/aloha/vla_stage2_train.sh
Eval Policy¶
You need to modify the corresponding path in the deploy_policy.yml file: 1. model_path : Path to the trained model, in the OUTPUT path. 2. state_path : Path to dataset_stats.pkl, in the OUTPUT path.
Then execute:
bash eval.sh ${task_name} ${task_config} ${ckpt_setting} ${expert_data_num} ${seed} ${gpu_id}
# bash eval.sh beat_block_hammer demo_clean demo_clean 0 50 0 0
# This command trains the policy using the `demo_clean` setting ($ckpt_setting)
# and evaluates it using the same `demo_clean` setting ($task_config).
#
# To evaluate a policy trained on the `demo_clean` setting and tested on the `demo_randomized` setting, run:
# bash eval.sh beat_block_hammer demo_randomized demo_clean 0 50 0 0
Citation¶
If you find our works useful for your research and applications, please cite using these BibTeX:
DexVLA¶
@article{wen2025dexvla,
title={DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control},
author={Wen, Junjie and Zhu, Yichen and Li, Jinming and Tang, Zhibin and Shen, Chaomin and Feng, Feifei},
journal={arXiv preprint arXiv:2502.05855},
year={2025}
}
DiffusionVLA¶
@article{wen2024diffusion,
title={Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression},
author={Wen, Junjie and Zhu, Minjie and Zhu, Yichen and Tang, Zhibin and Li, Jinming and Zhou, Zhongyi and Li, Chengmeng and Liu, Xiaoyu and Peng, Yaxin and Shen, Chaomin and others},
journal={arXiv preprint arXiv:2412.03293},
year={2024}
}
ScaleDP¶
@article{zhu2024scaling,
title={Scaling diffusion policy in transformer to 1 billion parameters for robotic manipulation},
author={Zhu, Minjie and Zhu, Yichen and Li, Jinming and Wen, Junjie and Xu, Zhiyuan and Liu, Ning and Cheng, Ran and Shen, Chaomin and Peng, Yaxin and Feng, Feifei and others},
journal={arXiv preprint arXiv:2409.14411},
year={2024}
}