Skip to content

DexVLA (Vision-Language Model with Plug-In Diffusion Expert for Visuomotor Policy Learning)

Contributed by Midea Group

Install

To guarantee clean isolation between training and evaluation environments for both DexVLA and TinyVLA, we provide two distinct, self-contained setups.The training and testing environment can be used for both DexVLA and TinyVLA.

Training Environment:

cd policy/DexVLA
conda env create -f Train_Tiny_DexVLA_train.yml
conda activate dexvla-robo
cd policy_heads
pip install -e .
Evaluation Environment:

If you already have RoboTwin 2.0 installed, activate this conda environment and add the evaluation dependencies:

conda activate your_RoboTwin_env
pip install -r Eval_Tiny_DexVLA_requirements.txt 

Prepare Training Data

This step performs data preprocessing, converting the original RoboTwin 2.0 data into the format required for DexVLA training. The expert_data_num parameter specifies the number of trajectory pairs to be used as training data.

python process_data.py ${task_name} ${task_config} ${expert_data_num}
# python process_data.py beat_block_hammer demo_clean 50
If success, you will find the data in the policy/Dexvla/data/sim_${task_name}/${setting}_${expert_data_num} folder.

Train Policy

This step launches the training process.

Download official Qwen2_VL weights

We construct the VLM backbone by integrating Qwen2-VL-2B.You can download the official weights from this link:

Model Link
Qwen2-VL (~2B) huggingface

❗❗ After downloading the standard weights, you have to modify the official config.json file in the folder. Please update the 'architectures' field from "Qwen2VLForConditionalGenerationForVLA" to "DexVLA", and change the 'model_type' field from "qwen2_vla" to "dex_vla".

Download our pretrained ScaleDP-H weights

We released our pretrained weights of ScaleDP-H which is trained after Stage1. Now you can download the weights and directly finetuning your data on Stage 2.

Model Link
ScaleDP-H (~1B) huggingface
ScaleDP-L (~400M) huggingface
### Train
The training script are "scripts/aloha/vla_stage2_train.sh". And you need to change following parameters:
1. OUTPUT : refers to the save directory for training, which must include the keyword "qwen2" (and optionally "lora"). If LoRA training is used, the name must include "lora" (e.g., "qwen2_lora").
2. TASKNAME : refers to the tasks used for training, which should be corresponded to "your_task_name" in aloha_scripts/constant.py
3. mnop : path to the pretrained VLM weights
4. load_pretrain_dit : True
5. DIT_PRETRAIN :Path to pretrained policy head (ScaleDP).

Other hyperparameters like "batch_size", "save_steps" could be customized according to your computation resources.

Start training by following commands:

bash ./scripts/aloha/vla_stage2_train.sh

Eval Policy

You need to modify the corresponding path in the deploy_policy.yml file: 1. model_path : Path to the trained model, in the OUTPUT path. 2. state_path : Path to dataset_stats.pkl, in the OUTPUT path.

Then execute:

bash eval.sh ${task_name} ${task_config} ${ckpt_setting} ${expert_data_num} ${seed} ${gpu_id}
# bash eval.sh beat_block_hammer demo_clean demo_clean 0 50 0 0
# This command trains the policy using the `demo_clean` setting ($ckpt_setting)
# and evaluates it using the same `demo_clean` setting ($task_config).
#
# To evaluate a policy trained on the `demo_clean` setting and tested on the `demo_randomized` setting, run:
# bash eval.sh beat_block_hammer demo_randomized demo_clean 0 50 0 0

Citation

If you find our works useful for your research and applications, please cite using these BibTeX:

DexVLA

@article{wen2025dexvla,
  title={DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control},
  author={Wen, Junjie and Zhu, Yichen and Li, Jinming and Tang, Zhibin and Shen, Chaomin and Feng, Feifei},
  journal={arXiv preprint arXiv:2502.05855},
  year={2025}
}

DiffusionVLA

@article{wen2024diffusion,
  title={Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression},
  author={Wen, Junjie and Zhu, Minjie and Zhu, Yichen and Tang, Zhibin and Li, Jinming and Zhou, Zhongyi and Li, Chengmeng and Liu, Xiaoyu and Peng, Yaxin and Shen, Chaomin and others},
  journal={arXiv preprint arXiv:2412.03293},
  year={2024}
}

ScaleDP

@article{zhu2024scaling,
  title={Scaling diffusion policy in transformer to 1 billion parameters for robotic manipulation},
  author={Zhu, Minjie and Zhu, Yichen and Li, Jinming and Wen, Junjie and Xu, Zhiyuan and Liu, Ning and Cheng, Ran and Shen, Chaomin and Peng, Yaxin and Feng, Feifei and others},
  journal={arXiv preprint arXiv:2409.14411},
  year={2024}
}