Transformers documentation

Accelerator selection

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v5.12.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Accelerator selection

You can control which accelerators (CUDA, XPU, MPS, HPU, etc.) PyTorch sees and in what order during distributed training. Prioritize faster devices or limit training to a subset of available hardware. It works with both DistributedDataParallel and DataParallel, and doesn’t require Accelerate or the DeepSpeed integration.

Order of accelerators

Use the hardware-specific environment variable to select accelerators and set their order. Set it on the command line per run, or add it to ~/.bashrc or another startup config file.

Avoid exporting environment variables because if you forget how an environment variable was set up, you may silently train on the wrong accelerators. Set the environment variable on the same command line as the training run.

For example, to select accelerators 0 and 2 out of four:

CUDA
Intel XPU
CUDA_VISIBLE_DEVICES=0,2 torchrun trainer-program.py ...

PyTorch sees only GPUs 0 and 2, which are mapped to cuda:0 and cuda:1. To reverse the order (use GPU 2 as cuda:0 and GPU 0 as cuda:1):

CUDA_VISIBLE_DEVICES=2,0 torchrun trainer-program.py ...

To run without any GPUs:

CUDA_VISIBLE_DEVICES= python trainer-program.py ...

Control the order of CUDA devices with CUDA_DEVICE_ORDER.

  • Order by PCIe bus ID (matches nvidia-smi):

    export CUDA_DEVICE_ORDER=PCI_BUS_ID
  • Order by compute capability (fastest first):

    export CUDA_DEVICE_ORDER=FASTEST_FIRST
Update on GitHub