Transformers documentation
Accelerate
Accelerate
Accelerate provides a unified interface for distributed training backends like FSDP or DeepSpeed. It detects your environment (number of GPUs, distributed backend, mixed precision, etc.) and automatically configures training, whether you’re on 1 GPU with DDP or 8 GPUs with FSDP.
Accelerate wraps the model in the appropriate distributed wrapper, moves it to the correct device, and creates a compatible optimizer. During training, Accelerate uses its own backward method to handle gradient scaling for mixed precision. Trainer calls the appropriate Accelerate APIs and delegates all distributed mechanics to Accelerate.
Configure Accelerate for Trainer with either an Accelerate config file or TrainingArguments.
Accelerate config file
Run the accelerate config command and answer questions about your hardware and training setup. This creates a default_config.yaml file in your cache. The example below is for FSDP.
compute_environment: LOCAL_MACHINE
distributed_type: FSDP
fsdp_config:
fsdp_version: 2
fsdp_reshard_after_forward: true
fsdp_cpu_offload: false
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_cpu_ram_efficient_loading: true
fsdp_activation_checkpointing: false
fsdp_state_dict_type: SHARDED_STATE_DICT
fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
mixed_precision: bf16
num_machines: 1
num_processes: 4Run accelerate launch with a Trainer-based script, and Accelerate reads the config file to set up training. The fsdp_config and deepspeed args are unnecessary because the Accelerate config file covers the same settings.
accelerate launch train.pyThe accelerator_config accepts settings that don’t have dedicated top-level arguments. For example, set non_blocking=True together with dataloader_pin_memory() to overlap data transfer with compute for higher GPU throughput.
from transformers import TrainingArguments
TrainingArguments(
...,
dataloader_pin_memory=True,
accelerator_config={
"non_blocking": True,
},
)TrainingArguments
Pass a backend-specific config to TrainingArguments. The create_accelerator_and_postprocess() method reads the settings and configures training.
Pass a JSON config file or dict to ~TrainingArguments.fsdp_config. See FSDP for a full guide and config reference.
from transformers import TrainingArguments
TrainingArguments(
...,
fsdp=True,
fsdp_config="path/to/fsdp.json",
)Next steps
- See DDP for data-parallel training when your model fits on one GPU.
- See FSDP for sharding parameters, gradients, and optimizer states across GPUs.
- See DeepSpeed for ZeRO optimization and offloading.