multi-gpu GRPO training with sequence parallelism terminated unexpectedly during execution, with the following warning: destroy_process_group() was not called before program exit, which may lead to resource leaks.

**Describe the bug**

Hi, I tried to GRPO training with **sequence parallelism** using 4 gpus. During training, the process terminates unexpectedly with the following warning:

`WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
`

This issue only occurs when using **sequence parallelism**. When running multi-GPU training without sequence parallelism, the training completes without any such problem.

Additionally, a warning message that did not appear before has started showing up only when sequence parallelism is enabled:

`Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
`

This deprecation warning also does not appear when sequence parallelism is disabled.

<img width="767" alt="Image" src="https://github.com/user-attachments/assets/eb41e937-ecaf-4e92-b0de-78c888f31fdd" />

**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

4 A100 GPU
ms-swft == 3.6.0
transformer == 4.51.3
vllms == 0.8.5.post1
trl == 0.18.1
torch == 2.6.9

<br class="Apple-interchange-newline">


<br class="Apple-interchange-newline">


**Additional context**
Add any other context about the problem here(在这里补充其他信息)

command I used :

NPROC_PER_NODE=4 \
swift rlhf \
    --rlhf_type grpo \
    --model Qwen/Qwen2.5-7B \
    --external_plugins MY_PATH \ 
    --reward_funcs MY_REWARD \
    --train_type full \
    --loss_type bnpo \
    --torch_dtype bfloat16 \
    --dataset MY_PATH \
    --max_length 512 \
    --max_completion_length 512 \
    --num_train_epochs 1 \
    --seed 42 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 4 \
    --learning_rate 1e-6 \
    --temperature 0.9 \
    --warmup_ratio 0.05 \
    --max_grad_norm 0.2 \
    --temperature 0.9 \
    --save_strategy="steps" \
    --save_steps 250 \
    --save_total_limit 20 \
    --logging_steps 1 \
    --dataloader_num_workers 4 \
    --num_generations 8 \
    --system 'You are a helpful assistant.' \
    --deepspeed zero3_offload \
    --log_completions true \
    --report_to wandb \
    --num_iterations 1 \
    --use_hf 1 \
    --split_dataset_ratio 0 \
    --use_vllm true \
    --vllm_mode colocate \
    --vllm_gpu_memory_utilization 0.5 \
    --vllm_max_model_len 512 \
    --vllm_tensor_parallel_size 4 \
    --attn_impl flash_attn \
    --offload_optimizer true \
    --offload_model true \
    --sequence_parallel_size 4 \
    --gc_collect_after_offload true \
    --dataloader_drop_last true \
    --sleep_level 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

multi-gpu GRPO training with sequence parallelism terminated unexpectedly during execution, with the following warning: destroy_process_group() was not called before program exit, which may lead to resource leaks. #4643

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

multi-gpu GRPO training with sequence parallelism terminated unexpectedly during execution, with the following warning: destroy_process_group() was not called before program exit, which may lead to resource leaks. #4643

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions