Description
python -m torch.distributed.launch --nproc_per_node=6 train.py --checkpoint ./pretrained_checkpoint/sam_vit_l_0b3195.pth --model-type vit_l --output work_dirs/hq_sam_l
/home/azuryl/anaconda3/envs/sam_hq2/lib/python3.10/site-packages/torch/distributed/launch.py:208: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects --local-rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
main()
W0306 18:12:40.932000 150874 site-packages/torch/distributed/run.py:792]
W0306 18:12:40.932000 150874 site-packages/torch/distributed/run.py:792] *****************************************
W0306 18:12:40.932000 150874 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0306 18:12:40.932000 150874 site-packages/torch/distributed/run.py:792] *****************************************
usage: HQ-SAM --output OUTPUT [--model-type MODEL_TYPE] --checkpoint CHECKPOINT [--device DEVICE] [--seed SEED] [--learning_rate LEARNING_RATE] [--start_epoch START_EPOCH]
[--lr_drop_epoch LR_DROP_EPOCH] [--max_epoch_num MAX_EPOCH_NUM] [--input_size INPUT_SIZE] [--batch_size_train BATCH_SIZE_TRAIN] [--batch_size_valid BATCH_SIZE_VALID]
[--model_save_fre MODEL_SAVE_FRE] [--world_size WORLD_SIZE] [--dist_url DIST_URL] [--rank RANK] [--local_rank LOCAL_RANK] [--find_unused_params] [--eval] [--visualize]
[--restore-model RESTORE_MODEL]
HQ-SAM: error: unrecognized arguments: --local-rank=5
should modified to torchrun --nproc_per_node=8 train.py --checkpoint ./pretrained_checkpoint/sam_vit_l_0b3195.pth --model-type vit_l --output work_dirs/hq_sam_l