Skip to content

大佬请教以下,我使用windows进行训练,使用脚本或者命令都会报错USE_LIBUV=0 to disable it.,不知道为什么。 #2545

Open
@MrWangChong

Description

@MrWangChong

训练命令:
torchrun --nnodes 1 --nproc_per_node 1 ../../../funasr/bin/train_ds.py ++model="iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" ++train_data_set_list="../../../data/list/train.jsonl" ++valid_data_set_list="../../../data/list/val.jsonl" ++dataset="AudioDataset" ++dataset_conf.index_ds="IndexDSJsonl" ++dataset_conf.data_split_num=1 ++dataset_conf.batch_sampler="BatchSampler" ++dataset_conf.batch_size=6000 ++dataset_conf.sort_size=1024 ++dataset_conf.batch_type="token" ++dataset_conf.num_workers=4 ++train_conf.max_epoch=50 ++train_conf.log_interval=1 ++train_conf.resume=true ++train_conf.validate_interval=2000 ++train_conf.save_checkpoint_interval=2000 ++train_conf.keep_nbest_models=20 ++train_conf.avg_nbest_model=10 ++train_conf.use_deepspeed=false ++optim_conf.lr=0.0002 ++output_dir="./outputs"

执行结果如下:
W0613 17:19:06.219000 1168 Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "E:\FunASR.venv\Scripts\torchrun.exe_main
.py", line 7, in
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\multiprocessing\errors_init
.py", line 357, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\run.py", line 892, in main
run(args)
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\run.py", line 883, in run
elastic_launch(
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\launcher\api.py", line 139, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\launcher\api.py", line 261, in launch_agent
result = agent.run()
^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\metrics\api.py", line 138, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 712, in run
result = self._invoke_run(role)
^^^^^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 866, in _invoke_run
self._initialize_workers(self._worker_group)
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\metrics\api.py", line 138, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 683, in _initialize_workers
self._rendezvous(worker_group)
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\metrics\api.py", line 138, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 500, in _rendezvous
rdzv_info = spec.rdzv_handler.next_rendezvous()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\rendezvous\static_tcp_rendezvous.py", line 67, in next_rendezvous
self._store = TCPStore( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.distributed.DistStoreError: use_libuv was requested but PyTorch was built without libuv support, run with USE_LIBUV=0 to disable it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions