Description
训练命令:
torchrun --nnodes 1 --nproc_per_node 1 ../../../funasr/bin/train_ds.py ++model="iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" ++train_data_set_list="../../../data/list/train.jsonl" ++valid_data_set_list="../../../data/list/val.jsonl" ++dataset="AudioDataset" ++dataset_conf.index_ds="IndexDSJsonl" ++dataset_conf.data_split_num=1 ++dataset_conf.batch_sampler="BatchSampler" ++dataset_conf.batch_size=6000 ++dataset_conf.sort_size=1024 ++dataset_conf.batch_type="token" ++dataset_conf.num_workers=4 ++train_conf.max_epoch=50 ++train_conf.log_interval=1 ++train_conf.resume=true ++train_conf.validate_interval=2000 ++train_conf.save_checkpoint_interval=2000 ++train_conf.keep_nbest_models=20 ++train_conf.avg_nbest_model=10 ++train_conf.use_deepspeed=false ++optim_conf.lr=0.0002 ++output_dir="./outputs"
执行结果如下:
W0613 17:19:06.219000 1168 Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "E:\FunASR.venv\Scripts\torchrun.exe_main.py", line 7, in
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\multiprocessing\errors_init.py", line 357, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\run.py", line 892, in main
run(args)
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\run.py", line 883, in run
elastic_launch(
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\launcher\api.py", line 139, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\launcher\api.py", line 261, in launch_agent
result = agent.run()
^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\metrics\api.py", line 138, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 712, in run
result = self._invoke_run(role)
^^^^^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 866, in _invoke_run
self._initialize_workers(self._worker_group)
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\metrics\api.py", line 138, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 683, in _initialize_workers
self._rendezvous(worker_group)
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\metrics\api.py", line 138, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 500, in _rendezvous
rdzv_info = spec.rdzv_handler.next_rendezvous()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\FunASR.venv\Lib\site-packages\torch\distributed\elastic\rendezvous\static_tcp_rendezvous.py", line 67, in next_rendezvous
self._store = TCPStore( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.distributed.DistStoreError: use_libuv was requested but PyTorch was built without libuv support, run with USE_LIBUV=0 to disable it.