Skip to content

Commit 3a704ea

Browse files
authored
[Auto Parallel] Support semi-auto trainer and fit Llama2 training (#7885)
* support semi-auto trainer and fit Llama2 training * support shard_dataloader in dynamic semi-auto * rewrite traning loop * refactor traning loop * refine args of auto trainer * broadcast loss * add auto ci cases
1 parent 44bfeb0 commit 3a704ea

File tree

12 files changed

+1139
-369
lines changed

12 files changed

+1139
-369
lines changed

llm/llama/auto_parallel/run_auto.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,6 @@ python -u -m paddle.distributed.launch \
6868
--do_eval \
6969
--device "gpu" \
7070
--data_impl "mmap" \
71-
--parallel_mode "auto"
71+
--enable_auto_parallel 1
7272

7373
# --resume_from_checkpoint "output/llama_auto_serial/checkpoint-2" \

llm/llama/auto_parallel/run_auto_sp.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ python -u -m paddle.distributed.launch \
6868
--do_eval \
6969
--device "gpu" \
7070
--data_impl "mmap" \
71-
--parallel_mode "auto" \
71+
--enable_auto_parallel 1 \
7272
--sequence_parallel true \
7373

7474
# --resume_from_checkpoint "output/llama_auto_serial/checkpoint-2" \

0 commit comments

Comments
 (0)