You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Pipeline Parallel for PPO training and support generation with InferenceModel (#7953)
* Add Pipeline Parallel for PPO training.
* Move new_ppo_trainer.py to ppo_trainer.py
* Fix padding among batches of accumulation steps in _prepare_pipeline_inputs_func.
* Fix hcg using in TP generation
* Try to support generation in PP. And allow extra training args passed from main from_pratrined.
* Support PP generation.
* Fix PP eval by unify prediction_step
* Fix reward value showing error cased by BF16 dtype when eval
* fix all
* Make non-PipelineParallel models use the same loss layer with PipeModel to unify.
* add offload.
* Use create_loss to unify Pipe and non-Pipe usage.
* Add eval mode and offload level.
* merge
* support tp+pp
* fix data split.
* Fix position_ids in generation/eval/train.
* fix data group.
* add tp rank guard
* Support rollout label data both with target length or source+target length.
* Move metric calculation to rl_step to avoid comm.
* fix pad
* fix create group.
* no print
* Suppport inference model generation.
* fix compatible for no eval model.
* fix pp sync.
* remove debug info
* Refacor PPO training using StepTrainer.
* Open PolicyTrainer loss logging postprocess. More StepTrainer docs.
* more timer.
* fix bugs.
* Add EMA and PPOMetric
* add tests
* add unit test for rank guard.
* Fix reshard zero3 and reshard infer.
* Revert #7818 for llama and remove position_ids for gen/train/eval to align.
* Move reload/clean/data_group to comm_utils and use guard to decorate them.
* Offload sync and other data reuse fix.
* Clead code
* Update README
* Update ppo_trainer
* format code
* Fix make_position_ids by 4d causal mask.
* Fix nested_broadcast_tensor_with_empty import
* Update eval with make_attention_mask
---------
Co-authored-by: Zhong Hui <zhonghui.net@gmail.com>
Co-authored-by: gongenlei <gongenlei@baidu.com>
0 commit comments