You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -75,7 +75,7 @@ You can contact us and communicate with us by adding our group:
75
75
76
76
## 🎉 News
77
77
- 🎁 2025.06.18: Support for accelerating the ms-swift [inference](https://github.com/modelscope/ms-swift/blob/main/examples/infer/sglang), deployment, evaluation, and UI modules using the [sglang](https://github.com/sgl-project/sglang) inference acceleration engine. Simply set `--infer_backend sglang` to enable it.
78
-
- 🎁 2025.06.15: Support for GKD training on both pure text large models and multimodal models. Training scripts can be found here: [Pure Text](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd.sh), [Multimodal](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd.sh).
78
+
- 🎁 2025.06.15: Support for GKD training on both pure text large models and multimodal models. Training scripts can be found here: [Pure Text](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd), [Multimodal](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd).
79
79
- 🎁 2025.06.11: Support for using Megatron parallelism techniques for RLHF training. The training script can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/rlhf).
80
80
- 🎁 2025.05.29: Support sequence parallel in pt, sft, dpo and grpo, check script [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/long_text).
81
81
- 🎁 2025.05.11: GRPO now supports custom processing logic for reward models. See the GenRM example [here](./docs/source_en/Instruction/GRPO.md#customized-reward-models).
| GKD Training | ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd.sh)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd.sh)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd.sh)|
291
+
| GKD Training | ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd)|
292
292
| KTO Training | ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/kto.sh)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/kto.sh)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/kto.sh)|
Copy file name to clipboardExpand all lines: docs/source_en/Instruction/Command-line-parameters.md
+2-1Lines changed: 2 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -417,7 +417,8 @@ RLHF arguments inherit from the [training arguments](#training-arguments).
417
417
- undesirable_weight: Loss weight $\lambda_U$ for undesirable response in the KTO algorithm, default is `1.`.
418
418
- loss_scale: Override template arguments, default is 'last_round'.
419
419
- temperature: Default is 0.9; this parameter will be used in PPO, GRPO and GKD.
420
-
- lmbda: Default is 0.5. This parameter is used in GKD. It is the lambda parameter that controls the student data fraction (i.e., the proportion of on-policy student-generated outputs).
420
+
- lmbda: Default is 0.5. This parameter is used in GKD. It controls the lambda parameter for the proportion of student data (i.e., the proportion of student-generated outputs within the strategy). If lmbda is 0, student-generated data is not used.
421
+
- sft_alpha: The default value is 0. It controls the weight of sft_loss added in GKD. The final loss is `gkd_loss + sft_alpha * sft_loss`.
421
422
- seq_kd: Default is False. This parameter is used in GKD. It is the `seq_kd` parameter that controls whether to perform Sequence-Level KD (can be viewed as supervised fine-tuning on teacher-generated output).
422
423
- Note: You can perform inference on the dataset using the teacher model in advance (accelerated by inference engines such as vLLM, SGLang, or lmdeploy), and set `seq_kd` to False during training. Alternatively, you can set `seq_kd` to True, which will use the teacher model to generate sequences during training (ensuring different generated data across multiple epochs, but at a slower efficiency).
| GKD Training | ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd.sh)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd.sh)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd.sh)|
12
+
| GKD Training | ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd)|
13
13
| KTO Training | ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/kto.sh)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/kto.sh)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/kto.sh)|
0 commit comments