Skip to content

Commit 871278c

Browse files
authored
[docs] update gkd (#4657)
1 parent cdf2334 commit 871278c

File tree

6 files changed

+7
-5
lines changed

6 files changed

+7
-5
lines changed

docs/source/Customization/自定义数据集.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,13 +80,13 @@ query-response格式:
8080
- 注意:GRPO会透传所有额外的字段内容给ORM,而不像其他训练方法,默认将额外的字段删除。例如: 你可以额外传入'solution'。自定义的ORM需要包含一个位置参数completions,其他为关键词参数,由数据集额外字段透传。
8181

8282
#### GKD
83-
若未开启`seq_kd`,即该参数为False。数据集格式如下:
83+
若未开启`seq_kd`,即该参数为False。数据集格式如下(你可使用teacher模型提前蒸馏)
8484
```jsonl
8585
{"messages": [{"role": "system", "content": "你是个有用无害的助手"}, {"role": "user", "content": "告诉我明天的天气"}, {"role": "assistant", "content": "明天天气晴朗"}]}
8686
{"messages": [{"role": "system", "content": "你是个有用无害的数学计算器"}, {"role": "user", "content": "1+1等于几"}, {"role": "assistant", "content": "等于2"}, {"role": "user", "content": "再加1呢"}, {"role": "assistant", "content": "等于3"}]}
8787
```
8888

89-
若开启`seq_kd`,则不需要最后一轮的'assistant'部分:
89+
若开启`seq_kd`,则不需要最后一轮的'assistant'部分(teacher模型在训练时生成数据)
9090
```jsonl
9191
{"messages": [{"role": "system", "content": "你是个有用无害的助手"}, {"role": "user", "content": "告诉我明天的天气"}]}
9292
{"messages": [{"role": "system", "content": "你是个有用无害的数学计算器"}, {"role": "user", "content": "1+1等于几"}, {"role": "assistant", "content": "等于2"}, {"role": "user", "content": "再加1呢"}]}

docs/source/Instruction/命令行参数.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -409,6 +409,7 @@ RLHF参数继承于[训练参数](#训练参数)。
409409
- temperature: 默认为0.9,该参数将在PPO、GRPO、GKD中使用。
410410
- lmbda: 默认为0.5。该参数在GKD中使用。控制学生数据比例的 lambda 参数(即策略内学生生成输出所占的比例)。
411411
- seq_kd: 默认为False。该参数在GKD中使用。控制是否执行序列级知识蒸馏(Sequence-Level KD)的 seq_kd 参数(可视为对教师模型生成输出的监督式微调)。
412+
- 注意:你可以提前对数据集内容使用teacher模型进行推理(使用vllm/sglang/lmdeploy等推理引擎加速),并在训练时将`seq_kd`设置为False。或者将`seq_kd`设置为True,在训练时使用teacher模型生成序列(能保证多个epoch生成数据的不同,但效率较慢)。
412413

413414
#### Reward/Teacher模型参数
414415
reward模型参数将在PPO、GRPO中使用。

docs/source_en/Customization/Custom-dataset.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,14 +82,14 @@ The following outlines the standard dataset format for ms-swift, where the "syst
8282

8383
#### GKD
8484

85-
If `seq_kd` is not enabled (i.e., the parameter is set to `False`), the dataset format should be as follows:
85+
If `seq_kd` is not enabled, i.e., the parameter is set to False, the dataset format is as follows (you can use a teacher model to pre-distill the data):
8686

8787
```jsonl
8888
{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "Tell me tomorrow's weather"}, {"role": "assistant", "content": "Tomorrow's weather will be sunny"}]}
8989
{"messages": [{"role": "system", "content": "You are a useful and harmless math calculator"}, {"role": "user", "content": "What is 1 + 1?"}, {"role": "assistant", "content": "It equals 2"}, {"role": "user", "content": "What about adding 1?"}, {"role": "assistant", "content": "It equals 3"}]}
9090
```
9191

92-
If `seq_kd` is enabled, the final `assistant` turn is not required in the dataset. The format should be:
92+
If `seq_kd` is enabled, the final round of the 'assistant' part is not required (the teacher model generates data during training):
9393

9494
```jsonl
9595
{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "Tell me tomorrow's weather"}]}

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -419,6 +419,7 @@ RLHF arguments inherit from the [training arguments](#training-arguments).
419419
- temperature: Default is 0.9; this parameter will be used in PPO, GRPO and GKD.
420420
- lmbda: Default is 0.5. This parameter is used in GKD. It is the lambda parameter that controls the student data fraction (i.e., the proportion of on-policy student-generated outputs).
421421
- seq_kd: Default is False. This parameter is used in GKD. It is the `seq_kd` parameter that controls whether to perform Sequence-Level KD (can be viewed as supervised fine-tuning on teacher-generated output).
422+
- Note: You can perform inference on the dataset using the teacher model in advance (accelerated by inference engines such as vLLM, SGLang, or lmdeploy), and set `seq_kd` to False during training. Alternatively, you can set `seq_kd` to True, which will use the teacher model to generate sequences during training (ensuring different generated data across multiple epochs, but at a slower efficiency).
422423

423424
#### Reward/Teacher Model Parameters
424425

swift/llm/dataset/utils.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -301,7 +301,6 @@ def __init__(
301301
'template': template,
302302
'packing_interval': packing_interval,
303303
'strict': strict,
304-
'version': 'v1',
305304
})
306305
self.dataset_name = f'packing-cache-{fingerprint}'
307306
with safe_ddp_context(None, True):

swift/llm/template/base.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@ def __init__(
8585
from .template_meta import TemplateMeta
8686
from swift.plugin import agent_templates, loss_scale_map
8787
self._processor_inited = False
88+
self._version = 'v1' # Avoid compatibility issues caused by load_from_cache_file caching.
8889
self.max_length = max_length
8990

9091
if not use_chat_template:

0 commit comments

Comments
 (0)