Skip to content

Commit 49eada3

Browse files
committed
fix
1 parent 7458d95 commit 49eada3

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

llm/docs/inference.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -232,8 +232,9 @@ python ./predict/predictor.py --model_name_or_path ./inference --inference_mode
232232
python ./predict/predictor.py --model_name_or_path ./inference --inference_model --dtype "float16" --mode "static" --cachekv_int8_type dynamic --block_attn
233233
```
234234
**Note**
235-
1. 使用Weight Only Int8 推理需要额外传入 `quant_type`
236-
2. A8W8推理传入的 `model_name_or_path` 为PTQ校准产出的量化模型。
235+
1. `quant_type`可选的数值有`weight_only_int8``weight_only_int4``a8w8`, `a8w8c8`
236+
2. `a8w8`推理传入的 `model_name_or_path` 为PTQ校准产出的量化模型,需要额外的act和weight的scale校准表。
237+
3. `cachekv_int8_type`可选`dynamic``static`两种,`static`需要额外的cache kv的scale校准表。
237238

238239

239240
## 3. 推理参数介绍

0 commit comments

Comments
 (0)