-
Notifications
You must be signed in to change notification settings - Fork 3k
llm inference docs #8976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llm inference docs #8976
Conversation
Thanks for your contribution! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #8976 +/- ##
===========================================
- Coverage 54.05% 53.98% -0.08%
===========================================
Files 650 650
Lines 103883 104167 +284
===========================================
+ Hits 56157 56235 +78
- Misses 47726 47932 +206 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
llm/docs/predict/inference.md
Outdated
- [昇腾NPU](../../npu/llama/README.md) | ||
- [海光K100](../dcu_install.md) | ||
- [燧原GCU](../../gcu/llama/README.md) | ||
- [X86 CPU](../../../csrc/cpu/README.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
llm/docs/predict/inference.md
Outdated
### 4.1 环境准备 | ||
|
||
git clone 代码到本地: | ||
|
||
```shell | ||
git clone https://github.com/PaddlePaddle/PaddleNLP.git | ||
export PYTHONPATH=/path/to/PaddleNLP:$PYTHONPATH | ||
``` | ||
|
||
PaddleNLP 针对于Transformer 系列编写了高性能自定义算子,提升模型在推理和解码过程中的性能,使用之前需要预先安装自定义算子库: | ||
|
||
|
||
```shell | ||
git clone https://github.com/PaddlePaddle/PaddleNLP | ||
#GPU设备安装自定义算子 | ||
cd ./paddlenlp/csrc && python setup_cuda.py install | ||
#XPU设备安装自定义算子 | ||
cd ./paddlenlp/csrc/xpu/src && sh cmake_build.sh | ||
#DCU设备安装自定义算子 | ||
cd ./paddlenlp/csrc && python setup_hip.py install | ||
``` | ||
|
||
到达运行目录,即可开始: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
环境准备这一节重复了,这里可以删掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
链接到llm/docs/predict/installation.md
README.md
Outdated
@@ -127,6 +127,18 @@ Unified Checkpoint 大模型存储格式在模型参数分布上支持动态扩 | |||
| Yuan2 | ✅ | ✅ | ✅ | 🚧 | 🚧 | 🚧 | 🚧 | ✅ | | |||
------------------------------------------------------------------------------------------ | |||
|
|||
* 大模型推理已支持 LLaMA 系列、Qwen 系列、Mistral 系列、ChatGLM 系列、Bloom 系列和Baichuan 系列,支持Weight Only INT8及INT4推理,支持WAC(权重、激活、Cache KV)进行INT8、FP8量化的推理,【LLM】模型推理支持列表如下: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“大模型推理”链接到llm/docs/predict/inference.md
llm/docs/predict/inference.md
Outdated
|
||
1. `quant_type`可选的数值有`weight_only_int8`、`weight_only_int4`、`a8w8`和`a8w8_fp8`。 | ||
2. `a8w8`与`a8w8_fp8`需要额外的act和weight的scale校准表,推理传入的 `model_name_or_path` 为PTQ校准产出的量化模型。量化模型导出参考[大模型量化教程](../quantization.md)。 | ||
3. `cachekv_int8_type`可选`dynamic`和`static`两种,`static`需要额外的cache kv的scale校准表,传入的 `model_name_or_path` 为PTQ校准产出的量化模型。量化模型导出参考[大模型量化教程](../quantization.md)。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里说明下 dynamic已不再维护,不建议使用吧,
llm/docs/predict/inference.md
Outdated
|
||
PaddleNLP 提供了多种硬件平台和精度支持,包括: | ||
|
||
| Precision | Ada | Ampere | Turing | Volta | 昆仑XPU | 昇腾NPU | 海光K100 | 燧原GCU | x86 CPU | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
推荐表格居中
llm/docs/predict/inference.md
Outdated
PaddleNLP 中已经添加高性能推理模型相关实现,支持: | ||
| Models | Example Models | | ||
|--------|----------------| | ||
|Llama 3.1, Llama 3, Llama 2|`meta-llama/Meta-Llama-3-8B`, `meta-llama/Meta-Llama-3.1-8B`, `meta-llama/Meta-Llama-3.1-405B`, etc.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
推荐这里把模型所有size全部列出,更方便查找
llm/docs/predict/inference.md
Outdated
|
||
- `model_name_or_path`: 必需,预训练模型名称或者本地的模型路径,用于热启模型和分词器,默认为None。 | ||
|
||
- `dtype`: 必需,模型参数dtype,默认为None。如果没有传入`lora_path`、`prefix_path`则必须传入。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果没有传入lora_path
或prefix_path
则必须传入dtype
参数。
llm/docs/predict/inference.md
Outdated
|
||
- `batch_size`: 批处理大小,默认为1。该参数越大,占用显存越高;该参数越小,占用显存越低。 | ||
|
||
- `data_file`: 待推理json文件,默认为None。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议这里给出文件内容格式。
llm/docs/predict/inference.md
Outdated
|
||
- `output_file`: 保存推理结果文件,默认为output.json。 | ||
|
||
- `device`: 运行环境,默认为gpu。可选的数值有`cpu`, `gpu`, `xpu`。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device只有这三种吗?和上述支持硬件支持表格不相符。
llm/docs/predict/inference.md
Outdated
|
||
- `mode`: 使用动态图或者静态图推理,可选值有`dynamic`、 `static`,默认为`dynamic`。 | ||
|
||
- `avx_model`: 当使用CPU推理时,是否使用AvxModel,默认为False。参考[CPU推理教程]()。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处缺少链接
llm/docs/predict/installation.md
Outdated
PaddleNLP 针对于Transformer 系列编写了高性能自定义算子,提升模型在推理和解码过程中的性能,使用之前需要预先安装自定义算子库: | ||
|
||
```shell | ||
git clone https://github.com/PaddlePaddle/PaddleNLP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里clone了两次
llm/docs/predict/installation.md
Outdated
|
||
```shell | ||
cd PaddleNLP/llm | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议这里跳转到最佳实践或者其他应用比较好
```shell | ||
# 动态图推理 | ||
export DEVICES=0,1 | ||
python -m paddle.distributed.launch \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
缺一些介绍说明,例如使用哪些参数,参数的具体含义,可以简单说明一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参数的具体含义在inference.md页面已介绍过,这里不再重复
llm/docs/predict/inference.md
Outdated
|
||
- 支持多硬件大模型推理,包括[昆仑XPU](../../xpu/llama/README.md)、[昇腾NPU](../../npu/llama/README.md)、[海光K100](../dcu_install.md)、[燧原GCU](../../gcu/llama/README.md)、[X86 CPU](../cpu_install.md)等 | ||
|
||
- 提供面向服务器场景的部署服务,支持连续批处理(continuous batching)、流式输出等功能,支持HTTP、RPC、RESTful多种Clent端形式 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
支持HTTP、RPC、RESTful多种Clent端形式 -> 支持gRPC、HTTP协议的服务接口
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
支持HTTP、RPC、RESTful多种Clent端形式 -> 支持gRPC、HTTP协议的服务接口
已修改
llm/docs/predict/inference.md
Outdated
|
||
- 基于Transformer的大模型(如Llama、Qwen) | ||
|
||
- 混合专家大模型(如Mixtral) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
25-29行可以去掉,下面1中也有模型支持,可以看到支持的模型列表。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
25-29行可以去掉,下面1中也有模型支持,可以看到支持的模型列表。
已删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* update inference docs * update * update * update * update * fix comments * fix comments * fix comments * update inference.md
* update inference docs * update * update * update * update * fix comments * fix comments * fix comments * update inference.md
* update inference docs * update * update * update * update * fix comments * fix comments * fix comments * update inference.md
* update inference docs * update * update * update * update * fix comments * fix comments * fix comments * update inference.md
PR types
Others
PR changes
Docs
Description
update llm inferece docs