Skip to content

Commit ac28f8f

Browse files
author
395822456@qq.com
committed
update
1 parent d96f9bb commit ac28f8f

File tree

5 files changed

+5
-7
lines changed

5 files changed

+5
-7
lines changed

csrc/cpu/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,5 @@
44
# 构建 cpu 自定义算子库
55
```
66
$ 前提条件:机器支持avx指令
7-
$ cd src
87
$ bash setup.sh
98
```

csrc/cpu/setup.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,18 +43,17 @@ fi
4343
cd xFasterTransformer
4444
git apply paddle.patch
4545

46-
#4. build xFasterTransformer
46+
# #4. build xFasterTransformer
4747
sh ./3rdparty/prepare_oneccl.sh
4848
source ./3rdparty/oneccl/build/_install/env/setvars.sh
49-
source /workspace/cpu_repo/xFasterTransformer/3rdparty/oneccl/build/_install/env/setvars.sh
5049

5150
rm -rf build
5251
mkdir build && cd build
5352
cmake ..
5453
make -j
5554

5655
#xft
57-
export XFT_HEADER_DIR=$PWD
56+
export XFT_HEADER_DIR=$PWD
5857
export XFT_LIB_DIR=$XFT_HEADER_DIR/build
5958
export LD_LIBRARY_PATH=$XFT_LIB_DIR:$LD_LIBRARY_PATH
6059

llm/docs/inference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ python ./predict/predictor.py --model_name_or_path checkpoints/llama_ptq_ckpts -
121121
python ./predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --output_path ./inference --dtype float16
122122

123123
# Cpu动转静avx指令动转静参考
124-
python ./predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --output_path ./inference --dtype float16 --avx_mode --avx_type "fp16" --device "cpu"
124+
python ./predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --output_path ./inference --dtype float32 --avx_mode --avx_type "fp16" --device "cpu"
125125

126126
# PrefixTuning动转静命令参考
127127
python ./predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --output_path ./inference --dtype float16 --export_precache true

llm/predict/predictor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -398,7 +398,6 @@ def __init__(self, config: PredictorArgument, tokenizer: PretrainedTokenizer):
398398

399399
self.dtype = config.dtype or self.model_config
400400
self.pre_ids = paddle.full([config.batch_size, config.total_max_length], -1, dtype="int64")
401-
self.arange_tensor_encoder = paddle.arange(config.total_max_length, dtype=self.dtype)
402401

403402
if config.device == "cpu" and config.avx_model:
404403
assert (
@@ -409,6 +408,7 @@ def __init__(self, config: PredictorArgument, tokenizer: PretrainedTokenizer):
409408
self.tgt_generation_mask = None
410409
self.tgt_pos = None
411410
else:
411+
self.arange_tensor_encoder = paddle.arange(config.total_max_length, dtype=self.dtype)
412412
self.cache_kvs = [paddle.zeros(shape, dtype=self.dtype) for shape in self.cache_kvs_shape]
413413
self.num_layers, self.num_attention_heads, self.head_dim = (
414414
len(self.cache_kvs),

paddlenlp/experimental/transformers/llama/modeling.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ def __init__(self, config: LlamaConfig):
143143
self.hidden_size,
144144
self.num_attention_heads,
145145
self.intermediate_size,
146-
activation="swiglu",
146+
activation="silu",
147147
num_layers=config.num_hidden_layers,
148148
ln_scale_attrs=ln_scale_attrs,
149149
qkv_weight_attrs=qkv_weight_attrs,

0 commit comments

Comments
 (0)