Open
Description
软件环境
- paddlepaddle:3.0.0
- paddlepaddle-gpu: 3.0.0
- paddlenlp: 3.0.b4
重复问题
- I have searched the existing issues
错误描述
我的显卡信息:
Tue May 27 20:30:25 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 573.24 Driver Version: 573.24 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5060 ... WDDM | 00000000:01:00.0 Off | N/A |
| N/A 45C P8 3W / 80W | 4570MiB / 8151MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 21700 C ...\envs\my_paddlenlp\python.exe N/A |
调用示例代码:
from paddlenlp.transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B", dtype="float16")
input_features = tokenizer("你好!请自我介绍一下。", return_tensors="pd")
outputs = model.generate(**input_features, max_length=128)
print(tokenizer.batch_decode(outputs[0], skip_special_tokens=True))
输出错误信息:
.conda\envs\my_paddlenlp\Lib\site-packages\paddle\utils\cpp_extension\extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
warnings.warn(warning_message)
[2025-05-27 20:24:52,511] [ INFO] - The `unk_token` parameter needs to be defined: we use `eos_token` by default.
[2025-05-27 20:24:52,655] [ INFO] - We are using <class 'paddlenlp.transformers.qwen2.modeling.Qwen2ForCausalLM'> to load 'Qwen/Qwen2-0.5B'.
[2025-05-27 20:24:52,655] [ INFO] - Loading configuration file C:\Users\panchonglin\.paddlenlp\models\Qwen/Qwen2-0.5B\config.json
[2025-05-27 20:24:52,658] [ INFO] - Loading weights file from cache at C:\Users\panchonglin\.paddlenlp\models\Qwen/Qwen2-0.5B\model.safetensors
[2025-05-27 20:24:53,358] [ INFO] - Loaded weights file from disk, setting weights to model.
W0527 20:24:53.449043 8752 gpu_resources.cc:106] The GPU compute capability in your current machine is 120, which is not supported by Paddle, it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.
W0527 20:24:53.789641 8752 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 12.0, Driver API Version: 12.8, Runtime API Version: 12.6
W0527 20:24:53.789641 8752 gpu_resources.cc:164] device: 0, cuDNN Version: 9.5.
[2025-05-27 20:25:02,544] [ INFO] - All model checkpoint weights were used when initializing Qwen2ForCausalLM.
[2025-05-27 20:25:02,544] [ WARNING] - Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at Qwen/Qwen2-0.5B and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[2025-05-27 20:25:02,545] [ INFO] - Loading configuration file C:\Users\panchonglin\.paddlenlp\models\Qwen/Qwen2-0.5B\generation_config.json
W0527 20:25:02.624600 8752 multiply_fwd_func.cc:76] got different data type, run type promotion automatically, this may cause data type been changed.
['!']
稳定复现步骤 & 代码
from paddlenlp.transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B", dtype="float16")
input_features = tokenizer("你好!请自我介绍一下。", return_tensors="pd")
outputs = model.generate(**input_features, max_length=128)
print(tokenizer.batch_decode(outputs[0], skip_special_tokens=True))