[Bug]: 5060显卡支持问题

### 软件环境

```Markdown
- paddlepaddle:3.0.0
- paddlepaddle-gpu: 3.0.0
- paddlenlp: 3.0.b4
```

### 重复问题

- [x] I have searched the existing issues

### 错误描述

```Markdown
我的显卡信息：
Tue May 27 20:30:25 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 573.24                 Driver Version: 573.24         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5060 ...  WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   45C    P8              3W /   80W |    4570MiB /   8151MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           21700      C   ...\envs\my_paddlenlp\python.exe      N/A      |
调用示例代码:

from paddlenlp.transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B", dtype="float16")
input_features = tokenizer("你好！请自我介绍一下。", return_tensors="pd")
outputs = model.generate(**input_features, max_length=128)

print(tokenizer.batch_decode(outputs[0], skip_special_tokens=True))

输出错误信息：
.conda\envs\my_paddlenlp\Lib\site-packages\paddle\utils\cpp_extension\extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
[2025-05-27 20:24:52,511] [    INFO] - The `unk_token` parameter needs to be defined: we use `eos_token` by default.
[2025-05-27 20:24:52,655] [    INFO] - We are using <class 'paddlenlp.transformers.qwen2.modeling.Qwen2ForCausalLM'> to load 'Qwen/Qwen2-0.5B'.
[2025-05-27 20:24:52,655] [    INFO] - Loading configuration file C:\Users\panchonglin\.paddlenlp\models\Qwen/Qwen2-0.5B\config.json
[2025-05-27 20:24:52,658] [    INFO] - Loading weights file from cache at C:\Users\panchonglin\.paddlenlp\models\Qwen/Qwen2-0.5B\model.safetensors
[2025-05-27 20:24:53,358] [    INFO] - Loaded weights file from disk, setting weights to model.
W0527 20:24:53.449043  8752 gpu_resources.cc:106] The GPU compute capability in your current machine is 120, which is not supported by Paddle, it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.
W0527 20:24:53.789641  8752 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 12.0, Driver API Version: 12.8, Runtime API Version: 12.6
W0527 20:24:53.789641  8752 gpu_resources.cc:164] device: 0, cuDNN Version: 9.5.
[2025-05-27 20:25:02,544] [    INFO] - All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[2025-05-27 20:25:02,544] [ WARNING] - Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at Qwen/Qwen2-0.5B and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[2025-05-27 20:25:02,545] [    INFO] - Loading configuration file C:\Users\panchonglin\.paddlenlp\models\Qwen/Qwen2-0.5B\generation_config.json
W0527 20:25:02.624600  8752 multiply_fwd_func.cc:76] got different data type, run type promotion automatically, this may cause data type been changed.
['!']
```

### 稳定复现步骤 & 代码

```python
from paddlenlp.transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B", dtype="float16")
input_features = tokenizer("你好！请自我介绍一下。", return_tensors="pd")
outputs = model.generate(**input_features, max_length=128)

print(tokenizer.batch_decode(outputs[0], skip_special_tokens=True))
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: 5060显卡支持问题 #10664

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: 5060显卡支持问题 #10664

Description

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions