[Cherry-Pick][NPU] Refine doc (#774) (#775)

Birdylx · web-flow · commit 35d4b8c68113 · 2024-10-22T15:31:15.000+08:00
1. refine doc
2. make typehint compatible with python3.9
diff --git a/paddlemix/examples/README.md b/paddlemix/examples/README.md
@@ -17,7 +17,7 @@ paddlemix `examples` 目录下提供模型的一站式体验，包括模型推
 | [groundingdino](./groundingdino/) | ✅ | ❌  | 🚧   | ❌  | ✅  | ❌ |
 | [imagebind](./imagebind/) |   ✅  |  ❌   |  ❌  | ❌ | ❌ | ❌ |
 | [InternLM-XComposer2](./internlm_xcomposer2/) | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
-| [Internvl2](./internvl2/) | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
+| [Internvl2](./internvl2/) | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ |
 | [llava](./llava/) | ✅  | ✅  | ✅  | ✅  | 🚧  | ✅ |
 | [llava-next](./llava_next_interleave/) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
 | [minigpt4](./minigpt4) | ✅ | ✅ | ✅   |  ❌  | ✅  | ❌ |
diff --git a/paddlemix/examples/llava/README.md b/paddlemix/examples/llava/README.md
@@ -104,8 +104,35 @@ python paddlemix/tools/supervised_finetune.py paddlemix/config/llava/v1_5/lora_s
 python paddlemix/tools/supervised_finetune.py paddlemix/config/llava/v1_5/sft_argument.json
 ```
 
-## 5 NPU硬件训练
-请参照[tools](../../tools/README.md)进行NPU硬件Paddle安装和环境变量设置，配置完成后可直接执行微调命令进行训练或预测。
+## 6 NPU硬件训练
+请参照[tools](../../tools/README.md)进行NPU硬件Paddle安装和环境变量设置。执行预测和训练前需要设置如下环境变量：
+```shell
+export ASCEND_RT_VISIBLE_DEVICES=8
+export FLAGS_npu_storage_format=0
+export FLAGS_use_stride_kernel=0
+export FLAGS_npu_jit_compile=0
+export FLAGS_npu_scale_aclnn=True
+export FLAGS_npu_split_aclnn=True
+export FLAGS_allocator_strategy=auto_growth
+export CUSTOM_DEVICE_BLACK_LIST=set_value,set_value_with_tensor
+```
+
+预测:
+```shell
+python paddlemix/examples/llava/run_predict_multiround.py \
+    --model-path "paddlemix/llava/llava-v1.6-7b" \
+    --image-file "https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg" \
+    --fp16
+```
+微调:
+```shell
+# llava lora微调
+python paddlemix/tools/supervised_finetune.py paddlemix/config/llava/v1_5/lora_sft_argument.json
+
+# llava full参数微调
+python paddlemix/tools/supervised_finetune.py paddlemix/config/llava/v1_5/sft_argument.json
+```
+
 
 ### 参考文献
 ```BibTeX
diff --git a/paddlemix/processors/qwen2_vl_processing.py b/paddlemix/processors/qwen2_vl_processing.py
@@ -667,7 +667,7 @@ def smart_resize(
     return h_bar, w_bar
 
 
-def fetch_image(ele: dict[str, str | Image.Image], size_factor: int = IMAGE_FACTOR) -> Image.Image:
+def fetch_image(ele: dict[str, Union[str, Image.Image]], size_factor: int = IMAGE_FACTOR) -> Image.Image:
     if "image" in ele:
         image = ele["image"]
     else:
@@ -715,7 +715,7 @@ def fetch_image(ele: dict[str, str | Image.Image], size_factor: int = IMAGE_FACT
 def smart_nframes(
     ele: dict,
     total_frames: int,
-    video_fps: int | float,
+    video_fps: Union[int, float],
 ) -> int:
     """calculate the number of frames for video used for model inputs.
 
@@ -850,7 +850,7 @@ def gaussian_kernel_1d(size, sigma):
     kernel = np.exp(-x**2 / (2 * sigma**2))
     return kernel / kernel.sum()
 
-def fetch_video(ele: dict, image_factor: int = IMAGE_FACTOR) -> paddle.Tensor | list[Image.Image]:
+def fetch_video(ele: dict, image_factor: int = IMAGE_FACTOR) -> Union[paddle.Tensor, list[Image.Image]]:
     if isinstance(ele["video"], str):
         video_reader_backend = get_video_reader_backend()
 
@@ -902,7 +902,7 @@ def fetch_video(ele: dict, image_factor: int = IMAGE_FACTOR) -> paddle.Tensor |
         return images
 
 
-def extract_vision_info(conversations: list[dict] | list[list[dict]]) -> list[dict]:
+def extract_vision_info(conversations: Union[list[dict], list[list[dict]]]) -> list[dict]:
     vision_infos = []
     if isinstance(conversations[0], dict):
         conversations = [conversations]
@@ -921,8 +921,8 @@ def extract_vision_info(conversations: list[dict] | list[list[dict]]) -> list[di
 
 
 def process_vision_info(
-    conversations: list[dict] | list[list[dict]],
-) -> tuple[list[Image.Image] | None, list[paddle.Tensor | list[Image.Image]] | None]:
+    conversations: Union[list[dict], list[list[dict]]],
+) -> tuple[Union[list[Image.Image], None, list[Union[paddle.Tensor, list[Image.Image]]], None]]:
     vision_infos = extract_vision_info(conversations)
     image_inputs = []
     video_inputs = []
diff --git a/paddlemix/tools/README.md b/paddlemix/tools/README.md
@@ -18,7 +18,7 @@ PaddleMIX工具箱秉承了飞桨套件一站式体验、性能极致、生态
 | [groundingdino](./groundingdino/) |  🚧   | ❌  | ✅  | ❌ |
 | [imagebind](./imagebind/) |  ❌  | ❌ | ❌ | ❌ |
 | [InternLM-XComposer2](./internlm_xcomposer2/) | ✅ | ❌ | ❌ | ❌ |
-| [Internvl2](./internvl2/)| ✅ | ❌ | ❌ | ❌ |
+| [Internvl2](./internvl2/)| ✅ | ❌ | ❌ | ✅ |
 | [llava](./llava/)  | ✅  | ✅  | 🚧  | ✅ |
 | [llava-next](./llava_next_interleave/) | ❌ | ❌ | ❌ | ❌ |
 | [minigpt4](./minigpt4) | ✅   |  ❌  | ✅  | ❌ |