Skip to content

Add unified sentiment analysis based on UIE. #3694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 102 commits into from
Dec 20, 2022

Conversation

1649759610
Copy link
Contributor

PR types

New features

PR changes

Others

Description

Add unified sentiment analysis based on UIE.


本功能在预测时需要传入测试集文件路径,可将测试集文件命名为`test.txt`, 然后放入 `./data` 目录下。需要注意的是,测试集文件每行均为一个待预测的语句,如下所示。
#### 2.2.1 属性/观点分析
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里说的属性是指情感维度吗? 看看要不要整体统一一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的情感维度是属性,一般叫属性说法的会比较多一点


如果你希望自己尝试进行属性级情感分类模型训练,可使用4.1节中提供的 `cls_data` Demo 数据,或自己业务的标注数据重新训练模型,本项目已将属性级情感分类模型的相关训练和测试代码放入 `classification` 目录下,请到该目录下执行模型训练即可,更多的实现细节和使用方式,请参考[这里](classification/README.md)。
#### 2.2.3 属性+情感分析
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

属性 + 情感极性分析?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已改为:属性 + 情感极性分析


## 6. 引用
- 👉 [通用情感分析抽取](./unified_sentiment_extraction/README)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里建议 更多的进行一些引导,因为上面的方案介绍更像是介绍统一信息抽取的方案

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续整体ready后,会继续完善这块

<a name="2.1"></a>

### 2.1 运行环境
- python >= 3.6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python的版本建议升级3.7+ 后续Paddle的框架和套件不再支持python 3.6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已改为:python >= 3.7


**安装PaddlePaddle**:

环境中paddlepaddle-gpu或paddlepaddle版本应大于或等于2.3, 请参见[飞桨快速安装](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)根据自己需求选择合适的PaddlePaddle下载命令。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里也可以建议增加conda的安装方式

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改为:
环境中paddlepaddle-gpu或paddlepaddle版本应大于或等于2.3, 具体可以参见飞桨快速安装根据自己需求选择合适的PaddlePaddle下载命令。如下命令可以安装linux系统,CUDA版本为10.2环境下的paddlepaddle,具体版本号为支持GPU的2.3.2版本。

conda install paddlepaddle-gpu==2.3.2 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/


# visualization for aspect
save_path = os.path.join(args.save_dir, "aspect_wc.png")
vs.plot_aspect_with_frequency(sr.aspect_frequency, save_path, image_type="wordcloud")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的sr 定义的地方没有放出code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加:

define SentimentResult to process the result of sentiment result.

sr = SentimentResult(args.file_path, sentiment_name=args.sentiment_name)

```

<div align="center">
<img src="https://user-images.githubusercontent.com/35913314/200213998-e646c422-7ab5-48ae-9e28-d6068cdf7b8f.png"/>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续可以看看和pipelines的联动

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已支持pipeline

@@ -0,0 +1,41 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

霖捷本周会把UIE放到 transformers/ernie/modeling.py里面,后续可以直接调用

同时可能要看看支持UIE-M模型,这个模型结构稍有区别

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

from utils import load_txt, write_json_file


def parse_args():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的yapf disable一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些代码后续会删掉,统一用taskflow

@@ -0,0 +1,821 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个有个疑问,后续也是提供uie-base的微调能力? 还是我们提供定制化的base medium nano mini相关版本

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里计划是提供uie-base的微调能力

@1649759610 1649759610 reopened this Dec 16, 2022
@codecov
Copy link

codecov bot commented Dec 16, 2022

Codecov Report

Merging #3694 (87782d3) into develop (d6ea3b0) will decrease coverage by 0.12%.
The diff coverage is 10.43%.

@@             Coverage Diff             @@
##           develop    #3694      +/-   ##
===========================================
- Coverage    33.11%   32.98%   -0.13%     
===========================================
  Files          400      400              
  Lines        56139    56396     +257     
===========================================
+ Hits         18588    18604      +16     
- Misses       37551    37792     +241     
Impacted Files Coverage Δ
paddlenlp/transformers/ernie/configuration.py 100.00% <ø> (ø)
paddlenlp/transformers/ernie/tokenizer.py 41.93% <ø> (ø)
paddlenlp/taskflow/sentiment_analysis.py 12.23% <10.10%> (-9.20%) ⬇️
paddlenlp/taskflow/taskflow.py 75.29% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants