Skip to content

Commit eb05f40

Browse files
authored
add_dataset_link (#3286)
1 parent 7fdfbf2 commit eb05f40

File tree

1 file changed

+57
-0
lines changed

1 file changed

+57
-0
lines changed

applications/text_classification/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
- [2.3 高效模型调优方案](#高效模型调优方案)
99
- [2.4 产业级全流程方案](#产业级全流程方案)
1010
- [3. 快速开始](#快速开始)
11+
- [4. 常用中文分类数据集](#常用中文分类数据集)
1112

1213
<a name="文本分类应用简介"></a>
1314

@@ -233,3 +234,59 @@
233234
- 快速开启多标签分类 👉 [多标签指南](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/text_classification/multi_label#readme)
234235

235236
- 快速开启层次分类 👉 [层次分类指南](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/text_classification/hierarchical#readme)
237+
238+
<a name="常用中文分类数据集"></a>
239+
240+
## 4. 常用中文分类数据集
241+
242+
**多分类数据集:**
243+
244+
- [THUCNews新闻分类数据集](http://thuctc.thunlp.org/)
245+
246+
- [百科问答分类数据集](https://github.com/brightmart/nlp_chinese_corpus#3%E7%99%BE%E7%A7%91%E7%B1%BB%E9%97%AE%E7%AD%94json%E7%89%88baike2018qa)
247+
248+
- [头条新闻标题数据集TNEWS](https://github.com/aceimnorstuvwxz/toutiao-text-classfication-dataset)
249+
250+
- [复旦新闻文本数据集](https://www.heywhale.com/mw/dataset/5d3a9c86cf76a600360edd04)
251+
252+
- [IFLYTEK app应用描述分类数据集](https://storage.googleapis.com/cluebenchmark/tasks/iflytek_public.zip)
253+
254+
- [CAIL 2022事件检测](https://cloud.tsinghua.edu.cn/d/6e911ff1286d47db8016/)
255+
256+
**情感分类数据集(多分类):**
257+
258+
- [亚马逊商品评论情感数据集](https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/yf_amazon/intro.ipynb)
259+
260+
- [财经新闻情感分类数据集](https://github.com/wwwxmu/Dataset-of-financial-news-sentiment-classification)
261+
262+
- [ChnSentiCorp 酒店评论情感分类数据集](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets/ChnSentiCorp_htl_all)
263+
264+
- [外卖评论情感分类数据集](https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/waimai_10k/intro.ipynb)
265+
266+
- [weibo情感二分类数据集](https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/weibo_senti_100k/intro.ipynb)
267+
268+
- [weibo情感四分类数据集](https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/simplifyweibo_4_moods/intro.ipynb)
269+
270+
- [商品评论情感分类数据集](https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/online_shopping_10_cats/intro.ipynb)
271+
272+
- [电影评论情感分类数据集](https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/dmsc_v2/intro.ipynb)
273+
274+
- [大众点评分类数据集](https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/yf_dianping/intro.ipynb)
275+
276+
**多标签数据集:**
277+
278+
- [学生评语分类数据集](https://github.com/FBI1314/textClassification/tree/master/multilabel_text_classfication/data)
279+
280+
- [CAIL2019婚姻要素识别](https://aistudio.baidu.com/aistudio/projectdetail/3996601)
281+
282+
- [CAIL2018 刑期预测、法条预测、罪名预测](https://cail.oss-cn-qingdao.aliyuncs.com/CAIL2018_ALL_DATA.zip)
283+
284+
**层次分类数据集:**
285+
286+
- [头条新闻标题分类-TNEWS的升级版](https://github.com/aceimnorstuvwxz/toutiao-multilevel-text-classfication-dataset)
287+
288+
- [网页层次分类数据集](https://csri.scu.edu.cn/info/1012/2827.htm)
289+
290+
- [医学意图数据集(CMID)](https://github.com/liutongyang/CMID)
291+
292+
- [2020语言与智能技术竞赛事件分类](https://github.com/percent4/keras_bert_multi_label_cls/tree/master/data)

0 commit comments

Comments
 (0)