Merge branch 'develop' into t5

FrostML · web-flow · commit 413bc3ae5512 · 2022-12-21T10:02:01.000+08:00
diff --git a/.github/workflows/stale.yml b/.github/workflows/stale.yml
@@ -23,10 +23,10 @@ jobs:
       - uses: actions/stale@v6.0.1
         with:
           days-before-issue-stale: 60
-          days-before-issue-close: 30
+          days-before-issue-close: 14
           stale-issue-label: "stale"
           stale-issue-message: "This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。"
-          close-issue-message: "This issue was closed because it has been inactive for 30 days since being marked as stale. 当前issue 被标记为stale已有30天，即将关闭。"
+          close-issue-message: "This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天，即将关闭。"
           exempt-issue-labels: 'triage,keep'
           days-before-pr-stale: 60
           days-before-pr-close: -1
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,150 @@
+# Contributing to PaddleNLP
+
+We highly welcome and value your contributions to `PaddleNLP`. The first step to start your contribution is to sign the [PaddlePaddle Contributor License Agreement](https://cla-assistant.io/PaddlePaddle/PaddleNLP).
+
+This document explains our workflow and work style:
+
+## Workflow
+
+PaddlePaddle uses the [Git branching model](http://nvie.com/posts/a-successful-git-branching-model/).  The following steps guide usual contributions.
+
+#### 1. Fork
+
+   Our development community has been growing fastly; it doesn't make sense for everyone to write into the official repo.  So, please file Pull Requests from your fork.  To make a fork,  just head over to the GitHub page and click the ["Fork" button](https://help.github.com/articles/fork-a-repo/).
+
+#### 2. Clone
+
+   To make a copy of your fork to your local computers, please run
+
+   ```bash
+   git clone https://github.com/<your-github-account>/PaddleNLP
+   cd PaddleNLP
+   ```
+
+#### 3. Create the local feature branch
+
+   For daily works like adding a new feature or fixing a bug, please open your feature branch before coding:
+
+   ```bash
+   git checkout -b my-cool-feature
+   ```
+
+#### 4. Set up the development environment
+
+   Before you start coding, you need to setup the development environment. We highly recommend doing all your development in a virtual environment such as
+   [venv](https://docs.python.org/3/library/venv.html) or [conda](https://docs.conda.io/en/latest/). After you setup and activated your virtual environment,
+   run the following command:
+
+   ```bash
+   make install
+   ```
+
+   This will setup all the dependencies of `PaddleNLP` as well as the [`pre-commit`](http://pre-commit.com/) tool.
+
+   If you are working on the `examples` or `applications` module and require importing from `PaddleNLP`, make sure you install `PaddleNLP` in editable mode.
+   If `PaddleNLP` is already installed in the virtual environment, remove it with `pip uninstall paddlenlp` before reinstalling it in editable mode with
+   `pip install -e .`
+
+#### 5. Develop
+
+   As you develop your new exciting feature, keep in mind that it should be covered by unit tests. All of our unit tests can be found under the `tests` directory.
+   You can either modify existing unit test to cover the new feature, or create a new test from scratch.
+   As you finish up the your code, you should make sure the test suite passes. You can run the tests impacted by your changes like this:
+
+   ```bash
+   pytest tests/<test_to_run>.py
+   ```
+
+#### 6. Commit
+
+   We utilizes [`pre-commit`](http://pre-commit.com/) (with [black](https://black.readthedocs.io/en/stable/), [isort](https://pycqa.github.io/isort/) and
+   [flake8](https://flake8.pycqa.org/en/latest/) under the hood) to check the style of code and documentation in every commit. When you run run `git commit`, you will see
+   something like the following:
+
+   ```
+    ➜  (my-virtual-env) git commit -m "commiting my cool feature"
+    black....................................................................Passed
+    isort....................................................................Passed
+    flake8...................................................................Passed
+    check for merge conflicts................................................Passed
+    check for broken symlinks............................(no files to check)Skipped
+    detect private key.......................................................Passed
+    fix end of files.....................................(no files to check)Skipped
+    trim trailing whitespace.............................(no files to check)Skipped
+    CRLF end-lines checker...............................(no files to check)Skipped
+    CRLF end-lines remover...............................(no files to check)Skipped
+    No-tabs checker......................................(no files to check)Skipped
+    Tabs remover.........................................(no files to check)Skipped
+    copyright_checker........................................................Passed
+   ```
+
+   But most of the time things don't go so smoothly. When your code or documentation doesn't meet the standard, the `pre-commit` check will fail.
+   ```
+    ➜  (my-virtual-env) git commit -m "commiting my cool feature"
+    black....................................................................Passed
+    isort....................................................................Failed
+    - hook id: isort
+    - files were modified by this hook
+
+    Fixing examples/information_extraction/waybill_ie/run_ernie_crf.py
+
+    flake8...................................................................Passed
+    check for merge conflicts................................................Passed
+    check for broken symlinks............................(no files to check)Skipped
+    detect private key.......................................................Passed
+    fix end of files.....................................(no files to check)Skipped
+    trim trailing whitespace.............................(no files to check)Skipped
+    CRLF end-lines checker...............................(no files to check)Skipped
+    CRLF end-lines remover...............................(no files to check)Skipped
+    No-tabs checker......................................(no files to check)Skipped
+    Tabs remover.........................................(no files to check)Skipped
+    copyright_checker........................................................Passed
+   ```
+
+   But **don't panic**!
+   Our tooling will fix most of the style errors automatically. Some errors will need to be addressed manually. Fortunately, the error messages are straight forward and
+   the errors are usually simple to fix. After addressing the errors, you can run `git add <files>` and `git commit` again, which will trigger `pre-commit` again.
+   Once the `pre-commit` checks pass, you are ready to push the code.
+
+   [Google][http://google.com/] or [StackOverflow](https://stackoverflow.com/) are great tools to help you understand the code style errors.
+   Don't worry if you still can't figure it out. You can commit with `git commit -m "style error" --no-verify` and we are happy to help you once you create a Pull Request.
+
+7. Keep pulling
+
+   An experienced Git user pulls from the official repo often -- daily or even hourly, so they notice conflicts with others work early, and it's easier to resolve smaller conflicts.
+
+   ```bash
+   git remote add upstream https://github.com/PaddlePaddle/PaddleNLP
+   git pull upstream develop
+   ```
+
+8. Push and file a pull request
+
+   You can "push" your local work into your forked repo:
+
+   ```bash
+   git push origin my-cool-stuff
+   ```
+
+   The push allows you to create a pull request, requesting owners of this [official repo](https://github.com/PaddlePaddle/PaddleNLP) to pull your change into the official one.
+
+   To create a pull request, please follow [these steps](https://help.github.com/articles/creating-a-pull-request/).
+
+9. Delete local and remote branches
+
+   To keep your local workspace and your fork clean, you might want to remove merged branches:
+
+   ```bash
+   git push origin my-cool-stuff
+   git checkout develop
+   git pull upstream develop
+   git branch -d my-cool-stuff
+   ```
+
+### Code Review
+
+-  Please feel free to ping your reviewers by sending them the URL of your pull request via IM or email.  Please do this after your pull request passes the CI.
+
+- Please answer reviewers' every comment.  If you are to follow the comment, please write "Done"; Otherwise, please start a discussion under the comment.
+
+- If you don't want your reviewers to get overwhelmed by email notifications, you might reply their comments by [in a batch](https://help.github.com/articles/reviewing-proposed-changes-in-a-pull-request/).
diff --git a/applications/text_summarization/unimo-text/export_model.py b/applications/text_summarization/unimo-text/export_model.py
@@ -11,16 +11,14 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import os
 import argparse
+import os
+from pprint import pprint
 
 import paddle
 
-from pprint import pprint
-
-from paddlenlp.transformers import UNIMOLMHeadModel, UNIMOTokenizer
 from paddlenlp.ops import FasterUNIMOText
-
+from paddlenlp.transformers import UNIMOLMHeadModel, UNIMOTokenizer
 from paddlenlp.utils.log import logger
 
 
@@ -82,13 +80,13 @@ def do_predict(args):
         unimo_text,
         input_spec=[
             # input_ids
-            paddle.static.InputSpec(shape=[None, None], dtype="int32"),
+            paddle.static.InputSpec(shape=[None, None], dtype="int64"),
             # token_type_ids
-            paddle.static.InputSpec(shape=[None, None], dtype="int32"),
+            paddle.static.InputSpec(shape=[None, None], dtype="int64"),
             # attention_mask
             paddle.static.InputSpec(shape=[None, 1, None, None], dtype="float32"),
             # seq_len
-            paddle.static.InputSpec(shape=[None], dtype="int32"),
+            paddle.static.InputSpec(shape=[None], dtype="int64"),
             args.max_out_len,
             args.min_out_len,
             args.topk,
diff --git a/examples/question_generation/unimo-text/export_model.py b/examples/question_generation/unimo-text/export_model.py
@@ -12,16 +12,14 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-import os
 import argparse
+import os
+from pprint import pprint
 
 import paddle
 
-from pprint import pprint
-
-from paddlenlp.transformers import UNIMOLMHeadModel, UNIMOTokenizer
 from paddlenlp.ops import FasterUNIMOText
-
+from paddlenlp.transformers import UNIMOLMHeadModel, UNIMOTokenizer
 from paddlenlp.utils.log import logger
 
 
@@ -70,19 +68,19 @@ def do_predict(args):
             paddle.static.InputSpec(shape=[None, None], dtype="int64"),
             # attention_mask
             paddle.static.InputSpec(shape=[None, 1, None, None],
-                                    dtype="float64"),
+                                    dtype="float32"),
             # seq_len
             paddle.static.InputSpec(shape=[None], dtype="int64"),
             args.max_dec_len,
             args.min_dec_len,
             args.topk,
             args.topp,
-            args.num_beams,  # num_beams. Used for beam_search. 
+            args.num_beams,  # num_beams. Used for beam_search.
             args.decoding_strategy,
             tokenizer.cls_token_id,  # cls/bos
             tokenizer.mask_token_id,  # mask/eos
             tokenizer.pad_token_id,  # pad
-            args.diversity_rate,  # diversity rate. Used for beam search. 
+            args.diversity_rate,  # diversity rate. Used for beam search.
             args.temperature,
             args.num_return_sequences,
             args.length_penalty,