From 275f2497099122dc293f3131367de4d5e91bb925 Mon Sep 17 00:00:00 2001 From: HongCheng Date: Mon, 18 Mar 2024 23:39:49 +0900 Subject: [PATCH] small update --- generate_data/tutorial.md | 2 +- generate_data/tutorial_EN.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/generate_data/tutorial.md b/generate_data/tutorial.md index f7af989..454895a 100644 --- a/generate_data/tutorial.md +++ b/generate_data/tutorial.md @@ -118,7 +118,7 @@ * 首先使用`check.py`进行数据检查。在进行数据集整合之前,我们要检查生成的数据是否存在格式错误,类型不符合等情况。 * 然后使用`merge_json.py`将所有的json(或者使用`merge_jsonl.py`将所有的jsonl)文件整合为一个总的json文件。 -#### Case 2: 使用`python qwen_gen_data_NoBash.py`或者`python zhipuai_gen_data.py` +#### Case 2: 使用改进的生成保存方法:`python qwen_gen_data_NoBash.py`或者`python zhipuai_gen_data.py` 在这种情况下,我们需要在使用两种改进的生成方法生成多轮对话后,将`{data_ai}`文件夹下所有`{area}`子文件夹中的所有`{emotion}.jsonl`文件合并为`{data_ai}_final_merge.json`文件。 diff --git a/generate_data/tutorial_EN.md b/generate_data/tutorial_EN.md index 85acf33..fdd5d69 100644 --- a/generate_data/tutorial_EN.md +++ b/generate_data/tutorial_EN.md @@ -119,7 +119,7 @@ Then, all `area` values are traversed, followed by different `emotion` values fo * First, use `check.py` to check the data. Before integrating the dataset, we need to check whether the generated data has format errors or type mismatches. * Then, use `merge_json.py` to consolidate all json files (or use `merge_jsonl.py` to consolidate all jsonl files) into one overall json file. -#### **Case 2**: Using `python qwen_gen_data_NoBash.py` or `python zhipuai_gen_data.py` +#### **Case 2**: Using improved generation method: `python qwen_gen_data_NoBash.py` or `python zhipuai_gen_data.py` In this case, we need to merge all `{emotion}.jsonl` files in all `{area}` subfolders under the `{data_ai}` folder into `{data_ai}_final_merge.json` after we use two improved generation methods to generate multi-round conversations.