small update

2024-03-18 23:39:49 +09:00 · 2024-03-18 23:39:49 +09:00 · 275f249709
commit 275f249709
parent c16761e289
2 changed files with 2 additions and 2 deletions
--- a/generate_data/tutorial.md
+++ b/generate_data/tutorial.md
@ -118,7 +118,7 @@
 * 首先使用`check.py`进行数据检查。在进行数据集整合之前，我们要检查生成的数据是否存在格式错误，类型不符合等情况。
 * 然后使用`merge_json.py`将所有的json（或者使用`merge_jsonl.py`将所有的jsonl）文件整合为一个总的json文件。

-#### Case 2: 使用`python qwen_gen_data_NoBash.py`或者`python zhipuai_gen_data.py`
+#### Case 2: 使用改进的生成保存方法：`python qwen_gen_data_NoBash.py`或者`python zhipuai_gen_data.py`

 在这种情况下，我们需要在使用两种改进的生成方法生成多轮对话后，将`{data_ai}`文件夹下所有`{area}`子文件夹中的所有`{emotion}.jsonl`文件合并为`{data_ai}_final_merge.json`文件。

--- a/generate_data/tutorial_EN.md
+++ b/generate_data/tutorial_EN.md
@ -119,7 +119,7 @@ Then, all `area` values are traversed, followed by different `emotion` values fo
 * First, use `check.py` to check the data. Before integrating the dataset, we need to check whether the generated data has format errors or type mismatches.
 * Then, use `merge_json.py` to consolidate all json files (or use `merge_jsonl.py` to consolidate all jsonl files) into one overall json file.

-#### **Case 2**: Using `python qwen_gen_data_NoBash.py` or `python zhipuai_gen_data.py`
+#### **Case 2**: Using improved generation method: `python qwen_gen_data_NoBash.py` or `python zhipuai_gen_data.py` 

 In this case, we need to merge all `{emotion}.jsonl` files in all `{area}` subfolders under the `{data_ai}` folder into `{data_ai}_final_merge.json` after we use two improved generation methods to generate multi-round conversations.