small update

This commit is contained in:
HongCheng 2024-03-18 23:39:49 +09:00
parent c16761e289
commit 275f249709
2 changed files with 2 additions and 2 deletions

View File

@ -118,7 +118,7 @@
* 首先使用`check.py`进行数据检查。在进行数据集整合之前,我们要检查生成的数据是否存在格式错误,类型不符合等情况。
* 然后使用`merge_json.py`将所有的json或者使用`merge_jsonl.py`将所有的jsonl文件整合为一个总的json文件。
#### Case 2: 使用`python qwen_gen_data_NoBash.py`或者`python zhipuai_gen_data.py`
#### Case 2: 使用改进的生成保存方法:`python qwen_gen_data_NoBash.py`或者`python zhipuai_gen_data.py`
在这种情况下,我们需要在使用两种改进的生成方法生成多轮对话后,将`{data_ai}`文件夹下所有`{area}`子文件夹中的所有`{emotion}.jsonl`文件合并为`{data_ai}_final_merge.json`文件。

View File

@ -119,7 +119,7 @@ Then, all `area` values are traversed, followed by different `emotion` values fo
* First, use `check.py` to check the data. Before integrating the dataset, we need to check whether the generated data has format errors or type mismatches.
* Then, use `merge_json.py` to consolidate all json files (or use `merge_jsonl.py` to consolidate all jsonl files) into one overall json file.
#### **Case 2**: Using `python qwen_gen_data_NoBash.py` or `python zhipuai_gen_data.py`
#### **Case 2**: Using improved generation method: `python qwen_gen_data_NoBash.py` or `python zhipuai_gen_data.py`
In this case, we need to merge all `{emotion}.jsonl` files in all `{area}` subfolders under the `{data_ai}` folder into `{data_ai}_final_merge.json` after we use two improved generation methods to generate multi-round conversations.