From 89dea4826a10f969c6cd91aeec62cabaee43b255 Mon Sep 17 00:00:00 2001 From: HongCheng Date: Sat, 20 Apr 2024 13:44:46 +0900 Subject: [PATCH] =?UTF-8?q?=E6=9B=B4=E6=AD=A3=E7=A4=BA=E4=BE=8B=E6=95=B0?= =?UTF-8?q?=E6=8D=AE=E9=9B=86=20multi=5Fturn=5Fdataset=5F2,=20=E6=B7=BB?= =?UTF-8?q?=E5=8A=A0=E6=9B=B4=E5=A4=9A=E6=8F=8F=E8=BF=B0,=20=E7=A7=BB?= =?UTF-8?q?=E5=8A=A8=E5=A4=84=E7=90=86=E6=96=87=E4=BB=B6?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- datasets/{ => processed}/deduplicate.py | 0 xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md | 4 ++-- 2 files changed, 2 insertions(+), 2 deletions(-) rename datasets/{ => processed}/deduplicate.py (100%) diff --git a/datasets/deduplicate.py b/datasets/processed/deduplicate.py similarity index 100% rename from datasets/deduplicate.py rename to datasets/processed/deduplicate.py diff --git a/xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md b/xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md index 7f986fc..c532ec6 100644 --- a/xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md +++ b/xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md @@ -84,7 +84,7 @@ pip install -e '.[all]' ### 修改配置文件 -我们这里可以参照[EmoLLM](https://link.zhihu.com/?target=https%3A//github.com/SmartFlowAI/EmoLLM)的[README_internlm2_7b_base_qlora.md](https://link.zhihu.com/?target=https%3A//github.com/SmartFlowAI/EmoLLM/blob/main/xtuner_config/README_internlm2_7b_base_qlora.md)来进行修改 +我们这里可以参照[EmoLLM](https://link.zhihu.com/?target=https%3A//github.com/SmartFlowAI/EmoLLM)的[README_internlm2_7b_base_qlora.md](xtuner_config/README_internlm2_7b_base_qlora.md)来进行修改 这里主要修改模型路径`pretrained_model_name_or_path` 和对话模板`prompt_template` ,将我们下载的Llama模型路径`Meta-Llama-3-8B-Instruct`和修改后的对话模板`llama3_chatM` 改到对应的位置即可 ```python @@ -164,7 +164,7 @@ SYSTEM = "你由EmoLLM团队打造的中文领域心理健康助手, 是一个 数据集介绍详见[EmoLLM](https://link.zhihu.com/?target=https%3A//github.com/SmartFlowAI/EmoLLM/)的[README_internlm2_7b_base_qlora.md](https://link.zhihu.com/?target=https%3A//github.com/SmartFlowAI/EmoLLM/blob/main/xtuner_config/README_internlm2_7b_base_qlora.md)和[datasets](https://link.zhihu.com/?target=https%3A//github.com/SmartFlowAI/EmoLLM/tree/main/datasets) -训练用的数据来自[single_turn_dataset_2.json](datasets/single_turn_dataset_2.json),被处理成多轮对话的形式,如 +训练用的数据与[README_internlm2_7b_base_qlora.md](xtuner_config/README_internlm2_7b_base_qlora.md)中使用的数据集完全相同, 用户可以只选择[multi_turn_dataset_2.json](datasets/multi_turn_dataset_2.json)外加自我认知数据集(待更新)才尝试训练, 或者采用[processed](datasets\processed)文件夹中的处理函数进行额外处理. 最终的训练数据是是对话的形式(可以包含多轮或者单轮),如 ```python [