Dev (#239)

2024-05-09 02:23:23 +08:00 · 2024-05-09 02:23:23 +08:00 · ad3a1ce58b
commit ad3a1ce58b
parent 6c870d350d 91ec1b57d6
12 changed files with 34 additions and 47 deletions
--- a/README.md
+++ b/README.md
@ -172,12 +172,12 @@
  - [🔗框架图](#框架图)
  - [目录](#目录)
          - [开发前的配置要求](#开发前的配置要求)
-          - [**使用指南**](#使用指南)
+          - [使用指南](#使用指南)
    - [🍪快速体验](#快速体验)
    - [📌数据构建](#数据构建)
    - [🎨微调指南](#微调指南)
    - [🔧部署指南](#部署指南)
-    - [⚙RAG(检索增强生成)Pipeline](#rag检索增强生成pipeline)
+    - [⚙RAG(检索增强生成)](#rag检索增强生成)
    - [使用到的框架](#使用到的框架)
      - [如何参与本项目](#如何参与本项目)
    - [作者（排名不分先后）](#作者排名不分先后)
@ -192,7 +192,7 @@
 - 硬件：A100 40G（仅针对InternLM2_7B_chat+qlora微调+deepspeed zero2优化）
-###### **使用指南**
+###### 使用指南
 1. Clone the repo
@ -211,7 +211,8 @@ git clone https://github.com/SmartFlowAI/EmoLLM.git
 ### 🍪快速体验
- 请阅读[快速体验](docs/quick_start.md)查阅
+- 请阅读[快速体验](quick_start/quick_start.md)查阅
 - 快速上手：[Baby EmoLLM](quick_start/Baby_EmoLLM.ipynb)
 ### 📌数据构建
@ -229,9 +230,9 @@ git clone https://github.com/SmartFlowAI/EmoLLM.git
 - Demo部署：详见[部署指南](demo/README.md)
 - 基于[LMDeploy](https://github.com/InternLM/lmdeploy/)的量化部署：详见[deploy](./deploy/lmdeploy.md)
-### ⚙RAG(检索增强生成)Pipeline
+### ⚙RAG(检索增强生成)
- 详见[RAG](./rag/)
+- 详见[RAG](rag/README.md)
 <details>
 <summary>更多详情</summary>
@ -307,11 +308,10 @@ git clone https://github.com/SmartFlowAI/EmoLLM.git
 ### 特别鸣谢
 - [Sanbu](https://github.com/sanbuphy)
 - [上海人工智能实验室](https://www.shlab.org.cn/)
- [闻星大佬（小助手）](https://github.com/vansin)
+- [闻星（浦语小助手）](https://github.com/vansin)
 - [扫地升（公众号宣传）](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
 - 阿布（北大心理学硕士）
 - [Sanbu](https://github.com/sanbuphy)
 - [HatBoy](https://github.com/hatboy)
 <!-- links -->
--- a/README_EN.md
+++ b/README_EN.md
@ -173,12 +173,12 @@ The Model aims to fully understand and promote the mental health of individuals,
  - [Roadmap](#roadmap)
  - [Contents](#contents)
          - [Pre-development Configuration Requirements.](#pre-development-configuration-requirements)
-          - [**User Guide**](#user-guide)
+          - [User Guide](#user-guide)
    - [🍪Quick start](#quick-start)
    - [📌Data Construction](#data-construction)
    - [🎨Fine-tuning Guide](#fine-tuning-guide)
    - [🔧Deployment Guide](#deployment-guide)
-    - [⚙RAG (Retrieval Augmented Generation) Pipeline](#rag-retrieval-augmented-generation-pipeline)
+    - [⚙RAG (Retrieval Augmented Generation)](#rag-retrieval-augmented-generation)
    - [Frameworks Used](#frameworks-used)
      - [How to participate in this project](#how-to-participate-in-this-project)
    - [Version control](#version-control)
@ -193,7 +193,7 @@ The Model aims to fully understand and promote the mental health of individuals,
 - A100 40G (specifically for InternLM2_7B_chat + qlora fine-tuning + deepspeed zero2 optimization)
-###### **User Guide**
+###### User Guide
 1. Clone the repo
@ -211,7 +211,8 @@ git clone https://github.com/SmartFlowAI/EmoLLM.git
 ### 🍪Quick start
- Please read [Quick Start](docs/quick_start_EN.md) to see.
+- Please read [Quick Start](quick_start/quick_start_EN.md) to see.
 - Quick coding: [Baby EmoLLM](quick_start/Baby_EmoLLM.ipynb)
 ### 📌Data Construction
@ -228,9 +229,9 @@ For details, see the [fine-tuning guide](xtuner_config/README_EN.md)
 - Demo deployment: see [deployment guide](./demo/README_EN.md) for details.
 - Quantitative deployment based on [LMDeploy](https://github.com/InternLM/lmdeploy/): see [deploy](./deploy/lmdeploy_EN.md)
-### ⚙RAG (Retrieval Augmented Generation) Pipeline
+### ⚙RAG (Retrieval Augmented Generation)
- See [RAG](./rag/)
+- See [RAG](rag/README_EN.md)
 <details>
 <summary>Additional Details</summary>
@ -297,11 +298,10 @@ The project is licensed under the MIT License. Please refer to the details
 ### Acknowledgments
 - [Sanbu](https://github.com/sanbuphy)
 - [Shanghai Artificial Intelligence Laboratory](https://www.shlab.org.cn/)
- [Vanin](https://github.com/vansin)
+- [Vansin](https://github.com/vansin)
- [Bloom up (WeChat Official Account Promotion)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
+- A.bu (M.A. in Psychology, Peking University)
- Abu (M.A. in Psychology, Peking University)
+- [Sanbuphy](https://github.com/sanbuphy)
 - [HatBoy](https://github.com/hatboy)
 <!-- links -->
--- a/datasets/LICENSE
+++ b/datasets/LICENSE
@ -1,21 +0,0 @@
 MIT License
 Copyright (c) 2024 SmartFlowAI
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/datasets/README.md
+++ b/datasets/README.md
@ -2,7 +2,7 @@
 * 数据集按用处分为两种类型：**General** 和 **Role-play**
 * 数据按格式分为两种类型：**QA** 和 **Conversation**
-* 数据汇总：General（**6个数据集**）；Role-play（**5个数据集**）
+* 数据汇总：General（**8个数据集**）；Role-play（**5个数据集**）
 ## 数据集类型
--- a/datasets/README_EN.md
+++ b/datasets/README_EN.md
@ -2,7 +2,7 @@
 * Category of dataset: **General** and **Role-play**
 * Type of data: **QA** and **Conversation**
-* Summary: General(**6 datasets**), Role-play(**5 datasets**)
+* Summary: General(**8 datasets**), Role-play(**5 datasets**)
 ## Category
 * **General**: generic dataset, including psychological Knowledge, counseling technology, etc.
--- a/datasets/processed/Book_QA_Process.md
+++ b/datasets/processed/Book_QA_Process.md
@ -1,8 +1,15 @@
-## 一共有两个 .py 文件，分别为Book_QA_process_Step_1.py和Book_QA_process_Step_2.py
+# Book_QA_process
 共两个python文件，分别为Book_QA_process_Step_1.py和Book_QA_process_Step_2.py
 ### Book_QA_process_Step_1.py
-    该代码是将我们生成的QA对jsonl数据转换为json格式
+
 * 该代码是将我们生成的QA对jsonl数据转换为json格式
 ### Book_QA_process_Step_2.py
-    该代码是将第一步生成的json格式数据转化为可用于指令微调的数据格式，并添加system，即：
+* 该代码是将第一步生成的json格式数据转化为可用于指令微调的数据格式，并添加system，即：
  ```json
    {
        "conversation": [
            {
@ -12,3 +19,4 @@
            }
        ]
    }
 ```
--- a/datasets/processed/ruozhiba_raw_data_process.py
+++ b/datasets/processed/ruozhiba_raw_data_process.py
@ -2,7 +2,7 @@ import json
 # 打开JSON文件并读取其内容
-file_name = 'ruozhiba_raw.jsonl' 
+file_name = '../ruozhiba_raw.jsonl' 
 # with open(f'data/{file_name}', 'r', encoding='utf-8') as file:
 #     data = json.load(file)
--- a/datasets/processed/split_dataset.py
+++ b/datasets/processed/split_dataset.py
--- a/datasets/processed/split_shuffle.py
+++ b/datasets/processed/split_shuffle.py
--- a/quick_start/Baby_EmoLLM.ipynb
+++ b/quick_start/Baby_EmoLLM.ipynb
--- a/quick_start/quick_start.md
+++ b/quick_start/quick_start.md
--- a/quick_start/quick_start_EN.md
+++ b/quick_start/quick_start_EN.md