Update README.md

This commit is contained in:
MING_X 2024-05-09 02:18:07 +08:00
parent 6c870d350d
commit bee025ca0f
12 changed files with 34 additions and 47 deletions

View File

@ -172,12 +172,12 @@
- [🔗框架图](#框架图)
- [目录](#目录)
- [开发前的配置要求](#开发前的配置要求)
- [**使用指南**](#使用指南)
- [使用指南](#使用指南)
- [🍪快速体验](#快速体验)
- [📌数据构建](#数据构建)
- [🎨微调指南](#微调指南)
- [🔧部署指南](#部署指南)
- [⚙RAG(检索增强生成)Pipeline](#rag检索增强生成pipeline)
- [⚙RAG(检索增强生成)](#rag检索增强生成)
- [使用到的框架](#使用到的框架)
- [如何参与本项目](#如何参与本项目)
- [作者(排名不分先后)](#作者排名不分先后)
@ -192,7 +192,7 @@
- 硬件A100 40G仅针对InternLM2_7B_chat+qlora微调+deepspeed zero2优化
###### **使用指南**
###### 使用指南
1. Clone the repo
@ -211,7 +211,8 @@ git clone https://github.com/SmartFlowAI/EmoLLM.git
### 🍪快速体验
- 请阅读[快速体验](docs/quick_start.md)查阅
- 请阅读[快速体验](quick_start/quick_start.md)查阅
- 快速上手:[Baby EmoLLM](quick_start/Baby_EmoLLM.ipynb)
### 📌数据构建
@ -229,9 +230,9 @@ git clone https://github.com/SmartFlowAI/EmoLLM.git
- Demo部署详见[部署指南](demo/README.md)
- 基于[LMDeploy](https://github.com/InternLM/lmdeploy/)的量化部署:详见[deploy](./deploy/lmdeploy.md)
### ⚙RAG(检索增强生成)Pipeline
### ⚙RAG(检索增强生成)
- 详见[RAG](./rag/)
- 详见[RAG](rag/README.md)
<details>
<summary>更多详情</summary>
@ -307,11 +308,10 @@ git clone https://github.com/SmartFlowAI/EmoLLM.git
### 特别鸣谢
- [Sanbu](https://github.com/sanbuphy)
- [上海人工智能实验室](https://www.shlab.org.cn/)
- [闻星大佬(小助手)](https://github.com/vansin)
- [扫地升(公众号宣传)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
- [闻星(浦语小助手)](https://github.com/vansin)
- 阿布(北大心理学硕士)
- [Sanbu](https://github.com/sanbuphy)
- [HatBoy](https://github.com/hatboy)
<!-- links -->

View File

@ -173,12 +173,12 @@ The Model aims to fully understand and promote the mental health of individuals,
- [Roadmap](#roadmap)
- [Contents](#contents)
- [Pre-development Configuration Requirements.](#pre-development-configuration-requirements)
- [**User Guide**](#user-guide)
- [User Guide](#user-guide)
- [🍪Quick start](#quick-start)
- [📌Data Construction](#data-construction)
- [🎨Fine-tuning Guide](#fine-tuning-guide)
- [🔧Deployment Guide](#deployment-guide)
- [⚙RAG (Retrieval Augmented Generation) Pipeline](#rag-retrieval-augmented-generation-pipeline)
- [⚙RAG (Retrieval Augmented Generation)](#rag-retrieval-augmented-generation)
- [Frameworks Used](#frameworks-used)
- [How to participate in this project](#how-to-participate-in-this-project)
- [Version control](#version-control)
@ -193,7 +193,7 @@ The Model aims to fully understand and promote the mental health of individuals,
- A100 40G (specifically for InternLM2_7B_chat + qlora fine-tuning + deepspeed zero2 optimization)
###### **User Guide**
###### User Guide
1. Clone the repo
@ -211,7 +211,8 @@ git clone https://github.com/SmartFlowAI/EmoLLM.git
### 🍪Quick start
- Please read [Quick Start](docs/quick_start_EN.md) to see.
- Please read [Quick Start](quick_start/quick_start_EN.md) to see.
- Quick coding: [Baby EmoLLM](quick_start/Baby_EmoLLM.ipynb)
### 📌Data Construction
@ -228,9 +229,9 @@ For details, see the [fine-tuning guide](xtuner_config/README_EN.md)
- Demo deployment: see [deployment guide](./demo/README_EN.md) for details.
- Quantitative deployment based on [LMDeploy](https://github.com/InternLM/lmdeploy/): see [deploy](./deploy/lmdeploy_EN.md)
### ⚙RAG (Retrieval Augmented Generation) Pipeline
### ⚙RAG (Retrieval Augmented Generation)
- See [RAG](./rag/)
- See [RAG](rag/README_EN.md)
<details>
<summary>Additional Details</summary>
@ -297,11 +298,10 @@ The project is licensed under the MIT License. Please refer to the details
### Acknowledgments
- [Sanbu](https://github.com/sanbuphy)
- [Shanghai Artificial Intelligence Laboratory](https://www.shlab.org.cn/)
- [Vanin](https://github.com/vansin)
- [Bloom up (WeChat Official Account Promotion)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
- Abu (M.A. in Psychology, Peking University)
- [Vansin](https://github.com/vansin)
- A.bu (M.A. in Psychology, Peking University)
- [Sanbuphy](https://github.com/sanbuphy)
- [HatBoy](https://github.com/hatboy)
<!-- links -->

View File

@ -1,21 +0,0 @@
MIT License
Copyright (c) 2024 SmartFlowAI
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@ -2,7 +2,7 @@
* 数据集按用处分为两种类型:**General** 和 **Role-play**
* 数据按格式分为两种类型:**QA** 和 **Conversation**
* 数据汇总General**6个数据集**Role-play**5个数据集**
* 数据汇总General**8个数据集**Role-play**5个数据集**
## 数据集类型

View File

@ -2,7 +2,7 @@
* Category of dataset: **General** and **Role-play**
* Type of data: **QA** and **Conversation**
* Summary: General(**6 datasets**), Role-play(**5 datasets**)
* Summary: General(**8 datasets**), Role-play(**5 datasets**)
## Category
* **General**: generic dataset, including psychological Knowledge, counseling technology, etc.

View File

@ -1,8 +1,15 @@
## 一共有两个 .py 文件分别为Book_QA_process_Step_1.py和Book_QA_process_Step_2.py
# Book_QA_process
共两个python文件分别为Book_QA_process_Step_1.py和Book_QA_process_Step_2.py
### Book_QA_process_Step_1.py
该代码是将我们生成的QA对jsonl数据转换为json格式
* 该代码是将我们生成的QA对jsonl数据转换为json格式
### Book_QA_process_Step_2.py
该代码是将第一步生成的json格式数据转化为可用于指令微调的数据格式并添加system
* 该代码是将第一步生成的json格式数据转化为可用于指令微调的数据格式并添加system
```json
{
"conversation": [
{
@ -11,4 +18,5 @@
"output": "Answer"
}
]
}
}
```

View File

@ -2,7 +2,7 @@ import json
# 打开JSON文件并读取其内容
file_name = 'ruozhiba_raw.jsonl'
file_name = '../ruozhiba_raw.jsonl'
# with open(f'data/{file_name}', 'r', encoding='utf-8') as file:
# data = json.load(file)