This commit is contained in:
Yicong 2024-04-11 10:29:27 +08:00
commit edaac7e256
20 changed files with 76356 additions and 811 deletions

View File

@ -48,7 +48,7 @@
| :-------------------: | :------: | :---: | | :-------------------: | :------: | :---: |
| InternLM2_7B_chat | QLORA | | | InternLM2_7B_chat | QLORA | |
| InternLM2_7B_chat | 全量微调 | | | InternLM2_7B_chat | 全量微调 | |
| InternLM2_7B_base | QLORA | | | InternLM2_7B_base | QLORA | [internlm2_7b_base_qlora_e10_M_1e4_32_64.py](./xtuner_config/internlm2_7b_base_qlora_e10_M_1e4_32_64.py) |
| InternLM2_1_8B_chat | 全量微调 | | | InternLM2_1_8B_chat | 全量微调 | |
| InternLM2_20B_chat | LORA | | | InternLM2_20B_chat | LORA | |
| Qwen_7b_chat | QLORA | | | Qwen_7b_chat | QLORA | |
@ -104,9 +104,10 @@
</table> </table>
### 🎇最近更新 ### 🎇最近更新
- 【2024.4.2】在 Huggingface 上传[老母亲心理咨询师](https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main) - 【2024.4.2】在 Huggingface 上传[老母亲心理咨询师](https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main)
- 【2024.3.25】在百度飞桨平台发布[爹系男友心理咨询师](https://aistudio.baidu.com/community/app/68787) - 【2024.3.25】在百度飞桨平台发布[爹系男友心理咨询师](https://aistudio.baidu.com/community/app/68787)
- 【2024.3.24】在OpenXLab和ModelScope平台发布InternLM2-Base-7B QLoRA微调模型, 具体请查看[InternLM2-Base-7B QLoRA](./xtuner_config/README_internlm2_7b_base_qlora.md) - 【2024.3.24】在**OpenXLab****ModelScope**平台发布**InternLM2-Base-7B QLoRA微调模型**, 具体请查看[**InternLM2-Base-7B QLoRA**](./xtuner_config/README_internlm2_7b_base_qlora.md)
- 【2024.3.12】在百度飞桨平台发布[艾薇](https://aistudio.baidu.com/community/app/63335) - 【2024.3.12】在百度飞桨平台发布[艾薇](https://aistudio.baidu.com/community/app/63335)
- 【2024.3.11】 **EmoLLM V2.0 相比 EmoLLM V1.0 全面提升,已超越 Role-playing ChatGPT 在心理咨询任务上的能力!**[点击体验EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0),更新[数据集统计及详细信息](./datasets/)、[路线图](./assets/Roadmap_ZH.png) - 【2024.3.11】 **EmoLLM V2.0 相比 EmoLLM V1.0 全面提升,已超越 Role-playing ChatGPT 在心理咨询任务上的能力!**[点击体验EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0),更新[数据集统计及详细信息](./datasets/)、[路线图](./assets/Roadmap_ZH.png)
- 【2024.3.9】 新增并发功能加速 [QA 对生成](./scripts/qa_generation/)、[RAG pipeline](./rag/) - 【2024.3.9】 新增并发功能加速 [QA 对生成](./scripts/qa_generation/)、[RAG pipeline](./rag/)
@ -156,7 +157,7 @@
<img src="assets/Roadmap_ZH.png" alt="Roadmap_ZH"> <img src="assets/Roadmap_ZH.png" alt="Roadmap_ZH">
</a> </a>
### 🎯框架图 ### 🔗框架图
<p align="center"> <p align="center">
<a href="https://github.com/SmartFlowAI/EmoLLM/"> <a href="https://github.com/SmartFlowAI/EmoLLM/">
@ -169,14 +170,15 @@
- [🎇最近更新](#最近更新) - [🎇最近更新](#最近更新)
- [🏆荣誉栏](#荣誉栏) - [🏆荣誉栏](#荣誉栏)
- [🎯路线图](#路线图) - [🎯路线图](#路线图)
- [🎯框架图](#框架图) - [🔗框架图](#框架图)
- [目录](#目录) - [目录](#目录)
- [开发前的配置要求](#开发前的配置要求) - [开发前的配置要求](#开发前的配置要求)
- [**使用指南**](#使用指南) - [**使用指南**](#使用指南)
- [数据构建](#数据构建) - [🍪快速体验](#快速体验)
- [微调指南](#微调指南) - [📌数据构建](#数据构建)
- [部署指南](#部署指南) - [🎨微调指南](#微调指南)
- [RAG(检索增强生成)Pipeline](#rag检索增强生成pipeline) - [🔧部署指南](#部署指南)
- [⚙RAG(检索增强生成)Pipeline](#rag检索增强生成pipeline)
- [使用到的框架](#使用到的框架) - [使用到的框架](#使用到的框架)
- [如何参与本项目](#如何参与本项目) - [如何参与本项目](#如何参与本项目)
- [作者(排名不分先后)](#作者排名不分先后) - [作者(排名不分先后)](#作者排名不分先后)
@ -200,28 +202,35 @@ git clone https://github.com/SmartFlowAI/EmoLLM.git
``` ```
2. 依次阅读或者选择感兴趣的部分阅读: 2. 依次阅读或者选择感兴趣的部分阅读:
- [快速体验](#快速体验)
- [数据构建](#数据构建) - [数据构建](#数据构建)
- [微调指南](#微调指南) - [微调指南](#微调指南)
- [部署指南](#部署指南) - [部署指南](#部署指南)
- [RAG](#rag检索增强生成pipeline) - [RAG](#rag检索增强生成pipeline)
- 查看更多详情 - 查看更多详情
### 数据构建
### 🍪快速体验
- 请阅读[快速体验](docs/quick_start.md)查阅
### 📌数据构建
- 请阅读[数据构建指南](generate_data/tutorial.md)查阅 - 请阅读[数据构建指南](generate_data/tutorial.md)查阅
- 微调用到的数据集见[datasets](datasets/data.json) - 微调用到的数据集见[datasets](datasets/data.json)
### 微调指南 ### 🎨微调指南
详见[微调指南](xtuner_config/README.md) 详见[微调指南](xtuner_config/README.md)
### 部署指南 ### 🔧部署指南
- Demo部署详见[部署指南](demo/README.md) - Demo部署详见[部署指南](demo/README.md)
- 基于[LMDeploy](https://github.com/InternLM/lmdeploy/)的量化部署:详见[deploy](./deploy/lmdeploy.md) - 基于[LMDeploy](https://github.com/InternLM/lmdeploy/)的量化部署:详见[deploy](./deploy/lmdeploy.md)
### RAG(检索增强生成)Pipeline ### RAG(检索增强生成)Pipeline
- 详见[RAG](./rag/) - 详见[RAG](./rag/)
@ -304,6 +313,7 @@ git clone https://github.com/SmartFlowAI/EmoLLM.git
- [闻星大佬(小助手)](https://github.com/vansin) - [闻星大佬(小助手)](https://github.com/vansin)
- [扫地升(公众号宣传)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) - [扫地升(公众号宣传)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
- 阿布(北大心理学硕士) - 阿布(北大心理学硕士)
- [HatBoy](https://github.com/hatboy)
<!-- links --> <!-- links -->

View File

@ -50,7 +50,7 @@
| :-------------------: | :--------------: | :---: | | :-------------------: | :--------------: | :---: |
| InternLM2_7B_chat | QLORA | | | InternLM2_7B_chat | QLORA | |
| InternLM2_7B_chat | full fine-tuning | | | InternLM2_7B_chat | full fine-tuning | |
| InternLM2_7B_base | QLORA | | | InternLM2_7B_base | QLORA |[internlm2_7b_base_qlora_e10_M_1e4_32_64.py](./xtuner_config/internlm2_7b_base_qlora_e10_M_1e4_32_64.py)|
| InternLM2_1_8B_chat | full fine-tuning | | | InternLM2_1_8B_chat | full fine-tuning | |
| InternLM2_20B_chat | LORA | | | InternLM2_20B_chat | LORA | |
| Qwen_7b_chat | QLORA | | | Qwen_7b_chat | QLORA | |
@ -109,7 +109,7 @@ The Model aims to fully understand and promote the mental health of individuals,
### Recent Updates ### Recent Updates
- 【2024.3.25】 [Mother-like Therapist] is released on Huggingface (https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main) - 【2024.3.25】 [Mother-like Therapist] is released on Huggingface (https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main)
- 【2024.3.25】 [Daddy-like Boy-Friend] is released on Baidu Paddle-Paddle AI Studio Platform (https://aistudio.baidu.com/community/app/68787) - 【2024.3.25】 [Daddy-like Boy-Friend] is released on Baidu Paddle-Paddle AI Studio Platform (https://aistudio.baidu.com/community/app/68787)
- 【2024.3.24】 The InternLM2-Base-7B QLoRA fine-tuned model has been released on the OpenXLab and ModelScope platforms. For more details, please refer to [InternLM2-Base-7B QLoRA](./xtuner_config/README_internlm2_7b_base_qlora.md). - 【2024.3.24】 The **InternLM2-Base-7B QLoRA fine-tuned model** has been released on the **OpenXLab** and **ModelScope** platforms. For more details, please refer to [**InternLM2-Base-7B QLoRA**](./xtuner_config/README_internlm2_7b_base_qlora.md).
- 【2024.3.12】 [aiwei] is released on Baidu Paddle-Paddle AI Studio Platform (https://aistudio.baidu.com/community/app/63335) - 【2024.3.12】 [aiwei] is released on Baidu Paddle-Paddle AI Studio Platform (https://aistudio.baidu.com/community/app/63335)
- 【2024.3.11】 **EmoLLM V2.0 is greatly improved in all scores compared to EmoLLM V1.0. Surpasses the performance of Role-playing ChatGPT on counseling tasks!** [Click to experience EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0), update [dataset statistics and details](./datasets/), [Roadmap](./assets/Roadmap_ZH.png) - 【2024.3.11】 **EmoLLM V2.0 is greatly improved in all scores compared to EmoLLM V1.0. Surpasses the performance of Role-playing ChatGPT on counseling tasks!** [Click to experience EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0), update [dataset statistics and details](./datasets/), [Roadmap](./assets/Roadmap_ZH.png)
- 【2024.3.9】 Add concurrency acceleration [QA pair generation](./scripts/qa_generation/), [RAG pipeline](./rag/) - 【2024.3.9】 Add concurrency acceleration [QA pair generation](./scripts/qa_generation/), [RAG pipeline](./rag/)
@ -171,11 +171,11 @@ The Model aims to fully understand and promote the mental health of individuals,
- [Contents](#contents) - [Contents](#contents)
- [Pre-development Configuration Requirements.](#pre-development-configuration-requirements) - [Pre-development Configuration Requirements.](#pre-development-configuration-requirements)
- [**User Guide**](#user-guide) - [**User Guide**](#user-guide)
- [File Directory Explanation](#file-directory-explanation) - [🍪Quick start](#quick-start)
- [Data Construction](#data-construction) - [📌Data Construction](#data-construction)
- [Fine-tuning Guide](#fine-tuning-guide) - [🎨Fine-tuning Guide](#fine-tuning-guide)
- [Deployment Guide](#deployment-guide) - [🔧Deployment Guide](#deployment-guide)
- [RAG (Retrieval Augmented Generation) Pipeline](#rag-retrieval-augmented-generation-pipeline) - [RAG (Retrieval Augmented Generation) Pipeline](#rag-retrieval-augmented-generation-pipeline)
- [Frameworks Used](#frameworks-used) - [Frameworks Used](#frameworks-used)
- [How to participate in this project](#how-to-participate-in-this-project) - [How to participate in this project](#how-to-participate-in-this-project)
- [Version control](#version-control) - [Version control](#version-control)
@ -199,41 +199,33 @@ git clone https://github.com/SmartFlowAI/EmoLLM.git
``` ```
1. Read in sequence or read sections you're interested in 1. Read in sequence or read sections you're interested in
- [File Directory Explanation](#file-directory-explanation) - [Quick Start](#quick-start)
- [Data Construction](#data-construction) - [Data Construction](#data-construction)
- [Fine-tuning Guide](#fine-tuning-guide) - [Fine-tuning Guide](#fine-tuning-guide)
- [Deployment Guide](#deployment-guide) - [Deployment Guide](#deployment-guide)
- [RAG](#rag-retrieval-augmented-generation-pipeline)
- View More Details - View More Details
### File Directory Explanation
``` ### 🍪Quick start
├─assets: Image Resources - Please read [Quick Start](docs/quick_start_EN.md) to see.
├─datasets: Dataset
├─demo: demo scripts
├─generate_data: Data Generation Guide
│ └─xinghuo
├─scripts: Some Available Tools
└─xtuner_configFine-tuning Guide
└─images
```
### Data Construction ### 📌Data Construction
- Please read the [Data Construction Guide ](generate_data/tutorial_EN.md)for reference. - Please read the [Data Construction Guide ](generate_data/tutorial_EN.md)for reference.
- The dataset used for this fine-tuning can be found at [datasets](datasets/data.json) - The dataset used for this fine-tuning can be found at [datasets](datasets/data.json)
### Fine-tuning Guide ### 🎨Fine-tuning Guide
For details, see the [fine-tuning guide](xtuner_config/README_EN.md) For details, see the [fine-tuning guide](xtuner_config/README_EN.md)
### Deployment Guide ### 🔧Deployment Guide
- Demo deployment: see [deployment guide](./demo/README_EN.md) for details. - Demo deployment: see [deployment guide](./demo/README_EN.md) for details.
- Quantitative deployment based on [LMDeploy](https://github.com/InternLM/lmdeploy/): see [deploy](./deploy/lmdeploy_EN.md) - Quantitative deployment based on [LMDeploy](https://github.com/InternLM/lmdeploy/): see [deploy](./deploy/lmdeploy_EN.md)
### RAG (Retrieval Augmented Generation) Pipeline ### RAG (Retrieval Augmented Generation) Pipeline
- See [RAG](./rag/) - See [RAG](./rag/)
@ -307,6 +299,7 @@ The project is licensed under the MIT License. Please refer to the details
- [Vanin](https://github.com/vansin) - [Vanin](https://github.com/vansin)
- [Bloom up (WeChat Official Account Promotion)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) - [Bloom up (WeChat Official Account Promotion)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
- Abu (M.A. in Psychology, Peking University) - Abu (M.A. in Psychology, Peking University)
- [HatBoy](https://github.com/hatboy)
<!-- links --> <!-- links -->

4
app.py
View File

@ -1,3 +1,3 @@
import os import os
# os.system('streamlit run web_internlm2.py --server.address=0.0.0.0 --server.port 7860') os.system('streamlit run web_internlm2.py --server.address=0.0.0.0 --server.port 7860')
os.system('streamlit run web_demo-aiwei.py --server.address=0.0.0.0 --server.port 7860') #os.system('streamlit run web_demo-aiwei.py --server.address=0.0.0.0 --server.port 7860')

BIN
assets/model.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 296 KiB

View File

@ -2,7 +2,7 @@
* 数据集按用处分为两种类型:**General** 和 **Role-play** * 数据集按用处分为两种类型:**General** 和 **Role-play**
* 数据按格式分为两种类型:**QA** 和 **Conversation** * 数据按格式分为两种类型:**QA** 和 **Conversation**
* 数据汇总General**6个数据集**Role-play**3个数据集** * 数据汇总General**6个数据集**Role-play**5个数据集**
## 数据集类型 ## 数据集类型
@ -19,32 +19,36 @@
| Category | Dataset | Type | Total | | Category | Dataset | Type | Total |
| :---------: | :-------------------: | :----------: | :-----: | | :---------: | :-------------------: | :----------: | :-----: |
| *General* | data | Conversation | 5600+ | | *General* | data | Conversation | 5600+ |
| *General* | data_pro | Conversation | 36500+ | | *General* | data_pro | Conversation | 36,500+ |
| *General* | multi_turn_dataset_1 | Conversation | 36,000+ | | *General* | multi_turn_dataset_1 | Conversation | 36,000+ |
| *General* | multi_turn_dataset_2 | Conversation | 27,000+ | | *General* | multi_turn_dataset_2 | Conversation | 27,000+ |
| *General* | single_turn_dataset_1 | QA | 14000+ | | *General* | single_turn_dataset_1 | QA | 14,000+ |
| *General* | single_turn_dataset_2 | QA | 18300+ | | *General* | single_turn_dataset_2 | QA | 18,300+ |
| *Role-play* | aiwei | Conversation | 4000+ | | *Role-play* | aiwei | Conversation | 4000+ |
| *Role-play* | SoulStar | QA | 11200+ | | *Role-play* | SoulStar | QA | 11,200+ |
| *Role-play* | tiangou | Conversation | 3900+ | | *Role-play* | tiangou | Conversation | 3900+ |
| *Role-play* | mother | Conversation | 40,300+ |
| *Role-play* | scientist | Conversation | 28,400+ |
| …… | …… | …… | …… | | …… | …… | …… | …… |
## 数据集来源 ## 数据集来源
### **General** ### **General**
* 数据集 data 来自本项目 * 数据集 `data` 来自本项目
* 数据集 data_pro 来自本项目 * 数据集 `data_pro` 来自本项目
* 数据集 multi_turn_dataset_1 来源 [Smile](https://github.com/qiuhuachuan/smile) * 数据集 `multi_turn_dataset_1` 来源 [Smile](https://github.com/qiuhuachuan/smile)
* 数据集 multi_turn_dataset_2 来源 [CPsyCounD](https://github.com/CAS-SIAT-XinHai/CPsyCoun) * 数据集 `multi_turn_dataset_2` 来源 [CPsyCounD](https://github.com/CAS-SIAT-XinHai/CPsyCoun)
* 数据集 single_turn_dataset_1 来自本项目 * 数据集 `single_turn_dataset_1` 来自本项目
* 数据集 single_turn_dataset_2 来自本项目 * 数据集 `single_turn_dataset_2` 来自本项目
### **Role-play** ### **Role-play**
* 数据集 aiwei 来自本项目 * 数据集 `aiwei` 来自本项目
* 数据集 tiangou 来自本项目 * 数据集 `tiangou` 来自本项目
* 数据集 SoulStar 来源 [SoulStar](https://github.com/Nobody-ML/SoulStar) * 数据集 `SoulStar` 来源 [SoulStar](https://github.com/Nobody-ML/SoulStar)
* 数据集 `mother` 来自本项目
* 数据集 `scientist` 来自本项目
## 数据集去重 ## 数据集去重

View File

@ -2,7 +2,7 @@
* Category of dataset: **General** and **Role-play** * Category of dataset: **General** and **Role-play**
* Type of data: **QA** and **Conversation** * Type of data: **QA** and **Conversation**
* Summary: General(**6 datasets**), Role-play(**3 datasets**) * Summary: General(**6 datasets**), Role-play(**5 datasets**)
## Category ## Category
* **General**: generic dataset, including psychological Knowledge, counseling technology, etc. * **General**: generic dataset, including psychological Knowledge, counseling technology, etc.
@ -17,14 +17,16 @@
| Category | Dataset | Type | Total | | Category | Dataset | Type | Total |
| :---------: | :-------------------: | :----------: | :-----: | | :---------: | :-------------------: | :----------: | :-----: |
| *General* | data | Conversation | 5600+ | | *General* | data | Conversation | 5600+ |
| *General* | data_pro | Conversation | 36500+ | | *General* | data_pro | Conversation | 36,500+ |
| *General* | multi_turn_dataset_1 | Conversation | 36,000+ | | *General* | multi_turn_dataset_1 | Conversation | 36,000+ |
| *General* | multi_turn_dataset_2 | Conversation | 27,000+ | | *General* | multi_turn_dataset_2 | Conversation | 27,000+ |
| *General* | single_turn_dataset_1 | QA | 14000+ | | *General* | single_turn_dataset_1 | QA | 14,000+ |
| *General* | single_turn_dataset_2 | QA | 18300+ | | *General* | single_turn_dataset_2 | QA | 18,300+ |
| *Role-play* | aiwei | Conversation | 4000+ | | *Role-play* | aiwei | Conversation | 4000+ |
| *Role-play* | SoulStar | QA | 11200+ | | *Role-play* | SoulStar | QA | 11,200+ |
| *Role-play* | tiangou | Conversation | 3900+ | | *Role-play* | tiangou | Conversation | 3900+ |
| *Role-play* | mother | Conversation | 40,300+ |
| *Role-play* | scientist | Conversation | 28,400+ |
| …… | …… | …… | …… | | …… | …… | …… | …… |
@ -41,6 +43,8 @@
* dataset `aiwei` from this repo * dataset `aiwei` from this repo
* dataset `tiangou` from this repo * dataset `tiangou` from this repo
* dataset `SoulStar` from [SoulStar](https://github.com/Nobody-ML/SoulStar) * dataset `SoulStar` from [SoulStar](https://github.com/Nobody-ML/SoulStar)
* dataset `mother` from this repo
* dataset `scientist` from this repo
**Dataset Deduplication** **Dataset Deduplication**
Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced via adjusting the threshold. Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced via adjusting the threshold.

75451
datasets/mother_v1.json Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1 +0,0 @@

View File

@ -1,44 +1,7 @@
# EmoLLM 部署指南 # EmoLLM 部署指南
## 本地部署 ## 本地部署
- 详见[快速体验](../docs/quick_start.md)
- Clone repo
```bash
git clone https://github.com/aJupyter/EmoLLM.git
```
- 安装依赖
```bash
pip install -r requirements.txt
```
- 下载模型
- 模型权重https://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model
- 通过 openxlab.model.download 下载,详情请看 [cli_internlm2](./cli_internlm2.py)
```bash
from openxlab.model import download
download(model_repo='jujimeizuo/EmoLLM_Model', output='model')
```
- 可以手动下载,放在 `./model` 目录下,然后把上面的代码删掉
- cli_demo
```bash
python ./demo/cli_internlm2.py
```
- web_demo
```bash
python ./app.py
```
如果在服务器上部署,需要配置本地端口映射
## OpenXLab 上部署 ## OpenXLab 上部署

View File

@ -1,44 +1,7 @@
# Deploying Guide for EmoLLM # Deploying Guide for EmoLLM
## Local Deployment ## Local Deployment
- Please read [Quick Start](../docs/quick_start_EN.md) to see.
- Clone repo
```bash
git clone https://github.com/aJupyter/EmoLLM.git
```
- Install dependencies
```bash
pip install -r requirements.txt
```
- Download the model
- Model weightshttps://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model
- Download via openxlab.model.download, see [cli_internlm2](./cli_internlm2.py) for details
```bash
from openxlab.model import download
download(model_repo='jujimeizuo/EmoLLM_Model', output='model')
```
- You can also download manually and place it in the `./model` directory, then delete the above code.
- cli_demo
```bash
python ./demo/cli_internlm2.py
```
- web_demo
```bash
python ./app.py
```
If deploying on a server, you need to configure local port mapping.
## Deploy on OpenXLab ## Deploy on OpenXLab

37
docs/quick_start.md Normal file
View File

@ -0,0 +1,37 @@
### 1、部署环境
- 操作系统Ubuntu 22.04.4 LTS
- CPUIntel (R) Xeon (R) CPU E 5-265032G在线 GPU 服务器)
- 显卡NVIDIA RTX 4060Ti 16GNVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2
- Python 3.11.5
### 2、默认部署步骤
- 1、Clone 代码或者手动下载代码放置服务器:
```
git clone https://github.com/SmartFlowAI/EmoLLM.git
```
- 2、安装 Python 依赖库:
```
# cd EmoLLM
# pip install -r requirements.txt
```
- 3、下载模型文件可手动下载也可运行download_model.py 脚本自动下载模型文件。
- 3.1、自动下载模型文件,运行脚本:
```
# python download_model.py <model_repo>
# 运行 web_demo-aiwei.py 脚本对应的模型仓库地址是 ajupyter/EmoLLM_aiwei
# python download_model.py ajupyter/EmoLLM_aiwei
# 运行 web_internlm2.py 脚本对应的模型仓库地址是 jujimeizuo/EmoLLM_Model
# python download_model.py jujimeizuo/EmoLLM_Model
# 也可用该脚本自动下载其他模型。该脚本当前仅支持openxlab平台的模型自动下载其他平台的模型需要手动下载。下载成功后可看到EmoLLM目录下新增 model 目录,即模型文件目录。
```
- 3.2、手动下载模型文件目录,去 openxlab、Huggingface等平台下载完整的模型目录文件将全部文件放在 `EmoLLM/model` 目录下。注意,模型文件目录打包下载时并不会下载 LFS 文件(如 pytorch_model-00001-of-00008.bin需要挨个下载完整的 LFS 文件。
![model](../assets/model.png)
- 4、运行脚本app.py仅用于调用web_demo-aiwei.py 或者 web_internlm2.py 文件想运行哪一个脚本就下载对应脚本的模型文件然后在app.py中注释另一个脚本即可。然后运行脚本
```
python app.py
```
5、运行 app.py 后通过浏览器访问: http://0.0.0.0:7860 地址访问模型 web 页面。可修改 app.py 文件修改 web 页面访问端口,即可正常体验该模型。如果在服务器上部署,需要配置本地端口映射。
6、替换模型EmoLLM 提供了多种开源模型,分别上传至 openxlab、Huggingface 平台,有[爹系男友心理咨询师 ](https://openxlab.org.cn/models/detail/chg0901/EmoLLM_Daddy-like_BF)、[老母亲心理咨询师](https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main)、[温柔御姐心理医生艾薇](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei)等角色,有 EmoLLM_internlm2_7b_full、EmoLLM-InternLM7B-base-10e 等多个模型可选择。可重复步骤 3、4手动或自动下载相关模型放在 `EmoLLM/model` 目录下,然后运行体验。

37
docs/quick_start_EN.md Normal file
View File

@ -0,0 +1,37 @@
### 1. Deployment Environment
- Operating system: Ubuntu 22.04.4 LTS
- CPU: Intel (R) Xeon (R) CPU E 5-2650, 32G
- Graphics card: NVIDIA RTX 4060Ti 16G, NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2
- Python 3.11.5
### 2. Default Deployment Steps
- 1. Clone the code or manually download the code and place it on the server:
```
git clone https://github.com/SmartFlowAI/EmoLLM.git
```
- 2. Install Python dependencies:
```
# cd EmoLLM
# pip install -r requirements.txt
```
- 3. Download the model files, either manually or by running the download_model.py script.
- 3.1. Automatically download the model file and run the script:
```
# python download_model.py <model_repo>
# Run the web_demo-aiwei.py script to run the model repository at ajupyter/EmoLLM_aiwei, ie:
# python download_model.py ajupyter/EmoLLM_aiwei
# Run the web_internlm2.py script to run the model repository at jujimeizuo/EmoLLM_Mode, ie:
# python download_model.py jujimeizuo/EmoLLM_Model
# This script can also be used to automatically download other models. This script only supports automatic download of models from openxlab platform, models from other platforms need to be downloaded manually. After successful download, you can see a new model directory under EmoLLM directory, i.e. the model file directory.
```
- 3.2. To download the model file directory manually, go to openxlab, Huggingface, etc. to download the complete model directory file, and put all the files in the `EmoLLM/model` directory. Note that the LFS file (e.g. pytorch_model-00001-of-00008.bin) is not downloaded when the model file directory is packaged for download, so you need to download the full LFS file one by one.
![model](../assets/model.png)
- 4. Run the script, app.py is only used to call web_demo-aiwei.py or web_internlm2.py file, you can download the model file of the corresponding script for whichever script you want to run, and then comment the other script in app.py. Then run the script:
```
python app.py
```
- 5. After running app.py, you can access the model's web page through your browser at the following address: http://0.0.0.0:7860. You can modify the app.py file to change the web page access port to experience the model normally. If you are deploying on a server, you need to configure local port mapping.
- 6. Use of other models, EmoLLM offers several versions of the open source model, uploaded to openxlab、Huggingface, and other platforms. There are roles such as the [father's boyfriend counselor](https://openxlab.org.cn/models/detail/chg0901/EmoLLM_Daddy-like_BF), [the old mother's counselor](https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main), and [the gentle royal psychiatrist](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei). There are several models to choose from such as EmoLLM_internlm2_7b_full, EmoLLM-InternLM7B-base-10e and so on. Repeat steps 3 and 4 to manually or automatically download the model in the `EmoLLM/model` directory and run the experience.

63
download_model.py Normal file
View File

@ -0,0 +1,63 @@
import requests
import os
import sys
import shutil
import zipfile
from openxlab.model import download
"""
Automatic download of model files from openxlab.
Currently only support openxlab automatic download, other platform model files need to be downloaded manually.
"""
if len(sys.argv) == 2:
model_repo = sys.argv[1]
else:
print("Usage: python download_model.py <model_repo>")
print("Example: python download_model.py jujimeizuo/EmoLLM_Model")
exit()
dir_name = "model"
if os.path.isdir(dir_name):
print("model file exist")
exit(0)
download_url = "https://code.openxlab.org.cn/api/v1/repos/{}/archive/main.zip".format(model_repo)
output_filename = "model_main.zip"
# download model file
response = requests.get(download_url, stream=True)
if response.status_code == 200:
with open(output_filename, "wb") as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
print(f"Successfully downloaded model file")
else:
print(f"Failed to download the model file. HTTP status code: {response.status_code}")
exit()
if not os.path.isfile(output_filename):
raise FileNotFoundError(f"ZIP file '{output_filename}' not found in the current directory.")
temp_dir = f".{os.sep}temp_{os.path.splitext(os.path.basename(output_filename))[0]}"
os.makedirs(temp_dir, exist_ok=True)
with zipfile.ZipFile(output_filename, 'r') as zip_ref:
zip_ref.extractall(temp_dir)
top_level_dir = next(os.walk(temp_dir))[1][0]
source_dir = os.path.join(temp_dir, top_level_dir)
destination_dir = os.path.join(os.getcwd(), dir_name)
shutil.move(source_dir, destination_dir)
os.rmdir(temp_dir)
os.remove(output_filename)
download(model_repo='jujimeizuo/EmoLLM_Model', output='model')
print("Model bin file download complete")

View File

@ -20,4 +20,4 @@
|-------------------|-----------------------|-------------------|-----------------|---------| |-------------------|-----------------------|-------------------|-----------------|---------|
| InternLM2_7B_chat_qlora | 1.32 | 2.20 | 2.10 | 1.00 | | InternLM2_7B_chat_qlora | 1.32 | 2.20 | 2.10 | 1.00 |
| InternLM2_7B_chat_full | 1.40 | 2.45 | 2.24 | 1.00 | | InternLM2_7B_chat_full | 1.40 | 2.45 | 2.24 | 1.00 |
| InternLM2_20B_chat_lora | 1.42 | 2.39 | 2.22 | 1.00 |

View File

@ -19,3 +19,5 @@
| Model | Comprehensiveness | rofessionalism | Authenticity | Safety | | Model | Comprehensiveness | rofessionalism | Authenticity | Safety |
|-------------------|-----------------------|-------------------|-----------------|---------| |-------------------|-----------------------|-------------------|-----------------|---------|
| InternLM2_7B_chat_qlora | 1.32 | 2.20 | 2.10 | 1.00 | | InternLM2_7B_chat_qlora | 1.32 | 2.20 | 2.10 | 1.00 |
| InternLM2_7B_chat_full | 1.40 | 2.45 | 2.24 | 1.00 |
| InternLM2_20B_chat_lora | 1.42 | 2.39 | 2.22 | 1.00 |

View File

@ -8,3 +8,5 @@ transformers_stream_generator==0.0.4
openxlab openxlab
tiktoken tiktoken
einops einops
oss2
requests

View File

@ -9,6 +9,7 @@ Please run with the command `streamlit run path/to/web_demo.py --server.address=
Using `python path/to/web_demo.py` may cause unknown problems. Using `python path/to/web_demo.py` may cause unknown problems.
""" """
import copy import copy
import os
import warnings import warnings
from dataclasses import asdict, dataclass from dataclasses import asdict, dataclass
from typing import Callable, List, Optional from typing import Callable, List, Optional
@ -24,8 +25,10 @@ from openxlab.model import download
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)
download(model_repo='ajupyter/EmoLLM_aiwei', if not os.path.isdir("model"):
output='model') print("[ERROR] not find model dir")
exit(0)
@dataclass @dataclass
class GenerationConfig: class GenerationConfig:

View File

@ -9,6 +9,7 @@ Please run with the command `streamlit run path/to/web_demo.py --server.address=
Using `python path/to/web_demo.py` may cause unknown problems. Using `python path/to/web_demo.py` may cause unknown problems.
""" """
import copy import copy
import os
import warnings import warnings
from dataclasses import asdict, dataclass from dataclasses import asdict, dataclass
from typing import Callable, List, Optional from typing import Callable, List, Optional
@ -24,8 +25,9 @@ from openxlab.model import download
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)
download(model_repo='jujimeizuo/EmoLLM_Model', if not os.path.isdir("model"):
output='model') print("[ERROR] not find model dir")
exit(0)
@dataclass @dataclass
class GenerationConfig: class GenerationConfig:

View File

@ -2,25 +2,37 @@
## 模型基座与配置文件 ## 模型基座与配置文件
- 本项目在[**internlm2_7b_chat_qlora_e3**模型](./internlm2_7b_chat_qlora_e3.py)微调[指南](./README.md)的基础上,更新了对[**internlm2_7b_base_qlora_e3配置文件**](./internlm2_7b_base_qlora_e10_M_1e4_32_64.py)**模型**的微调。 - 本项目在XTuner项目所提供的[**internlm2_7b_chat_qlora_e3**模型配置文件](./internlm2_7b_chat_qlora_e3.py)和在[EmoLLM模型微调指南](./README.md)的基础上,创建和更新了对**InternLM2_7B_base模型**在[EmoLLM通用数据集](../datasets/README.md)上进行QLoRA微调训练配置文件详见[**internlm2_7b_base_qlora_e10_M_1e4_32_64.py**](./internlm2_7b_base_qlora_e10_M_1e4_32_64.py)。
- 为了用户可以根据自己不同的硬件配置进行复现和微调训练EmoLLM也提供了其他的配置文件以满足不同的配置需求。
- [internlm2_7b_base_qlora_e10_b8_16_32.py](./internlm2_7b_base_qlora_e10_b8_16_32.py)
- [internlm2_7b_base_qlora_e3_M_1e4_32_64.py](./internlm2_7b_base_qlora_e3_M_1e4_32_64.py)
## 模型公布和训练epoch数设置 ## 模型公布和训练epoch数设置
- 由于采用了合并后的数据集,我们对选用的internlm2_7b_base模型进行了**10 epoch**的训练读者可以根据训练过程中的输出和loss变化进行训练的终止和模型的挑选也可以采用更加专业的评估方法来对模型评测。 - 由于采用了合并后的数据集,我们对选用的InternLM2_7B_base模型进行了**10 epoch**的训练读者可以根据训练过程中的输出和loss变化进行训练的终止和模型的挑选也可以采用更加专业的评估方法来对模型评测。
- 在我们公布的internlm2_7b_base_qlora微调模型时也分别在OpenXLab和ModelScope中提供了两个不同的权重版本供用户使用和测试更多专业测评结果将会在近期更新 敬请期待。 - 在我们公布的InternLM2_7B_base QLoRA微调模型时也分别在OpenXLab和ModelScope中提供了两个不同的权重版本供用户使用和测试更多专业测评结果将会在近期更新敬请期待。
- **OpenXLab** - **OpenXLab**
- [5 epoch 模型](https://openxlab.org.cn/models/detail/chg0901/EmoLLM-InternLM7B-base) - [5 epoch 模型](https://openxlab.org.cn/models/detail/chg0901/EmoLLM-InternLM7B-base)
- [10 epoch 模型](https://openxlab.org.cn/models/detail/chg0901/EmoLLM-InternLM7B-base-10e) - [10 epoch 模型](https://openxlab.org.cn/models/detail/chg0901/EmoLLM-InternLM7B-base-10e)
- **ModelScope**
- [5 epoch 模型](https://www.modelscope.cn/models/chg0901/EmoLLM-InternLM7B-base/files)
- [10 epoch 模型](https://www.modelscope.cn/models/chg0901/EmoLLM-InternLM7B-base-10e/files)
- **ModelScope** - 目前EmoLLM团队已经采用**通用指标**评估了QLoRA微调训练的InternLM2_7B_base模型包括5 epoch 模型和10 epoch 模型结果如下表所示可以看到10 epoch QLoRA微调训练的InternLM2_7B_base模型通用指标已经超过其他模型我们将近期更新在心理咨询专业指标上的评测结果。更多评测详情请查看[通用测评结果页面General_evaluation.md](../evaluate/General_evaluation.md)和[测评目录README](../evaluate/README.md).
- [5 epoch 模型](https://www.modelscope.cn/models/chg0901/EmoLLM-InternLM7B-base/files)
- [10 epoch 模型](https://www.modelscope.cn/models/chg0901/EmoLLM-InternLM7B-base-10e/files) | Model | ROUGE-1 | ROUGE-2 | ROUGE-L | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 |
|----------|---------|---------|---------|---------|---------|---------|---------|
| Qwen1_5-0_5B-chat | 27.23% | 8.55% | 17.05% | 26.65% | 13.11% | 7.19% | 4.05% |
| InternLM2_7B_chat_qlora | 37.86% | 15.23% | 24.34% | 39.71% | 22.66% | 14.26% | 9.21% |
| InternLM2_7B_chat_full | 32.45% | 10.82% | 20.17% | 30.48% | 15.67% | 8.84% | 5.02% |
| InternLM2_7B_base_qlora_5epoch | 41.94% | 20.21% | 29.67% | 42.98% | 27.07% | 19.33% | 14.62% |
| **InternLM2_7B_base_qlora_10epoch** | **43.47%** | **22.06%** | **31.4%** | **44.81%** | **29.15%** | **21.44%** | **16.72%** |
### 超参数设置 ### 超参数设置
训练config设置详情请查看[**internlm2_7b_base_qlora_e3配置文件**](./internlm2_7b_base_qlora_e10_M_1e4_32_64.py),这里我们只列出了关键的超参数或者我们做过调整的超参数。 训练config设置详情请查看[**`internlm2_7b_base_qlora_e10_M_1e4_32_64.py`(配置文件)**](./internlm2_7b_base_qlora_e10_M_1e4_32_64.py),这里我们只列出了关键的超参数或者我们做过调整的超参数。
```python ```python
prompt_template = PROMPT_TEMPLATE.internlm2_chat prompt_template = PROMPT_TEMPLATE.internlm2_chat