commit
4a36ff428a
6
.gitignore
vendored
6
.gitignore
vendored
@ -6,6 +6,11 @@ data/
|
||||
pdf/
|
||||
.idea/
|
||||
|
||||
*.jsonl
|
||||
*.json
|
||||
# ./generate_data/*.josnl
|
||||
# ./generate_data/*/*/*.josnl
|
||||
|
||||
# Byte-compiled / optimized / DLL files
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
@ -169,3 +174,4 @@ cython_debug/
|
||||
# and can be added to the global gitignore or merged into this file. For a more nuclear
|
||||
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
||||
#.idea/
|
||||
|
||||
|
583
README.md
583
README.md
@ -1,287 +1,296 @@
|
||||
<div align="center">
|
||||
|
||||
# EmoLLM-心理健康大模型
|
||||
|
||||
</div>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://github.com/aJupyter/EmoLLM/">
|
||||
<img src="assets/logo.jpeg" alt="Logo" width="30%">
|
||||
</a>
|
||||
|
||||
<div align="center">
|
||||
|
||||
<!-- PROJECT SHIELDS -->
|
||||
[![Contributors][contributors-shield]][contributors-url]
|
||||
[![Forks][forks-shield]][forks-url]
|
||||
[![Issues][issues-shield]][issues-url]
|
||||
[![OpenXLab_App][OpenXLab_App-image]][OpenXLab_App-url]
|
||||
[![OpenXLab_Model][OpenXLab_Model-image]][OpenXLab_Model-url]
|
||||
[![MIT License][license-shield]][license-url]
|
||||
[![Stargazers][stars-shield]][stars-url]
|
||||
|
||||
</div>
|
||||
|
||||
<h3 align="center">EmoLLM</h3>
|
||||
|
||||
<div align="center">
|
||||
简体中文| <a href="README_EN.md" >English</a>
|
||||
<br />
|
||||
<br />
|
||||
<a href="https://github.com/aJupyter/EmoLLM"><strong>探索本项目的文档 »</strong></a>
|
||||
<br />
|
||||
<br />
|
||||
<a href="https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0">体验EmoLLM 2.0</a>
|
||||
·
|
||||
<a href="https://github.com/aJupyter/EmoLLM/issues">报告Bug</a>
|
||||
·
|
||||
<a href="https://github.com/aJupyter/EmoLLM/issues">提出新特性</a>
|
||||
</div>
|
||||
|
||||
|
||||
<!-- 本篇README.md面向开发者 -->
|
||||
|
||||
**EmoLLM** 是一系列能够支持 **理解用户-支持用户-帮助用户** 心理健康辅导链路的心理健康大模型,由 `LLM`指令微调而来,欢迎大家star~⭐⭐。目前已经开源的 `LLM` 微调配置如下:
|
||||
|
||||
<div align="center">
|
||||
|
||||
| 模型 | 类型 |
|
||||
| :-------------------: | :------: |
|
||||
| InternLM2_7B_chat | QLORA |
|
||||
| InternLM2_7B_chat | 全量微调 |
|
||||
| InternLM2_1_8B_chat | 全量微调 |
|
||||
| InternLM2_20B_chat | LORA |
|
||||
| Qwen_7b_chat | QLORA |
|
||||
| Qwen1_5-0_5B-Chat | 全量微调 |
|
||||
| Baichuan2_13B_chat | QLORA |
|
||||
| ChatGLM3_6B | LORA |
|
||||
| DeepSeek MoE_16B_chat | QLORA |
|
||||
| Mixtral 8x7B_instruct | QLORA |
|
||||
| …… | …… |
|
||||
|
||||
</div>
|
||||
|
||||
欢迎大家为本项目做出贡献~
|
||||
|
||||
---
|
||||
|
||||
心理健康大模型(Mental Health Grand Model)是一个综合性的概念,它旨在全面理解和促进个体、群体乃至整个社会的心理健康状态。这个模型通常包含以下几个关键组成部分:
|
||||
|
||||
- 认知因素:涉及个体的思维模式、信念系统、认知偏差以及解决问题的能力。认知因素对心理健康有重要影响,因为它们影响个体如何解释和应对生活中的事件。
|
||||
- 情感因素:包括情绪调节、情感表达和情感体验。情感健康是心理健康的重要组成部分,涉及个体如何管理和表达自己的情感,以及如何从负面情绪中恢复。
|
||||
- 行为因素:涉及个体的行为模式、习惯和应对策略。这包括应对压力的技巧、社交技能以及自我效能感,即个体对自己能力的信心。
|
||||
- 社会环境:包括家庭、工作、社区和文化背景等外部因素,这些因素对个体的心理健康有着直接和间接的影响。
|
||||
- 生理健康:身体健康与心理健康紧密相关。良好的身体健康可以促进心理健康,反之亦然。
|
||||
- 心理韧性:指个体在面对逆境时的恢复力和适应能力。心理韧性强的人更能够从挑战中恢复,并从中学习和成长。
|
||||
- 预防和干预措施:心理健康大模型还包括预防心理问题和促进心理健康的策略,如心理教育、心理咨询、心理治疗和社会支持系统。
|
||||
- 评估和诊断工具:为了有效促进心理健康,需要有科学的工具来评估个体的心理状态,以及诊断可能存在的心理问题。
|
||||
|
||||
### 🎇最近更新
|
||||
|
||||
- 【2024.3.12】在百度飞浆平台发布[艾薇](https://aistudio.baidu.com/community/app/63335)
|
||||
- 【2024.3.11】 **EmoLLM V2.0 相比 EmoLLM V1.0 全面提升,已超越 Role-playing ChatGPT 在心理咨询任务上的能力!**[点击体验EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0),更新[数据集统计及详细信息](./datasets/)、[路线图](./assets/Roadmap_ZH.png)
|
||||
- 【2024.3.9】 新增并发功能加速 [QA 对生成](./scripts/qa_generation/)、[RAG pipeline](./rag/)
|
||||
- 【2024.3.3】 [基于InternLM2-7B-chat全量微调版本EmoLLM V2.0开源](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full),需要两块A100*80G,更新专业评估,详见[evaluate](./evaluate/),更新基于PaddleOCR的PDF转txt工具脚本,详见[scripts](./scripts/)
|
||||
- 【2024.2.29】更新客观评估计算,详见[evaluate](./evaluate/),更新一系列数据集,详见[datasets](./datasets/)
|
||||
- 【2024.2.27】更新英文readme和一系列数据集(舔狗和单轮对话)
|
||||
- 【2024.2.23】推出基于InternLM2_7B_chat_qlora的 `温柔御姐心理医生艾薇`,[点击获取模型权重](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei),[配置文件](xtuner_config/aiwei-internlm2_chat_7b_qlora.py),[在线体验链接](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei)
|
||||
- 【2024.2.23】更新[若干微调配置](/xtuner_config/),新增 [data_pro.json](/datasets/data_pro.json)(数量更多、场景更全、更丰富)和 [aiwei.json](/datasets/aiwei.json)(温柔御姐角色扮演专用,带有Emoji表情),即将推出 `温柔御姐心理医生艾薇`
|
||||
- 【2024.2.18】 [基于Qwen1_5-0_5B-Chat全量微调版本开源](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary),算力有限的道友可以玩起来~
|
||||
|
||||
<details>
|
||||
<summary>查看更多</summary>
|
||||
|
||||
- 【2024.2.6】 EmoLLM在[**Openxlab** ](https://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model) 平台下载量高达18.7k,欢迎大家体验!
|
||||
|
||||
<p align="center">
|
||||
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/7e931682-c54d-4ded-bc67-79130c68d744" alt="模型下载量">
|
||||
</p>
|
||||
|
||||
- 【2024.2.5】 项目荣获公众号**NLP工程化**推文宣传[推文链接](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A),为博主推广一波,欢迎大家关注!!🥳🥳
|
||||
|
||||
<p align="center">
|
||||
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/47868d6a-2e91-4aa9-a630-e594c14295b4" alt="公众号二维码">
|
||||
</p>
|
||||
|
||||
- 【2024.2.3】 [项目宣传视频](https://www.bilibili.com/video/BV1N7421N76X/)完成 😊
|
||||
- 【2024.1.27】 完善数据构建文档、微调指南、部署指南、Readme等相关文档 👏
|
||||
- 【2024.1.25】 EmoLLM V1.0 已部署上线 https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀
|
||||
|
||||
</details>
|
||||
|
||||
### 🎯路线图
|
||||
|
||||
<p align="center">
|
||||
<a href="https://github.com/aJupyter/EmoLLM/">
|
||||
<img src="assets/Roadmap_ZH.png" alt="Roadmap_ZH">
|
||||
</a>
|
||||
|
||||
## 目录
|
||||
|
||||
- [EmoLLM-心理健康大模型](#emollm-心理健康大模型)
|
||||
- [🎇最近更新](#最近更新)
|
||||
- [🎯路线图](#路线图)
|
||||
- [目录](#目录)
|
||||
- [开发前的配置要求](#开发前的配置要求)
|
||||
- [**使用指南**](#使用指南)
|
||||
- [数据构建](#数据构建)
|
||||
- [微调指南](#微调指南)
|
||||
- [部署指南](#部署指南)
|
||||
- [RAG(检索增强生成)Pipeline](#rag检索增强生成pipeline)
|
||||
- [使用到的框架](#使用到的框架)
|
||||
- [如何参与本项目](#如何参与本项目)
|
||||
- [作者(排名不分先后)](#作者排名不分先后)
|
||||
- [版权说明](#版权说明)
|
||||
- [特别鸣谢](#特别鸣谢)
|
||||
- [Star History](#star-history)
|
||||
- [🌟 Contributors](#-contributors)
|
||||
- [交流群](#交流群)
|
||||
|
||||
###### 开发前的配置要求
|
||||
|
||||
- 硬件:A100 40G(仅针对InternLM2_7B_chat+qlora微调+deepspeed zero2优化)
|
||||
|
||||
###### **使用指南**
|
||||
|
||||
1. Clone the repo
|
||||
|
||||
```sh
|
||||
git clone https://github.com/SmartFlowAI/EmoLLM.git
|
||||
```
|
||||
|
||||
2. 依次阅读或者选择感兴趣的部分阅读:
|
||||
- [数据构建](#数据构建)
|
||||
- [微调指南](#微调指南)
|
||||
- [部署指南](#部署指南)
|
||||
- [RAG](#rag检索增强生成pipeline)
|
||||
- 查看更多详情
|
||||
|
||||
### 数据构建
|
||||
|
||||
- 请阅读[数据构建指南](generate_data/tutorial.md)查阅
|
||||
|
||||
- 微调用到的数据集见[datasets](datasets/data.json)
|
||||
|
||||
### 微调指南
|
||||
|
||||
详见[微调指南](xtuner_config/README.md)
|
||||
|
||||
### 部署指南
|
||||
|
||||
- Demo部署:详见[部署指南](demo/README.md)
|
||||
- 基于[LMDeploy](https://github.com/InternLM/lmdeploy/)的量化部署:详见[deploy](./deploy/lmdeploy.md)
|
||||
|
||||
### RAG(检索增强生成)Pipeline
|
||||
|
||||
- 详见[RAG](./rag/)
|
||||
|
||||
<details>
|
||||
<summary>更多详情</summary>
|
||||
|
||||
### 使用到的框架
|
||||
|
||||
- [Xtuner](https://github.com/InternLM/xtuner):用于微调
|
||||
- [Transformers](https://github.com/huggingface/transformers)
|
||||
- [Pytorch](https://pytorch.org/)
|
||||
- [LMDeploy](https://github.com/InternLM/lmdeploy/):用于量化部署
|
||||
- [Stremlit](https://streamlit.io/):用于构建Demo
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed):并行训练
|
||||
- …
|
||||
|
||||
#### 如何参与本项目
|
||||
|
||||
贡献使开源社区成为一个学习、激励和创造的绝佳场所。你所作的任何贡献都是**非常感谢**的。
|
||||
|
||||
1. Fork the Project
|
||||
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
|
||||
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
|
||||
4. Push to the Branch (`git push origin feature/AmazingFeature`)
|
||||
5. Open a Pull Request
|
||||
|
||||
</details>
|
||||
|
||||
### 作者(排名不分先后)
|
||||
|
||||
| 用户名 | 学校/组织 | 备注 | 贡献 |
|
||||
| :----------: | :--------------------: | :-------------------: | :----------: |
|
||||
| [aJupyter](https://github.com/aJupyter) | 南开大学在读硕士 | DataWhale成员 | 项目发起人 |
|
||||
| [jujimeizuo](https://github.com/jujimeizuo) | 江南大学在读硕士 | | |
|
||||
| [Smiling-Weeping-zhr](https://github.com/Smiling-Weeping-zhr) | 哈尔滨工业大学(威海)在读本科生 | | |
|
||||
| [8baby8](https://github.com/8baby8) | 飞桨领航团区域主管 | 文心大模型核心开发者 | |
|
||||
| [zxazys](https://github.com/zxazys) | 南开大学在读硕士 | | |
|
||||
| [MING-ZCH](https://github.com/MING-ZCH) | 华中科技大学在读本科生 | | |
|
||||
| [JasonLLLLLLLLLLL](https://github.com/JasonLLLLLLLLLLL) | swufe | | |
|
||||
| [MrCatAI](https://github.com/MrCatAI) | AI搬用工 | | |
|
||||
| [ZeyuBa](https://github.com/ZeyuBa) | 自动化所在读硕士 | | |
|
||||
| [aiyinyuedejustin](https://github.com/aiyinyuedejustin) | 宾夕法尼亚大学在读硕士 | | |
|
||||
| [Nobody-ML](https://github.com/Nobody-ML) | 中国石油大学(华东)在读本科生 | | |
|
||||
| [chg0901](https://github.com/chg0901) | [MiniSora](https://github.com/mini-sora/minisora/) |MiniSora主要维护|数据清洗、文档翻译|
|
||||
| [Mxoder](https://github.com/Mxoder) | 北京航空航天大学在读本科生 | | |
|
||||
| [Anooyman](https://github.com/Anooyman) | 南京理工大学硕士 | | |
|
||||
| [Vicky-3021](https://github.com/Vicky-3021) | 西安电子科技大学硕士(研0) | | |
|
||||
| [SantiagoTOP](https://github.com/santiagoTOP) | 太原理工大学在读硕士 | | |
|
||||
|
||||
### 版权说明
|
||||
|
||||
该项目签署了 MIT 授权许可,详情请参阅 [LICENSE](https://github.com/SmartFlowAI/EmoLLM/blob/main/LICENSE)
|
||||
|
||||
|
||||
### 引用
|
||||
如果本项目对您的工作有所帮助,请使用以下格式引用:
|
||||
|
||||
```bibtex
|
||||
@misc{EmoLLM,
|
||||
title={EmoLLM},
|
||||
author={EmoLLM},
|
||||
url={https://github.com/SmartFlowAI/EmoLLM/},
|
||||
year={2024}
|
||||
}
|
||||
```
|
||||
|
||||
### 特别鸣谢
|
||||
|
||||
- [Sanbu](https://github.com/sanbuphy)
|
||||
- [上海人工智能实验室](https://www.shlab.org.cn/)
|
||||
- [闻星大佬(小助手)](https://github.com/vansin)
|
||||
- [扫地升(公众号宣传)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
|
||||
- 阿布(北大心理学硕士)
|
||||
|
||||
<!-- links -->
|
||||
|
||||
<!-- [linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555 -->
|
||||
|
||||
<!-- [linkedin-url]: https://linkedin.com/in/aJupyter -->
|
||||
|
||||
## Star History
|
||||
|
||||
[](https://star-history.com/#SmartFlowAI/EmoLLM&Date)
|
||||
|
||||
## 🌟 Contributors
|
||||
|
||||
[](https://github.com/SmartFlowAI/EmoLLM/graphs/contributors)
|
||||
|
||||
[your-project-path]: SmartflowAI/EmoLLM
|
||||
[contributors-shield]: https://img.shields.io/github/contributors/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[contributors-url]: https://github.com/SmartflowAI/EmoLLM/graphs/contributors
|
||||
[forks-shield]: https://img.shields.io/github/forks/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[forks-url]: https://github.com/SmartflowAI/EmoLLM/network/members
|
||||
[stars-shield]: https://img.shields.io/github/stars/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[stars-url]: https://github.com/SmartflowAI/EmoLLM/stargazers
|
||||
[issues-shield]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[issues-url]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg
|
||||
[license-shield]: https://img.shields.io/github/license/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[license-url]: https://github.com/SmartFlowAI/EmoLLM/blob/main/LICENSE
|
||||
|
||||
[OpenXLab_App-image]: https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg
|
||||
[OpenXLab_Model-image]: https://cdn-static.openxlab.org.cn/header/openxlab_models.svg
|
||||
[OpenXLab_App-url]: https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0
|
||||
[OpenXLab_Model-url]: https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full
|
||||
|
||||
|
||||
## 交流群
|
||||
|
||||
- 如果失效,请移步Issue区
|
||||
|
||||
<p align="center">
|
||||
<img width="30%" src="https://github.com/SmartFlowAI/EmoLLM/assets/62385492/55ecd0aa-4832-4269-ad57-4c26f9aa286b" alt="EmoLLM官方交流群">
|
||||
</p>
|
||||
<div align="center">
|
||||
|
||||
# EmoLLM-心理健康大模型
|
||||
|
||||
</div>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://github.com/aJupyter/EmoLLM/">
|
||||
<img src="assets/logo.jpeg" alt="Logo" width="30%">
|
||||
</a>
|
||||
|
||||
<div align="center">
|
||||
|
||||
<!-- PROJECT SHIELDS -->
|
||||
[![Contributors][contributors-shield]][contributors-url]
|
||||
[![Forks][forks-shield]][forks-url]
|
||||
[![Issues][issues-shield]][issues-url]
|
||||
[![OpenXLab_App][OpenXLab_App-image]][OpenXLab_App-url]
|
||||
[![OpenXLab_Model][OpenXLab_Model-image]][OpenXLab_Model-url]
|
||||
[![MIT License][license-shield]][license-url]
|
||||
[![Stargazers][stars-shield]][stars-url]
|
||||
|
||||
</div>
|
||||
|
||||
<h3 align="center">EmoLLM</h3>
|
||||
|
||||
<div align="center">
|
||||
简体中文| <a href="README_EN.md" >English</a>
|
||||
<br />
|
||||
<br />
|
||||
<a href="https://github.com/aJupyter/EmoLLM"><strong>探索本项目的文档 »</strong></a>
|
||||
<br />
|
||||
<br />
|
||||
<a href="https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0">体验EmoLLM 2.0</a>
|
||||
·
|
||||
<a href="https://github.com/aJupyter/EmoLLM/issues">报告Bug</a>
|
||||
·
|
||||
<a href="https://github.com/aJupyter/EmoLLM/issues">提出新特性</a>
|
||||
</div>
|
||||
|
||||
|
||||
<!-- 本篇README.md面向开发者 -->
|
||||
|
||||
**EmoLLM** 是一系列能够支持 **理解用户-支持用户-帮助用户** 心理健康辅导链路的心理健康大模型,由 `LLM`指令微调而来,欢迎大家star~⭐⭐。目前已经开源的 `LLM` 微调配置如下:
|
||||
|
||||
<div align="center">
|
||||
|
||||
| 模型 | 类型 |
|
||||
| :-------------------: | :------: |
|
||||
| InternLM2_7B_chat | QLORA |
|
||||
| InternLM2_7B_chat | 全量微调 |
|
||||
| InternLM2_1_8B_chat | 全量微调 |
|
||||
| InternLM2_20B_chat | LORA |
|
||||
| Qwen_7b_chat | QLORA |
|
||||
| Qwen1_5-0_5B-Chat | 全量微调 |
|
||||
| Baichuan2_13B_chat | QLORA |
|
||||
| ChatGLM3_6B | LORA |
|
||||
| DeepSeek MoE_16B_chat | QLORA |
|
||||
| Mixtral 8x7B_instruct | QLORA |
|
||||
| …… | …… |
|
||||
|
||||
</div>
|
||||
|
||||
欢迎大家为本项目做出贡献~
|
||||
|
||||
---
|
||||
|
||||
心理健康大模型(Mental Health Grand Model)是一个综合性的概念,它旨在全面理解和促进个体、群体乃至整个社会的心理健康状态。这个模型通常包含以下几个关键组成部分:
|
||||
|
||||
- 认知因素:涉及个体的思维模式、信念系统、认知偏差以及解决问题的能力。认知因素对心理健康有重要影响,因为它们影响个体如何解释和应对生活中的事件。
|
||||
- 情感因素:包括情绪调节、情感表达和情感体验。情感健康是心理健康的重要组成部分,涉及个体如何管理和表达自己的情感,以及如何从负面情绪中恢复。
|
||||
- 行为因素:涉及个体的行为模式、习惯和应对策略。这包括应对压力的技巧、社交技能以及自我效能感,即个体对自己能力的信心。
|
||||
- 社会环境:包括家庭、工作、社区和文化背景等外部因素,这些因素对个体的心理健康有着直接和间接的影响。
|
||||
- 生理健康:身体健康与心理健康紧密相关。良好的身体健康可以促进心理健康,反之亦然。
|
||||
- 心理韧性:指个体在面对逆境时的恢复力和适应能力。心理韧性强的人更能够从挑战中恢复,并从中学习和成长。
|
||||
- 预防和干预措施:心理健康大模型还包括预防心理问题和促进心理健康的策略,如心理教育、心理咨询、心理治疗和社会支持系统。
|
||||
- 评估和诊断工具:为了有效促进心理健康,需要有科学的工具来评估个体的心理状态,以及诊断可能存在的心理问题。
|
||||
|
||||
### 🎇最近更新
|
||||
|
||||
- 【2024.3.12】在百度飞浆平台发布[艾薇](https://aistudio.baidu.com/community/app/63335)
|
||||
- 【2024.3.11】 **EmoLLM V2.0 相比 EmoLLM V1.0 全面提升,已超越 Role-playing ChatGPT 在心理咨询任务上的能力!**[点击体验EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0),更新[数据集统计及详细信息](./datasets/)、[路线图](./assets/Roadmap_ZH.png)
|
||||
- 【2024.3.9】 新增并发功能加速 [QA 对生成](./scripts/qa_generation/)、[RAG pipeline](./rag/)
|
||||
- 【2024.3.3】 [基于InternLM2-7B-chat全量微调版本EmoLLM V2.0开源](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full),需要两块A100*80G,更新专业评估,详见[evaluate](./evaluate/),更新基于PaddleOCR的PDF转txt工具脚本,详见[scripts](./scripts/)
|
||||
- 【2024.2.29】更新客观评估计算,详见[evaluate](./evaluate/),更新一系列数据集,详见[datasets](./datasets/)
|
||||
- 【2024.2.27】更新英文readme和一系列数据集(舔狗和单轮对话)
|
||||
- 【2024.2.23】推出基于InternLM2_7B_chat_qlora的 `温柔御姐心理医生艾薇`,[点击获取模型权重](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei),[配置文件](xtuner_config/aiwei-internlm2_chat_7b_qlora.py),[在线体验链接](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei)
|
||||
- 【2024.2.23】更新[若干微调配置](/xtuner_config/),新增 [data_pro.json](/datasets/data_pro.json)(数量更多、场景更全、更丰富)和 [aiwei.json](/datasets/aiwei.json)(温柔御姐角色扮演专用,带有Emoji表情),即将推出 `温柔御姐心理医生艾薇`
|
||||
- 【2024.2.18】 [基于Qwen1_5-0_5B-Chat全量微调版本开源](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary),算力有限的道友可以玩起来~
|
||||
|
||||
<details>
|
||||
<summary>查看更多</summary>
|
||||
|
||||
- 【2024.2.6】 EmoLLM在[**Openxlab** ](https://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model) 平台下载量高达18.7k,欢迎大家体验!
|
||||
|
||||
<p align="center">
|
||||
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/7e931682-c54d-4ded-bc67-79130c68d744" alt="模型下载量">
|
||||
</p>
|
||||
|
||||
- 【2024.2.5】 项目荣获公众号**NLP工程化**推文宣传[推文链接](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A),为博主推广一波,欢迎大家关注!!🥳🥳
|
||||
|
||||
<p align="center">
|
||||
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/47868d6a-2e91-4aa9-a630-e594c14295b4" alt="公众号二维码">
|
||||
</p>
|
||||
|
||||
- 【2024.2.3】 [项目宣传视频](https://www.bilibili.com/video/BV1N7421N76X/)完成 😊
|
||||
- 【2024.1.27】 完善数据构建文档、微调指南、部署指南、Readme等相关文档 👏
|
||||
- 【2024.1.25】 EmoLLM V1.0 已部署上线 https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀
|
||||
|
||||
</details>
|
||||
|
||||
### 🎯路线图
|
||||
|
||||
<p align="center">
|
||||
<a href="https://github.com/aJupyter/EmoLLM/">
|
||||
<img src="assets/Roadmap_ZH.png" alt="Roadmap_ZH">
|
||||
</a>
|
||||
|
||||
### 🎯框架图
|
||||
|
||||
<p align="center">
|
||||
<a href="https://github.com/aJupyter/EmoLLM/">
|
||||
<img src="assets/框架图.png" alt="Roadmap_ZH">
|
||||
</a>
|
||||
|
||||
## 目录
|
||||
|
||||
- [EmoLLM-心理健康大模型](#emollm-心理健康大模型)
|
||||
- [🎇最近更新](#最近更新)
|
||||
- [🎯路线图](#路线图)
|
||||
- [🎯框架图](#框架图)
|
||||
- [目录](#目录)
|
||||
- [开发前的配置要求](#开发前的配置要求)
|
||||
- [**使用指南**](#使用指南)
|
||||
- [数据构建](#数据构建)
|
||||
- [微调指南](#微调指南)
|
||||
- [部署指南](#部署指南)
|
||||
- [RAG(检索增强生成)Pipeline](#rag检索增强生成pipeline)
|
||||
- [使用到的框架](#使用到的框架)
|
||||
- [如何参与本项目](#如何参与本项目)
|
||||
- [作者(排名不分先后)](#作者排名不分先后)
|
||||
- [版权说明](#版权说明)
|
||||
- [特别鸣谢](#特别鸣谢)
|
||||
- [Star History](#star-history)
|
||||
- [🌟 Contributors](#-contributors)
|
||||
- [交流群](#交流群)
|
||||
|
||||
###### 开发前的配置要求
|
||||
|
||||
- 硬件:A100 40G(仅针对InternLM2_7B_chat+qlora微调+deepspeed zero2优化)
|
||||
|
||||
###### **使用指南**
|
||||
|
||||
1. Clone the repo
|
||||
|
||||
```sh
|
||||
git clone https://github.com/SmartFlowAI/EmoLLM.git
|
||||
```
|
||||
|
||||
2. 依次阅读或者选择感兴趣的部分阅读:
|
||||
- [数据构建](#数据构建)
|
||||
- [微调指南](#微调指南)
|
||||
- [部署指南](#部署指南)
|
||||
- [RAG](#rag检索增强生成pipeline)
|
||||
- 查看更多详情
|
||||
|
||||
### 数据构建
|
||||
|
||||
- 请阅读[数据构建指南](generate_data/tutorial.md)查阅
|
||||
|
||||
- 微调用到的数据集见[datasets](datasets/data.json)
|
||||
|
||||
### 微调指南
|
||||
|
||||
详见[微调指南](xtuner_config/README.md)
|
||||
|
||||
### 部署指南
|
||||
|
||||
- Demo部署:详见[部署指南](demo/README.md)
|
||||
- 基于[LMDeploy](https://github.com/InternLM/lmdeploy/)的量化部署:详见[deploy](./deploy/lmdeploy.md)
|
||||
|
||||
### RAG(检索增强生成)Pipeline
|
||||
|
||||
- 详见[RAG](./rag/)
|
||||
|
||||
<details>
|
||||
<summary>更多详情</summary>
|
||||
|
||||
### 使用到的框架
|
||||
|
||||
- [Xtuner](https://github.com/InternLM/xtuner):用于微调
|
||||
- [Transformers](https://github.com/huggingface/transformers)
|
||||
- [Pytorch](https://pytorch.org/)
|
||||
- [LMDeploy](https://github.com/InternLM/lmdeploy/):用于量化部署
|
||||
- [Stremlit](https://streamlit.io/):用于构建Demo
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed):并行训练
|
||||
- …
|
||||
|
||||
#### 如何参与本项目
|
||||
|
||||
贡献使开源社区成为一个学习、激励和创造的绝佳场所。你所作的任何贡献都是**非常感谢**的。
|
||||
|
||||
1. Fork the Project
|
||||
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
|
||||
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
|
||||
4. Push to the Branch (`git push origin feature/AmazingFeature`)
|
||||
5. Open a Pull Request
|
||||
|
||||
</details>
|
||||
|
||||
### 作者(排名不分先后)
|
||||
|
||||
| 用户名 | 学校/组织 | 备注 | 贡献 |
|
||||
| :----------: | :--------------------: | :-------------------: | :----------: |
|
||||
| [aJupyter](https://github.com/aJupyter) | 南开大学在读硕士 | DataWhale成员 | 项目发起人 |
|
||||
| [jujimeizuo](https://github.com/jujimeizuo) | 江南大学在读硕士 | | |
|
||||
| [Smiling-Weeping-zhr](https://github.com/Smiling-Weeping-zhr) | 哈尔滨工业大学(威海)在读本科生 | | |
|
||||
| [8baby8](https://github.com/8baby8) | 飞桨领航团区域主管 | 文心大模型核心开发者 | |
|
||||
| [zxazys](https://github.com/zxazys) | 南开大学在读硕士 | | |
|
||||
| [MING-ZCH](https://github.com/MING-ZCH) | 华中科技大学在读本科生 | | |
|
||||
| [JasonLLLLLLLLLLL](https://github.com/JasonLLLLLLLLLLL) | swufe | | |
|
||||
| [MrCatAI](https://github.com/MrCatAI) | AI搬用工 | | |
|
||||
| [ZeyuBa](https://github.com/ZeyuBa) | 自动化所在读硕士 | | |
|
||||
| [aiyinyuedejustin](https://github.com/aiyinyuedejustin) | 宾夕法尼亚大学在读硕士 | | |
|
||||
| [Nobody-ML](https://github.com/Nobody-ML) | 中国石油大学(华东)在读本科生 | | |
|
||||
| [chg0901](https://github.com/chg0901) | [MiniSora](https://github.com/mini-sora/minisora/) |MiniSora主要维护|数据清洗、文档翻译|
|
||||
| [Mxoder](https://github.com/Mxoder) | 北京航空航天大学在读本科生 | | |
|
||||
| [Anooyman](https://github.com/Anooyman) | 南京理工大学硕士 | | |
|
||||
| [Vicky-3021](https://github.com/Vicky-3021) | 西安电子科技大学硕士(研0) | | |
|
||||
| [SantiagoTOP](https://github.com/santiagoTOP) | 太原理工大学在读硕士 | | |
|
||||
| [zealot52099](https://github.com/zealot52099) | AI搬用工 | |清洗数据、RAG|
|
||||
|
||||
### 版权说明
|
||||
|
||||
该项目签署了 MIT 授权许可,详情请参阅 [LICENSE](https://github.com/SmartFlowAI/EmoLLM/blob/main/LICENSE)
|
||||
|
||||
|
||||
### 引用
|
||||
如果本项目对您的工作有所帮助,请使用以下格式引用:
|
||||
|
||||
```bibtex
|
||||
@misc{EmoLLM,
|
||||
title={EmoLLM},
|
||||
author={EmoLLM},
|
||||
url={https://github.com/SmartFlowAI/EmoLLM/},
|
||||
year={2024}
|
||||
}
|
||||
```
|
||||
|
||||
### 特别鸣谢
|
||||
|
||||
- [Sanbu](https://github.com/sanbuphy)
|
||||
- [上海人工智能实验室](https://www.shlab.org.cn/)
|
||||
- [闻星大佬(小助手)](https://github.com/vansin)
|
||||
- [扫地升(公众号宣传)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
|
||||
- 阿布(北大心理学硕士)
|
||||
|
||||
<!-- links -->
|
||||
|
||||
<!-- [linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555 -->
|
||||
|
||||
<!-- [linkedin-url]: https://linkedin.com/in/aJupyter -->
|
||||
|
||||
## Star History
|
||||
|
||||
[](https://star-history.com/#SmartFlowAI/EmoLLM&Date)
|
||||
|
||||
## 🌟 Contributors
|
||||
|
||||
[](https://github.com/SmartFlowAI/EmoLLM/graphs/contributors)
|
||||
|
||||
[your-project-path]: SmartflowAI/EmoLLM
|
||||
[contributors-shield]: https://img.shields.io/github/contributors/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[contributors-url]: https://github.com/SmartflowAI/EmoLLM/graphs/contributors
|
||||
[forks-shield]: https://img.shields.io/github/forks/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[forks-url]: https://github.com/SmartflowAI/EmoLLM/network/members
|
||||
[stars-shield]: https://img.shields.io/github/stars/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[stars-url]: https://github.com/SmartflowAI/EmoLLM/stargazers
|
||||
[issues-shield]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[issues-url]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg
|
||||
[license-shield]: https://img.shields.io/github/license/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[license-url]: https://github.com/SmartFlowAI/EmoLLM/blob/main/LICENSE
|
||||
|
||||
[OpenXLab_App-image]: https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg
|
||||
[OpenXLab_Model-image]: https://cdn-static.openxlab.org.cn/header/openxlab_models.svg
|
||||
[OpenXLab_App-url]: https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0
|
||||
[OpenXLab_Model-url]: https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full
|
||||
|
||||
|
||||
## 交流群
|
||||
|
||||
- 如果失效,请移步Issue区
|
||||
|
||||
<p align="center">
|
||||
<img width="30%" src="https://github.com/SmartFlowAI/EmoLLM/assets/62385492/55ecd0aa-4832-4269-ad57-4c26f9aa286b" alt="EmoLLM官方交流群">
|
||||
</p>
|
||||
|
600
README_EN.md
600
README_EN.md
@ -1,300 +1,300 @@
|
||||
<div align="center">
|
||||
|
||||
# EmoLLM - Large Language Model for Mental Health
|
||||
|
||||
</div>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://github.com/aJupyter/EmoLLM/">
|
||||
<img src="assets/logo.jpeg" alt="Logo" width="30%">
|
||||
</a>
|
||||
|
||||
<div align="center">
|
||||
|
||||
<!-- PROJECT SHIELDS -->
|
||||
[![Contributors][contributors-shield]][contributors-url]
|
||||
[![Forks][forks-shield]][forks-url]
|
||||
[![Issues][issues-shield]][issues-url]
|
||||
[![OpenXLab_App][OpenXLab_App-image]][OpenXLab_App-url]
|
||||
[![OpenXLab_Model][OpenXLab_Model-image]][OpenXLab_Model-url]
|
||||
[![MIT License][license-shield]][license-url]
|
||||
[![Stargazers][stars-shield]][stars-url]
|
||||
|
||||
</div>
|
||||
|
||||
<h3 align="center">EmoLLM</h3>
|
||||
|
||||
<p align="center">
|
||||
<a href="README.md">简体中文</a> | English
|
||||
<br />
|
||||
<br />
|
||||
<a href="https://github.com/aJupyter/EmoLLM"><strong>Explore the documentation of this project »</strong></a>
|
||||
<br />
|
||||
<br />
|
||||
<a href="https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0">EmoLLM 2.0 Demo</a>
|
||||
·
|
||||
<a href="https://github.com/aJupyter/EmoLLM/issues">Report a Bug</a>
|
||||
·
|
||||
<a href="https://github.com/aJupyter/EmoLLM/issues">Propose a New Feature</a>
|
||||
</p>
|
||||
|
||||
</p>
|
||||
|
||||
<!-- 本篇README.md面向开发者 -->
|
||||
|
||||
|
||||
**EmoLLM** is a series of large language models designed to understand, support and help customers in mental health counseling. It is fine-tuned from the LLM instructions. We really appreciate it if you could give it a star~⭐⭐. The open-sourced configuration is as follows:
|
||||
|
||||
<div align="center">
|
||||
|
||||
| Model | Type |
|
||||
| :-------------------: | :------: |
|
||||
| InternLM2_7B_chat | QLORA |
|
||||
| InternLM2_7B_chat | full fine-tuning |
|
||||
| InternLM2_1_8B_chat | full fine-tuning |
|
||||
| InternLM2_20B_chat | LORA |
|
||||
| Qwen_7b_chat | QLORA |
|
||||
| Qwen1_5-0_5B-Chat | full fine-tuning |
|
||||
| Baichuan2_13B_chat | QLORA |
|
||||
| ChatGLM3_6B | LORA |
|
||||
| DeepSeek MoE_16B_chat | QLORA |
|
||||
| Mixtral 8x7B_instruct | QLORA |
|
||||
| …… | …… |
|
||||
|
||||
</div>
|
||||
|
||||
Everyone is welcome to contribute to this project ~
|
||||
|
||||
---
|
||||
|
||||
The Model aims to fully understand and promote the mental health of individuals, groups, and society. This model typically includes the following key components:
|
||||
|
||||
- Cognitive factors: Involving an individual's thought patterns, belief systems, cognitive biases, and problem-solving abilities. Cognitive factors significantly impact mental health as they affect how individuals interpret and respond to life events.
|
||||
- Emotional factors: Including emotion regulation, emotional expression, and emotional experiences. Emotional health is a crucial part of mental health, involving how individuals manage and express their emotions and how they recover from negative emotions.
|
||||
- Behavioral factors: Concerning an individual's behavior patterns, habits, and coping strategies. This includes stress management skills, social skills, and self-efficacy, which is the confidence in one's abilities.
|
||||
- Social environment: Comprising external factors such as family, work, community, and cultural background, which have direct and indirect impacts on an individual's mental health.
|
||||
- Physical health: There is a close relationship between physical and mental health. Good physical health can promote mental health and vice versa.
|
||||
- Psychological resilience: Refers to an individual's ability to recover from adversity and adapt. Those with strong psychological resilience can bounce back from challenges and learn and grow from them.
|
||||
- Prevention and intervention measures: The Mental Health Grand Model also includes strategies for preventing psychological issues and promoting mental health, such as psychological education, counseling, therapy, and social support systems.
|
||||
- Assessment and diagnostic tools: Effective promotion of mental health requires scientific tools to assess individuals' psychological states and diagnose potential psychological issues.
|
||||
### Recent Updates
|
||||
- 【2024.3.12】 Released on Baidu Flying Pulp Platform [aiwei](https://aistudio.baidu.com/community/app/63335)
|
||||
- 【2024.3.11】 **EmoLLM V2.0 is greatly improved in all scores compared to EmoLLM V1.0. Surpasses the performance of Role-playing ChatGPT on counseling tasks!** [Click to experience EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0), update [dataset statistics and details](./datasets/), [Roadmap](./assets/Roadmap_ZH.png)
|
||||
- 【2024.3.9】 Add concurrency acceleration [QA pair generation](./scripts/qa_generation/), [RAG pipeline](./rag/)
|
||||
- 【2024.3.3】 [Based on InternLM2-7B-chat full fine-tuned version EmoLLM V2.0 open sourced](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full), need two A100*80G, update professional evaluation, see [evaluate](./evaluate/), update PaddleOCR-based PDF to txt tool scripts, see [scripts](./scripts/).
|
||||
- 【2024.2.29】 Updated objective assessment calculations, see [evaluate](./evaluate/) for details. A series of datasets have also been updated, see [datasets](./datasets/) for details.
|
||||
- 【2024.2.27】 Updated English README and a series of datasets (licking dogs and one-round dialogue)
|
||||
- 【2024.2.23】The "Gentle Lady Psychologist Ai Wei" based on InternLM2_7B_chat_qlora was launched. [Click here to obtain the model weights](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei), [configuration file](xtuner_config/aiwei-internlm2_chat_7b_qlora.py), [online experience link](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei)
|
||||
|
||||
- 【2024.2.23】Updated [several fine-tuning configurations](/xtuner_config/), added [data_pro.json](/datasets/data_pro.json) (more quantity, more comprehensive scenarios, richer content) and [aiwei.json](/datasets/aiwei.json) (dedicated to the gentle lady role-play, featuring Emoji expressions), the "Gentle Lady Psychologist Ai Wei" is coming soon.
|
||||
|
||||
- 【2024.2.18】 The full fine-tuned version based on Qwen1_5-0_5B-Chat has been [open-sourced](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary). Friends with limited computational resources can now dive in and explore it.
|
||||
|
||||
|
||||
<details>
|
||||
<summary>View More</summary>
|
||||
|
||||
- 【2024.2.6】 [Open-sourced based on the Qwen1_5-0_5B-Chat full-scale fine-tuned version](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary), friends with limited computing power can start experimenting~
|
||||
|
||||
<p align="center">
|
||||
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/7e931682-c54d-4ded-bc67-79130c68d744" alt="模型下载量">
|
||||
</p>
|
||||
|
||||
- 【2024.2.5】 The project has been promoted by the official WeChat account NLP Engineering. Here's the [link](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) to the article. Welcome everyone to follow!! 🥳🥳
|
||||
|
||||
<p align="center">
|
||||
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/47868d6a-2e91-4aa9-a630-e594c14295b4" alt="公众号二维码">
|
||||
</p>
|
||||
|
||||
- 【2024.2.3】 [Project Vedio](https://www.bilibili.com/video/BV1N7421N76X/) at bilibili 😊
|
||||
- 【2024.1.27】 Complete data construction documentation, fine-tuning guide, deployment guide, Readme, and other related documents 👏
|
||||
- 【2024.1.25】 EmoLLM V1.0 has deployed online https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀
|
||||
|
||||
</details>
|
||||
|
||||
### Roadmap
|
||||
|
||||
<p align="center">
|
||||
<a href="https://github.com/aJupyter/EmoLLM/">
|
||||
<img src="assets/Roadmap_EN.png" alt="Roadmap_EN">
|
||||
</a>
|
||||
|
||||
## Contents
|
||||
|
||||
- [EmoLLM - Large Language Model for Mental Health](#emollm---large-language-model-for-mental-health)
|
||||
- [Recent Updates](#recent-updates)
|
||||
- [Roadmap](#roadmap)
|
||||
- [Contents](#contents)
|
||||
- [Pre-development Configuration Requirements.](#pre-development-configuration-requirements)
|
||||
- [**User Guide**](#user-guide)
|
||||
- [File Directory Explanation](#file-directory-explanation)
|
||||
- [Data Construction](#data-construction)
|
||||
- [Fine-tuning Guide](#fine-tuning-guide)
|
||||
- [Deployment Guide](#deployment-guide)
|
||||
- [RAG (Retrieval Augmented Generation) Pipeline](#rag-retrieval-augmented-generation-pipeline)
|
||||
- [Frameworks Used](#frameworks-used)
|
||||
- [How to participate in this project](#how-to-participate-in-this-project)
|
||||
- [Version control](#version-control)
|
||||
- [Authors (in no particular order)](#authors-in-no-particular-order)
|
||||
- [Copyright Notice](#copyright-notice)
|
||||
- [Acknowledgments](#acknowledgments)
|
||||
- [Star History](#star-history)
|
||||
- [🌟 Contributors](#-contributors)
|
||||
- [Communication group](#communication-group)
|
||||
|
||||
###### Pre-development Configuration Requirements.
|
||||
|
||||
- A100 40G (specifically for InternLM2_7B_chat + qlora fine-tuning + deepspeed zero2 optimization)
|
||||
|
||||
###### **User Guide**
|
||||
|
||||
1. Clone the repo
|
||||
|
||||
```sh
|
||||
git clone https://github.com/SmartFlowAI/EmoLLM.git
|
||||
```
|
||||
|
||||
1. Read in sequence or read sections you're interested in:
|
||||
- [File Directory Explanation](#file-directory-explanation)
|
||||
- [Data Construction](#data-construction)
|
||||
- [Fine-tuning Guide](#fine-tuning-guide)
|
||||
- [Deployment Guide](#deployment-guide)
|
||||
- View More Details
|
||||
|
||||
|
||||
|
||||
### File Directory Explanation
|
||||
|
||||
```
|
||||
├─assets: Image Resources
|
||||
├─datasets: Dataset
|
||||
├─demo: demo scripts
|
||||
├─generate_data: Data Generation Guide
|
||||
│ └─xinghuo
|
||||
├─scripts: Some Available Tools
|
||||
└─xtuner_config:Fine-tuning Guide
|
||||
└─images
|
||||
```
|
||||
|
||||
### Data Construction
|
||||
|
||||
- Please read the [Data Construction Guide ](generate_data/tutorial.md)for reference.
|
||||
|
||||
- The dataset used for this fine-tuning can be found at [datasets](datasets/data.json)
|
||||
|
||||
### Fine-tuning Guide
|
||||
|
||||
For details, see the [fine-tuning guide](xtuner_config/README.md)
|
||||
|
||||
### Deployment Guide
|
||||
|
||||
- Demo deployment: see [deployment guide](./demo/README.md) for details.
|
||||
- Quantitative deployment based on [LMDeploy](https://github.com/InternLM/lmdeploy/): see [deploy](./deploy/lmdeploy.md)
|
||||
|
||||
|
||||
### RAG (Retrieval Augmented Generation) Pipeline
|
||||
- See [RAG](./rag/)
|
||||
|
||||
<details>
|
||||
<summary>Additional Details</summary>
|
||||
|
||||
### Frameworks Used
|
||||
|
||||
- [Xtuner](https://github.com/InternLM/xtuner)
|
||||
- [Transformers](https://github.com/huggingface/transformers)
|
||||
- [Pytorch](https://pytorch.org/)
|
||||
- [LMDeploy](https://github.com/InternLM/lmdeploy/): for quantitative deployment
|
||||
- [Stremlit](https://streamlit.io/): for building demos
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed): for parallel training
|
||||
- …
|
||||
|
||||
#### How to participate in this project
|
||||
|
||||
Contributions make the open-source community an excellent place for learning, inspiration, and creation. Any contribution you make is greatly appreciated.
|
||||
|
||||
1. Fork the Project
|
||||
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
|
||||
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
|
||||
4. Push to the Branch (`git push origin feature/AmazingFeature`)
|
||||
5. Open a Pull Request
|
||||
|
||||
### Version control
|
||||
|
||||
This project uses Git for version control. You can see the currently available versions in the repository.
|
||||
|
||||
</details>
|
||||
|
||||
### Authors (in no particular order)
|
||||
|
||||
| Username | School/Organization | Remarks | Contributions |
|
||||
| :-------: | :-------------------: | :------------------: | :--------: |
|
||||
| [aJupyter](https://github.com/aJupyter) | Nankai University, Master's student | DataWhale member | Project initiator |
|
||||
| [jujimeizuo](https://github.com/jujimeizuo) | Jiangnan University, Master's student | | |
|
||||
| [Smiling-Weeping-zhr](https://github.com/Smiling-Weeping-zhr) | Harbin Institute of Technology (Weihai), Undergraduate student | | |
|
||||
| [8baby8](https://github.com/8baby8) | PaddlePaddle Pilot Team Regional Director | Wenxin Large Model core developer | |
|
||||
| [zxazys](https://github.com/zxazys) | Nankai University, Master's student | | |
|
||||
| [MING-ZCH](https://github.com/MING-ZCH) | Huazhong University of Science and Technology, Undergraduate student | | |
|
||||
| [JasonLLLLLLLLLLL](https://github.com/JasonLLLLLLLLLLL) | SWUFE (Southwestern University of Finance and Economics) | | |
|
||||
| [MrCatAI](https://github.com/MrCatAI) | AI Mover | | |
|
||||
| [ZeyuBa](https://github.com/ZeyuBa) | Institute of Automation, Master's student | | |
|
||||
| [aiyinyuedejustin](https://github.com/aiyinyuedejustin) | University of Pennsylvania, Master's student | | |
|
||||
| [Nobody-ML](https://github.com/Nobody-ML) | China University of Petroleum (East China), Undergraduate student | | |
|
||||
| [chg0901](https://github.com/chg0901) | [MiniSora](https://github.com/mini-sora/minisora) |Maintainer and Admin|Data Cleaning and Docs Translation|
|
||||
| [Mxoder](https://github.com/Mxoder) | Beihang University, Undergraduate student | | |
|
||||
| [Anooyman](https://github.com/Anooyman) | Nanjing University of Science and Technology, Master's student | | |
|
||||
| [Vicky-3021](https://github.com/Vicky-3021) | Xidian University, Master's student (Research Year 0) | | |
|
||||
| [SantiagoTOP](https://github.com/santiagoTOP) | Taiyuan University of Technology, Master's student | | |
|
||||
|
||||
|
||||
### Copyright Notice
|
||||
|
||||
The project is licensed under the MIT License. Please refer to the details
|
||||
[LICENSE](https://github.com/aJupyter/EmoLLM/blob/master/LICENSE)
|
||||
|
||||
### Acknowledgments
|
||||
|
||||
- [Sanbu](https://github.com/sanbuphy)
|
||||
- [Shanghai Artificial Intelligence Laboratory](https://www.shlab.org.cn/)
|
||||
- [Vanin](https://github.com/vansin)
|
||||
- [Bloom up (WeChat Official Account Promotion)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
|
||||
- Abu (M.A. in Psychology, Peking University)
|
||||
|
||||
<!-- links -->
|
||||
|
||||
<!-- [linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555 -->
|
||||
|
||||
<!-- [linkedin-url]: https://linkedin.com/in/aJupyter -->
|
||||
|
||||
<!-- 太少了,没必要放 -->
|
||||
|
||||
## Star History
|
||||
|
||||
[](https://star-history.com/#SmartFlowAI/EmoLLM&Date)
|
||||
|
||||
## 🌟 Contributors
|
||||
|
||||
[](https://github.com/SmartFlowAI/EmoLLM/graphs/contributors)
|
||||
|
||||
[your-project-path]: SmartflowAI/EmoLLM
|
||||
[contributors-shield]: https://img.shields.io/github/contributors/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[contributors-url]: https://github.com/SmartflowAI/EmoLLM/graphs/contributors
|
||||
[forks-shield]: https://img.shields.io/github/forks/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[forks-url]: https://github.com/SmartflowAI/EmoLLM/network/members
|
||||
[stars-shield]: https://img.shields.io/github/stars/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[stars-url]: https://github.com/SmartflowAI/EmoLLM/stargazers
|
||||
[issues-shield]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[issues-url]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg
|
||||
[license-shield]: https://img.shields.io/github/license/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[license-url]: https://github.com/SmartflowAI/EmoLLM/blob/main/LICENSE
|
||||
|
||||
[OpenXLab_App-image]: https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg
|
||||
[OpenXLab_Model-image]: https://cdn-static.openxlab.org.cn/header/openxlab_models.svg
|
||||
[OpenXLab_App-url]: https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0
|
||||
[OpenXLab_Model-url]: https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full
|
||||
|
||||
## Communication group
|
||||
- If it fails, go to the Issue section.
|
||||
|
||||
<p align="center">
|
||||
<img width="30%" src="https://github.com/SmartFlowAI/EmoLLM/assets/62385492/55ecd0aa-4832-4269-ad57-4c26f9aa286b" alt="EmoLLM official communication group">
|
||||
</p>
|
||||
<div align="center">
|
||||
|
||||
# EmoLLM - Large Language Model for Mental Health
|
||||
|
||||
</div>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://github.com/aJupyter/EmoLLM/">
|
||||
<img src="assets/logo.jpeg" alt="Logo" width="30%">
|
||||
</a>
|
||||
|
||||
<div align="center">
|
||||
|
||||
<!-- PROJECT SHIELDS -->
|
||||
[![Contributors][contributors-shield]][contributors-url]
|
||||
[![Forks][forks-shield]][forks-url]
|
||||
[![Issues][issues-shield]][issues-url]
|
||||
[![OpenXLab_App][OpenXLab_App-image]][OpenXLab_App-url]
|
||||
[![OpenXLab_Model][OpenXLab_Model-image]][OpenXLab_Model-url]
|
||||
[![MIT License][license-shield]][license-url]
|
||||
[![Stargazers][stars-shield]][stars-url]
|
||||
|
||||
</div>
|
||||
|
||||
<h3 align="center">EmoLLM</h3>
|
||||
|
||||
<p align="center">
|
||||
<a href="README.md">简体中文</a> | English
|
||||
<br />
|
||||
<br />
|
||||
<a href="https://github.com/aJupyter/EmoLLM"><strong>Explore the documentation of this project »</strong></a>
|
||||
<br />
|
||||
<br />
|
||||
<a href="https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0">EmoLLM 2.0 Demo</a>
|
||||
·
|
||||
<a href="https://github.com/aJupyter/EmoLLM/issues">Report a Bug</a>
|
||||
·
|
||||
<a href="https://github.com/aJupyter/EmoLLM/issues">Propose a New Feature</a>
|
||||
</p>
|
||||
|
||||
</p>
|
||||
|
||||
<!-- 本篇README.md面向开发者 -->
|
||||
|
||||
|
||||
**EmoLLM** is a series of large language models designed to understand, support and help customers in mental health counseling. It is fine-tuned from the LLM instructions. We really appreciate it if you could give it a star~⭐⭐. The open-sourced configuration is as follows:
|
||||
|
||||
<div align="center">
|
||||
|
||||
| Model | Type |
|
||||
| :-------------------: | :------: |
|
||||
| InternLM2_7B_chat | QLORA |
|
||||
| InternLM2_7B_chat | full fine-tuning |
|
||||
| InternLM2_1_8B_chat | full fine-tuning |
|
||||
| InternLM2_20B_chat | LORA |
|
||||
| Qwen_7b_chat | QLORA |
|
||||
| Qwen1_5-0_5B-Chat | full fine-tuning |
|
||||
| Baichuan2_13B_chat | QLORA |
|
||||
| ChatGLM3_6B | LORA |
|
||||
| DeepSeek MoE_16B_chat | QLORA |
|
||||
| Mixtral 8x7B_instruct | QLORA |
|
||||
| …… | …… |
|
||||
|
||||
</div>
|
||||
|
||||
Everyone is welcome to contribute to this project ~
|
||||
|
||||
---
|
||||
|
||||
The Model aims to fully understand and promote the mental health of individuals, groups, and society. This model typically includes the following key components:
|
||||
|
||||
- Cognitive factors: Involving an individual's thought patterns, belief systems, cognitive biases, and problem-solving abilities. Cognitive factors significantly impact mental health as they affect how individuals interpret and respond to life events.
|
||||
- Emotional factors: Including emotion regulation, emotional expression, and emotional experiences. Emotional health is a crucial part of mental health, involving how individuals manage and express their emotions and how they recover from negative emotions.
|
||||
- Behavioral factors: Concerning an individual's behavior patterns, habits, and coping strategies. This includes stress management skills, social skills, and self-efficacy, which is the confidence in one's abilities.
|
||||
- Social environment: Comprising external factors such as family, work, community, and cultural background, which have direct and indirect impacts on an individual's mental health.
|
||||
- Physical health: There is a close relationship between physical and mental health. Good physical health can promote mental health and vice versa.
|
||||
- Psychological resilience: Refers to an individual's ability to recover from adversity and adapt. Those with strong psychological resilience can bounce back from challenges and learn and grow from them.
|
||||
- Prevention and intervention measures: The Mental Health Grand Model also includes strategies for preventing psychological issues and promoting mental health, such as psychological education, counseling, therapy, and social support systems.
|
||||
- Assessment and diagnostic tools: Effective promotion of mental health requires scientific tools to assess individuals' psychological states and diagnose potential psychological issues.
|
||||
### Recent Updates
|
||||
- 【2024.3.12】 Released on Baidu Flying Pulp Platform [aiwei](https://aistudio.baidu.com/community/app/63335)
|
||||
- 【2024.3.11】 **EmoLLM V2.0 is greatly improved in all scores compared to EmoLLM V1.0. Surpasses the performance of Role-playing ChatGPT on counseling tasks!** [Click to experience EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0), update [dataset statistics and details](./datasets/), [Roadmap](./assets/Roadmap_ZH.png)
|
||||
- 【2024.3.9】 Add concurrency acceleration [QA pair generation](./scripts/qa_generation/), [RAG pipeline](./rag/)
|
||||
- 【2024.3.3】 [Based on InternLM2-7B-chat full fine-tuned version EmoLLM V2.0 open sourced](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full), need two A100*80G, update professional evaluation, see [evaluate](./evaluate/), update PaddleOCR-based PDF to txt tool scripts, see [scripts](./scripts/).
|
||||
- 【2024.2.29】 Updated objective assessment calculations, see [evaluate](./evaluate/) for details. A series of datasets have also been updated, see [datasets](./datasets/) for details.
|
||||
- 【2024.2.27】 Updated English README and a series of datasets (licking dogs and one-round dialogue)
|
||||
- 【2024.2.23】The "Gentle Lady Psychologist Ai Wei" based on InternLM2_7B_chat_qlora was launched. [Click here to obtain the model weights](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei), [configuration file](xtuner_config/aiwei-internlm2_chat_7b_qlora.py), [online experience link](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei)
|
||||
|
||||
- 【2024.2.23】Updated [several fine-tuning configurations](/xtuner_config/), added [data_pro.json](/datasets/data_pro.json) (more quantity, more comprehensive scenarios, richer content) and [aiwei.json](/datasets/aiwei.json) (dedicated to the gentle lady role-play, featuring Emoji expressions), the "Gentle Lady Psychologist Ai Wei" is coming soon.
|
||||
|
||||
- 【2024.2.18】 The full fine-tuned version based on Qwen1_5-0_5B-Chat has been [open-sourced](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary). Friends with limited computational resources can now dive in and explore it.
|
||||
|
||||
|
||||
<details>
|
||||
<summary>View More</summary>
|
||||
|
||||
- 【2024.2.6】 [Open-sourced based on the Qwen1_5-0_5B-Chat full-scale fine-tuned version](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary), friends with limited computing power can start experimenting~
|
||||
|
||||
<p align="center">
|
||||
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/7e931682-c54d-4ded-bc67-79130c68d744" alt="模型下载量">
|
||||
</p>
|
||||
|
||||
- 【2024.2.5】 The project has been promoted by the official WeChat account NLP Engineering. Here's the [link](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) to the article. Welcome everyone to follow!! 🥳🥳
|
||||
|
||||
<p align="center">
|
||||
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/47868d6a-2e91-4aa9-a630-e594c14295b4" alt="公众号二维码">
|
||||
</p>
|
||||
|
||||
- 【2024.2.3】 [Project Vedio](https://www.bilibili.com/video/BV1N7421N76X/) at bilibili 😊
|
||||
- 【2024.1.27】 Complete data construction documentation, fine-tuning guide, deployment guide, Readme, and other related documents 👏
|
||||
- 【2024.1.25】 EmoLLM V1.0 has deployed online https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀
|
||||
|
||||
</details>
|
||||
|
||||
### Roadmap
|
||||
|
||||
<p align="center">
|
||||
<a href="https://github.com/aJupyter/EmoLLM/">
|
||||
<img src="assets/Roadmap_EN.png" alt="Roadmap_EN">
|
||||
</a>
|
||||
|
||||
## Contents
|
||||
|
||||
- [EmoLLM - Large Language Model for Mental Health](#emollm---large-language-model-for-mental-health)
|
||||
- [Recent Updates](#recent-updates)
|
||||
- [Roadmap](#roadmap)
|
||||
- [Contents](#contents)
|
||||
- [Pre-development Configuration Requirements.](#pre-development-configuration-requirements)
|
||||
- [**User Guide**](#user-guide)
|
||||
- [File Directory Explanation](#file-directory-explanation)
|
||||
- [Data Construction](#data-construction)
|
||||
- [Fine-tuning Guide](#fine-tuning-guide)
|
||||
- [Deployment Guide](#deployment-guide)
|
||||
- [RAG (Retrieval Augmented Generation) Pipeline](#rag-retrieval-augmented-generation-pipeline)
|
||||
- [Frameworks Used](#frameworks-used)
|
||||
- [How to participate in this project](#how-to-participate-in-this-project)
|
||||
- [Version control](#version-control)
|
||||
- [Authors (in no particular order)](#authors-in-no-particular-order)
|
||||
- [Copyright Notice](#copyright-notice)
|
||||
- [Acknowledgments](#acknowledgments)
|
||||
- [Star History](#star-history)
|
||||
- [🌟 Contributors](#-contributors)
|
||||
- [Communication group](#communication-group)
|
||||
|
||||
###### Pre-development Configuration Requirements.
|
||||
|
||||
- A100 40G (specifically for InternLM2_7B_chat + qlora fine-tuning + deepspeed zero2 optimization)
|
||||
|
||||
###### **User Guide**
|
||||
|
||||
1. Clone the repo
|
||||
|
||||
```sh
|
||||
git clone https://github.com/SmartFlowAI/EmoLLM.git
|
||||
```
|
||||
|
||||
1. Read in sequence or read sections you're interested in:
|
||||
- [File Directory Explanation](#file-directory-explanation)
|
||||
- [Data Construction](#data-construction)
|
||||
- [Fine-tuning Guide](#fine-tuning-guide)
|
||||
- [Deployment Guide](#deployment-guide)
|
||||
- View More Details
|
||||
|
||||
|
||||
|
||||
### File Directory Explanation
|
||||
|
||||
```
|
||||
├─assets: Image Resources
|
||||
├─datasets: Dataset
|
||||
├─demo: demo scripts
|
||||
├─generate_data: Data Generation Guide
|
||||
│ └─xinghuo
|
||||
├─scripts: Some Available Tools
|
||||
└─xtuner_config:Fine-tuning Guide
|
||||
└─images
|
||||
```
|
||||
|
||||
### Data Construction
|
||||
|
||||
- Please read the [Data Construction Guide ](generate_data/tutorial.md)for reference.
|
||||
|
||||
- The dataset used for this fine-tuning can be found at [datasets](datasets/data.json)
|
||||
|
||||
### Fine-tuning Guide
|
||||
|
||||
For details, see the [fine-tuning guide](xtuner_config/README.md)
|
||||
|
||||
### Deployment Guide
|
||||
|
||||
- Demo deployment: see [deployment guide](./demo/README.md) for details.
|
||||
- Quantitative deployment based on [LMDeploy](https://github.com/InternLM/lmdeploy/): see [deploy](./deploy/lmdeploy.md)
|
||||
|
||||
|
||||
### RAG (Retrieval Augmented Generation) Pipeline
|
||||
- See [RAG](./rag/)
|
||||
|
||||
<details>
|
||||
<summary>Additional Details</summary>
|
||||
|
||||
### Frameworks Used
|
||||
|
||||
- [Xtuner](https://github.com/InternLM/xtuner)
|
||||
- [Transformers](https://github.com/huggingface/transformers)
|
||||
- [Pytorch](https://pytorch.org/)
|
||||
- [LMDeploy](https://github.com/InternLM/lmdeploy/): for quantitative deployment
|
||||
- [Stremlit](https://streamlit.io/): for building demos
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed): for parallel training
|
||||
- …
|
||||
|
||||
#### How to participate in this project
|
||||
|
||||
Contributions make the open-source community an excellent place for learning, inspiration, and creation. Any contribution you make is greatly appreciated.
|
||||
|
||||
1. Fork the Project
|
||||
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
|
||||
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
|
||||
4. Push to the Branch (`git push origin feature/AmazingFeature`)
|
||||
5. Open a Pull Request
|
||||
|
||||
### Version control
|
||||
|
||||
This project uses Git for version control. You can see the currently available versions in the repository.
|
||||
|
||||
</details>
|
||||
|
||||
### Authors (in no particular order)
|
||||
|
||||
| Username | School/Organization | Remarks | Contributions |
|
||||
| :-------: | :-------------------: | :------------------: | :--------: |
|
||||
| [aJupyter](https://github.com/aJupyter) | Nankai University, Master's student | DataWhale member | Project initiator |
|
||||
| [jujimeizuo](https://github.com/jujimeizuo) | Jiangnan University, Master's student | | |
|
||||
| [Smiling-Weeping-zhr](https://github.com/Smiling-Weeping-zhr) | Harbin Institute of Technology (Weihai), Undergraduate student | | |
|
||||
| [8baby8](https://github.com/8baby8) | PaddlePaddle Pilot Team Regional Director | Wenxin Large Model core developer | |
|
||||
| [zxazys](https://github.com/zxazys) | Nankai University, Master's student | | |
|
||||
| [MING-ZCH](https://github.com/MING-ZCH) | Huazhong University of Science and Technology, Undergraduate student | | |
|
||||
| [JasonLLLLLLLLLLL](https://github.com/JasonLLLLLLLLLLL) | SWUFE (Southwestern University of Finance and Economics) | | |
|
||||
| [MrCatAI](https://github.com/MrCatAI) | AI Mover | | |
|
||||
| [ZeyuBa](https://github.com/ZeyuBa) | Institute of Automation, Master's student | | |
|
||||
| [aiyinyuedejustin](https://github.com/aiyinyuedejustin) | University of Pennsylvania, Master's student | | |
|
||||
| [Nobody-ML](https://github.com/Nobody-ML) | China University of Petroleum (East China), Undergraduate student | | |
|
||||
| [chg0901](https://github.com/chg0901) | [MiniSora](https://github.com/mini-sora/minisora) |Maintainer and Admin|Data Cleaning and Docs Translation|
|
||||
| [Mxoder](https://github.com/Mxoder) | Beihang University, Undergraduate student | | |
|
||||
| [Anooyman](https://github.com/Anooyman) | Nanjing University of Science and Technology, Master's student | | |
|
||||
| [Vicky-3021](https://github.com/Vicky-3021) | Xidian University, Master's student (Research Year 0) | | |
|
||||
| [SantiagoTOP](https://github.com/santiagoTOP) | Taiyuan University of Technology, Master's student | | |
|
||||
| [zealot52099](https://github.com/zealot52099) | AI Mover | |Data Processing and RAG|
|
||||
|
||||
### Copyright Notice
|
||||
|
||||
The project is licensed under the MIT License. Please refer to the details
|
||||
[LICENSE](https://github.com/aJupyter/EmoLLM/blob/master/LICENSE)
|
||||
|
||||
### Acknowledgments
|
||||
|
||||
- [Sanbu](https://github.com/sanbuphy)
|
||||
- [Shanghai Artificial Intelligence Laboratory](https://www.shlab.org.cn/)
|
||||
- [Vanin](https://github.com/vansin)
|
||||
- [Bloom up (WeChat Official Account Promotion)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
|
||||
- Abu (M.A. in Psychology, Peking University)
|
||||
|
||||
<!-- links -->
|
||||
|
||||
<!-- [linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555 -->
|
||||
|
||||
<!-- [linkedin-url]: https://linkedin.com/in/aJupyter -->
|
||||
|
||||
<!-- 太少了,没必要放 -->
|
||||
|
||||
## Star History
|
||||
|
||||
[](https://star-history.com/#SmartFlowAI/EmoLLM&Date)
|
||||
|
||||
## 🌟 Contributors
|
||||
|
||||
[](https://github.com/SmartFlowAI/EmoLLM/graphs/contributors)
|
||||
|
||||
[your-project-path]: SmartflowAI/EmoLLM
|
||||
[contributors-shield]: https://img.shields.io/github/contributors/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[contributors-url]: https://github.com/SmartflowAI/EmoLLM/graphs/contributors
|
||||
[forks-shield]: https://img.shields.io/github/forks/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[forks-url]: https://github.com/SmartflowAI/EmoLLM/network/members
|
||||
[stars-shield]: https://img.shields.io/github/stars/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[stars-url]: https://github.com/SmartflowAI/EmoLLM/stargazers
|
||||
[issues-shield]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[issues-url]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg
|
||||
[license-shield]: https://img.shields.io/github/license/SmartflowAI/EmoLLM.svg?style=flat-square
|
||||
[license-url]: https://github.com/SmartflowAI/EmoLLM/blob/main/LICENSE
|
||||
|
||||
[OpenXLab_App-image]: https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg
|
||||
[OpenXLab_Model-image]: https://cdn-static.openxlab.org.cn/header/openxlab_models.svg
|
||||
[OpenXLab_App-url]: https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0
|
||||
[OpenXLab_Model-url]: https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full
|
||||
|
||||
## Communication group
|
||||
- If it fails, go to the Issue section.
|
||||
|
||||
<p align="center">
|
||||
<img width="30%" src="https://github.com/SmartFlowAI/EmoLLM/assets/62385492/55ecd0aa-4832-4269-ad57-4c26f9aa286b" alt="EmoLLM official communication group">
|
||||
</p>
|
||||
|
BIN
assets/框架图.png
Normal file
BIN
assets/框架图.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 204 KiB |
60
generate_data/final_data/merge_jsonl.py
Normal file
60
generate_data/final_data/merge_jsonl.py
Normal file
@ -0,0 +1,60 @@
|
||||
import json
|
||||
import os
|
||||
|
||||
|
||||
def save_merge_json(data_lis, file_path):
|
||||
with open(file_path, 'wt', encoding='utf-8') as file:
|
||||
json.dump(data_lis, file, ensure_ascii=False, separators=(',\n',':'))
|
||||
|
||||
|
||||
def get_all_file_paths(folder_path, file_type='.jsonl'):
|
||||
# 确保传入的是一个目录
|
||||
if not os.path.isdir(folder_path):
|
||||
raise ValueError(f"{folder_path} is not a valid directory")
|
||||
|
||||
# 获取文件夹下所有文件的路径
|
||||
file_paths = [os.path.join(folder_path, file) for file in os.listdir(
|
||||
folder_path) if os.path.isfile(os.path.join(folder_path, file)) and (file_type in file)]
|
||||
return file_paths
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
conversion_lis = []
|
||||
|
||||
folder_path = r'./'
|
||||
|
||||
merge_path = folder_path.split('/')[-1]
|
||||
try:
|
||||
merge_last_path = folder_path.split('/')[-2] if folder_path.split('/')[-2]!='.' else ''
|
||||
except:
|
||||
merge_last_path = ''
|
||||
print(f'merge_path={merge_path},merge_last_path={merge_last_path}')
|
||||
|
||||
|
||||
for path in get_all_file_paths(folder_path):
|
||||
print(path)
|
||||
|
||||
with open(path, 'rt', encoding='utf-8') as file:
|
||||
for line in file:
|
||||
# # 移除行尾的换行符
|
||||
# if line == '\n':
|
||||
# line = line.rstrip('\n')
|
||||
line = line.rstrip('\n')
|
||||
# 解析JSON
|
||||
try:
|
||||
data = json.loads(line)
|
||||
conversion_lis.append(data)
|
||||
# conversion_lis.append('\n')
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error decoding JSON: {e}")
|
||||
|
||||
if merge_last_path!='':
|
||||
save_merge_json_path = rf'./{merge_last_path}/{merge_path}_merge.json'
|
||||
elif merge_path!='':
|
||||
save_merge_json_path = rf'./{merge_path}_merge.json'
|
||||
else:
|
||||
save_merge_json_path = rf'./curr_merge.json'
|
||||
|
||||
save_merge_json(data_lis=conversion_lis,
|
||||
file_path=save_merge_json_path)
|
||||
print(len(conversion_lis),save_merge_json_path)
|
75
generate_data/final_data/merge_jsonl_r.py
Normal file
75
generate_data/final_data/merge_jsonl_r.py
Normal file
@ -0,0 +1,75 @@
|
||||
import json
|
||||
import os
|
||||
|
||||
|
||||
def save_merge_json(data_lis, file_path):
|
||||
with open(file_path, 'wt', encoding='utf-8') as file:
|
||||
json.dump(data_lis, file, ensure_ascii=False, separators=(',\n',':'))
|
||||
|
||||
|
||||
def get_all_file_paths(folder_path, file_type='.jsonl'):
|
||||
# 确保传入的是一个目录
|
||||
if not os.path.isdir(folder_path):
|
||||
raise ValueError(f"{folder_path} is not a valid directory")
|
||||
|
||||
# 获取文件夹下所有文件的路径
|
||||
file_paths = [os.path.join(folder_path, file) for file in os.listdir(
|
||||
folder_path) if os.path.isfile(os.path.join(folder_path, file)) and (file_type in file)]
|
||||
return file_paths
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
data_ai = 'qwen' # python merge_jsonl_r.py > qwen.txt
|
||||
# data_ai = 'zhipuai' # python merge_jsonl_r.py > zhipuai.txt
|
||||
root_dir = rf'./{data_ai}/'
|
||||
|
||||
save_final_merge_json_path = f'{data_ai}_final_merge.json'
|
||||
|
||||
subfolders = [os.path.join(root_dir, d) for d in os.listdir(root_dir) if os.path.isdir(os.path.join(root_dir, d))]
|
||||
|
||||
final_list = []
|
||||
for folder_path in subfolders:
|
||||
conversion_lis = []
|
||||
merge_path = folder_path.split('/')[-1]
|
||||
try:
|
||||
merge_last_path = folder_path.split('/')[-2] if folder_path.split('/')[-2]!='.' else ''
|
||||
except:
|
||||
merge_last_path = ''
|
||||
print(f'merge_path={merge_path},merge_last_path={merge_last_path}')
|
||||
|
||||
|
||||
for path in get_all_file_paths(folder_path):
|
||||
print(path)
|
||||
|
||||
with open(path, 'rt', encoding='utf-8') as file:
|
||||
for line in file:
|
||||
# # 移除行尾的换行符
|
||||
# if line == '\n':
|
||||
# line = line.rstrip('\n')
|
||||
line = line.rstrip('\n')
|
||||
# 解析JSON
|
||||
try:
|
||||
data = json.loads(line)
|
||||
conversion_lis.append(data)
|
||||
# conversion_lis.append('\n')
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error decoding JSON: {e}")
|
||||
|
||||
if merge_last_path!='':
|
||||
save_merge_json_path = rf'./{merge_last_path}/{merge_path}_merge.json'
|
||||
elif merge_path!='':
|
||||
save_merge_json_path = rf'./{merge_path}_merge.json'
|
||||
else:
|
||||
save_merge_json_path = rf'./curr_merge.json'
|
||||
|
||||
save_merge_json(data_lis=conversion_lis,
|
||||
file_path=save_merge_json_path)
|
||||
|
||||
final_list = final_list+conversion_lis
|
||||
print(len(conversion_lis),len(final_list),save_merge_json_path)
|
||||
|
||||
save_merge_json(data_lis=final_list,file_path=save_final_merge_json_path)
|
||||
print(save_final_merge_json_path)
|
||||
|
||||
|
@ -100,7 +100,10 @@
|
||||
|
||||
5. **数据集整合**
|
||||
|
||||
在进行数据集整合之前,我们要检查生成的数据是否存在格式错误,类型不符合等情况。我们需要check.py进行检查数据。最后再使用merge_json.py将所有的json整合为一个总的json文件。
|
||||
在进行数据集整合之前,我们要检查生成的数据是否存在格式错误,类型不符合等情况。
|
||||
|
||||
* 首先使用`check.py`进行数据检查。
|
||||
* 然后使用`merge_json.py`将所有的json整合为一个总的json文件。
|
||||
|
||||
6. **评估与优化**
|
||||
|
||||
|
@ -34,11 +34,21 @@ def zhipu_api(data, emo):
|
||||
|
||||
top_p = round(random.uniform(0.1, 0.9), 2)
|
||||
messages = getText('user', prompt)
|
||||
response = client.chat.completions.create(
|
||||
model='glm-4',
|
||||
messages=messages,
|
||||
top_p=top_p,
|
||||
)
|
||||
|
||||
# Error code: 400, with error text {"error":{"code":"1301","message":
|
||||
# "系统检测到输入或生成内容可能包含不安全或敏感内容,请您避免输入易产生敏感内容的提示语,感谢您的配合。"}}
|
||||
try:
|
||||
response = client.chat.completions.create(
|
||||
model='glm-4',
|
||||
messages=messages,
|
||||
top_p=top_p,
|
||||
)
|
||||
except:
|
||||
response = client.chat.completions.create(
|
||||
model='glm-4',
|
||||
messages=messages,
|
||||
top_p=top_p,
|
||||
)
|
||||
|
||||
return response.choices[0].message.content
|
||||
|
||||
|
@ -1,11 +0,0 @@
|
||||
# 清洗 QA 对
|
||||
调用qwen去判断当前QA对是否属于心理学范畴,去除非心理学范畴的 QA 对
|
||||
|
||||
## Step 1
|
||||
1. 准备好需要清洗的 QA 对数据
|
||||
2. 将该数据放进 model 同级 data 文件夹下
|
||||
3. 根据文件夹名去修改 config/config.py 中的 judge_dir。我个人没有对文件名进行更改,所以我的judge_dir是 judge_dir = os.path.join(data_dir, '数据整合')
|
||||
|
||||
## Step 2
|
||||
1. 运行QA_clean.py即可
|
||||
2. 清洗完的 QA 对会以 jsonl 的格式存在 data/cleaned 下
|
@ -93,3 +93,34 @@
|
||||
## **步骤四:清洗QA对**
|
||||
|
||||
- 清洗目的
|
||||
|
||||
- 提高提取的QA数据质量,清理掉与心理学无关的QA对
|
||||
|
||||
- 清洗方法
|
||||
|
||||
- 使用Prompt方法,驱动LLM对给出的QA对进行判断
|
||||
|
||||
- **参考Prompt**
|
||||
|
||||
- ```markdown
|
||||
你是一名经验丰富的心理咨询师,熟悉心理学相关知识。根据我提供的 QA 对,来判断这个 QA 对是否属于心理学范畴。
|
||||
|
||||
标准如下:
|
||||
|
||||
- 若当前 QA 对属于心理学范畴,则返回1
|
||||
- 若当前 QA 对不属于心理学范畴,则返回0
|
||||
|
||||
|
||||
以下是给定的心理学 QA 对内容:
|
||||
```
|
||||
|
||||
- 清洗工具
|
||||
- 配置`config/config.py` 中的 `DASHSCOPE_API_KEY`,`API_KEY`获取方法见步骤三
|
||||
- 使用提供的清洗脚本[QA_Clear](https://github.com/SmartFlowAI/EmoLLM/blob/main/scripts/qa_generation/QA_clean.py)
|
||||
|
||||
- 使用方法
|
||||
- 准备好需要清洗的 QA 对数据
|
||||
- 将该数据放进 model 同级 data 文件夹下
|
||||
- 根据文件夹名去修改 `config/config.py` 中的 `judge_dir`。
|
||||
- 如存储数据的文件名为`xxx`,则`judge_dir`是 `judge_dir = os.path.join(data_dir, 'xxx')`
|
||||
- 清洗完的 QA 对会以 `jsonl` 的格式存在 `data/cleaned` 下
|
||||
|
@ -93,3 +93,40 @@ Using books specialized in psychology to build QA knowledge pairs for RAG to pro
|
||||
## **Step 4: Cleaning of QA pairs**
|
||||
|
||||
- Purpose of cleaning
|
||||
- Improve the quality of extracted QA data and clean out QA pairs that are not relevant to psychology
|
||||
|
||||
- Cleaning Methods
|
||||
|
||||
- Use the Prompt method to drive the LLM to make a judgment on the given QA pairs
|
||||
|
||||
- **Reference to Prompt**
|
||||
|
||||
- ```markdown
|
||||
You are an experienced counselor and are familiar with psychology. Based on the QA pair I have provided, determine if this QA pair is psychological in nature.
|
||||
|
||||
The criteria are as follows:
|
||||
|
||||
- If the current QA pair belongs to the category of psychology, then return 1
|
||||
- If the current QA pair does not belong to the category of psychology, then return 0.
|
||||
|
||||
|
||||
The following is the content of the given psychology QA pair:
|
||||
```
|
||||
|
||||
- Cleaning Tools
|
||||
|
||||
- Configure `DASHSCOPE_API_KEY` in `config/config.py`, see step 3 for how to get `API_KEY`.
|
||||
|
||||
- Use the provided cleaning script [QA_Clear](https://github.com/SmartFlowAI/EmoLLM/blob/main/scripts/qa_generation/QA_clean.py)
|
||||
|
||||
- How to use
|
||||
|
||||
- Prepare the QA pair data to be cleaned
|
||||
|
||||
- Put the data into the data folder of the same level as the model.
|
||||
|
||||
- Modify `judge_dir` in `config/config.py` according to the folder name.
|
||||
|
||||
- If the file name of the stored data is `xxx`, then `judge_dir` is `judge_dir = os.path.join(data_dir, 'xxx')`.
|
||||
|
||||
- The cleaned QA pairs are stored as `jsonl` under `data/cleaned`.
|
||||
|
Loading…
Reference in New Issue
Block a user