diff --git a/.gitignore b/.gitignore index 8713d9b..a4d6434 100644 --- a/.gitignore +++ b/.gitignore @@ -6,6 +6,11 @@ data/ pdf/ .idea/ +*.jsonl +*.json +# ./generate_data/*.josnl +# ./generate_data/*/*/*.josnl + # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] @@ -169,3 +174,4 @@ cython_debug/ # and can be added to the global gitignore or merged into this file. For a more nuclear # option (not recommended) you can uncomment the following to ignore the entire idea folder. #.idea/ + diff --git a/README.md b/README.md index fdc606b..b2abf79 100644 --- a/README.md +++ b/README.md @@ -1,287 +1,296 @@ -
- -# EmoLLM-心理健康大模型 - -
- -

- - Logo - - -

- - -[![Contributors][contributors-shield]][contributors-url] -[![Forks][forks-shield]][forks-url] -[![Issues][issues-shield]][issues-url] -[![OpenXLab_App][OpenXLab_App-image]][OpenXLab_App-url] -[![OpenXLab_Model][OpenXLab_Model-image]][OpenXLab_Model-url] -[![MIT License][license-shield]][license-url] -[![Stargazers][stars-shield]][stars-url] - -
- -

EmoLLM

- -
- 简体中文| English -
-
- 探索本项目的文档 » -
-
- 体验EmoLLM 2.0 - · - 报告Bug - · - 提出新特性 -
- - - - -**EmoLLM** 是一系列能够支持 **理解用户-支持用户-帮助用户** 心理健康辅导链路的心理健康大模型,由 `LLM`指令微调而来,欢迎大家star~⭐⭐。目前已经开源的 `LLM` 微调配置如下: - -
- -| 模型 | 类型 | -| :-------------------: | :------: | -| InternLM2_7B_chat | QLORA | -| InternLM2_7B_chat | 全量微调 | -| InternLM2_1_8B_chat | 全量微调 | -| InternLM2_20B_chat | LORA | -| Qwen_7b_chat | QLORA | -| Qwen1_5-0_5B-Chat | 全量微调 | -| Baichuan2_13B_chat | QLORA | -| ChatGLM3_6B | LORA | -| DeepSeek MoE_16B_chat | QLORA | -| Mixtral 8x7B_instruct | QLORA | -| …… | …… | - -
- -欢迎大家为本项目做出贡献~ - ---- - -心理健康大模型(Mental Health Grand Model)是一个综合性的概念,它旨在全面理解和促进个体、群体乃至整个社会的心理健康状态。这个模型通常包含以下几个关键组成部分: - -- 认知因素:涉及个体的思维模式、信念系统、认知偏差以及解决问题的能力。认知因素对心理健康有重要影响,因为它们影响个体如何解释和应对生活中的事件。 -- 情感因素:包括情绪调节、情感表达和情感体验。情感健康是心理健康的重要组成部分,涉及个体如何管理和表达自己的情感,以及如何从负面情绪中恢复。 -- 行为因素:涉及个体的行为模式、习惯和应对策略。这包括应对压力的技巧、社交技能以及自我效能感,即个体对自己能力的信心。 -- 社会环境:包括家庭、工作、社区和文化背景等外部因素,这些因素对个体的心理健康有着直接和间接的影响。 -- 生理健康:身体健康与心理健康紧密相关。良好的身体健康可以促进心理健康,反之亦然。 -- 心理韧性:指个体在面对逆境时的恢复力和适应能力。心理韧性强的人更能够从挑战中恢复,并从中学习和成长。 -- 预防和干预措施:心理健康大模型还包括预防心理问题和促进心理健康的策略,如心理教育、心理咨询、心理治疗和社会支持系统。 -- 评估和诊断工具:为了有效促进心理健康,需要有科学的工具来评估个体的心理状态,以及诊断可能存在的心理问题。 - -### 🎇最近更新 - -- 【2024.3.12】在百度飞浆平台发布[艾薇](https://aistudio.baidu.com/community/app/63335) -- 【2024.3.11】 **EmoLLM V2.0 相比 EmoLLM V1.0 全面提升,已超越 Role-playing ChatGPT 在心理咨询任务上的能力!**[点击体验EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0),更新[数据集统计及详细信息](./datasets/)、[路线图](./assets/Roadmap_ZH.png) -- 【2024.3.9】 新增并发功能加速 [QA 对生成](./scripts/qa_generation/)、[RAG pipeline](./rag/) -- 【2024.3.3】 [基于InternLM2-7B-chat全量微调版本EmoLLM V2.0开源](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full),需要两块A100*80G,更新专业评估,详见[evaluate](./evaluate/),更新基于PaddleOCR的PDF转txt工具脚本,详见[scripts](./scripts/) -- 【2024.2.29】更新客观评估计算,详见[evaluate](./evaluate/),更新一系列数据集,详见[datasets](./datasets/) -- 【2024.2.27】更新英文readme和一系列数据集(舔狗和单轮对话) -- 【2024.2.23】推出基于InternLM2_7B_chat_qlora的 `温柔御姐心理医生艾薇`,[点击获取模型权重](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei),[配置文件](xtuner_config/aiwei-internlm2_chat_7b_qlora.py),[在线体验链接](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei) -- 【2024.2.23】更新[若干微调配置](/xtuner_config/),新增 [data_pro.json](/datasets/data_pro.json)(数量更多、场景更全、更丰富)和 [aiwei.json](/datasets/aiwei.json)(温柔御姐角色扮演专用,带有Emoji表情),即将推出 `温柔御姐心理医生艾薇` -- 【2024.2.18】 [基于Qwen1_5-0_5B-Chat全量微调版本开源](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary),算力有限的道友可以玩起来~ - -
-查看更多 - -- 【2024.2.6】 EmoLLM在[**Openxlab** ](https://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model) 平台下载量高达18.7k,欢迎大家体验! - -

- 模型下载量 -

- -- 【2024.2.5】 项目荣获公众号**NLP工程化**推文宣传[推文链接](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A),为博主推广一波,欢迎大家关注!!🥳🥳 - -

- 公众号二维码 -

- -- 【2024.2.3】 [项目宣传视频](https://www.bilibili.com/video/BV1N7421N76X/)完成 😊 -- 【2024.1.27】 完善数据构建文档、微调指南、部署指南、Readme等相关文档 👏 -- 【2024.1.25】 EmoLLM V1.0 已部署上线 https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀 - -
- -### 🎯路线图 - -

- - Roadmap_ZH - - -## 目录 - -- [EmoLLM-心理健康大模型](#emollm-心理健康大模型) - - [🎇最近更新](#最近更新) - - [🎯路线图](#路线图) - - [目录](#目录) - - [开发前的配置要求](#开发前的配置要求) - - [**使用指南**](#使用指南) - - [数据构建](#数据构建) - - [微调指南](#微调指南) - - [部署指南](#部署指南) - - [RAG(检索增强生成)Pipeline](#rag检索增强生成pipeline) - - [使用到的框架](#使用到的框架) - - [如何参与本项目](#如何参与本项目) - - [作者(排名不分先后)](#作者排名不分先后) - - [版权说明](#版权说明) - - [特别鸣谢](#特别鸣谢) - - [Star History](#star-history) - - [🌟 Contributors](#-contributors) - - [交流群](#交流群) - -###### 开发前的配置要求 - -- 硬件:A100 40G(仅针对InternLM2_7B_chat+qlora微调+deepspeed zero2优化) - -###### **使用指南** - -1. Clone the repo - -```sh -git clone https://github.com/SmartFlowAI/EmoLLM.git -``` - -2. 依次阅读或者选择感兴趣的部分阅读: - - [数据构建](#数据构建) - - [微调指南](#微调指南) - - [部署指南](#部署指南) - - [RAG](#rag检索增强生成pipeline) - - 查看更多详情 - -### 数据构建 - -- 请阅读[数据构建指南](generate_data/tutorial.md)查阅 - -- 微调用到的数据集见[datasets](datasets/data.json) - -### 微调指南 - -详见[微调指南](xtuner_config/README.md) - -### 部署指南 - -- Demo部署:详见[部署指南](demo/README.md) -- 基于[LMDeploy](https://github.com/InternLM/lmdeploy/)的量化部署:详见[deploy](./deploy/lmdeploy.md) - -### RAG(检索增强生成)Pipeline - -- 详见[RAG](./rag/) - -

-更多详情 - -### 使用到的框架 - -- [Xtuner](https://github.com/InternLM/xtuner):用于微调 -- [Transformers](https://github.com/huggingface/transformers) -- [Pytorch](https://pytorch.org/) -- [LMDeploy](https://github.com/InternLM/lmdeploy/):用于量化部署 -- [Stremlit](https://streamlit.io/):用于构建Demo -- [DeepSpeed](https://github.com/microsoft/DeepSpeed):并行训练 -- … - -#### 如何参与本项目 - -贡献使开源社区成为一个学习、激励和创造的绝佳场所。你所作的任何贡献都是**非常感谢**的。 - -1. Fork the Project -2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`) -3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`) -4. Push to the Branch (`git push origin feature/AmazingFeature`) -5. Open a Pull Request - -
- -### 作者(排名不分先后) - -| 用户名 | 学校/组织 | 备注 | 贡献 | -| :----------: | :--------------------: | :-------------------: | :----------: | -| [aJupyter](https://github.com/aJupyter) | 南开大学在读硕士 | DataWhale成员 | 项目发起人 | -| [jujimeizuo](https://github.com/jujimeizuo) | 江南大学在读硕士 | | | -| [Smiling-Weeping-zhr](https://github.com/Smiling-Weeping-zhr) | 哈尔滨工业大学(威海)在读本科生 | | | -| [8baby8](https://github.com/8baby8) | 飞桨领航团区域主管 | 文心大模型核心开发者 | | -| [zxazys](https://github.com/zxazys) | 南开大学在读硕士 | | | -| [MING-ZCH](https://github.com/MING-ZCH) | 华中科技大学在读本科生 | | | -| [JasonLLLLLLLLLLL](https://github.com/JasonLLLLLLLLLLL) | swufe | | | -| [MrCatAI](https://github.com/MrCatAI) | AI搬用工 | | | -| [ZeyuBa](https://github.com/ZeyuBa) | 自动化所在读硕士 | | | -| [aiyinyuedejustin](https://github.com/aiyinyuedejustin) | 宾夕法尼亚大学在读硕士 | | | -| [Nobody-ML](https://github.com/Nobody-ML) | 中国石油大学(华东)在读本科生 | | | -| [chg0901](https://github.com/chg0901) | [MiniSora](https://github.com/mini-sora/minisora/) |MiniSora主要维护|数据清洗、文档翻译| -| [Mxoder](https://github.com/Mxoder) | 北京航空航天大学在读本科生 | | | -| [Anooyman](https://github.com/Anooyman) | 南京理工大学硕士 | | | -| [Vicky-3021](https://github.com/Vicky-3021) | 西安电子科技大学硕士(研0) | | | -| [SantiagoTOP](https://github.com/santiagoTOP) | 太原理工大学在读硕士 | | | - -### 版权说明 - -该项目签署了 MIT 授权许可,详情请参阅 [LICENSE](https://github.com/SmartFlowAI/EmoLLM/blob/main/LICENSE) - - -### 引用 -如果本项目对您的工作有所帮助,请使用以下格式引用: - -```bibtex -@misc{EmoLLM, - title={EmoLLM}, - author={EmoLLM}, - url={https://github.com/SmartFlowAI/EmoLLM/}, - year={2024} -} -``` - -### 特别鸣谢 - -- [Sanbu](https://github.com/sanbuphy) -- [上海人工智能实验室](https://www.shlab.org.cn/) -- [闻星大佬(小助手)](https://github.com/vansin) -- [扫地升(公众号宣传)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) -- 阿布(北大心理学硕士) - - - - - - - -## Star History - -[![Star History Chart](https://api.star-history.com/svg?repos=SmartFlowAI/EmoLLM&type=Date)](https://star-history.com/#SmartFlowAI/EmoLLM&Date) - -## 🌟 Contributors - -[![EmoLLM contributors](https://contrib.rocks/image?repo=SmartFlowAI/EmoLLM&max=50)](https://github.com/SmartFlowAI/EmoLLM/graphs/contributors) - -[your-project-path]: SmartflowAI/EmoLLM -[contributors-shield]: https://img.shields.io/github/contributors/SmartflowAI/EmoLLM.svg?style=flat-square -[contributors-url]: https://github.com/SmartflowAI/EmoLLM/graphs/contributors -[forks-shield]: https://img.shields.io/github/forks/SmartflowAI/EmoLLM.svg?style=flat-square -[forks-url]: https://github.com/SmartflowAI/EmoLLM/network/members -[stars-shield]: https://img.shields.io/github/stars/SmartflowAI/EmoLLM.svg?style=flat-square -[stars-url]: https://github.com/SmartflowAI/EmoLLM/stargazers -[issues-shield]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg?style=flat-square -[issues-url]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg -[license-shield]: https://img.shields.io/github/license/SmartflowAI/EmoLLM.svg?style=flat-square -[license-url]: https://github.com/SmartFlowAI/EmoLLM/blob/main/LICENSE - -[OpenXLab_App-image]: https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg -[OpenXLab_Model-image]: https://cdn-static.openxlab.org.cn/header/openxlab_models.svg -[OpenXLab_App-url]: https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0 -[OpenXLab_Model-url]: https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full - - -## 交流群 - -- 如果失效,请移步Issue区 - -

- EmoLLM官方交流群 -

+
+ +# EmoLLM-心理健康大模型 + +
+ +

+ + Logo + + +

+ + +[![Contributors][contributors-shield]][contributors-url] +[![Forks][forks-shield]][forks-url] +[![Issues][issues-shield]][issues-url] +[![OpenXLab_App][OpenXLab_App-image]][OpenXLab_App-url] +[![OpenXLab_Model][OpenXLab_Model-image]][OpenXLab_Model-url] +[![MIT License][license-shield]][license-url] +[![Stargazers][stars-shield]][stars-url] + +
+ +

EmoLLM

+ +
+ 简体中文| English +
+
+ 探索本项目的文档 » +
+
+ 体验EmoLLM 2.0 + · + 报告Bug + · + 提出新特性 +
+ + + + +**EmoLLM** 是一系列能够支持 **理解用户-支持用户-帮助用户** 心理健康辅导链路的心理健康大模型,由 `LLM`指令微调而来,欢迎大家star~⭐⭐。目前已经开源的 `LLM` 微调配置如下: + +
+ +| 模型 | 类型 | +| :-------------------: | :------: | +| InternLM2_7B_chat | QLORA | +| InternLM2_7B_chat | 全量微调 | +| InternLM2_1_8B_chat | 全量微调 | +| InternLM2_20B_chat | LORA | +| Qwen_7b_chat | QLORA | +| Qwen1_5-0_5B-Chat | 全量微调 | +| Baichuan2_13B_chat | QLORA | +| ChatGLM3_6B | LORA | +| DeepSeek MoE_16B_chat | QLORA | +| Mixtral 8x7B_instruct | QLORA | +| …… | …… | + +
+ +欢迎大家为本项目做出贡献~ + +--- + +心理健康大模型(Mental Health Grand Model)是一个综合性的概念,它旨在全面理解和促进个体、群体乃至整个社会的心理健康状态。这个模型通常包含以下几个关键组成部分: + +- 认知因素:涉及个体的思维模式、信念系统、认知偏差以及解决问题的能力。认知因素对心理健康有重要影响,因为它们影响个体如何解释和应对生活中的事件。 +- 情感因素:包括情绪调节、情感表达和情感体验。情感健康是心理健康的重要组成部分,涉及个体如何管理和表达自己的情感,以及如何从负面情绪中恢复。 +- 行为因素:涉及个体的行为模式、习惯和应对策略。这包括应对压力的技巧、社交技能以及自我效能感,即个体对自己能力的信心。 +- 社会环境:包括家庭、工作、社区和文化背景等外部因素,这些因素对个体的心理健康有着直接和间接的影响。 +- 生理健康:身体健康与心理健康紧密相关。良好的身体健康可以促进心理健康,反之亦然。 +- 心理韧性:指个体在面对逆境时的恢复力和适应能力。心理韧性强的人更能够从挑战中恢复,并从中学习和成长。 +- 预防和干预措施:心理健康大模型还包括预防心理问题和促进心理健康的策略,如心理教育、心理咨询、心理治疗和社会支持系统。 +- 评估和诊断工具:为了有效促进心理健康,需要有科学的工具来评估个体的心理状态,以及诊断可能存在的心理问题。 + +### 🎇最近更新 + +- 【2024.3.12】在百度飞浆平台发布[艾薇](https://aistudio.baidu.com/community/app/63335) +- 【2024.3.11】 **EmoLLM V2.0 相比 EmoLLM V1.0 全面提升,已超越 Role-playing ChatGPT 在心理咨询任务上的能力!**[点击体验EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0),更新[数据集统计及详细信息](./datasets/)、[路线图](./assets/Roadmap_ZH.png) +- 【2024.3.9】 新增并发功能加速 [QA 对生成](./scripts/qa_generation/)、[RAG pipeline](./rag/) +- 【2024.3.3】 [基于InternLM2-7B-chat全量微调版本EmoLLM V2.0开源](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full),需要两块A100*80G,更新专业评估,详见[evaluate](./evaluate/),更新基于PaddleOCR的PDF转txt工具脚本,详见[scripts](./scripts/) +- 【2024.2.29】更新客观评估计算,详见[evaluate](./evaluate/),更新一系列数据集,详见[datasets](./datasets/) +- 【2024.2.27】更新英文readme和一系列数据集(舔狗和单轮对话) +- 【2024.2.23】推出基于InternLM2_7B_chat_qlora的 `温柔御姐心理医生艾薇`,[点击获取模型权重](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei),[配置文件](xtuner_config/aiwei-internlm2_chat_7b_qlora.py),[在线体验链接](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei) +- 【2024.2.23】更新[若干微调配置](/xtuner_config/),新增 [data_pro.json](/datasets/data_pro.json)(数量更多、场景更全、更丰富)和 [aiwei.json](/datasets/aiwei.json)(温柔御姐角色扮演专用,带有Emoji表情),即将推出 `温柔御姐心理医生艾薇` +- 【2024.2.18】 [基于Qwen1_5-0_5B-Chat全量微调版本开源](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary),算力有限的道友可以玩起来~ + +
+查看更多 + +- 【2024.2.6】 EmoLLM在[**Openxlab** ](https://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model) 平台下载量高达18.7k,欢迎大家体验! + +

+ 模型下载量 +

+ +- 【2024.2.5】 项目荣获公众号**NLP工程化**推文宣传[推文链接](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A),为博主推广一波,欢迎大家关注!!🥳🥳 + +

+ 公众号二维码 +

+ +- 【2024.2.3】 [项目宣传视频](https://www.bilibili.com/video/BV1N7421N76X/)完成 😊 +- 【2024.1.27】 完善数据构建文档、微调指南、部署指南、Readme等相关文档 👏 +- 【2024.1.25】 EmoLLM V1.0 已部署上线 https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀 + +
+ +### 🎯路线图 + +

+ + Roadmap_ZH + + +### 🎯框架图 + +

+ + Roadmap_ZH + + +## 目录 + +- [EmoLLM-心理健康大模型](#emollm-心理健康大模型) + - [🎇最近更新](#最近更新) + - [🎯路线图](#路线图) + - [🎯框架图](#框架图) + - [目录](#目录) + - [开发前的配置要求](#开发前的配置要求) + - [**使用指南**](#使用指南) + - [数据构建](#数据构建) + - [微调指南](#微调指南) + - [部署指南](#部署指南) + - [RAG(检索增强生成)Pipeline](#rag检索增强生成pipeline) + - [使用到的框架](#使用到的框架) + - [如何参与本项目](#如何参与本项目) + - [作者(排名不分先后)](#作者排名不分先后) + - [版权说明](#版权说明) + - [特别鸣谢](#特别鸣谢) + - [Star History](#star-history) + - [🌟 Contributors](#-contributors) + - [交流群](#交流群) + +###### 开发前的配置要求 + +- 硬件:A100 40G(仅针对InternLM2_7B_chat+qlora微调+deepspeed zero2优化) + +###### **使用指南** + +1. Clone the repo + +```sh +git clone https://github.com/SmartFlowAI/EmoLLM.git +``` + +2. 依次阅读或者选择感兴趣的部分阅读: + - [数据构建](#数据构建) + - [微调指南](#微调指南) + - [部署指南](#部署指南) + - [RAG](#rag检索增强生成pipeline) + - 查看更多详情 + +### 数据构建 + +- 请阅读[数据构建指南](generate_data/tutorial.md)查阅 + +- 微调用到的数据集见[datasets](datasets/data.json) + +### 微调指南 + +详见[微调指南](xtuner_config/README.md) + +### 部署指南 + +- Demo部署:详见[部署指南](demo/README.md) +- 基于[LMDeploy](https://github.com/InternLM/lmdeploy/)的量化部署:详见[deploy](./deploy/lmdeploy.md) + +### RAG(检索增强生成)Pipeline + +- 详见[RAG](./rag/) + +

+更多详情 + +### 使用到的框架 + +- [Xtuner](https://github.com/InternLM/xtuner):用于微调 +- [Transformers](https://github.com/huggingface/transformers) +- [Pytorch](https://pytorch.org/) +- [LMDeploy](https://github.com/InternLM/lmdeploy/):用于量化部署 +- [Stremlit](https://streamlit.io/):用于构建Demo +- [DeepSpeed](https://github.com/microsoft/DeepSpeed):并行训练 +- … + +#### 如何参与本项目 + +贡献使开源社区成为一个学习、激励和创造的绝佳场所。你所作的任何贡献都是**非常感谢**的。 + +1. Fork the Project +2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`) +3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`) +4. Push to the Branch (`git push origin feature/AmazingFeature`) +5. Open a Pull Request + +
+ +### 作者(排名不分先后) + +| 用户名 | 学校/组织 | 备注 | 贡献 | +| :----------: | :--------------------: | :-------------------: | :----------: | +| [aJupyter](https://github.com/aJupyter) | 南开大学在读硕士 | DataWhale成员 | 项目发起人 | +| [jujimeizuo](https://github.com/jujimeizuo) | 江南大学在读硕士 | | | +| [Smiling-Weeping-zhr](https://github.com/Smiling-Weeping-zhr) | 哈尔滨工业大学(威海)在读本科生 | | | +| [8baby8](https://github.com/8baby8) | 飞桨领航团区域主管 | 文心大模型核心开发者 | | +| [zxazys](https://github.com/zxazys) | 南开大学在读硕士 | | | +| [MING-ZCH](https://github.com/MING-ZCH) | 华中科技大学在读本科生 | | | +| [JasonLLLLLLLLLLL](https://github.com/JasonLLLLLLLLLLL) | swufe | | | +| [MrCatAI](https://github.com/MrCatAI) | AI搬用工 | | | +| [ZeyuBa](https://github.com/ZeyuBa) | 自动化所在读硕士 | | | +| [aiyinyuedejustin](https://github.com/aiyinyuedejustin) | 宾夕法尼亚大学在读硕士 | | | +| [Nobody-ML](https://github.com/Nobody-ML) | 中国石油大学(华东)在读本科生 | | | +| [chg0901](https://github.com/chg0901) | [MiniSora](https://github.com/mini-sora/minisora/) |MiniSora主要维护|数据清洗、文档翻译| +| [Mxoder](https://github.com/Mxoder) | 北京航空航天大学在读本科生 | | | +| [Anooyman](https://github.com/Anooyman) | 南京理工大学硕士 | | | +| [Vicky-3021](https://github.com/Vicky-3021) | 西安电子科技大学硕士(研0) | | | +| [SantiagoTOP](https://github.com/santiagoTOP) | 太原理工大学在读硕士 | | | +| [zealot52099](https://github.com/zealot52099) | AI搬用工 | |清洗数据、RAG| + +### 版权说明 + +该项目签署了 MIT 授权许可,详情请参阅 [LICENSE](https://github.com/SmartFlowAI/EmoLLM/blob/main/LICENSE) + + +### 引用 +如果本项目对您的工作有所帮助,请使用以下格式引用: + +```bibtex +@misc{EmoLLM, + title={EmoLLM}, + author={EmoLLM}, + url={https://github.com/SmartFlowAI/EmoLLM/}, + year={2024} +} +``` + +### 特别鸣谢 + +- [Sanbu](https://github.com/sanbuphy) +- [上海人工智能实验室](https://www.shlab.org.cn/) +- [闻星大佬(小助手)](https://github.com/vansin) +- [扫地升(公众号宣传)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) +- 阿布(北大心理学硕士) + + + + + + + +## Star History + +[![Star History Chart](https://api.star-history.com/svg?repos=SmartFlowAI/EmoLLM&type=Date)](https://star-history.com/#SmartFlowAI/EmoLLM&Date) + +## 🌟 Contributors + +[![EmoLLM contributors](https://contrib.rocks/image?repo=SmartFlowAI/EmoLLM&max=50)](https://github.com/SmartFlowAI/EmoLLM/graphs/contributors) + +[your-project-path]: SmartflowAI/EmoLLM +[contributors-shield]: https://img.shields.io/github/contributors/SmartflowAI/EmoLLM.svg?style=flat-square +[contributors-url]: https://github.com/SmartflowAI/EmoLLM/graphs/contributors +[forks-shield]: https://img.shields.io/github/forks/SmartflowAI/EmoLLM.svg?style=flat-square +[forks-url]: https://github.com/SmartflowAI/EmoLLM/network/members +[stars-shield]: https://img.shields.io/github/stars/SmartflowAI/EmoLLM.svg?style=flat-square +[stars-url]: https://github.com/SmartflowAI/EmoLLM/stargazers +[issues-shield]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg?style=flat-square +[issues-url]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg +[license-shield]: https://img.shields.io/github/license/SmartflowAI/EmoLLM.svg?style=flat-square +[license-url]: https://github.com/SmartFlowAI/EmoLLM/blob/main/LICENSE + +[OpenXLab_App-image]: https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg +[OpenXLab_Model-image]: https://cdn-static.openxlab.org.cn/header/openxlab_models.svg +[OpenXLab_App-url]: https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0 +[OpenXLab_Model-url]: https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full + + +## 交流群 + +- 如果失效,请移步Issue区 + +

+ EmoLLM官方交流群 +

diff --git a/README_EN.md b/README_EN.md index 7464ade..a8a5a3e 100644 --- a/README_EN.md +++ b/README_EN.md @@ -1,300 +1,300 @@ -
- -# EmoLLM - Large Language Model for Mental Health - -
- -

- - Logo - - -

- - -[![Contributors][contributors-shield]][contributors-url] -[![Forks][forks-shield]][forks-url] -[![Issues][issues-shield]][issues-url] -[![OpenXLab_App][OpenXLab_App-image]][OpenXLab_App-url] -[![OpenXLab_Model][OpenXLab_Model-image]][OpenXLab_Model-url] -[![MIT License][license-shield]][license-url] -[![Stargazers][stars-shield]][stars-url] - -
- -

EmoLLM

- -

- 简体中文 | English -
-
- Explore the documentation of this project » -
-
- EmoLLM 2.0 Demo - · - Report a Bug - · - Propose a New Feature -

- -

- - - - -**EmoLLM** is a series of large language models designed to understand, support and help customers in mental health counseling. It is fine-tuned from the LLM instructions. We really appreciate it if you could give it a star~⭐⭐. The open-sourced configuration is as follows: - -
- -| Model | Type | -| :-------------------: | :------: | -| InternLM2_7B_chat | QLORA | -| InternLM2_7B_chat | full fine-tuning | -| InternLM2_1_8B_chat | full fine-tuning | -| InternLM2_20B_chat | LORA | -| Qwen_7b_chat | QLORA | -| Qwen1_5-0_5B-Chat | full fine-tuning | -| Baichuan2_13B_chat | QLORA | -| ChatGLM3_6B | LORA | -| DeepSeek MoE_16B_chat | QLORA | -| Mixtral 8x7B_instruct | QLORA | -| …… | …… | - -
- -Everyone is welcome to contribute to this project ~ - ---- - -The Model aims to fully understand and promote the mental health of individuals, groups, and society. This model typically includes the following key components: - -- Cognitive factors: Involving an individual's thought patterns, belief systems, cognitive biases, and problem-solving abilities. Cognitive factors significantly impact mental health as they affect how individuals interpret and respond to life events. -- Emotional factors: Including emotion regulation, emotional expression, and emotional experiences. Emotional health is a crucial part of mental health, involving how individuals manage and express their emotions and how they recover from negative emotions. -- Behavioral factors: Concerning an individual's behavior patterns, habits, and coping strategies. This includes stress management skills, social skills, and self-efficacy, which is the confidence in one's abilities. -- Social environment: Comprising external factors such as family, work, community, and cultural background, which have direct and indirect impacts on an individual's mental health. -- Physical health: There is a close relationship between physical and mental health. Good physical health can promote mental health and vice versa. -- Psychological resilience: Refers to an individual's ability to recover from adversity and adapt. Those with strong psychological resilience can bounce back from challenges and learn and grow from them. -- Prevention and intervention measures: The Mental Health Grand Model also includes strategies for preventing psychological issues and promoting mental health, such as psychological education, counseling, therapy, and social support systems. -- Assessment and diagnostic tools: Effective promotion of mental health requires scientific tools to assess individuals' psychological states and diagnose potential psychological issues. -### Recent Updates -- 【2024.3.12】 Released on Baidu Flying Pulp Platform [aiwei](https://aistudio.baidu.com/community/app/63335) -- 【2024.3.11】 **EmoLLM V2.0 is greatly improved in all scores compared to EmoLLM V1.0. Surpasses the performance of Role-playing ChatGPT on counseling tasks!** [Click to experience EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0), update [dataset statistics and details](./datasets/), [Roadmap](./assets/Roadmap_ZH.png) -- 【2024.3.9】 Add concurrency acceleration [QA pair generation](./scripts/qa_generation/), [RAG pipeline](./rag/) -- 【2024.3.3】 [Based on InternLM2-7B-chat full fine-tuned version EmoLLM V2.0 open sourced](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full), need two A100*80G, update professional evaluation, see [evaluate](./evaluate/), update PaddleOCR-based PDF to txt tool scripts, see [scripts](./scripts/). -- 【2024.2.29】 Updated objective assessment calculations, see [evaluate](./evaluate/) for details. A series of datasets have also been updated, see [datasets](./datasets/) for details. -- 【2024.2.27】 Updated English README and a series of datasets (licking dogs and one-round dialogue) -- 【2024.2.23】The "Gentle Lady Psychologist Ai Wei" based on InternLM2_7B_chat_qlora was launched. [Click here to obtain the model weights](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei), [configuration file](xtuner_config/aiwei-internlm2_chat_7b_qlora.py), [online experience link](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei) - -- 【2024.2.23】Updated [several fine-tuning configurations](/xtuner_config/), added [data_pro.json](/datasets/data_pro.json) (more quantity, more comprehensive scenarios, richer content) and [aiwei.json](/datasets/aiwei.json) (dedicated to the gentle lady role-play, featuring Emoji expressions), the "Gentle Lady Psychologist Ai Wei" is coming soon. - -- 【2024.2.18】 The full fine-tuned version based on Qwen1_5-0_5B-Chat has been [open-sourced](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary). Friends with limited computational resources can now dive in and explore it. - - -
-View More - -- 【2024.2.6】 [Open-sourced based on the Qwen1_5-0_5B-Chat full-scale fine-tuned version](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary), friends with limited computing power can start experimenting~ - -

- 模型下载量 -

- -- 【2024.2.5】 The project has been promoted by the official WeChat account NLP Engineering. Here's the [link](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) to the article. Welcome everyone to follow!! 🥳🥳 - -

- 公众号二维码 -

- -- 【2024.2.3】 [Project Vedio](https://www.bilibili.com/video/BV1N7421N76X/) at bilibili 😊 -- 【2024.1.27】 Complete data construction documentation, fine-tuning guide, deployment guide, Readme, and other related documents 👏 -- 【2024.1.25】 EmoLLM V1.0 has deployed online https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀 - -
- -### Roadmap - -

- - Roadmap_EN - - -## Contents - -- [EmoLLM - Large Language Model for Mental Health](#emollm---large-language-model-for-mental-health) - - [Recent Updates](#recent-updates) - - [Roadmap](#roadmap) - - [Contents](#contents) - - [Pre-development Configuration Requirements.](#pre-development-configuration-requirements) - - [**User Guide**](#user-guide) - - [File Directory Explanation](#file-directory-explanation) - - [Data Construction](#data-construction) - - [Fine-tuning Guide](#fine-tuning-guide) - - [Deployment Guide](#deployment-guide) - - [RAG (Retrieval Augmented Generation) Pipeline](#rag-retrieval-augmented-generation-pipeline) - - [Frameworks Used](#frameworks-used) - - [How to participate in this project](#how-to-participate-in-this-project) - - [Version control](#version-control) - - [Authors (in no particular order)](#authors-in-no-particular-order) - - [Copyright Notice](#copyright-notice) - - [Acknowledgments](#acknowledgments) - - [Star History](#star-history) - - [🌟 Contributors](#-contributors) - - [Communication group](#communication-group) - -###### Pre-development Configuration Requirements. - -- A100 40G (specifically for InternLM2_7B_chat + qlora fine-tuning + deepspeed zero2 optimization) - -###### **User Guide** - -1. Clone the repo - -```sh -git clone https://github.com/SmartFlowAI/EmoLLM.git -``` - -1. Read in sequence or read sections you're interested in: - - [File Directory Explanation](#file-directory-explanation) - - [Data Construction](#data-construction) - - [Fine-tuning Guide](#fine-tuning-guide) - - [Deployment Guide](#deployment-guide) - - View More Details - - - -### File Directory Explanation - -``` -├─assets: Image Resources -├─datasets: Dataset -├─demo: demo scripts -├─generate_data: Data Generation Guide -│ └─xinghuo -├─scripts: Some Available Tools -└─xtuner_config:Fine-tuning Guide - └─images -``` - -### Data Construction - -- Please read the [Data Construction Guide ](generate_data/tutorial.md)for reference. - -- The dataset used for this fine-tuning can be found at [datasets](datasets/data.json) - -### Fine-tuning Guide - -For details, see the [fine-tuning guide](xtuner_config/README.md) - -### Deployment Guide - -- Demo deployment: see [deployment guide](./demo/README.md) for details. -- Quantitative deployment based on [LMDeploy](https://github.com/InternLM/lmdeploy/): see [deploy](./deploy/lmdeploy.md) - - -### RAG (Retrieval Augmented Generation) Pipeline -- See [RAG](./rag/) - -

-Additional Details - -### Frameworks Used - -- [Xtuner](https://github.com/InternLM/xtuner) -- [Transformers](https://github.com/huggingface/transformers) -- [Pytorch](https://pytorch.org/) -- [LMDeploy](https://github.com/InternLM/lmdeploy/): for quantitative deployment -- [Stremlit](https://streamlit.io/): for building demos -- [DeepSpeed](https://github.com/microsoft/DeepSpeed): for parallel training -- … - -#### How to participate in this project - -Contributions make the open-source community an excellent place for learning, inspiration, and creation. Any contribution you make is greatly appreciated. - -1. Fork the Project -2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`) -3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`) -4. Push to the Branch (`git push origin feature/AmazingFeature`) -5. Open a Pull Request - -### Version control - -This project uses Git for version control. You can see the currently available versions in the repository. - -
- -### Authors (in no particular order) - -| Username | School/Organization | Remarks | Contributions | -| :-------: | :-------------------: | :------------------: | :--------: | -| [aJupyter](https://github.com/aJupyter) | Nankai University, Master's student | DataWhale member | Project initiator | -| [jujimeizuo](https://github.com/jujimeizuo) | Jiangnan University, Master's student | | | -| [Smiling-Weeping-zhr](https://github.com/Smiling-Weeping-zhr) | Harbin Institute of Technology (Weihai), Undergraduate student | | | -| [8baby8](https://github.com/8baby8) | PaddlePaddle Pilot Team Regional Director | Wenxin Large Model core developer | | -| [zxazys](https://github.com/zxazys) | Nankai University, Master's student | | | -| [MING-ZCH](https://github.com/MING-ZCH) | Huazhong University of Science and Technology, Undergraduate student | | | -| [JasonLLLLLLLLLLL](https://github.com/JasonLLLLLLLLLLL) | SWUFE (Southwestern University of Finance and Economics) | | | -| [MrCatAI](https://github.com/MrCatAI) | AI Mover | | | -| [ZeyuBa](https://github.com/ZeyuBa) | Institute of Automation, Master's student | | | -| [aiyinyuedejustin](https://github.com/aiyinyuedejustin) | University of Pennsylvania, Master's student | | | -| [Nobody-ML](https://github.com/Nobody-ML) | China University of Petroleum (East China), Undergraduate student | | | -| [chg0901](https://github.com/chg0901) | [MiniSora](https://github.com/mini-sora/minisora) |Maintainer and Admin|Data Cleaning and Docs Translation| -| [Mxoder](https://github.com/Mxoder) | Beihang University, Undergraduate student | | | -| [Anooyman](https://github.com/Anooyman) | Nanjing University of Science and Technology, Master's student | | | -| [Vicky-3021](https://github.com/Vicky-3021) | Xidian University, Master's student (Research Year 0) | | | -| [SantiagoTOP](https://github.com/santiagoTOP) | Taiyuan University of Technology, Master's student | | | - - -### Copyright Notice - -The project is licensed under the MIT License. Please refer to the details - [LICENSE](https://github.com/aJupyter/EmoLLM/blob/master/LICENSE) - -### Acknowledgments - -- [Sanbu](https://github.com/sanbuphy) -- [Shanghai Artificial Intelligence Laboratory](https://www.shlab.org.cn/) -- [Vanin](https://github.com/vansin) -- [Bloom up (WeChat Official Account Promotion)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) -- Abu (M.A. in Psychology, Peking University) - - - - - - - - - -## Star History - -[![Star History Chart](https://api.star-history.com/svg?repos=SmartFlowAI/EmoLLM&type=Date)](https://star-history.com/#SmartFlowAI/EmoLLM&Date) - -## 🌟 Contributors - -[![EmoLLM contributors](https://contrib.rocks/image?repo=SmartFlowAI/EmoLLM&max=50)](https://github.com/SmartFlowAI/EmoLLM/graphs/contributors) - -[your-project-path]: SmartflowAI/EmoLLM -[contributors-shield]: https://img.shields.io/github/contributors/SmartflowAI/EmoLLM.svg?style=flat-square -[contributors-url]: https://github.com/SmartflowAI/EmoLLM/graphs/contributors -[forks-shield]: https://img.shields.io/github/forks/SmartflowAI/EmoLLM.svg?style=flat-square -[forks-url]: https://github.com/SmartflowAI/EmoLLM/network/members -[stars-shield]: https://img.shields.io/github/stars/SmartflowAI/EmoLLM.svg?style=flat-square -[stars-url]: https://github.com/SmartflowAI/EmoLLM/stargazers -[issues-shield]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg?style=flat-square -[issues-url]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg -[license-shield]: https://img.shields.io/github/license/SmartflowAI/EmoLLM.svg?style=flat-square -[license-url]: https://github.com/SmartflowAI/EmoLLM/blob/main/LICENSE - -[OpenXLab_App-image]: https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg -[OpenXLab_Model-image]: https://cdn-static.openxlab.org.cn/header/openxlab_models.svg -[OpenXLab_App-url]: https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0 -[OpenXLab_Model-url]: https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full - -## Communication group -- If it fails, go to the Issue section. - -

- EmoLLM official communication group -

+
+ +# EmoLLM - Large Language Model for Mental Health + +
+ +

+ + Logo + + +

+ + +[![Contributors][contributors-shield]][contributors-url] +[![Forks][forks-shield]][forks-url] +[![Issues][issues-shield]][issues-url] +[![OpenXLab_App][OpenXLab_App-image]][OpenXLab_App-url] +[![OpenXLab_Model][OpenXLab_Model-image]][OpenXLab_Model-url] +[![MIT License][license-shield]][license-url] +[![Stargazers][stars-shield]][stars-url] + +
+ +

EmoLLM

+ +

+ 简体中文 | English +
+
+ Explore the documentation of this project » +
+
+ EmoLLM 2.0 Demo + · + Report a Bug + · + Propose a New Feature +

+ +

+ + + + +**EmoLLM** is a series of large language models designed to understand, support and help customers in mental health counseling. It is fine-tuned from the LLM instructions. We really appreciate it if you could give it a star~⭐⭐. The open-sourced configuration is as follows: + +
+ +| Model | Type | +| :-------------------: | :------: | +| InternLM2_7B_chat | QLORA | +| InternLM2_7B_chat | full fine-tuning | +| InternLM2_1_8B_chat | full fine-tuning | +| InternLM2_20B_chat | LORA | +| Qwen_7b_chat | QLORA | +| Qwen1_5-0_5B-Chat | full fine-tuning | +| Baichuan2_13B_chat | QLORA | +| ChatGLM3_6B | LORA | +| DeepSeek MoE_16B_chat | QLORA | +| Mixtral 8x7B_instruct | QLORA | +| …… | …… | + +
+ +Everyone is welcome to contribute to this project ~ + +--- + +The Model aims to fully understand and promote the mental health of individuals, groups, and society. This model typically includes the following key components: + +- Cognitive factors: Involving an individual's thought patterns, belief systems, cognitive biases, and problem-solving abilities. Cognitive factors significantly impact mental health as they affect how individuals interpret and respond to life events. +- Emotional factors: Including emotion regulation, emotional expression, and emotional experiences. Emotional health is a crucial part of mental health, involving how individuals manage and express their emotions and how they recover from negative emotions. +- Behavioral factors: Concerning an individual's behavior patterns, habits, and coping strategies. This includes stress management skills, social skills, and self-efficacy, which is the confidence in one's abilities. +- Social environment: Comprising external factors such as family, work, community, and cultural background, which have direct and indirect impacts on an individual's mental health. +- Physical health: There is a close relationship between physical and mental health. Good physical health can promote mental health and vice versa. +- Psychological resilience: Refers to an individual's ability to recover from adversity and adapt. Those with strong psychological resilience can bounce back from challenges and learn and grow from them. +- Prevention and intervention measures: The Mental Health Grand Model also includes strategies for preventing psychological issues and promoting mental health, such as psychological education, counseling, therapy, and social support systems. +- Assessment and diagnostic tools: Effective promotion of mental health requires scientific tools to assess individuals' psychological states and diagnose potential psychological issues. +### Recent Updates +- 【2024.3.12】 Released on Baidu Flying Pulp Platform [aiwei](https://aistudio.baidu.com/community/app/63335) +- 【2024.3.11】 **EmoLLM V2.0 is greatly improved in all scores compared to EmoLLM V1.0. Surpasses the performance of Role-playing ChatGPT on counseling tasks!** [Click to experience EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0), update [dataset statistics and details](./datasets/), [Roadmap](./assets/Roadmap_ZH.png) +- 【2024.3.9】 Add concurrency acceleration [QA pair generation](./scripts/qa_generation/), [RAG pipeline](./rag/) +- 【2024.3.3】 [Based on InternLM2-7B-chat full fine-tuned version EmoLLM V2.0 open sourced](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full), need two A100*80G, update professional evaluation, see [evaluate](./evaluate/), update PaddleOCR-based PDF to txt tool scripts, see [scripts](./scripts/). +- 【2024.2.29】 Updated objective assessment calculations, see [evaluate](./evaluate/) for details. A series of datasets have also been updated, see [datasets](./datasets/) for details. +- 【2024.2.27】 Updated English README and a series of datasets (licking dogs and one-round dialogue) +- 【2024.2.23】The "Gentle Lady Psychologist Ai Wei" based on InternLM2_7B_chat_qlora was launched. [Click here to obtain the model weights](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei), [configuration file](xtuner_config/aiwei-internlm2_chat_7b_qlora.py), [online experience link](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei) + +- 【2024.2.23】Updated [several fine-tuning configurations](/xtuner_config/), added [data_pro.json](/datasets/data_pro.json) (more quantity, more comprehensive scenarios, richer content) and [aiwei.json](/datasets/aiwei.json) (dedicated to the gentle lady role-play, featuring Emoji expressions), the "Gentle Lady Psychologist Ai Wei" is coming soon. + +- 【2024.2.18】 The full fine-tuned version based on Qwen1_5-0_5B-Chat has been [open-sourced](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary). Friends with limited computational resources can now dive in and explore it. + + +
+View More + +- 【2024.2.6】 [Open-sourced based on the Qwen1_5-0_5B-Chat full-scale fine-tuned version](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary), friends with limited computing power can start experimenting~ + +

+ 模型下载量 +

+ +- 【2024.2.5】 The project has been promoted by the official WeChat account NLP Engineering. Here's the [link](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) to the article. Welcome everyone to follow!! 🥳🥳 + +

+ 公众号二维码 +

+ +- 【2024.2.3】 [Project Vedio](https://www.bilibili.com/video/BV1N7421N76X/) at bilibili 😊 +- 【2024.1.27】 Complete data construction documentation, fine-tuning guide, deployment guide, Readme, and other related documents 👏 +- 【2024.1.25】 EmoLLM V1.0 has deployed online https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀 + +
+ +### Roadmap + +

+ + Roadmap_EN + + +## Contents + +- [EmoLLM - Large Language Model for Mental Health](#emollm---large-language-model-for-mental-health) + - [Recent Updates](#recent-updates) + - [Roadmap](#roadmap) + - [Contents](#contents) + - [Pre-development Configuration Requirements.](#pre-development-configuration-requirements) + - [**User Guide**](#user-guide) + - [File Directory Explanation](#file-directory-explanation) + - [Data Construction](#data-construction) + - [Fine-tuning Guide](#fine-tuning-guide) + - [Deployment Guide](#deployment-guide) + - [RAG (Retrieval Augmented Generation) Pipeline](#rag-retrieval-augmented-generation-pipeline) + - [Frameworks Used](#frameworks-used) + - [How to participate in this project](#how-to-participate-in-this-project) + - [Version control](#version-control) + - [Authors (in no particular order)](#authors-in-no-particular-order) + - [Copyright Notice](#copyright-notice) + - [Acknowledgments](#acknowledgments) + - [Star History](#star-history) + - [🌟 Contributors](#-contributors) + - [Communication group](#communication-group) + +###### Pre-development Configuration Requirements. + +- A100 40G (specifically for InternLM2_7B_chat + qlora fine-tuning + deepspeed zero2 optimization) + +###### **User Guide** + +1. Clone the repo + +```sh +git clone https://github.com/SmartFlowAI/EmoLLM.git +``` + +1. Read in sequence or read sections you're interested in: + - [File Directory Explanation](#file-directory-explanation) + - [Data Construction](#data-construction) + - [Fine-tuning Guide](#fine-tuning-guide) + - [Deployment Guide](#deployment-guide) + - View More Details + + + +### File Directory Explanation + +``` +├─assets: Image Resources +├─datasets: Dataset +├─demo: demo scripts +├─generate_data: Data Generation Guide +│ └─xinghuo +├─scripts: Some Available Tools +└─xtuner_config:Fine-tuning Guide + └─images +``` + +### Data Construction + +- Please read the [Data Construction Guide ](generate_data/tutorial.md)for reference. + +- The dataset used for this fine-tuning can be found at [datasets](datasets/data.json) + +### Fine-tuning Guide + +For details, see the [fine-tuning guide](xtuner_config/README.md) + +### Deployment Guide + +- Demo deployment: see [deployment guide](./demo/README.md) for details. +- Quantitative deployment based on [LMDeploy](https://github.com/InternLM/lmdeploy/): see [deploy](./deploy/lmdeploy.md) + + +### RAG (Retrieval Augmented Generation) Pipeline +- See [RAG](./rag/) + +

+Additional Details + +### Frameworks Used + +- [Xtuner](https://github.com/InternLM/xtuner) +- [Transformers](https://github.com/huggingface/transformers) +- [Pytorch](https://pytorch.org/) +- [LMDeploy](https://github.com/InternLM/lmdeploy/): for quantitative deployment +- [Stremlit](https://streamlit.io/): for building demos +- [DeepSpeed](https://github.com/microsoft/DeepSpeed): for parallel training +- … + +#### How to participate in this project + +Contributions make the open-source community an excellent place for learning, inspiration, and creation. Any contribution you make is greatly appreciated. + +1. Fork the Project +2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`) +3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`) +4. Push to the Branch (`git push origin feature/AmazingFeature`) +5. Open a Pull Request + +### Version control + +This project uses Git for version control. You can see the currently available versions in the repository. + +
+ +### Authors (in no particular order) + +| Username | School/Organization | Remarks | Contributions | +| :-------: | :-------------------: | :------------------: | :--------: | +| [aJupyter](https://github.com/aJupyter) | Nankai University, Master's student | DataWhale member | Project initiator | +| [jujimeizuo](https://github.com/jujimeizuo) | Jiangnan University, Master's student | | | +| [Smiling-Weeping-zhr](https://github.com/Smiling-Weeping-zhr) | Harbin Institute of Technology (Weihai), Undergraduate student | | | +| [8baby8](https://github.com/8baby8) | PaddlePaddle Pilot Team Regional Director | Wenxin Large Model core developer | | +| [zxazys](https://github.com/zxazys) | Nankai University, Master's student | | | +| [MING-ZCH](https://github.com/MING-ZCH) | Huazhong University of Science and Technology, Undergraduate student | | | +| [JasonLLLLLLLLLLL](https://github.com/JasonLLLLLLLLLLL) | SWUFE (Southwestern University of Finance and Economics) | | | +| [MrCatAI](https://github.com/MrCatAI) | AI Mover | | | +| [ZeyuBa](https://github.com/ZeyuBa) | Institute of Automation, Master's student | | | +| [aiyinyuedejustin](https://github.com/aiyinyuedejustin) | University of Pennsylvania, Master's student | | | +| [Nobody-ML](https://github.com/Nobody-ML) | China University of Petroleum (East China), Undergraduate student | | | +| [chg0901](https://github.com/chg0901) | [MiniSora](https://github.com/mini-sora/minisora) |Maintainer and Admin|Data Cleaning and Docs Translation| +| [Mxoder](https://github.com/Mxoder) | Beihang University, Undergraduate student | | | +| [Anooyman](https://github.com/Anooyman) | Nanjing University of Science and Technology, Master's student | | | +| [Vicky-3021](https://github.com/Vicky-3021) | Xidian University, Master's student (Research Year 0) | | | +| [SantiagoTOP](https://github.com/santiagoTOP) | Taiyuan University of Technology, Master's student | | | +| [zealot52099](https://github.com/zealot52099) | AI Mover | |Data Processing and RAG| + +### Copyright Notice + +The project is licensed under the MIT License. Please refer to the details + [LICENSE](https://github.com/aJupyter/EmoLLM/blob/master/LICENSE) + +### Acknowledgments + +- [Sanbu](https://github.com/sanbuphy) +- [Shanghai Artificial Intelligence Laboratory](https://www.shlab.org.cn/) +- [Vanin](https://github.com/vansin) +- [Bloom up (WeChat Official Account Promotion)](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) +- Abu (M.A. in Psychology, Peking University) + + + + + + + + + +## Star History + +[![Star History Chart](https://api.star-history.com/svg?repos=SmartFlowAI/EmoLLM&type=Date)](https://star-history.com/#SmartFlowAI/EmoLLM&Date) + +## 🌟 Contributors + +[![EmoLLM contributors](https://contrib.rocks/image?repo=SmartFlowAI/EmoLLM&max=50)](https://github.com/SmartFlowAI/EmoLLM/graphs/contributors) + +[your-project-path]: SmartflowAI/EmoLLM +[contributors-shield]: https://img.shields.io/github/contributors/SmartflowAI/EmoLLM.svg?style=flat-square +[contributors-url]: https://github.com/SmartflowAI/EmoLLM/graphs/contributors +[forks-shield]: https://img.shields.io/github/forks/SmartflowAI/EmoLLM.svg?style=flat-square +[forks-url]: https://github.com/SmartflowAI/EmoLLM/network/members +[stars-shield]: https://img.shields.io/github/stars/SmartflowAI/EmoLLM.svg?style=flat-square +[stars-url]: https://github.com/SmartflowAI/EmoLLM/stargazers +[issues-shield]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg?style=flat-square +[issues-url]: https://img.shields.io/github/issues/SmartflowAI/EmoLLM.svg +[license-shield]: https://img.shields.io/github/license/SmartflowAI/EmoLLM.svg?style=flat-square +[license-url]: https://github.com/SmartflowAI/EmoLLM/blob/main/LICENSE + +[OpenXLab_App-image]: https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg +[OpenXLab_Model-image]: https://cdn-static.openxlab.org.cn/header/openxlab_models.svg +[OpenXLab_App-url]: https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0 +[OpenXLab_Model-url]: https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full + +## Communication group +- If it fails, go to the Issue section. + +

+ EmoLLM official communication group +

diff --git a/assets/框架图.png b/assets/框架图.png new file mode 100644 index 0000000..9985c9a Binary files /dev/null and b/assets/框架图.png differ diff --git a/generate_data/final_data/merge_jsonl.py b/generate_data/final_data/merge_jsonl.py new file mode 100644 index 0000000..b8edd10 --- /dev/null +++ b/generate_data/final_data/merge_jsonl.py @@ -0,0 +1,60 @@ +import json +import os + + +def save_merge_json(data_lis, file_path): + with open(file_path, 'wt', encoding='utf-8') as file: + json.dump(data_lis, file, ensure_ascii=False, separators=(',\n',':')) + + +def get_all_file_paths(folder_path, file_type='.jsonl'): + # 确保传入的是一个目录 + if not os.path.isdir(folder_path): + raise ValueError(f"{folder_path} is not a valid directory") + + # 获取文件夹下所有文件的路径 + file_paths = [os.path.join(folder_path, file) for file in os.listdir( + folder_path) if os.path.isfile(os.path.join(folder_path, file)) and (file_type in file)] + return file_paths + + +if __name__ == '__main__': + conversion_lis = [] + + folder_path = r'./' + + merge_path = folder_path.split('/')[-1] + try: + merge_last_path = folder_path.split('/')[-2] if folder_path.split('/')[-2]!='.' else '' + except: + merge_last_path = '' + print(f'merge_path={merge_path},merge_last_path={merge_last_path}') + + + for path in get_all_file_paths(folder_path): + print(path) + + with open(path, 'rt', encoding='utf-8') as file: + for line in file: + # # 移除行尾的换行符 + # if line == '\n': + # line = line.rstrip('\n') + line = line.rstrip('\n') + # 解析JSON + try: + data = json.loads(line) + conversion_lis.append(data) + # conversion_lis.append('\n') + except json.JSONDecodeError as e: + print(f"Error decoding JSON: {e}") + + if merge_last_path!='': + save_merge_json_path = rf'./{merge_last_path}/{merge_path}_merge.json' + elif merge_path!='': + save_merge_json_path = rf'./{merge_path}_merge.json' + else: + save_merge_json_path = rf'./curr_merge.json' + + save_merge_json(data_lis=conversion_lis, + file_path=save_merge_json_path) + print(len(conversion_lis),save_merge_json_path) diff --git a/generate_data/final_data/merge_jsonl_r.py b/generate_data/final_data/merge_jsonl_r.py new file mode 100644 index 0000000..a29c951 --- /dev/null +++ b/generate_data/final_data/merge_jsonl_r.py @@ -0,0 +1,75 @@ +import json +import os + + +def save_merge_json(data_lis, file_path): + with open(file_path, 'wt', encoding='utf-8') as file: + json.dump(data_lis, file, ensure_ascii=False, separators=(',\n',':')) + + +def get_all_file_paths(folder_path, file_type='.jsonl'): + # 确保传入的是一个目录 + if not os.path.isdir(folder_path): + raise ValueError(f"{folder_path} is not a valid directory") + + # 获取文件夹下所有文件的路径 + file_paths = [os.path.join(folder_path, file) for file in os.listdir( + folder_path) if os.path.isfile(os.path.join(folder_path, file)) and (file_type in file)] + return file_paths + + +if __name__ == '__main__': + + data_ai = 'qwen' # python merge_jsonl_r.py > qwen.txt + # data_ai = 'zhipuai' # python merge_jsonl_r.py > zhipuai.txt + root_dir = rf'./{data_ai}/' + + save_final_merge_json_path = f'{data_ai}_final_merge.json' + + subfolders = [os.path.join(root_dir, d) for d in os.listdir(root_dir) if os.path.isdir(os.path.join(root_dir, d))] + + final_list = [] + for folder_path in subfolders: + conversion_lis = [] + merge_path = folder_path.split('/')[-1] + try: + merge_last_path = folder_path.split('/')[-2] if folder_path.split('/')[-2]!='.' else '' + except: + merge_last_path = '' + print(f'merge_path={merge_path},merge_last_path={merge_last_path}') + + + for path in get_all_file_paths(folder_path): + print(path) + + with open(path, 'rt', encoding='utf-8') as file: + for line in file: + # # 移除行尾的换行符 + # if line == '\n': + # line = line.rstrip('\n') + line = line.rstrip('\n') + # 解析JSON + try: + data = json.loads(line) + conversion_lis.append(data) + # conversion_lis.append('\n') + except json.JSONDecodeError as e: + print(f"Error decoding JSON: {e}") + + if merge_last_path!='': + save_merge_json_path = rf'./{merge_last_path}/{merge_path}_merge.json' + elif merge_path!='': + save_merge_json_path = rf'./{merge_path}_merge.json' + else: + save_merge_json_path = rf'./curr_merge.json' + + save_merge_json(data_lis=conversion_lis, + file_path=save_merge_json_path) + + final_list = final_list+conversion_lis + print(len(conversion_lis),len(final_list),save_merge_json_path) + + save_merge_json(data_lis=final_list,file_path=save_final_merge_json_path) + print(save_final_merge_json_path) + + diff --git a/generate_data/tutorial.md b/generate_data/tutorial.md index 996a823..80426b4 100644 --- a/generate_data/tutorial.md +++ b/generate_data/tutorial.md @@ -100,7 +100,10 @@ 5. **数据集整合** - 在进行数据集整合之前,我们要检查生成的数据是否存在格式错误,类型不符合等情况。我们需要check.py进行检查数据。最后再使用merge_json.py将所有的json整合为一个总的json文件。 + 在进行数据集整合之前,我们要检查生成的数据是否存在格式错误,类型不符合等情况。 + +* 首先使用`check.py`进行数据检查。 +* 然后使用`merge_json.py`将所有的json整合为一个总的json文件。 6. **评估与优化** diff --git a/generate_data/zhipuai_gen_data.py b/generate_data/zhipuai_gen_data.py index 4370f1a..d959b03 100644 --- a/generate_data/zhipuai_gen_data.py +++ b/generate_data/zhipuai_gen_data.py @@ -34,11 +34,21 @@ def zhipu_api(data, emo): top_p = round(random.uniform(0.1, 0.9), 2) messages = getText('user', prompt) - response = client.chat.completions.create( - model='glm-4', - messages=messages, - top_p=top_p, - ) + + # Error code: 400, with error text {"error":{"code":"1301","message": + # "系统检测到输入或生成内容可能包含不安全或敏感内容,请您避免输入易产生敏感内容的提示语,感谢您的配合。"}} + try: + response = client.chat.completions.create( + model='glm-4', + messages=messages, + top_p=top_p, + ) + except: + response = client.chat.completions.create( + model='glm-4', + messages=messages, + top_p=top_p, + ) return response.choices[0].message.content diff --git a/scripts/qa_generation/Clean_QA.md b/scripts/qa_generation/Clean_QA.md deleted file mode 100644 index 9e0b6ec..0000000 --- a/scripts/qa_generation/Clean_QA.md +++ /dev/null @@ -1,11 +0,0 @@ -# 清洗 QA 对 -调用qwen去判断当前QA对是否属于心理学范畴,去除非心理学范畴的 QA 对 - -## Step 1 -1. 准备好需要清洗的 QA 对数据 -2. 将该数据放进 model 同级 data 文件夹下 -3. 根据文件夹名去修改 config/config.py 中的 judge_dir。我个人没有对文件名进行更改,所以我的judge_dir是 judge_dir = os.path.join(data_dir, '数据整合') - -## Step 2 -1. 运行QA_clean.py即可 -2. 清洗完的 QA 对会以 jsonl 的格式存在 data/cleaned 下 \ No newline at end of file diff --git a/scripts/qa_generation/README.md b/scripts/qa_generation/README.md index 874427a..b0339a7 100644 --- a/scripts/qa_generation/README.md +++ b/scripts/qa_generation/README.md @@ -93,3 +93,34 @@ ## **步骤四:清洗QA对** - 清洗目的 + + - 提高提取的QA数据质量,清理掉与心理学无关的QA对 + +- 清洗方法 + + - 使用Prompt方法,驱动LLM对给出的QA对进行判断 + + - **参考Prompt** + + - ```markdown + 你是一名经验丰富的心理咨询师,熟悉心理学相关知识。根据我提供的 QA 对,来判断这个 QA 对是否属于心理学范畴。 + + 标准如下: + + - 若当前 QA 对属于心理学范畴,则返回1 + - 若当前 QA 对不属于心理学范畴,则返回0 + + + 以下是给定的心理学 QA 对内容: + ``` + +- 清洗工具 + - 配置`config/config.py` 中的 `DASHSCOPE_API_KEY`,`API_KEY`获取方法见步骤三 + - 使用提供的清洗脚本[QA_Clear](https://github.com/SmartFlowAI/EmoLLM/blob/main/scripts/qa_generation/QA_clean.py) + +- 使用方法 + - 准备好需要清洗的 QA 对数据 + - 将该数据放进 model 同级 data 文件夹下 + - 根据文件夹名去修改 `config/config.py` 中的 `judge_dir`。 + - 如存储数据的文件名为`xxx`,则`judge_dir`是 `judge_dir = os.path.join(data_dir, 'xxx')` + - 清洗完的 QA 对会以 `jsonl` 的格式存在 `data/cleaned` 下 diff --git a/scripts/qa_generation/README_EN.md b/scripts/qa_generation/README_EN.md index b2768df..112b07f 100644 --- a/scripts/qa_generation/README_EN.md +++ b/scripts/qa_generation/README_EN.md @@ -93,3 +93,40 @@ Using books specialized in psychology to build QA knowledge pairs for RAG to pro ## **Step 4: Cleaning of QA pairs** - Purpose of cleaning + - Improve the quality of extracted QA data and clean out QA pairs that are not relevant to psychology + +- Cleaning Methods + + - Use the Prompt method to drive the LLM to make a judgment on the given QA pairs + + - **Reference to Prompt** + + - ```markdown + You are an experienced counselor and are familiar with psychology. Based on the QA pair I have provided, determine if this QA pair is psychological in nature. + + The criteria are as follows: + + - If the current QA pair belongs to the category of psychology, then return 1 + - If the current QA pair does not belong to the category of psychology, then return 0. + + + The following is the content of the given psychology QA pair: + ``` + +- Cleaning Tools + + - Configure `DASHSCOPE_API_KEY` in `config/config.py`, see step 3 for how to get `API_KEY`. + + - Use the provided cleaning script [QA_Clear](https://github.com/SmartFlowAI/EmoLLM/blob/main/scripts/qa_generation/QA_clean.py) + +- How to use + + - Prepare the QA pair data to be cleaned + + - Put the data into the data folder of the same level as the model. + + - Modify `judge_dir` in `config/config.py` according to the folder name. + + - If the file name of the stored data is `xxx`, then `judge_dir` is `judge_dir = os.path.join(data_dir, 'xxx')`. + + - The cleaned QA pairs are stored as `jsonl` under `data/cleaned`.