diff --git a/.vscode/settings.json b/.vscode/settings.json
new file mode 100644
index 0000000..1f6e803
--- /dev/null
+++ b/.vscode/settings.json
@@ -0,0 +1,5 @@
+{
+    "[python]": {
+        "editor.defaultFormatter": null
+    }
+}
\ No newline at end of file
diff --git a/README.md b/README.md
index c6625b5..76128f4 100644
--- a/README.md
+++ b/README.md
@@ -58,7 +58,7 @@
 | DeepSeek MoE_16B_chat |  QLORA   |  [deepseek_moe_16b_chat_qlora_oasst1_e3.py](./xtuner_config/deepseek_moe_16b_chat_qlora_oasst1_e3.py)    | |
 | Mixtral 8x7B_instruct |  QLORA   | [mixtral_8x7b_instruct_qlora_oasst1_e3.py](./xtuner_config/mixtral_8x7b_instruct_qlora_oasst1_e3.py)    | |
 | LLaMA3_8b_instruct |  QLORA   | [aiwei_llama3_8b_instruct_qlora_e3.py](./xtuner_config/aiwei_llama3_8b_instruct_qlora_e3.py) | |
-| LLaMA3_8b_instruct |  QLORA   | [llama3_8b_instruct_qlora_alpaca_e3_M.py](./xtuner_config/llama3_8b_instruct_qlora_alpaca_e3_M.py)    |[OpenXLab](https://openxlab.org.cn/models/detail/chg0901/EmoLLM-Llama3-8B-Instruct2.0), [ModelScope](https://modelscope.cn/models/chg0901/EmoLLM-Llama3-8B-Instruct2.0/summary) |
+| LLaMA3_8b_instruct |  QLORA   | [llama3_8b_instruct_qlora_alpaca_e3_M_ruozhi_scM.py](./xtuner_config/llama3_8b_instruct_qlora_alpaca_e3_M_ruozhi_scM.py)    |[OpenXLab](https://openxlab.org.cn/models/detail/chg0901/EmoLLM-Llama3-8B-Instruct3.0), [ModelScope](https://modelscope.cn/models/chg0901/EmoLLM-Llama3-8B-Instruct3.0/summary) |
 |          ……           |    ……    |                                                    ……                                                    | …… |
 
 </div>
@@ -97,44 +97,46 @@
     </tr>
 </table>
 
-### 🎇最近更新
-- 【2024.4.20】[LLAMA3微调指南](xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md)及基于[LLaMA3_8b_instruct的艾薇](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM-LLaMA3_8b_instruct_aiwei)开源
-- 【2023.4.14】新增[快速开始](docs/quick_start.md)和保姆级教程[BabyEmoLLM](Baby_EmoLLM.ipynb)
-- 【2024.4.2】在 Huggingface 上传[老母亲心理咨询师](https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main)
-- 【2024.3.25】在百度飞桨平台发布[爹系男友心理咨询师](https://aistudio.baidu.com/community/app/68787)
-- 【2024.3.24】在**OpenXLab**和**ModelScope**平台发布**InternLM2-Base-7B QLoRA微调模型**, 具体请查看[**InternLM2-Base-7B QLoRA**](./xtuner_config/README_internlm2_7b_base_qlora.md)
-- 【2024.3.12】在百度飞桨平台发布[艾薇](https://aistudio.baidu.com/community/app/63335)
-- 【2024.3.11】 **EmoLLM V2.0 相比 EmoLLM V1.0 全面提升，已超越 Role-playing ChatGPT 在心理咨询任务上的能力！**[点击体验EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0)，更新[数据集统计及详细信息](./datasets/)、[路线图](./assets/Roadmap_ZH.png)
-- 【2024.3.9】 新增并发功能加速 [QA 对生成](./scripts/qa_generation/)、[RAG pipeline](./rag/)
-- 【2024.3.3】 [基于InternLM2-7B-chat全量微调版本EmoLLM V2.0开源](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full)，需要两块A100*80G，更新专业评估，详见[evaluate](./evaluate/)，更新基于PaddleOCR的PDF转txt工具脚本，详见[scripts](./scripts/)
-- 【2024.2.29】更新客观评估计算，详见[evaluate](./evaluate/)，更新一系列数据集，详见[datasets](./datasets/)
-- 【2024.2.27】更新英文readme和一系列数据集（舔狗和单轮对话）
-- 【2024.2.23】推出基于InternLM2_7B_chat_qlora的 `温柔御姐心理医生艾薇`，[点击获取模型权重](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei)，[配置文件](xtuner_config/aiwei-internlm2_chat_7b_qlora.py)，[在线体验链接](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei)
-- 【2024.2.23】更新[若干微调配置](/xtuner_config/)，新增 [data_pro.json](/datasets/data_pro.json)（数量更多、场景更全、更丰富）和 [aiwei.json](/datasets/aiwei.json)（温柔御姐角色扮演专用，带有Emoji表情），即将推出 `温柔御姐心理医生艾薇`
-- 【2024.2.18】 [基于Qwen1_5-0_5B-Chat全量微调版本开源](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary)，算力有限的道友可以玩起来~
+## 🎇最近更新
+
+- 【2024.05.04】基于LLaMA3_8b_instruct的[EmoLLM3.0 OpenXLab Demo](https://st-app-center-006861-9746-jlroxvg.openxlab.space/)上线（[重启链接](https://openxlab.org.cn/apps/detail/chg0901/EmoLLM-Llama3-8B-Instruct3.0)）, [**LLAMA3微调指南**](xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md)**更新**，在[**OpenXLab**](https://openxlab.org.cn/models/detail/chg0901/EmoLLM-Llama3-8B-Instruct3.0)和[**ModelScope**](https://modelscope.cn/models/chg0901/EmoLLM-Llama3-8B-Instruct3.0/summary)平台发布**LLaMA3_8b_instruct-8B QLoRA微调模型 EmoLLM3.0权重**
+- 【2024.04.20】[LLAMA3微调指南](xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md)及基于[LLaMA3_8b_instruct的艾薇](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM-LLaMA3_8b_instruct_aiwei)开源
+- 【2023.04.14】新增[快速开始](docs/quick_start.md)和保姆级教程[BabyEmoLLM](Baby_EmoLLM.ipynb)
+- 【2024.04.02】在 Huggingface 上传[老母亲心理咨询师](https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main)
+- 【2024.03.25】在百度飞桨平台发布[爹系男友心理咨询师](https://aistudio.baidu.com/community/app/68787)
+- 【2024.03.24】在**OpenXLab**和**ModelScope**平台发布**InternLM2-Base-7B QLoRA微调模型**, 具体请查看[**InternLM2-Base-7B QLoRA**](./xtuner_config/README_internlm2_7b_base_qlora.md)
+- 【2024.03.12】在百度飞桨平台发布[艾薇](https://aistudio.baidu.com/community/app/63335)
+- 【2024.03.11】 **EmoLLM V2.0 相比 EmoLLM V1.0 全面提升，已超越 Role-playing ChatGPT 在心理咨询任务上的能力！**[点击体验EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0)，更新[数据集统计及详细信息](./datasets/)、[路线图](./assets/Roadmap_ZH.png)
+- 【2024.03.09】 新增并发功能加速 [QA 对生成](./scripts/qa_generation/)、[RAG pipeline](./rag/)
+- 【2024.03.03】 [基于InternLM2-7B-chat全量微调版本EmoLLM V2.0开源](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full)，需要两块A100*80G，更新专业评估，详见[evaluate](./evaluate/)，更新基于PaddleOCR的PDF转txt工具脚本，详见[scripts](./scripts/)
+- 【2024.02.29】更新客观评估计算，详见[evaluate](./evaluate/)，更新一系列数据集，详见[datasets](./datasets/)
+- 【2024.02.27】更新英文readme和一系列数据集（舔狗和单轮对话）
+- 【2024.02.23】推出基于InternLM2_7B_chat_qlora的 `温柔御姐心理医生艾薇`，[点击获取模型权重](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei)，[配置文件](xtuner_config/aiwei-internlm2_chat_7b_qlora.py)，[在线体验链接](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei)
+- 【2024.02.23】更新[若干微调配置](/xtuner_config/)，新增 [data_pro.json](/datasets/data_pro.json)（数量更多、场景更全、更丰富）和 [aiwei.json](/datasets/aiwei.json)（温柔御姐角色扮演专用，带有Emoji表情），即将推出 `温柔御姐心理医生艾薇`
+- 【2024.02.18】 [基于Qwen1_5-0_5B-Chat全量微调版本开源](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary)，算力有限的道友可以玩起来~
 
 <details>
 <summary>查看更多</summary>
 
-- 【2024.2.6】 EmoLLM在[**Openxlab** ](https://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model) 平台下载量高达18.7k，欢迎大家体验！
+- 【2024.02.06】 EmoLLM在[**Openxlab** ](https://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model) 平台下载量高达18.7k，欢迎大家体验！
 
 <p align="center"> 
   <img src="https://github.com/SmartFlowAI/EmoLLM/assets/62385492/7e931682-c54d-4ded-bc67-79130c68d744" alt="模型下载量">
 </p>
 
-- 【2024.2.5】 项目荣获公众号**NLP工程化**推文宣传[推文链接](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)，为博主推广一波，欢迎大家关注！！🥳🥳
+- 【2024.02.05】 项目荣获公众号**NLP工程化**推文宣传[推文链接](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)，为博主推广一波，欢迎大家关注！！🥳🥳
 
 <p align="center">
   <img src="https://github.com/SmartFlowAI/EmoLLM/assets/62385492/47868d6a-2e91-4aa9-a630-e594c14295b4" alt="公众号二维码">
 </p>
 
-- 【2024.2.3】 [项目宣传视频](https://www.bilibili.com/video/BV1N7421N76X/)完成 😊
-- 【2024.1.27】 完善数据构建文档、微调指南、部署指南、Readme等相关文档 👏
-- 【2024.1.25】 EmoLLM V1.0 已部署上线 https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀
+- 【2024.02.03】 [项目宣传视频](https://www.bilibili.com/video/BV1N7421N76X/)完成 😊
+- 【2024.01.27】 完善数据构建文档、微调指南、部署指南、Readme等相关文档 👏
+- 【2024.01.25】 EmoLLM V1.0 已部署上线 https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀
 
 </details>
 
-### 🏆荣誉栏
+## 🏆荣誉栏
 
 - 项目荣获上海人工智能实验室举办的**2024浦源大模型系列挑战赛春季赛*****创新创意奖***
   
@@ -145,14 +147,14 @@
 
 - 项目荣获公众号**NLP工程化**[推文宣传](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
 
-### 🎯路线图
+## 🎯路线图
 
 <p align="center">
   <a href="https://github.com/SmartFlowAI/EmoLLM/">
     <img src="assets/Roadmap_ZH.png" alt="Roadmap_ZH">
   </a>
 
-### 🔗框架图
+## 🔗框架图
 
 <p align="center">
   <a href="https://github.com/SmartFlowAI/EmoLLM/">
@@ -162,10 +164,10 @@
 ## 目录
 
 - [EmoLLM-心理健康大模型](#emollm-心理健康大模型)
-    - [🎇最近更新](#最近更新)
-    - [🏆荣誉栏](#荣誉栏)
-    - [🎯路线图](#路线图)
-    - [🔗框架图](#框架图)
+  - [🎇最近更新](#最近更新)
+  - [🏆荣誉栏](#荣誉栏)
+  - [🎯路线图](#路线图)
+  - [🔗框架图](#框架图)
   - [目录](#目录)
           - [开发前的配置要求](#开发前的配置要求)
           - [**使用指南**](#使用指南)
diff --git a/README_EN.md b/README_EN.md
index 2ccccf3..e9ab047 100644
--- a/README_EN.md
+++ b/README_EN.md
@@ -60,7 +60,7 @@
 | DeepSeek MoE_16B_chat |  QLORA   |  [deepseek_moe_16b_chat_qlora_oasst1_e3.py](./xtuner_config/deepseek_moe_16b_chat_qlora_oasst1_e3.py)    | |
 | Mixtral 8x7B_instruct |  QLORA   | [mixtral_8x7b_instruct_qlora_oasst1_e3.py](./xtuner_config/mixtral_8x7b_instruct_qlora_oasst1_e3.py)    | |
 | LLaMA3_8b_instruct |  QLORA   | [aiwei_llama3_8b_instruct_qlora_e3.py](./xtuner_config/aiwei_llama3_8b_instruct_qlora_e3.py) | |
-| LLaMA3_8b_instruct |  QLORA   | [llama3_8b_instruct_qlora_alpaca_e3_M.py](./xtuner_config/llama3_8b_instruct_qlora_alpaca_e3_M.py)    |[OpenXLab](https://openxlab.org.cn/models/detail/chg0901/EmoLLM-Llama3-8B-Instruct2.0), [ModelScope](https://modelscope.cn/models/chg0901/EmoLLM-Llama3-8B-Instruct2.0/summary) |
+| LLaMA3_8b_instruct |  QLORA   | [llama3_8b_instruct_qlora_alpaca_e3_M_ruozhi_scM.py](./xtuner_config/llama3_8b_instruct_qlora_alpaca_e3_M_ruozhi_scM.py)    |[OpenXLab](https://openxlab.org.cn/models/detail/chg0901/EmoLLM-Llama3-8B-Instruct3.0), [ModelScope](https://modelscope.cn/models/chg0901/EmoLLM-Llama3-8B-Instruct3.0/summary) |
 |          ……           |    ……    |                                                    ……                                                    | …… |
 
 </div>
@@ -71,7 +71,7 @@ Everyone is welcome to contribute to this project ~
 
 The Model aims to fully understand and promote the mental health of individuals, groups, and society. This model typically includes the following key components:
 
--  Cognitive factors: Involving an individual's thought patterns, belief systems, cognitive biases, and problem-solving abilities. Cognitive factors significantly impact mental health as they affect how individuals interpret and respond to life events.
+- Cognitive factors: Involving an individual's thought patterns, belief systems, cognitive biases, and problem-solving abilities. Cognitive factors significantly impact mental health as they affect how individuals interpret and respond to life events.
 - Emotional factors: Including emotion regulation, emotional expression, and emotional experiences. Emotional health is a crucial part of mental health, involving how individuals manage and express their emotions and how they recover from negative emotions.
 - Behavioral factors: Concerning an individual's behavior patterns, habits, and coping strategies. This includes stress management skills, social skills, and self-efficacy, which is the confidence in one's abilities.
 - Social environment: Comprising external factors such as family, work, community, and cultural background, which have direct and indirect impacts on an individual's mental health.
@@ -100,47 +100,49 @@ The Model aims to fully understand and promote the mental health of individuals,
     </tr>
 </table>
 
-### Recent Updates
- - [2024.4.20] [LLAMA3 fine-tuning guide](xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md) and based on [LLaMA3_8b_instruct's aiwei](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM-LLaMA3_8b_instruct_aiwei) open source
-- [2023.4.14] Added [Quick Start](docs/quick_start_EN.md) and Nanny level tutorial [BabyEmoLLM](Baby_EmoLLM.ipynb)
-- [2024.4.2] Uploaded at Huggingface [Old Mother Counsellor](https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main)
-- 【2024.3.25】 [Mother-like Therapist] is released on Huggingface (https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main)
-- 【2024.3.25】 [Daddy-like Boy-Friend] is released on Baidu Paddle-Paddle AI Studio Platform (https://aistudio.baidu.com/community/app/68787)
-- 【2024.3.24】 The **InternLM2-Base-7B QLoRA fine-tuned model** has been released on the **OpenXLab** and **ModelScope** platforms. For more details, please refer to [**InternLM2-Base-7B QLoRA**](./xtuner_config/README_internlm2_7b_base_qlora.md).
-- 【2024.3.12】 [aiwei] is released on Baidu Paddle-Paddle AI Studio Platform (https://aistudio.baidu.com/community/app/63335)
-- 【2024.3.11】 **EmoLLM V2.0 is greatly improved in all scores compared to EmoLLM V1.0. Surpasses the performance of Role-playing ChatGPT on counseling tasks!** [Click to experience EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0), update [dataset statistics and details](./datasets/), [Roadmap](./assets/Roadmap_ZH.png)
-- 【2024.3.9】 Add concurrency acceleration [QA pair generation](./scripts/qa_generation/), [RAG pipeline](./rag/)
-- 【2024.3.3】 [Based on InternLM2-7B-chat full fine-tuned version EmoLLM V2.0 open sourced](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full), need two A100*80G, update professional evaluation, see [evaluate](./evaluate/), update PaddleOCR-based PDF to txt tool scripts, see [scripts](./scripts/).
-- 【2024.2.29】 Updated objective assessment calculations, see [evaluate](./evaluate/) for details. A series of datasets have also been updated, see [datasets](./datasets/) for details.
-- 【2024.2.27】 Updated English README and a series of datasets (licking dogs and one-round dialogue)
-- 【2024.2.23】The "Gentle Lady Psychologist Ai Wei" based on InternLM2_7B_chat_qlora was launched. [Click here to obtain the model weights](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei), [configuration file](xtuner_config/aiwei-internlm2_chat_7b_qlora.py), [online experience link](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei)
+## Recent Updates
 
-- 【2024.2.23】Updated [several fine-tuning configurations](/xtuner_config/), added [data_pro.json](/datasets/data_pro.json) (more quantity, more comprehensive scenarios, richer content) and [aiwei.json](/datasets/aiwei.json) (dedicated to the gentle lady role-play, featuring Emoji expressions), the "Gentle Lady Psychologist Ai Wei" is coming soon.
+- [2024.05.04] [EmoLLM3.0 OpenXLab Demo](https://st-app-center-006861-9746-jlroxvg.openxlab.space/) based on LLaMA3_8b_instruct is available now ([restart link]((https://openxlab.org.cn/apps/detail/chg0901/EmoLLM-Llama3-8B-Instruct3.0))), [LLAMA3 fine-tuning guide](xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md) is updated, LLaMA3_8b_instruct-8B QLoRA fine-tuning model EmoLLM3.0 weights are released on [**OpenXLab**](https://openxlab.org.cn/models/detail/chg0901/EmoLLM-Llama3-8B-Instruct3.0) and [**ModelScope**](https://modelscope.cn/models/chg0901/EmoLLM-Llama3-8B-Instruct3.0/summary) platforms
+- [2024.04.20] [LLAMA3 fine-tuning guide](xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md) and based on [LLaMA3_8b_instruct's aiwei](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM-LLaMA3_8b_instruct_aiwei) open source
+- [2023.04.14] Added [Quick Start](docs/quick_start_EN.md) and Nanny level tutorial [BabyEmoLLM](Baby_EmoLLM.ipynb)
+- [2024.04.02] Uploaded at Huggingface [Old Mother Counsellor](https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main)
+- [2024.03.25] [Mother-like Therapist] is released on Huggingface (https://huggingface.co/brycewang2018/EmoLLM-mother/tree/main)
+- [2024.03.25] [Daddy-like Boy-Friend] is released on Baidu Paddle-Paddle AI Studio Platform (https://aistudio.baidu.com/community/app/68787)
+- [2024.03.24] The **InternLM2-Base-7B QLoRA fine-tuned model** has been released on the **OpenXLab** and **ModelScope** platforms. For more details, please refer to [**InternLM2-Base-7B QLoRA**](./xtuner_config/README_internlm2_7b_base_qlora.md).
+- [2024.03.12] [aiwei] is released on Baidu Paddle-Paddle AI Studio Platform (https://aistudio.baidu.com/community/app/63335)
+- [2024.03.11] **EmoLLM V2.0 is greatly improved in all scores compared to EmoLLM V1.0. Surpasses the performance of Role-playing ChatGPT on counseling tasks!** [Click to experience EmoLLM V2.0](https://openxlab.org.cn/apps/detail/Farewell1/EmoLLMV2.0), update [dataset statistics and details](./datasets/), [Roadmap](./assets/Roadmap_ZH.png)
+- [2024.03.09] Add concurrency acceleration [QA pair generation](./scripts/qa_generation/), [RAG pipeline](./rag/)
+- [2024.03.03] [Based on InternLM2-7B-chat full fine-tuned version EmoLLM V2.0 open sourced](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_internlm2_7b_full), need two A100*80G, update professional evaluation, see [evaluate](./evaluate/), update PaddleOCR-based PDF to txt tool scripts, see [scripts](./scripts/).
+- [2024.02.29] Updated objective assessment calculations, see [evaluate](./evaluate/) for details. A series of datasets have also been updated, see [datasets](./datasets/) for details.
+- [2024.02.27] Updated English README and a series of datasets (licking dogs and one-round dialogue)
+- [2024.02.23]The "Gentle Lady Psychologist Ai Wei" based on InternLM2_7B_chat_qlora was launched. [Click here to obtain the model weights](https://openxlab.org.cn/models/detail/ajupyter/EmoLLM_aiwei), [configuration file](xtuner_config/aiwei-internlm2_chat_7b_qlora.py), [online experience link](https://openxlab.org.cn/apps/detail/ajupyter/EmoLLM-aiwei)
 
-- 【2024.2.18】 The full fine-tuned version based on Qwen1_5-0_5B-Chat has been [open-sourced](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary). Friends with limited computational resources can now dive in and explore it.
+- [2024.02.23]Updated [several fine-tuning configurations](/xtuner_config/), added [data_pro.json](/datasets/data_pro.json) (more quantity, more comprehensive scenarios, richer content) and [aiwei.json](/datasets/aiwei.json) (dedicated to the gentle lady role-play, featuring Emoji expressions), the "Gentle Lady Psychologist Ai Wei" is coming soon.
+
+- [2024.02.18] The full fine-tuned version based on Qwen1_5-0_5B-Chat has been [open-sourced](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary). Friends with limited computational resources can now dive in and explore it.
 
 <details>
 <summary>View More</summary>
 
-- 【2024.2.6】 [Open-sourced based on the Qwen1_5-0_5B-Chat full-scale fine-tuned version](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary), friends with limited computing power can start experimenting~
+- [2024.02.06] [Open-sourced based on the Qwen1_5-0_5B-Chat full-scale fine-tuned version](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary), friends with limited computing power can start experimenting~
 
 <p align="center"> 
   <img src="https://github.com/SmartFlowAI/EmoLLM/assets/62385492/7e931682-c54d-4ded-bc67-79130c68d744" alt="模型下载量">
 </p>
 
-- 【2024.2.5】 The project has been promoted by the official WeChat account NLP Engineering. Here's the [link](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) to the article. Welcome everyone to follow!! 🥳🥳
+- [2024.02.05] The project has been promoted by the official WeChat account NLP Engineering. Here's the [link](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A) to the article. Welcome everyone to follow!! 🥳🥳
 
 <p align="center">
   <img src="https://github.com/SmartFlowAI/EmoLLM/assets/62385492/47868d6a-2e91-4aa9-a630-e594c14295b4" alt="公众号二维码">
 </p>
 
-- 【2024.2.3】 [Project Vedio](https://www.bilibili.com/video/BV1N7421N76X/) at bilibili 😊
-- 【2024.1.27】 Complete data construction documentation, fine-tuning guide, deployment guide, Readme, and other related documents 👏
-- 【2024.1.25】 EmoLLM V1.0 has deployed online https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀
+- [2024.02.03] [Project Vedio](https://www.bilibili.com/video/BV1N7421N76X/) at bilibili 😊
+- [2024.01.27] Complete data construction documentation, fine-tuning guide, deployment guide, Readme, and other related documents 👏
+- [2024.01.25] EmoLLM V1.0 has deployed online https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀
 
 </details>
 
-### Honors
+## Honors
 
 - The project won the ***the Innovation and Creativity Award*** in the **2024 Puyuan Large Model Series Challenge Spring Competition held by the Shanghai Artificial Intelligence Laboratory**
 
@@ -149,10 +151,9 @@ The Model aims to fully understand and promote the mental health of individuals,
     <img src="assets/Shusheng.png" alt="Challenge Innovation and Creativity Award">
 </p>
 
-
 - The project has been promoted by the official WeChat account **NLP Engineering**. Here's the [link](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A). 
 
-### Roadmap
+## Roadmap
 
 <p align="center">
   <a href="https://github.com/SmartFlowAI/EmoLLM/">
@@ -162,9 +163,9 @@ The Model aims to fully understand and promote the mental health of individuals,
 ## Contents
 
 - [EmoLLM - Large Language Model for Mental Health](#emollm---large-language-model-for-mental-health)
-    - [Recent Updates](#recent-updates)
-    - [Honors](#honors)
-    - [Roadmap](#roadmap)
+  - [Recent Updates](#recent-updates)
+  - [Honors](#honors)
+  - [Roadmap](#roadmap)
   - [Contents](#contents)
           - [Pre-development Configuration Requirements.](#pre-development-configuration-requirements)
           - [**User Guide**](#user-guide)
diff --git a/app.py b/app.py
index f6d30fc..03099a0 100644
--- a/app.py
+++ b/app.py
@@ -1,335 +1,19 @@
-import copy
 import os
-import warnings
-from dataclasses import asdict, dataclass
-from typing import Callable, List, Optional
 
-import streamlit as st
-import torch
-from torch import nn
-from transformers.generation.utils import LogitsProcessorList, StoppingCriteriaList
-from transformers.utils import logging
+os.system('streamlit run web_demo-Llama3.py --server.address=0.0.0.0 --server.port 7860')
 
-from transformers import AutoTokenizer, AutoModelForCausalLM  # isort: skip
+model = "EmoLLM_aiwei"
+# model = "EmoLLM_Model"
+# model = "Llama3_Model"
 
-
-# warnings.filterwarnings("ignore")
-logger = logging.get_logger(__name__)
-
-
-@dataclass
-class GenerationConfig:
-    # this config is used for chat to provide more diversity
-    max_length: int = 32768
-    top_p: float = 0.8
-    temperature: float = 0.8
-    do_sample: bool = True
-    repetition_penalty: float = 1.005
-
-
-@torch.inference_mode()
-def generate_interactive(
-    model,
-    tokenizer,
-    prompt,
-    generation_config: Optional[GenerationConfig] = None,
-    logits_processor: Optional[LogitsProcessorList] = None,
-    stopping_criteria: Optional[StoppingCriteriaList] = None,
-    prefix_allowed_tokens_fn: Optional[Callable[[int, torch.Tensor], List[int]]] = None,
-    additional_eos_token_id: Optional[int] = None,
-    **kwargs,
-):
-    inputs = tokenizer([prompt], padding=True, return_tensors="pt")
-    input_length = len(inputs["input_ids"][0])
-    for k, v in inputs.items():
-        inputs[k] = v.cuda()
-    input_ids = inputs["input_ids"]
-    batch_size, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]  # noqa: F841  # pylint: disable=W0612
-    if generation_config is None:
-        generation_config = model.generation_config
-    generation_config = copy.deepcopy(generation_config)
-    model_kwargs = generation_config.update(**kwargs)
-    bos_token_id, eos_token_id = (  # noqa: F841  # pylint: disable=W0612
-        generation_config.bos_token_id,
-        generation_config.eos_token_id,
-    )
-    if isinstance(eos_token_id, int):
-        eos_token_id = [eos_token_id]
-    if additional_eos_token_id is not None:
-        eos_token_id.append(additional_eos_token_id)
-    has_default_max_length = kwargs.get("max_length") is None and generation_config.max_length is not None
-    if has_default_max_length and generation_config.max_new_tokens is None:
-        warnings.warn(
-            f"Using `max_length`'s default ({generation_config.max_length}) to control the generation length. "
-            "This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we"
-            " recommend using `max_new_tokens` to control the maximum length of the generation.",
-            UserWarning,
-        )
-    elif generation_config.max_new_tokens is not None:
-        generation_config.max_length = generation_config.max_new_tokens + input_ids_seq_length
-        if not has_default_max_length:
-            logger.warn(  # pylint: disable=W4902
-                f"Both `max_new_tokens` (={generation_config.max_new_tokens}) and `max_length`(="
-                f"{generation_config.max_length}) seem to have been set. `max_new_tokens` will take precedence. "
-                "Please refer to the documentation for more information. "
-                "(https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)",
-                UserWarning,
-            )
-
-    if input_ids_seq_length >= generation_config.max_length:
-        input_ids_string = "input_ids"
-        logger.warning(
-            f"Input length of {input_ids_string} is {input_ids_seq_length}, but `max_length` is set to"
-            f" {generation_config.max_length}. This can lead to unexpected behavior. You should consider"
-            " increasing `max_new_tokens`."
-        )
-
-    # 2. Set generation parameters if not already defined
-    logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
-    stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()
-
-    logits_processor = model._get_logits_processor(
-        generation_config=generation_config,
-        input_ids_seq_length=input_ids_seq_length,
-        encoder_input_ids=input_ids,
-        prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
-        logits_processor=logits_processor,
-    )
-
-    stopping_criteria = model._get_stopping_criteria(
-        generation_config=generation_config, stopping_criteria=stopping_criteria
-    )
-    logits_warper = model._get_logits_warper(generation_config)
-
-    unfinished_sequences = input_ids.new(input_ids.shape[0]).fill_(1)
-    scores = None
-    while True:
-        model_inputs = model.prepare_inputs_for_generation(input_ids, **model_kwargs)
-        # forward pass to get next token
-        outputs = model(
-            **model_inputs,
-            return_dict=True,
-            output_attentions=False,
-            output_hidden_states=False,
-        )
-
-        next_token_logits = outputs.logits[:, -1, :]
-
-        # pre-process distribution
-        next_token_scores = logits_processor(input_ids, next_token_logits)
-        next_token_scores = logits_warper(input_ids, next_token_scores)
-
-        # sample
-        probs = nn.functional.softmax(next_token_scores, dim=-1)
-        if generation_config.do_sample:
-            next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
-        else:
-            next_tokens = torch.argmax(probs, dim=-1)
-
-        # update generated ids, model inputs, and length for next step
-        input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1)
-        model_kwargs = model._update_model_kwargs_for_generation(outputs, model_kwargs, is_encoder_decoder=False)
-        unfinished_sequences = unfinished_sequences.mul((min(next_tokens != i for i in eos_token_id)).long())
-
-        output_token_ids = input_ids[0].cpu().tolist()
-        output_token_ids = output_token_ids[input_length:]
-        for each_eos_token_id in eos_token_id:
-            if output_token_ids[-1] == each_eos_token_id:
-                output_token_ids = output_token_ids[:-1]
-        response = tokenizer.decode(output_token_ids)
-
-        yield response
-        # stop when each sentence is finished, or if we exceed the maximum length
-        if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):
-            break
-
-
-def on_btn_click():
-    del st.session_state.messages
-
-
-@st.cache_resource
-def load_model():
-    
-    # model_name0 = "./EmoLLM-Llama3-8B-Instruct3.0"
-    # print(model_name0)
-
-    # print('pip install modelscope websockets')
-    # os.system(f'pip install modelscope websockets==11.0.3')
-    # from modelscope import snapshot_download
-
-    # #模型下载
-    # model_name = snapshot_download('chg0901/EmoLLM-Llama3-8B-Instruct3.0',cache_dir=model_name0)
-    # print(model_name)
-    
-    # model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16).eval()
-    # # model.eval()
-    # tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-    
-    base_path = './EmoLLM-Llama3-8B-Instruct3.0'
-    os.system(f'git clone https://code.openxlab.org.cn/chg0901/EmoLLM-Llama3-8B-Instruct3.0.git {base_path}')
-    os.system(f'cd {base_path} && git lfs pull')
-    
-
-    model = AutoModelForCausalLM.from_pretrained(base_path, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16).eval()
-    # model.eval()
-    tokenizer = AutoTokenizer.from_pretrained(base_path, trust_remote_code=True)
-    
-    if tokenizer.pad_token is None:
-        tokenizer.pad_token = tokenizer.eos_token
-        
-    return model, tokenizer
-
-
-def prepare_generation_config():
-    with st.sidebar:
-        # 使用 Streamlit 的 markdown 函数添加 Markdown 文本
-        st.image('assets/EmoLLM_logo_L.png', width=1, caption='EmoLLM Logo', use_column_width=True)
-        st.markdown("[访问 **EmoLLM** 官方repo: **SmartFlowAI/EmoLLM**](https://github.com/SmartFlowAI/EmoLLM)")
-
-        max_length = st.slider("Max Length", min_value=8, max_value=32768, value=32768)
-        top_p = st.slider("Top P", 0.0, 1.0, 0.8, step=0.01)
-        temperature = st.slider("Temperature", 0.0, 1.0, 0.7, step=0.01)
-        st.button("Clear Chat History", on_click=on_btn_click)
-
-    generation_config = GenerationConfig(max_length=max_length, top_p=top_p, temperature=temperature)
-
-    return generation_config
-
-
-user_prompt = '<|start_header_id|>user<|end_header_id|>\n\n{user}<|eot_id|>'
-robot_prompt = '<|start_header_id|>assistant<|end_header_id|>\n\n{robot}<|eot_id|>'
-cur_query_prompt = '<|start_header_id|>user<|end_header_id|>\n\n{user}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'
-
-
-def combine_history(prompt):
-    messages = st.session_state.messages
-    meta_instruction = (
-        "你是心理健康助手EmoLLM, 由EmoLLM团队打造, 是一个研究过无数具有心理健康问题的病人与心理健康医生对话的心理专家, 在心理方面拥有广博的知识储备和丰富的研究咨询经验。你旨在通过专业心理咨询, 协助来访者完成心理诊断。请充分利用专业心理学知识与咨询技术, 一步步帮助来访者解决心理问题。\n\n"
-    )
-    total_prompt =f"<|start_header_id|>system<|end_header_id|>\n\n{meta_instruction}<|eot_id|>\n\n"
-    for message in messages:
-        cur_content = message["content"]
-        if message["role"] == "user":
-            cur_prompt = user_prompt.format(user=cur_content)
-        elif message["role"] == "robot":
-            cur_prompt = robot_prompt.format(robot=cur_content)
-        else:
-            raise RuntimeError
-        total_prompt += cur_prompt
-    total_prompt = total_prompt + cur_query_prompt.format(user=prompt)
-    return total_prompt
-
-
-def main():
-
-    # torch.cuda.empty_cache()
-    print("load model begin.")
-    model, tokenizer = load_model()
-    print("load model end.")
-
-    user_avator = "assets/user.png"
-    robot_avator = "assets/EmoLLM.png"
-
-    st.title("EmoLLM Llama3心理咨询室V3.0")
-
-    generation_config = prepare_generation_config()
-
-    # Initialize chat history
-    if "messages" not in st.session_state:
-        st.session_state.messages = []
-
-    # Display chat messages from history on app rerun
-    for message in st.session_state.messages:
-        with st.chat_message(message["role"], avatar=message.get("avatar")):
-            st.markdown(message["content"])
-
-    # Accept user input
-    if prompt := st.chat_input("我在这里，准备好倾听你的心声了。"):
-        # Display user message in chat message container
-        with st.chat_message("user", avatar=user_avator):
-            st.markdown(prompt)
-            
-        real_prompt = combine_history(prompt)
-        # Add user message to chat history
-        st.session_state.messages.append({"role": "user", "content": prompt, "avatar": user_avator})
-
-        with st.chat_message("robot", avatar=robot_avator):
-            message_placeholder = st.empty()
-            for cur_response in generate_interactive(
-                model=model,
-                tokenizer=tokenizer,
-                prompt=real_prompt,
-                additional_eos_token_id=128009, 
-                **asdict(generation_config),
-            ):
-                # Display robot response in chat message container
-                message_placeholder.markdown(cur_response + "▌")
-            message_placeholder.markdown(cur_response)  # pylint: disable=undefined-loop-variable
-        # Add robot response to chat history
-        st.session_state.messages.append(
-            {
-                "role": "robot",
-                "content": cur_response,  # pylint: disable=undefined-loop-variable
-                "avatar": robot_avator,
-            }
-        )
-        torch.cuda.empty_cache()
-
-
-# if __name__ == '__main__':
-#     main()
-
-
-# torch.cuda.empty_cache()
-print("load model begin.")
-model, tokenizer = load_model()
-print("load model end.")
-
-user_avator = "assets/user.png"
-robot_avator = "assets/EmoLLM.png"
-
-st.title("EmoLLM Llama3心理咨询室V3.0")
-
-generation_config = prepare_generation_config()
-
-# Initialize chat history
-if "messages" not in st.session_state:
-    st.session_state.messages = []
-
-# Display chat messages from history on app rerun
-for message in st.session_state.messages:
-    with st.chat_message(message["role"], avatar=message.get("avatar")):
-        st.markdown(message["content"])
-
-# Accept user input
-if prompt := st.chat_input("我在这里，准备好倾听你的心声了。"):
-    # Display user message in chat message container
-    with st.chat_message("user", avatar=user_avator):
-        st.markdown(prompt)
-        
-    real_prompt = combine_history(prompt)
-    # Add user message to chat history
-    st.session_state.messages.append({"role": "user", "content": prompt, "avatar": user_avator})
-
-    with st.chat_message("robot", avatar=robot_avator):
-        message_placeholder = st.empty()
-        for cur_response in generate_interactive(
-            model=model,
-            tokenizer=tokenizer,
-            prompt=real_prompt,
-            additional_eos_token_id=128009, 
-            **asdict(generation_config),
-        ):
-            # Display robot response in chat message container
-            message_placeholder.markdown(cur_response + "▌")
-        message_placeholder.markdown(cur_response)  # pylint: disable=undefined-loop-variable
-    # Add robot response to chat history
-    st.session_state.messages.append(
-        {
-            "role": "robot",
-            "content": cur_response,  # pylint: disable=undefined-loop-variable
-            "avatar": robot_avator,
-        }
-    )
-    torch.cuda.empty_cache()
+if model == "EmoLLM_aiwei":
+    os.system("python download_model.py ajupyter/EmoLLM_aiwei")
+    os.system('streamlit run web_demo-aiwei.py --server.address=0.0.0.0 --server.port 7860')
+elif model == "EmoLLM_Model":
+    os.system("python download_model.py jujimeizuo/EmoLLM_Model")
+    os.system('streamlit run web_internlm2.py --server.address=0.0.0.0 --server.port 7860')
+elif model == "Llama3_Model":
+    os.system("python download_model.py chg0901/EmoLLM-Llama3-8B-Instruct3.0")
+    os.system('streamlit run web_demo_Llama3.py --server.address=0.0.0.0 --server.port 7860')
+else:
+    print("Please select one model")
diff --git a/app_bk.py b/app_bk.py
deleted file mode 100644
index 41c903b..0000000
--- a/app_bk.py
+++ /dev/null
@@ -1,356 +0,0 @@
-# import os
-
-# os.system('streamlit run web_demo-Llama3.py --server.address=0.0.0.0 --server.port 7860')
-
-# # #model = "EmoLLM_aiwei"
-# # # model = "EmoLLM_Model"
-# # model = "Llama3_Model"
-
-# # if model == "EmoLLM_aiwei":
-# #     os.system("python download_model.py ajupyter/EmoLLM_aiwei")
-# #     os.system('streamlit run web_demo-aiwei.py --server.address=0.0.0.0 --server.port 7860')
-# # elif model == "EmoLLM_Model":
-# #     os.system("python download_model.py jujimeizuo/EmoLLM_Model")
-# #     os.system('streamlit run web_internlm2.py --server.address=0.0.0.0 --server.port 7860')
-# # elif model == "Llama3_Model":
-# #     os.system('streamlit run web_demo_Llama3.py --server.address=0.0.0.0 --server.port 7860')
-# # else:
-# #     print("Please select one model")
-
-
-
-import copy
-import os
-import warnings
-from dataclasses import asdict, dataclass
-from typing import Callable, List, Optional
-
-import streamlit as st
-import torch
-from torch import nn
-from transformers.generation.utils import LogitsProcessorList, StoppingCriteriaList
-from transformers.utils import logging
-
-from transformers import AutoTokenizer, AutoModelForCausalLM  # isort: skip
-
-
-# warnings.filterwarnings("ignore")
-logger = logging.get_logger(__name__)
-
-
-@dataclass
-class GenerationConfig:
-    # this config is used for chat to provide more diversity
-    max_length: int = 32768
-    top_p: float = 0.8
-    temperature: float = 0.8
-    do_sample: bool = True
-    repetition_penalty: float = 1.005
-
-
-@torch.inference_mode()
-def generate_interactive(
-    model,
-    tokenizer,
-    prompt,
-    generation_config: Optional[GenerationConfig] = None,
-    logits_processor: Optional[LogitsProcessorList] = None,
-    stopping_criteria: Optional[StoppingCriteriaList] = None,
-    prefix_allowed_tokens_fn: Optional[Callable[[int, torch.Tensor], List[int]]] = None,
-    additional_eos_token_id: Optional[int] = None,
-    **kwargs,
-):
-    inputs = tokenizer([prompt], padding=True, return_tensors="pt")
-    input_length = len(inputs["input_ids"][0])
-    for k, v in inputs.items():
-        inputs[k] = v.cuda()
-    input_ids = inputs["input_ids"]
-    batch_size, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]  # noqa: F841  # pylint: disable=W0612
-    if generation_config is None:
-        generation_config = model.generation_config
-    generation_config = copy.deepcopy(generation_config)
-    model_kwargs = generation_config.update(**kwargs)
-    bos_token_id, eos_token_id = (  # noqa: F841  # pylint: disable=W0612
-        generation_config.bos_token_id,
-        generation_config.eos_token_id,
-    )
-    if isinstance(eos_token_id, int):
-        eos_token_id = [eos_token_id]
-    if additional_eos_token_id is not None:
-        eos_token_id.append(additional_eos_token_id)
-    has_default_max_length = kwargs.get("max_length") is None and generation_config.max_length is not None
-    if has_default_max_length and generation_config.max_new_tokens is None:
-        warnings.warn(
-            f"Using `max_length`'s default ({generation_config.max_length}) to control the generation length. "
-            "This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we"
-            " recommend using `max_new_tokens` to control the maximum length of the generation.",
-            UserWarning,
-        )
-    elif generation_config.max_new_tokens is not None:
-        generation_config.max_length = generation_config.max_new_tokens + input_ids_seq_length
-        if not has_default_max_length:
-            logger.warn(  # pylint: disable=W4902
-                f"Both `max_new_tokens` (={generation_config.max_new_tokens}) and `max_length`(="
-                f"{generation_config.max_length}) seem to have been set. `max_new_tokens` will take precedence. "
-                "Please refer to the documentation for more information. "
-                "(https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)",
-                UserWarning,
-            )
-
-    if input_ids_seq_length >= generation_config.max_length:
-        input_ids_string = "input_ids"
-        logger.warning(
-            f"Input length of {input_ids_string} is {input_ids_seq_length}, but `max_length` is set to"
-            f" {generation_config.max_length}. This can lead to unexpected behavior. You should consider"
-            " increasing `max_new_tokens`."
-        )
-
-    # 2. Set generation parameters if not already defined
-    logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
-    stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()
-
-    logits_processor = model._get_logits_processor(
-        generation_config=generation_config,
-        input_ids_seq_length=input_ids_seq_length,
-        encoder_input_ids=input_ids,
-        prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
-        logits_processor=logits_processor,
-    )
-
-    stopping_criteria = model._get_stopping_criteria(
-        generation_config=generation_config, stopping_criteria=stopping_criteria
-    )
-    logits_warper = model._get_logits_warper(generation_config)
-
-    unfinished_sequences = input_ids.new(input_ids.shape[0]).fill_(1)
-    scores = None
-    while True:
-        model_inputs = model.prepare_inputs_for_generation(input_ids, **model_kwargs)
-        # forward pass to get next token
-        outputs = model(
-            **model_inputs,
-            return_dict=True,
-            output_attentions=False,
-            output_hidden_states=False,
-        )
-
-        next_token_logits = outputs.logits[:, -1, :]
-
-        # pre-process distribution
-        next_token_scores = logits_processor(input_ids, next_token_logits)
-        next_token_scores = logits_warper(input_ids, next_token_scores)
-
-        # sample
-        probs = nn.functional.softmax(next_token_scores, dim=-1)
-        if generation_config.do_sample:
-            next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
-        else:
-            next_tokens = torch.argmax(probs, dim=-1)
-
-        # update generated ids, model inputs, and length for next step
-        input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1)
-        model_kwargs = model._update_model_kwargs_for_generation(outputs, model_kwargs, is_encoder_decoder=False)
-        unfinished_sequences = unfinished_sequences.mul((min(next_tokens != i for i in eos_token_id)).long())
-
-        output_token_ids = input_ids[0].cpu().tolist()
-        output_token_ids = output_token_ids[input_length:]
-        for each_eos_token_id in eos_token_id:
-            if output_token_ids[-1] == each_eos_token_id:
-                output_token_ids = output_token_ids[:-1]
-        response = tokenizer.decode(output_token_ids)
-
-        yield response
-        # stop when each sentence is finished, or if we exceed the maximum length
-        if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):
-            break
-
-
-def on_btn_click():
-    del st.session_state.messages
-
-
-@st.cache_resource
-def load_model():
-    
-    # model_name0 = "./EmoLLM-Llama3-8B-Instruct3.0"
-    # print(model_name0)
-
-    # print('pip install modelscope websockets')
-    # os.system(f'pip install modelscope websockets==11.0.3')
-    # from modelscope import snapshot_download
-
-    # #模型下载
-    # model_name = snapshot_download('chg0901/EmoLLM-Llama3-8B-Instruct3.0',cache_dir=model_name0)
-    # print(model_name)
-    
-    # model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16).eval()
-    # # model.eval()
-    # tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-    
-    base_path = './EmoLLM-Llama3-8B-Instruct3.0'
-    os.system(f'git clone https://code.openxlab.org.cn/chg0901/EmoLLM-Llama3-8B-Instruct3.0.git {base_path}')
-    os.system(f'cd {base_path} && git lfs pull')
-    
-
-    model = AutoModelForCausalLM.from_pretrained(base_path, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16).eval()
-    # model.eval()
-    tokenizer = AutoTokenizer.from_pretrained(base_path, trust_remote_code=True)
-    
-    if tokenizer.pad_token is None:
-        tokenizer.pad_token = tokenizer.eos_token
-        
-    return model, tokenizer
-
-
-def prepare_generation_config():
-    with st.sidebar:
-        # 使用 Streamlit 的 markdown 函数添加 Markdown 文本
-        st.image('assets/EmoLLM_logo_L.png', width=1, caption='EmoLLM Logo', use_column_width=True)
-        st.markdown("[访问 **EmoLLM** 官方repo: **SmartFlowAI/EmoLLM**](https://github.com/SmartFlowAI/EmoLLM)")
-
-        max_length = st.slider("Max Length", min_value=8, max_value=32768, value=32768)
-        top_p = st.slider("Top P", 0.0, 1.0, 0.8, step=0.01)
-        temperature = st.slider("Temperature", 0.0, 1.0, 0.7, step=0.01)
-        st.button("Clear Chat History", on_click=on_btn_click)
-
-    generation_config = GenerationConfig(max_length=max_length, top_p=top_p, temperature=temperature)
-
-    return generation_config
-
-
-user_prompt = '<|start_header_id|>user<|end_header_id|>\n\n{user}<|eot_id|>'
-robot_prompt = '<|start_header_id|>assistant<|end_header_id|>\n\n{robot}<|eot_id|>'
-cur_query_prompt = '<|start_header_id|>user<|end_header_id|>\n\n{user}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'
-
-
-def combine_history(prompt):
-    messages = st.session_state.messages
-    meta_instruction = (
-        "你是心理健康助手EmoLLM, 由EmoLLM团队打造, 是一个研究过无数具有心理健康问题的病人与心理健康医生对话的心理专家, 在心理方面拥有广博的知识储备和丰富的研究咨询经验。你旨在通过专业心理咨询, 协助来访者完成心理诊断。请充分利用专业心理学知识与咨询技术, 一步步帮助来访者解决心理问题。\n\n"
-    )
-    total_prompt =f"<|start_header_id|>system<|end_header_id|>\n\n{meta_instruction}<|eot_id|>\n\n"
-    for message in messages:
-        cur_content = message["content"]
-        if message["role"] == "user":
-            cur_prompt = user_prompt.format(user=cur_content)
-        elif message["role"] == "robot":
-            cur_prompt = robot_prompt.format(robot=cur_content)
-        else:
-            raise RuntimeError
-        total_prompt += cur_prompt
-    total_prompt = total_prompt + cur_query_prompt.format(user=prompt)
-    return total_prompt
-
-
-def main():
-
-    # torch.cuda.empty_cache()
-    print("load model begin.")
-    model, tokenizer = load_model()
-    print("load model end.")
-
-    user_avator = "assets/user.png"
-    robot_avator = "assets/EmoLLM.png"
-
-    st.title("EmoLLM Llama3心理咨询室V3.0")
-
-    generation_config = prepare_generation_config()
-
-    # Initialize chat history
-    if "messages" not in st.session_state:
-        st.session_state.messages = []
-
-    # Display chat messages from history on app rerun
-    for message in st.session_state.messages:
-        with st.chat_message(message["role"], avatar=message.get("avatar")):
-            st.markdown(message["content"])
-
-    # Accept user input
-    if prompt := st.chat_input("我在这里，准备好倾听你的心声了。"):
-        # Display user message in chat message container
-        with st.chat_message("user", avatar=user_avator):
-            st.markdown(prompt)
-            
-        real_prompt = combine_history(prompt)
-        # Add user message to chat history
-        st.session_state.messages.append({"role": "user", "content": prompt, "avatar": user_avator})
-
-        with st.chat_message("robot", avatar=robot_avator):
-            message_placeholder = st.empty()
-            for cur_response in generate_interactive(
-                model=model,
-                tokenizer=tokenizer,
-                prompt=real_prompt,
-                additional_eos_token_id=128009, 
-                **asdict(generation_config),
-            ):
-                # Display robot response in chat message container
-                message_placeholder.markdown(cur_response + "▌")
-            message_placeholder.markdown(cur_response)  # pylint: disable=undefined-loop-variable
-        # Add robot response to chat history
-        st.session_state.messages.append(
-            {
-                "role": "robot",
-                "content": cur_response,  # pylint: disable=undefined-loop-variable
-                "avatar": robot_avator,
-            }
-        )
-        torch.cuda.empty_cache()
-
-
-# if __name__ == '__main__':
-#     main()
-
-
-# torch.cuda.empty_cache()
-print("load model begin.")
-model, tokenizer = load_model()
-print("load model end.")
-
-user_avator = "assets/user.png"
-robot_avator = "assets/EmoLLM.png"
-
-st.title("EmoLLM Llama3心理咨询室V3.0")
-
-generation_config = prepare_generation_config()
-
-# Initialize chat history
-if "messages" not in st.session_state:
-    st.session_state.messages = []
-
-# Display chat messages from history on app rerun
-for message in st.session_state.messages:
-    with st.chat_message(message["role"], avatar=message.get("avatar")):
-        st.markdown(message["content"])
-
-# Accept user input
-if prompt := st.chat_input("我在这里，准备好倾听你的心声了。"):
-    # Display user message in chat message container
-    with st.chat_message("user", avatar=user_avator):
-        st.markdown(prompt)
-        
-    real_prompt = combine_history(prompt)
-    # Add user message to chat history
-    st.session_state.messages.append({"role": "user", "content": prompt, "avatar": user_avator})
-
-    with st.chat_message("robot", avatar=robot_avator):
-        message_placeholder = st.empty()
-        for cur_response in generate_interactive(
-            model=model,
-            tokenizer=tokenizer,
-            prompt=real_prompt,
-            additional_eos_token_id=128009, 
-            **asdict(generation_config),
-        ):
-            # Display robot response in chat message container
-            message_placeholder.markdown(cur_response + "▌")
-        message_placeholder.markdown(cur_response)  # pylint: disable=undefined-loop-variable
-    # Add robot response to chat history
-    st.session_state.messages.append(
-        {
-            "role": "robot",
-            "content": cur_response,  # pylint: disable=undefined-loop-variable
-            "avatar": robot_avator,
-        }
-    )
-    torch.cuda.empty_cache()
diff --git a/app_web_demo-Llama3.py b/app_web_demo-Llama3.py
index 09421c3..bd5a6a8 100644
--- a/app_web_demo-Llama3.py
+++ b/app_web_demo-Llama3.py
@@ -151,11 +151,13 @@ def on_btn_click():
 @st.cache_resource
 def load_model():
     
-    # model_name0 = "./EmoLLM-Llama3-8B-Instruct3.0"
-    # print(model_name0)
-
     print('pip install modelscope websockets')
     os.system(f'pip install modelscope websockets==11.0.3')
+    
+    ######## old model downloading method with modelscope ########
+    # model_name0 = "./EmoLLM-Llama3-8B-Instruct3.0"
+    # print(model_name0)
+    
     # from modelscope import snapshot_download
 
     # #模型下载
@@ -166,11 +168,11 @@ def load_model():
     # # model.eval()
     # tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
     
+    ######## new model downloading method with openxlab ########
     base_path = './EmoLLM-Llama3-8B-Instruct3.0'
     os.system(f'git clone https://code.openxlab.org.cn/chg0901/EmoLLM-Llama3-8B-Instruct3.0.git {base_path}')
     os.system(f'cd {base_path} && git lfs pull')
     
-
     model = AutoModelForCausalLM.from_pretrained(base_path, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16).eval()
     # model.eval()
     tokenizer = AutoTokenizer.from_pretrained(base_path, trust_remote_code=True)
diff --git a/assets/new_openxlab_app_demo.png b/assets/new_openxlab_app_demo.png
new file mode 100644
index 0000000..3612f34
Binary files /dev/null and b/assets/new_openxlab_app_demo.png differ
diff --git a/xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md b/xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md
index 147960c..417e2d6 100644
--- a/xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md
+++ b/xtuner_config/README_llama3_8b_instruct_qlora_alpaca_e3_M.md
@@ -18,9 +18,13 @@
 
 已经上传了最新的训练配置文件, 进行了些许改动, 训练数据中添加了85条自我认知数据和240条弱智吧数据.
 
+### 简评
+
+Llama3 由于中文训练数据较少，因此微调后，部分中文逻辑能力会稍弱一些，后续会继续更新对基于中文对齐后的Llama3模型的EmoLLM模型微调训练。
+
 ### 更新的文件
 
-- 原始自我认知数据 [**self_cognition_EmoLLM.json**](../datasets/self_cognition_EmoLLM.json)(修改自[ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning/blob/main/data/self_cognition.json))
+- 原始自我认知数据 [**self_cognition_EmoLLM.json**](../datasets/self_cognition_EmoLLM.json) (修改自[ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning/blob/main/data/self_cognition.json))
 - 处理后的符合对话格式的自我认知数据 [**processed_self_cognition_EmoLLM.json**](../datasets/processed/processed_self_cognition_EmoLLM.json)
 - 配置文件 [llama3_8b_instruct_qlora_alpaca_e3_M_ruozhi_scM.py](./llama3_8b_instruct_qlora_alpaca_e3_M_ruozhi_scM.py)
 - 弱智吧原始数据 [**ruozhiba_raw.jsonl**](../datasets/ruozhiba_raw.jsonl)
@@ -28,6 +32,20 @@
 - ruozhiba_raw_data_process.py处理之后的弱智吧数据 [**ruozhiba_format_emo.jsonl**](../datasets/processed/ruozhiba_format_emo.jsonl)
 - 数据集划分工具代码 [**split_dataset.py**](../datasets/split_dataset.py)
 - 调用split_dataset.py的示例代码 [**split_shuffle.py**](../datasets/split_shuffle.py)
+- [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/chg0901/EmoLLM-Llama3-8B-Instruct3.0) **OpenXLab**部署文件及说明  
+  - 目前OpenXLab采用了新的模型下载方式
+  
+  ```Python
+  base_path = './EmoLLM-Llama3-8B-Instruct3.0'
+  os.system(f'git clone https://code.openxlab.org.cn/chg0901/EmoLLM-Llama3-8B-Instruct3.0.git {base_path}')
+  os.system(f'cd {base_path} && git lfs pull')
+  ```
+
+  - 启动文件 [app_web_demo-Llama3.py](../app_web_demo-Llama3.py)
+  - git lfs 依赖文件 [packages.txt](../packages.txt)
+  - 部署时注意其他设置和具体细节，请参照[openxlab-deploy](https://github.com/InternLM/Tutorial/tree/camp2/tools/openxlab-deploy)
+  ![](../assets/new_openxlab_app_demo.png)
+  - 在线体验链接 [EmoLLM Llama3心理咨询室V3.0](https://st-app-center-006861-9746-jlroxvg.openxlab.space/) ，或者前往[OpenXLab EmoLLM3.0-Llama3](https://openxlab.org.cn/apps/detail/chg0901/EmoLLM-Llama3-8B-Instruct3.0)启动
 
 ### 更新的有关参考教程
 
@@ -91,9 +109,9 @@ llama3_chat=dict(
 
 - 微调模型是为对话应用训练的。
 - 为了获得它们的预期特性和性能，需要遵循 [ChatFormat](https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py#L202) 中定义的特定格式：
-  1. 提示以特殊令牌 <|begin_of_text|> 开始，之后跟随一个或多个消息。
-  2. 每条消息以标签 <|start_header_id|> 开始，角色为 system、user 或 assistant，并以标签 <|end_header_id|> 结束。
-  3. 在双换行 \n\n 之后，消息的内容随之而来。每条消息的结尾由 <|eot_id|> 令牌标记。
+  1. 提示以特殊令牌 `<|begin_of_text|>` 开始，之后跟随一个或多个消息。
+  2. 每条消息以标签 `<|start_header_id|>` 开始，角色为 `system`、`user` 或 `assistant`，并以标签 `<|end_header_id|>` 结束。
+  3. 在双换行 `\n\n` 之后，消息的内容随之而来。每条消息的结尾由 `<|eot_id|>` 令牌标记。
 - Ref： [ArtificialZeng/llama3_explained](https://github.com/ArtificialZeng/llama3_explained)
 
 ### 安装XTuner-0.1.18