From e32d9a11c2bbf7cc4523c1655f875c8cef003015 Mon Sep 17 00:00:00 2001 From: MING_X <119648793+MING-ZCH@users.noreply.github.com> Date: Fri, 1 Mar 2024 20:32:26 +0800 Subject: [PATCH 1/4] Update and rename README.md to General evaluation.md --- evaluate/{README.md => General evaluation.md} | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) rename evaluate/{README.md => General evaluation.md} (87%) diff --git a/evaluate/README.md b/evaluate/General evaluation.md similarity index 87% rename from evaluate/README.md rename to evaluate/General evaluation.md index c15464e..1c716be 100644 --- a/evaluate/README.md +++ b/evaluate/General evaluation.md @@ -1,9 +1,8 @@ - # EmoLLM通用指标评估 ## 简介 -此 README 文件提供了关于如何使用 `eval.py` 和 `metric.py` 两个脚本的指导。这些脚本用于评估 EmoLLM-心理健康大模型的生成结果。 +此文件提供了关于如何使用 `eval.py` 和 `metric.py` 两个脚本的指导。这些脚本用于评估 EmoLLM-心理健康大模型的生成结果。 ## 安装 From aea78d5ef90774c97b13cc5b9aaaf70748b390e0 Mon Sep 17 00:00:00 2001 From: MING_X <119648793+MING-ZCH@users.noreply.github.com> Date: Fri, 1 Mar 2024 20:59:01 +0800 Subject: [PATCH 2/4] Create Professional evaluation.md --- evaluate/Professional evaluation.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 evaluate/Professional evaluation.md diff --git a/evaluate/Professional evaluation.md b/evaluate/Professional evaluation.md new file mode 100644 index 0000000..0f2bd09 --- /dev/null +++ b/evaluate/Professional evaluation.md @@ -0,0 +1,28 @@ +# EmoLLM专业指标评估 + +## 简介 + +本文档介绍一种专业评测方法,并提供 EmoLLM 在专业指标的得分。 + +## 评测方法 + +本评测方法采用论文《CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling》提出的评测指标与方法。 +* 指标:Comprehensiveness, Professionalism, Authenticity, Safety +* 方法:Turn-Based Dialogue Evaluation +* 数据集:CPsyCounE + +## 评测结果 + +评测模型: [EmoLLM](https://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model)(InternLM2-7B-chat + qlora), 得分: +| Metric | Value | +|-------------------|------------| +| Comprehensiveness | 1.32 | +| Professionalism | 2.20 | +| Authenticity | 2.10 | +| Safety | 1.00 | + +## 比较 +* [EmoLLM](https://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model) 在 InternLM2-7B-Chat 基础上提升较大;相比 Role-playing ChatGPT 在心理咨询任务上能力相近 + +* 对比结果图片来源于论文《CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling》 +![image](https://github.com/MING-ZCH/EmoLLM/assets/119648793/abc9f626-11bc-4ec8-84a4-427c4600a720) From 324a1da1ba661c2268ed6c82a3b0a7f33713ebf2 Mon Sep 17 00:00:00 2001 From: MING_X <119648793+MING-ZCH@users.noreply.github.com> Date: Fri, 1 Mar 2024 21:01:56 +0800 Subject: [PATCH 3/4] Update General evaluation.md --- evaluate/General evaluation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/evaluate/General evaluation.md b/evaluate/General evaluation.md index 1c716be..fee6bca 100644 --- a/evaluate/General evaluation.md +++ b/evaluate/General evaluation.md @@ -2,7 +2,7 @@ ## 简介 -此文件提供了关于如何使用 `eval.py` 和 `metric.py` 两个脚本的指导。这些脚本用于评估 EmoLLM-心理健康大模型的生成结果。 +本文档提供了关于如何使用 `eval.py` 和 `metric.py` 两个脚本的指导。这些脚本用于评估 EmoLLM-心理健康大模型的生成结果。 ## 安装 From 4301688bc87be6f2dcb2587400328cccd9875614 Mon Sep 17 00:00:00 2001 From: MING_X <119648793+MING-ZCH@users.noreply.github.com> Date: Fri, 1 Mar 2024 21:05:46 +0800 Subject: [PATCH 4/4] Create README.md --- evaluate/README.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 evaluate/README.md diff --git a/evaluate/README.md b/evaluate/README.md new file mode 100644 index 0000000..9f8bc46 --- /dev/null +++ b/evaluate/README.md @@ -0,0 +1,26 @@ +# EmoLLM评测 + +## 通用指标评测 + +* 具体指标、方法见 General evaluation.md + +| Metric | Value | +|---------|----------------------| +| ROUGE-1 | 27.23% | +| ROUGE-2 | 8.55% | +| ROUGE-L | 17.05% | +| BLEU-1 | 26.65% | +| BLEU-2 | 13.11% | +| BLEU-3 | 7.19% | +| BLEU-4 | 4.05% | + +## 专业指标评测 + +* 具体指标、方法见 Professional evaluation.md + +| Metric | Value | +|-------------------|------------| +| Comprehensiveness | 1.32 | +| Professionalism | 2.20 | +| Authenticity | 2.10 | +| Safety | 1.00 |