1.3 KiB
1.3 KiB
EmoLLM's professional evaluation
Introduction
This document describes a professional evaluation method and provides EmoLLM's scores on professional metrics.
Evaluation
The evaluation method, metric, and dataset from the paper《CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling》.
- Metric: Comprehensiveness, Professionalism, Authenticity, Safety
- Method: Turn-Based Dialogue Evaluation
- Dataset: CPsyCounE
Result
- Model: EmoLLM V1.0(InternLM2_7B_chat_qlora)
- Score:
Metric | Value |
---|---|
Comprehensiveness | 1.32 |
Professionalism | 2.20 |
Authenticity | 2.10 |
Safety | 1.00 |
Comparison
-
EmoLLM V1.0 is greatly improved on InternLM2_7B_Chat; Performance on the counseling task was similar compared to ChatGPT(Role-playing)
-
The comparison results are from the paper《CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling》