000491f1be
* Add files via upload * 新增ENmd文档 * Update README.md * Update README_EN.md * Update LICENSE * [docs] update lmdeploy file * add ocr.md * Update tutorial.md * Update tutorial_EN.md * Update General_evaluation_EN.md * Update General_evaluation_EN.md * Update README.md Add InternLM2_7B_chat_full's professional evaluation results * Update Professional_evaluation.md * Update Professional_evaluation.md * Update Professional_evaluation.md * Update Professional_evaluation.md * Update Professional_evaluation_EN.md * Update README.md * Update README.md * Update README_EN.md * Update README_EN.md * Update README_EN.md * [DOC] update readme * Update LICENSE * Update LICENSE * update personal info and small format optimizations * update personal info and translations for contents in a table * Update RAG README * Update demo link in README.md * Update xlab app link * Update xlab link * add xlab model * Update web_demo-aiwei.py * add bitex --------- Co-authored-by: xzw <62385492+aJupyter@users.noreply.github.com> Co-authored-by: এ許我辞忧࿐♡ <127636623+Smiling-Weeping-zhr@users.noreply.github.com> Co-authored-by: Vicky <vicky_3021@163.com> Co-authored-by: MING_X <119648793+MING-ZCH@users.noreply.github.com> Co-authored-by: Nobody-ML <1755309985@qq.com> Co-authored-by: 8baby8 <3345710651@qq.com> Co-authored-by: chaoke <101492509+8baby8@users.noreply.github.com> Co-authored-by: aJupyter <ajupyter@163.com> Co-authored-by: HongCheng <kwchenghong@gmail.com> Co-authored-by: santiagoTOP <“1537211712top@gmail.com”>
1.6 KiB
1.6 KiB
EmoLLM's professional evaluation
Introduction
This document describes a professional evaluation method and provides EmoLLM's scores on professional metrics.
Evaluation
The evaluation method, metric, and dataset from the paper《CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling》.
- Metric: Comprehensiveness, Professionalism, Authenticity, Safety
- Method: Turn-Based Dialogue Evaluation
- Dataset: CPsyCounE
Result
-
Model:
- EmoLLM V1.0 (InternLM2_7B_chat_qlora)
- EmoLLM V2.0 (InternLM2_7B_chat_full)
-
Score:
Model | Comprehensiveness | Professionalism | Authenticity | Safety |
---|---|---|---|---|
InternLM2_7B_chat_qlora | 1.32 | 2.20 | 2.10 | 1.00 |
InternLM2_7B_chat_full | 1.40 | 2.45 | 2.24 | 1.00 |
Comparison
-
EmoLLM V2.0 is greatly improved in all scores compared to EmoLLM V1.0! Surpasses the performance of Role-playing ChatGPT on counseling tasks!
-
EmoLLM V1.0 is greatly improved on InternLM2_7B_Chat; Performance on the counseling task was similar compared to ChatGPT(Role-playing)
-
The comparison results are from the paper《CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling》