OliveSensorAPI/evaluate
MING_X 54a8ad2081 Update README.md
* Update news
* Update paper link - [《CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling》](https://arxiv.org/abs/2405.16433)
2024-05-28 17:04:36 +08:00
..
data_dir add InterLM2_7B eval 2024-03-03 17:20:16 +08:00
General_evaluation_EN.md [ADD] add evaluation result of base model on 5/10 epochs 2024-03-28 15:38:37 +08:00
General_evaluation.md [ADD] add evaluation result of base model on 5/10 epochs 2024-03-28 15:38:37 +08:00
InternLM2_7B_chat_eval.py add InternLM2_7B_chat_full eval 2024-03-03 22:50:54 +08:00
metric.py add InterLM2_7B eval 2024-03-03 17:20:16 +08:00
Professional_evaluation_EN.md Update README.md 2024-05-28 17:04:36 +08:00
Professional_evaluation.md Update README.md 2024-05-28 17:04:36 +08:00
Qwen1_5-0_5B-Chat_eval.py add InterLM2_7B eval 2024-03-03 17:20:16 +08:00
qwen_generation_utils.py add evaluation part 2024-02-28 20:14:46 +08:00
README_EN.md Update README_EN.md 2024-04-09 19:07:14 +08:00
README.md Update README.md 2024-04-09 19:06:11 +08:00

EmoLLM Evaluation

General Metrics Evaluation

Model ROUGE-1 ROUGE-2 ROUGE-L BLEU-1 BLEU-2 BLEU-3 BLEU-4
Qwen1_5-0_5B-chat 27.23% 8.55% 17.05% 26.65% 13.11% 7.19% 4.05%
InternLM2_7B_chat_qlora 37.86% 15.23% 24.34% 39.71% 22.66% 14.26% 9.21%
InternLM2_7B_chat_full 32.45% 10.82% 20.17% 30.48% 15.67% 8.84% 5.02%
InternLM2_7B_base_qlora_5epoch 41.94% 20.21% 29.67% 42.98% 27.07% 19.33% 14.62%
InternLM2_7B_base_qlora_10epoch 43.47% 22.06% 31.4% 44.81% 29.15% 21.44% 16.72%

Professional Metrics Evaluation

Model Comprehensiveness rofessionalism Authenticity Safety
InternLM2_7B_chat_qlora 1.32 2.20 2.10 1.00
InternLM2_7B_chat_full 1.40 2.45 2.24 1.00
InternLM2_20B_chat_lora 1.42 2.39 2.22 1.00