* Add files via upload

* 新增ENmd文档

* Update README.md

* Update README_EN.md

* Update LICENSE

* [docs] update lmdeploy file

* add ocr.md

* Update tutorial.md

* Update tutorial_EN.md

* Update General_evaluation_EN.md

* Update General_evaluation_EN.md

* Update README.md

Add InternLM2_7B_chat_full's professional evaluation results

* Update Professional_evaluation.md

* Update Professional_evaluation.md

* Update Professional_evaluation.md

* Update Professional_evaluation.md

* Update Professional_evaluation_EN.md

* Update README.md

* Update README.md

* Update README_EN.md

* Update README_EN.md

* Update README_EN.md

* [DOC] update readme

* Update LICENSE

* Update LICENSE

* update personal info and small format optimizations

* update personal info and translations for contents in a table

* Update RAG README

* Update demo link in README.md

* Update xlab app link

* Update xlab link

* add xlab model

* Update web_demo-aiwei.py

* add bitex

---------

Co-authored-by: xzw <62385492+aJupyter@users.noreply.github.com>
Co-authored-by: এ許我辞忧࿐♡ <127636623+Smiling-Weeping-zhr@users.noreply.github.com>
Co-authored-by: Vicky <vicky_3021@163.com>
Co-authored-by: MING_X <119648793+MING-ZCH@users.noreply.github.com>
Co-authored-by: Nobody-ML <1755309985@qq.com>
Co-authored-by: 8baby8 <3345710651@qq.com>
Co-authored-by: chaoke <101492509+8baby8@users.noreply.github.com>
Co-authored-by: aJupyter <ajupyter@163.com>
Co-authored-by: HongCheng <kwchenghong@gmail.com>
Co-authored-by: santiagoTOP <“1537211712top@gmail.com”>

2024-03-15 19:51:04 +08:00

1.5 KiB

Raw Blame History

EmoLLM's general evaluation

Introduction

This document provides instructions on how to use the 'eval.py' and 'metric.py' scripts. These scripts are used to evaluate the generation results of EmoLLM- a large model of mental health.

Installation

Python 3.x
PyTorch
Transformers
Datasets
NLTK
Rouge
Jieba

It can be installed using the following command:

pip install torch transformers datasets nltk rouge jieba

Usage

convert.py

Convert raw multi-round conversation data into single round data for evaluation.

eval.py

The eval.py script is used to generate the doctor's response and evaluate it, mainly divided into the following parts:

Load the model and word divider.
Set test parameters, such as the number of test data and batch size.
Obtain data.
Generate responses and evaluate.

metric.py

The metric.py script contains functions to calculate evaluation metrics, which can be set to evaluate by character level or word level, currently including BLEU and ROUGE scores.

Results

Test the data in data.json with the following results:

Model	ROUGE-1	ROUGE-2	ROUGE-L	BLEU-1	BLEU-2	BLEU-3	BLEU-4
Qwen1_5-0_5B-chat	27.23%	8.55%	17.05%	26.65%	13.11%	7.19%	4.05%
InternLM2_7B_chat_qlora	37.86%	15.23%	24.34%	39.71%	22.66%	14.26%	9.21%
InternLM2_7B_chat_full	32.45%	10.82%	20.17%	30.48%	15.67%	8.84%	5.02%

1.5 KiB Raw Blame History