History

Anooyman 14890fad56 Update code (#8 ) * feat: add agents/actions/write_markdown * [ADD] add evaluation result of base model on 5/10 epochs * Rename mother.json to mother_v1_2439.json * Add files via upload * [DOC] update README * Update requirements.txt update mpi4py installation * Update README_EN.md update English comma * Update README.md 基于母亲角色的多轮对话模型微调完毕。已上传到 Huggingface。 * 多轮对话母亲角色的微调的脚本 * Update README.md 加上了王几行XING 和思在的作者信息 * Update README_EN.md * Update README.md * Update README_EN.md * Update README_EN.md * Changes to be committed: modified: .gitignore modified: README.md modified: README_EN.md new file: assets/EmoLLM_transparent.png deleted: assets/Shusheng.jpg new file: assets/Shusheng.png new file: assets/aiwei_demo1.gif new file: assets/aiwei_demo2.gif new file: assets/aiwei_demo3.gif new file: assets/aiwei_demo4.gif * Update README.md rectify aiwei_demo.gif * Update README.md rectify aiwei_demo style * Changes to be committed: modified: README.md modified: README_EN.md * Changes to be committed: modified: README.md modified: README_EN.md * [Doc] update readme * [Doc] update readme * Update README.md * Update README_EN.md * Update README.md * Update README_EN.md * Delete datasets/mother_v1_2439.json * Rename mother_v2_3838.json to mother_v2.json * Delete datasets/mother_v2.json * Add files via upload * Update README.md * Update README_EN.md * [Doc] Update README_EN.md minor fix * InternLM2-Base-7B QLoRA微调模型链接和测评结果更新 * add download_model.py script, automatic download of model libraries * 清除图片的黑边、更新作者信息 modified: README.md new file: assets/aiwei_demo.gif deleted: assets/aiwei_demo1.gif modified: assets/aiwei_demo2.gif modified: assets/aiwei_demo3.gif modified: assets/aiwei_demo4.gif * rectify aiwei_demo transparent * transparent * modify: aiwei_demo table--->div * modified: aiwei_demo * modify: div ---> table * modified: README.md * modified: README_EN.md * update model config file links * Create internlm2_20b_chat_lora_alpaca_e3.py 20b模型的配置文件 * update model config file links update model config file links * Revert "update model config file links" --------- Co-authored-by: jujimeizuo <fengzetao.zed@foxmail.com> Co-authored-by: xzw <62385492+aJupyter@users.noreply.github.com> Co-authored-by: Zeyu Ba <72795264+ZeyuBa@users.noreply.github.com> Co-authored-by: Bryce Wang <90940753+brycewang2018@users.noreply.github.com> Co-authored-by: zealot52099 <songyan5209@163.com> Co-authored-by: HongCheng <kwchenghong@gmail.com> Co-authored-by: Yicong <yicooong@qq.com> Co-authored-by: Yicooong <54353406+Yicooong@users.noreply.github.com> Co-authored-by: aJupyter <ajupyter@163.com> Co-authored-by: MING_X <119648793+MING-ZCH@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: HatBoy <null2none@163.com> Co-authored-by: ZhouXinAo <142309012+zxazys@users.noreply.github.com>		2024-04-14 10:09:17 +08:00
..
processed	Update process_merge.py	2024-03-21 16:07:18 +09:00
aiwei.json	feat：Add new finetune configurations and datasets	2024-02-23 11:36:58 +08:00
data_pro.json	feat：Add new finetune configurations and datasets	2024-02-23 11:36:58 +08:00
data.json	update data.json (delete 4 empty data)	2024-03-21 15:56:54 +09:00
deduplicate.py	Update main code (#2 )	2024-03-24 11:51:19 +08:00
LICENSE	Update main code (#2 )	2024-03-24 11:51:19 +08:00
mother_v1.json	Update code (#8 )	2024-04-14 10:09:17 +08:00
mother_v2.json	Update code (#8 )	2024-04-14 10:09:17 +08:00
multi_turn_dataset_1.json	upload smile.dataset	2024-02-28 17:44:48 +08:00
multi_turn_dataset_2.json	Add files via upload	2024-02-28 21:18:02 +08:00
README_EN.md	Update code (#8 )	2024-04-14 10:09:17 +08:00
README.md	Update code (#8 )	2024-04-14 10:09:17 +08:00
scientist.json	1111	2024-03-20 23:25:07 +08:00
single_turn_dataset_1.json	Upload datasets	2024-02-27 22:01:53 +08:00
single_turn_dataset_2.json	Upload datasets	2024-02-27 22:01:53 +08:00
SoulStar_data.json	add SoulStar_data	2024-03-03 17:28:26 +08:00
tiangou.json	feat：Add new finetune configurations and datasets	2024-02-24 22:39:10 +08:00

README_EN.md

EmoLLM's datasets

Category of dataset: General and Role-play
Type of data: QA and Conversation
Summary: General(6 datasets), Role-play(5 datasets)

Type

QA: question-and-answer pair
Conversation: multi-turn consultation dialogue

Summary

Category	Dataset	Type	Total
General	data	Conversation	5600+
General	data_pro	Conversation	36,500+
General	multi_turn_dataset_1	Conversation	36,000+
General	multi_turn_dataset_2	Conversation	27,000+
General	single_turn_dataset_1	QA	14,000+
General	single_turn_dataset_2	QA	18,300+
Role-play	aiwei	Conversation	4000+
Role-play	SoulStar	QA	11,200+
Role-play	tiangou	Conversation	3900+
Role-play	mother	Conversation	40,300+
Role-play	scientist	Conversation	28,400+
……	……	……	……

Source

General：

dataset data from this repo
dataset data_pro from this repo
dataset multi_turn_dataset_1 from Smile
dataset multi_turn_dataset_2 from CPsyCounD
dataset single_turn_dataset_1 from this repo
dataset single_turn_dataset_2 from this repo

Role-play：

dataset aiwei from this repo
dataset tiangou from this repo
dataset SoulStar from SoulStar
dataset mother from this repo
dataset scientist from this repo

Dataset Deduplication： Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced via adjusting the threshold.

https://algonotes.readthedocs.io/en/latest/Simhash.html

README_EN.md Unescape Escape

EmoLLM's datasets

Category

Type

Summary

Source

README_EN.md