bakaEC/OliveSensorAPI

History

zealot52099 9b4e58f732 [DOC]update datasets/README.md		2024-03-20 17:40:31 +08:00
..
processed	feat: add internlm2-chat-7b-config	2024-03-03 21:08:52 +08:00
aiwei.json	feat：Add new finetune configurations and datasets	2024-02-23 11:36:58 +08:00
data_pro.json	feat：Add new finetune configurations and datasets	2024-02-23 11:36:58 +08:00
data.json	feat: add datasets and update readme	2024-01-26 22:43:38 +08:00
deduplicate.py	add deduplicate.py	2024-03-19 20:09:44 +08:00
multi_turn_dataset_1.json	upload smile.dataset	2024-02-28 17:44:48 +08:00
multi_turn_dataset_2.json	Add files via upload	2024-02-28 21:18:02 +08:00
README_EN.md	Update README_EN.md	2024-03-10 16:09:17 +08:00
README.md	[DOC]update datasets/README.md	2024-03-20 17:40:31 +08:00
single_turn_dataset_1.json	Upload datasets	2024-02-27 22:01:53 +08:00
single_turn_dataset_2.json	Upload datasets	2024-02-27 22:01:53 +08:00
SoulStar_data.json	add SoulStar_data	2024-03-03 17:28:26 +08:00
tiangou.json	feat：Add new finetune configurations and datasets	2024-02-24 22:39:10 +08:00

README_EN.md

EmoLLM's datasets

Category of dataset: General and Role-play
Type of data: QA and Conversation
Summary: General(6 datasets), Role-play(3 datasets)

Category

General: generic dataset, including psychological Knowledge, counseling technology, etc.
Role-play: role-playing dataset, including character-specific conversation style data, etc.

Type

QA: question-and-answer pair
Conversation: multi-turn consultation dialogue

Summary

Category	Dataset	Type	Total
General	data	Conversation	5600+
General	data_pro	Conversation	36500+
General	multi_turn_dataset_1	Conversation	36,000+
General	multi_turn_dataset_2	Conversation	27,000+
General	single_turn_dataset_1	QA	14000+
General	single_turn_dataset_2	QA	18300+
Role-play	aiwei	Conversation	4000+
Role-play	SoulStar	QA	11200+
Role-play	tiangou	Conversation	3900+
……	……	……	……

Source

General：

dataset data from this repo
dataset data_pro from this repo
dataset multi_turn_dataset_1 from Smile
dataset multi_turn_dataset_2 from CPsyCounD
dataset single_turn_dataset_1 from this repo
dataset single_turn_dataset_2 from this repo

Role-play：

dataset aiwei from this repo
dataset tiangou from this repo
dataset SoulStar from SoulStar