OliveSensorAPI/datasets/README_EN.md
2024-03-10 16:09:17 +08:00

44 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# EmoLLM's datasets
* Category of dataset: **General** and **Role-play**
* Type of data: **QA** and **Conversation**
* Summary: General(**6 datasets**), Role-play(**3 datasets**)
## Category
* **General**: generic dataset, including psychological Knowledge, counseling technology, etc.
* **Role-play**: role-playing dataset, including character-specific conversation style data, etc.
## Type
* **QA**: question-and-answer pair
* **Conversation**: multi-turn consultation dialogue
## Summary
| Category | Dataset | Type | Total |
| :---------: | :-------------------: | :----------: | :-----: |
| *General* | data | Conversation | 5600+ |
| *General* | data_pro | Conversation | 36500+ |
| *General* | multi_turn_dataset_1 | Conversation | 36,000+ |
| *General* | multi_turn_dataset_2 | Conversation | 27,000+ |
| *General* | single_turn_dataset_1 | QA | 14000+ |
| *General* | single_turn_dataset_2 | QA | 18300+ |
| *Role-play* | aiwei | Conversation | 4000+ |
| *Role-play* | SoulStar | QA | 11200+ |
| *Role-play* | tiangou | Conversation | 3900+ |
| …… | …… | …… | …… |
## Source
**General**
* dataset `data` from this repo
* dataset `data_pro` from this repo
* dataset `multi_turn_dataset_1` from [Smile](https://github.com/qiuhuachuan/smile)
* dataset `multi_turn_dataset_2` from [CPsyCounD](https://github.com/CAS-SIAT-XinHai/CPsyCoun)
* dataset `single_turn_dataset_1` from this repo
* dataset `single_turn_dataset_2` from this repo
**Role-play**
* dataset `aiwei` from this repo
* dataset `tiangou` from this repo
* dataset `SoulStar` from [SoulStar](https://github.com/Nobody-ML/SoulStar)