14890fad56
* feat: add agents/actions/write_markdown * [ADD] add evaluation result of base model on 5/10 epochs * Rename mother.json to mother_v1_2439.json * Add files via upload * [DOC] update README * Update requirements.txt update mpi4py installation * Update README_EN.md update English comma * Update README.md 基于母亲角色的多轮对话模型微调完毕。已上传到 Huggingface。 * 多轮对话母亲角色的微调的脚本 * Update README.md 加上了王几行XING 和 思在 的作者信息 * Update README_EN.md * Update README.md * Update README_EN.md * Update README_EN.md * Changes to be committed: modified: .gitignore modified: README.md modified: README_EN.md new file: assets/EmoLLM_transparent.png deleted: assets/Shusheng.jpg new file: assets/Shusheng.png new file: assets/aiwei_demo1.gif new file: assets/aiwei_demo2.gif new file: assets/aiwei_demo3.gif new file: assets/aiwei_demo4.gif * Update README.md rectify aiwei_demo.gif * Update README.md rectify aiwei_demo style * Changes to be committed: modified: README.md modified: README_EN.md * Changes to be committed: modified: README.md modified: README_EN.md * [Doc] update readme * [Doc] update readme * Update README.md * Update README_EN.md * Update README.md * Update README_EN.md * Delete datasets/mother_v1_2439.json * Rename mother_v2_3838.json to mother_v2.json * Delete datasets/mother_v2.json * Add files via upload * Update README.md * Update README_EN.md * [Doc] Update README_EN.md minor fix * InternLM2-Base-7B QLoRA微调模型 链接和测评结果更新 * add download_model.py script, automatic download of model libraries * 清除图片的黑边、更新作者信息 modified: README.md new file: assets/aiwei_demo.gif deleted: assets/aiwei_demo1.gif modified: assets/aiwei_demo2.gif modified: assets/aiwei_demo3.gif modified: assets/aiwei_demo4.gif * rectify aiwei_demo transparent * transparent * modify: aiwei_demo table--->div * modified: aiwei_demo * modify: div ---> table * modified: README.md * modified: README_EN.md * update model config file links * Create internlm2_20b_chat_lora_alpaca_e3.py 20b模型的配置文件 * update model config file links update model config file links * Revert "update model config file links" --------- Co-authored-by: jujimeizuo <fengzetao.zed@foxmail.com> Co-authored-by: xzw <62385492+aJupyter@users.noreply.github.com> Co-authored-by: Zeyu Ba <72795264+ZeyuBa@users.noreply.github.com> Co-authored-by: Bryce Wang <90940753+brycewang2018@users.noreply.github.com> Co-authored-by: zealot52099 <songyan5209@163.com> Co-authored-by: HongCheng <kwchenghong@gmail.com> Co-authored-by: Yicong <yicooong@qq.com> Co-authored-by: Yicooong <54353406+Yicooong@users.noreply.github.com> Co-authored-by: aJupyter <ajupyter@163.com> Co-authored-by: MING_X <119648793+MING-ZCH@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: HatBoy <null2none@163.com> Co-authored-by: ZhouXinAo <142309012+zxazys@users.noreply.github.com> |
||
---|---|---|
.. | ||
processed | ||
aiwei.json | ||
data_pro.json | ||
data.json | ||
deduplicate.py | ||
LICENSE | ||
mother_v1.json | ||
mother_v2.json | ||
multi_turn_dataset_1.json | ||
multi_turn_dataset_2.json | ||
README_EN.md | ||
README.md | ||
scientist.json | ||
single_turn_dataset_1.json | ||
single_turn_dataset_2.json | ||
SoulStar_data.json | ||
tiangou.json |
EmoLLM's datasets
- Category of dataset: General and Role-play
- Type of data: QA and Conversation
- Summary: General(6 datasets), Role-play(5 datasets)
Category
- General: generic dataset, including psychological Knowledge, counseling technology, etc.
- Role-play: role-playing dataset, including character-specific conversation style data, etc.
Type
- QA: question-and-answer pair
- Conversation: multi-turn consultation dialogue
Summary
Category | Dataset | Type | Total |
---|---|---|---|
General | data | Conversation | 5600+ |
General | data_pro | Conversation | 36,500+ |
General | multi_turn_dataset_1 | Conversation | 36,000+ |
General | multi_turn_dataset_2 | Conversation | 27,000+ |
General | single_turn_dataset_1 | QA | 14,000+ |
General | single_turn_dataset_2 | QA | 18,300+ |
Role-play | aiwei | Conversation | 4000+ |
Role-play | SoulStar | QA | 11,200+ |
Role-play | tiangou | Conversation | 3900+ |
Role-play | mother | Conversation | 40,300+ |
Role-play | scientist | Conversation | 28,400+ |
…… | …… | …… | …… |
Source
General:
- dataset
data
from this repo - dataset
data_pro
from this repo - dataset
multi_turn_dataset_1
from Smile - dataset
multi_turn_dataset_2
from CPsyCounD - dataset
single_turn_dataset_1
from this repo - dataset
single_turn_dataset_2
from this repo
Role-play:
- dataset
aiwei
from this repo - dataset
tiangou
from this repo - dataset
SoulStar
from SoulStar - dataset
mother
from this repo - dataset
scientist
from this repo
Dataset Deduplication: Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced via adjusting the threshold.