Commit Graph

33 Commits

Author SHA1 Message Date
Anooyman
de0674ccf7
Update main code (#2)
* update rag/src/data_processing.py

* Add files via upload

allow user to load embedding & rerank models from cache

* Add files via upload

embedding_path = os.path.join(model_dir, 'embedding_model')  
rerank_path = os.path.join(model_dir, 'rerank_model')

* 测试push dev

测试push dev

* Add files via upload

两个母亲多轮对话数据集合并、清理和去重之后,得到 2439 条多轮对话数据(每条有6-8轮对话)。

* optimize deduplicate.py

Add time print information
save duplicate dataset as well
remove print(content)

* add base model qlora fintuning config file: internlm2_7b_base_qlora_e10_M_1e4_32_64.py

* add full finetune code from internlm2

* other 2 configs for base model

* update cli_internlm2.py

 three methods to load model

1. download model in openxlab
2. download model in modelscope
3. offline model

* create upload_modelscope.py

* add base model and update personal contributions

* add README.md for Emollm_Scientist

* Create README_internlm2_7b_base_qlora.md

InternLM2 7B Base QLoRA 微调指南

* [DOC]EmoLLM_Scientist微调指南

* [DOC]EmoLLM_Scientist微调指南

* [DOC]EmoLLM_Scientist微调指南

* [DOC]EmoLLM_Scientist微调指南

* [DOC]EmoLLM_Scientist微调指南

* [DOC]EmoLLM_Scientist微调指南

* update

* [DOC]README_scientist.md

* delete config

* format update

* upload xlab

* add README_Model_Uploading.md and images

* modelscope model upload

* Modify Recent Updates

* update daddy-like Boy-Friend EmoLLM

* update model uploading with openxlab

* update model uploading with openxlab

---------

Co-authored-by: zealot52099 <songyan5209@163.com>
Co-authored-by: xzw <62385492+aJupyter@users.noreply.github.com>
Co-authored-by: zealot52099 <67356208+zealot52099@users.noreply.github.com>
Co-authored-by: Bryce Wang <90940753+brycewang2018@users.noreply.github.com>
Co-authored-by: HongCheng <kwchenghong@gmail.com>
2024-03-24 11:51:19 +08:00
xzw
8a1e0df9d3
[DOC]update datesets/README.md (#115) 2024-03-21 15:50:20 +08:00
HongCheng
4ff7910368
Update process_merge.py 2024-03-21 16:07:18 +09:00
HongCheng
d25a304c4d
Update process_single_turn_conversation_construction.py 2024-03-21 16:06:41 +09:00
HongCheng
085a01eafa add dataset processing codes
1. update process.py for multi_turn_dataset(1 and 2) and data.json, data_pro.json
2. add datasets\processed\process_single_turn_conversation_construction.py for single-turn dataset (1 and 2)
3. add datasets\processed\process_merge.py for these 6 updated dataset in datasets\processed\
2024-03-21 16:01:54 +09:00
HongCheng
ce2cb5156c update data.json (delete 4 empty data)
4 empty lines in data.json 425 483 742 1120
2024-03-21 15:56:54 +09:00
zealot52099
e2025cc8ea [DOC]update datesets/README.md 2024-03-21 08:24:15 +08:00
zealot52099
3b21f79c3c Merge branch 'dev' of https://github.com/SmartFlowAI/EmoLLM into dev 2024-03-21 07:59:16 +08:00
zealot52099
c354ffd7e0 [DOC]update datesets/README.md 2024-03-21 07:58:13 +08:00
xzw
f5eb0ddc93
Merge pull request #113 from lll997150986/main
scientist.json
2024-03-20 23:44:46 +08:00
jeky
dbdd731565 1111 2024-03-20 23:25:07 +08:00
zealot52099
77ff2d079c update deduplicate.py 2024-03-20 23:08:36 +08:00
zealot52099
41744ed604 [DOC] update datasets/README_EN.md 2024-03-20 17:52:23 +08:00
zealot52099
9b4e58f732 [DOC]update datasets/README.md 2024-03-20 17:40:31 +08:00
zealot52099
b542929c1d add deduplicate.py 2024-03-19 20:09:44 +08:00
zealot52099
861f12d47a add deduplicate.py 2024-03-19 16:41:09 +08:00
MING_X
b499aec9da
Update README_EN.md 2024-03-10 16:09:17 +08:00
MING_X
49998436b9
Update README.md 2024-03-10 16:04:31 +08:00
MING_X
3a49c22983
Create README_EN.md 2024-03-06 17:58:17 +08:00
MING_X
b8bd726849
Create README.md 2024-03-05 23:24:33 +08:00
aJupyter
4d8ae7d428 feat: add internlm2-chat-7b-config 2024-03-03 21:08:52 +08:00
Nobody-ML
a71de6ce24 add SoulStar_data 2024-03-03 17:28:26 +08:00
MING_X
4a1ef9c083
Add files via upload 2024-02-28 21:18:02 +08:00
MING_X
97f0cc068a
upload smile.dataset 2024-02-28 17:44:48 +08:00
MING_X
96b0cf76dd
Delete datasets/qa_dataset.json 2024-02-27 22:03:56 +08:00
MING_X
7ebb05c236
Upload datasets
two cleaned single_turn datasets from qa_dataset.json
2024-02-27 22:01:53 +08:00
MrCatAI
6739f2ed4c new_dataset 2024-02-26 17:25:06 +00:00
MrCatAI
6e70c62771 qa_dataset 2024-02-26 17:22:11 +00:00
ZhouXinAo
52c7d63d49
feat:Add new finetune configurations and datasets 2024-02-24 22:39:10 +08:00
ZhouXinAo
a691e78307
Delete datasets/tiangou.json 2024-02-24 22:38:52 +08:00
ZhouXinAo
f505efb0c4
Add files via upload
Add new finetune configurations and datasets
2024-02-24 22:37:08 +08:00
jupyter
1a6b8eac20 feat:Add new finetune configurations and datasets 2024-02-23 11:36:58 +08:00
jupyter
294d5d1d60 feat: add datasets and update readme 2024-01-26 22:43:38 +08:00