Commit Graph

17 Commits

Author SHA1 Message Date
HongCheng
63e32019f4
Update data_processing.py add rag.src. 2024-05-04 12:03:34 +09:00
HongCheng
93a7a8c25d
Update data_processing.py format 2024-05-03 00:48:46 +09:00
HongCheng
92782a13ea
Update data_processing.py 2024-05-03 00:29:05 +09:00
Anooyman
2632ec390d Add RAG into internlm2 2024-04-14 12:22:35 +08:00
Anooyman
25184d894c Revert "Revert "Update RAG""
This reverts commit a67596a215.
2024-03-27 21:14:19 +08:00
Anooyman
a67596a215 Revert "Update RAG"
This reverts commit d0663208e3.
2024-03-27 21:13:26 +08:00
Anooyman
d0663208e3 Update RAG 2024-03-27 21:11:06 +08:00
Anooyman
c50b834104
Merge branch 'dev' into main 2024-03-24 16:09:51 +08:00
Anooyman
8c81c222a9 Update 2024-03-24 15:48:59 +08:00
Anooyman
f44310f665 update 2024-03-24 15:18:35 +08:00
Anooyman
de0674ccf7
Update main code (#2)
* update rag/src/data_processing.py

* Add files via upload

allow user to load embedding & rerank models from cache

* Add files via upload

embedding_path = os.path.join(model_dir, 'embedding_model')  
rerank_path = os.path.join(model_dir, 'rerank_model')

* 测试push dev

测试push dev

* Add files via upload

两个母亲多轮对话数据集合并、清理和去重之后,得到 2439 条多轮对话数据(每条有6-8轮对话)。

* optimize deduplicate.py

Add time print information
save duplicate dataset as well
remove print(content)

* add base model qlora fintuning config file: internlm2_7b_base_qlora_e10_M_1e4_32_64.py

* add full finetune code from internlm2

* other 2 configs for base model

* update cli_internlm2.py

 three methods to load model

1. download model in openxlab
2. download model in modelscope
3. offline model

* create upload_modelscope.py

* add base model and update personal contributions

* add README.md for Emollm_Scientist

* Create README_internlm2_7b_base_qlora.md

InternLM2 7B Base QLoRA 微调指南

* [DOC]EmoLLM_Scientist微调指南

* [DOC]EmoLLM_Scientist微调指南

* [DOC]EmoLLM_Scientist微调指南

* [DOC]EmoLLM_Scientist微调指南

* [DOC]EmoLLM_Scientist微调指南

* [DOC]EmoLLM_Scientist微调指南

* update

* [DOC]README_scientist.md

* delete config

* format update

* upload xlab

* add README_Model_Uploading.md and images

* modelscope model upload

* Modify Recent Updates

* update daddy-like Boy-Friend EmoLLM

* update model uploading with openxlab

* update model uploading with openxlab

---------

Co-authored-by: zealot52099 <songyan5209@163.com>
Co-authored-by: xzw <62385492+aJupyter@users.noreply.github.com>
Co-authored-by: zealot52099 <67356208+zealot52099@users.noreply.github.com>
Co-authored-by: Bryce Wang <90940753+brycewang2018@users.noreply.github.com>
Co-authored-by: HongCheng <kwchenghong@gmail.com>
2024-03-24 11:51:19 +08:00
zealot52099
0aa58372bb
Add files via upload
allow user to load embedding & rerank models from cache
2024-03-22 20:15:37 +08:00
zealot52099
b5af7793d6 update rag/src/data_processing.py 2024-03-22 07:39:44 +08:00
Anooyman
2d3bd4a8f5 Update RAG pipeline 2024-03-21 22:43:09 +08:00
zealot52099
fdf05f480c update rag/src/data_processing.py & main,py 2024-03-20 16:51:07 +08:00
zealot52099
98ecdda78d fix bug 2024-03-18 10:46:09 +08:00
zealot52099
5879afffe6 add data_processing.py 2024-03-18 10:32:27 +08:00