This commit is contained in:
MING_X 2024-05-04 19:48:44 +08:00 committed by GitHub
commit 7dc7ea2ddd
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 15 additions and 92 deletions

View File

@ -122,12 +122,6 @@
<img src="https://github.com/SmartFlowAI/EmoLLM/assets/62385492/7e931682-c54d-4ded-bc67-79130c68d744" alt="模型下载量">
</p>
- 【2024.2.5】 项目荣获公众号**NLP工程化**推文宣传[推文链接](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A),为博主推广一波,欢迎大家关注!!🥳🥳
<p align="center">
<img src="https://github.com/SmartFlowAI/EmoLLM/assets/62385492/47868d6a-2e91-4aa9-a630-e594c14295b4" alt="公众号二维码">
</p>
- 【2024.2.3】 [项目宣传视频](https://www.bilibili.com/video/BV1N7421N76X/)完成 😊
- 【2024.1.27】 完善数据构建文档、微调指南、部署指南、Readme等相关文档 👏
- 【2024.1.25】 EmoLLM V1.0 已部署上线 https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀
@ -143,7 +137,9 @@
<img src="assets/Shusheng.png" alt="浦语挑战赛创新创意奖">
</p>
- 项目荣获公众号**NLP工程化**[推文宣传](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A)
- 🎉感谢以下媒体及公众号朋友对本项目的报道和支持(以下排名不分先后! 若有遗漏、十分抱歉, 一并感激! 欢迎补充!): [NLP工程化](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A), [机智流](https://mp.weixin.qq.com/s/_wMCmssRMGd0Oz5OVVkjAA), [爱可可爱生活](https://mp.weixin.qq.com/s/4WaCg4OpkCWXEuWHuV4r3w), [阿郎小哥](https://mp.weixin.qq.com/s/_MSMeL1XHP0v5lDi3YaPVw), [大模型日知路](https://mp.weixin.qq.com/s/FYYibsCXtfU6FFM9TuKILA), [AI Code](https://mp.weixin.qq.com/s/yDWGY3S4CwCi6U_irsFmqA) 等!
- 项目宣传视频 [EmoLLM](https://www.bilibili.com/video/BV1N7421N76X/) 已发布,欢迎大家围观 😀
### 🎯路线图

View File

@ -149,9 +149,10 @@ The Model aims to fully understand and promote the mental health of individuals,
<img src="assets/Shusheng.png" alt="Challenge Innovation and Creativity Award">
</p>
- 🎉 Thanks to the following media and friends for their coverage and support of our project(Listed below in no particular order! Sorry for any omissions, we appreciate it! Feel free to add!): [NLP工程化](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A), [机智流](https://mp.weixin.qq.com/s/_wMCmssRMGd0Oz5OVVkjAA), [爱可可爱生活](https://mp.weixin.qq.com/s/4WaCg4OpkCWXEuWHuV4r3w), [阿郎小哥](https://mp.weixin.qq.com/s/_MSMeL1XHP0v5lDi3YaPVw), [大模型日知路](https://mp.weixin.qq.com/s/FYYibsCXtfU6FFM9TuKILA), [AI Code](https://mp.weixin.qq.com/s/yDWGY3S4CwCi6U_irsFmqA), etc!
- The project has been promoted by the official WeChat account **NLP Engineering**. Here's the [link](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A).
- Project Vedio [EmoLLM](https://www.bilibili.com/video/BV1N7421N76X/) has been released for viewing! 😀
### Roadmap
<p align="center">

View File

@ -24,6 +24,8 @@
| *General* | multi_turn_dataset_2 | Conversation | 27,000+ |
| *General* | single_turn_dataset_1 | QA | 14,000+ |
| *General* | single_turn_dataset_2 | QA | 18,300+ |
| *General* | self_cognition_EmoLLM | QA | 85+ |
| *General* | ruozhiba_raw | QA | 240+ |
| *Role-play* | aiwei | Conversation | 4000+ |
| *Role-play* | SoulStar | QA | 11,200+ |
| *Role-play* | tiangou | Conversation | 3900+ |
@ -41,6 +43,8 @@
* 数据集 `multi_turn_dataset_2` 来源 [CPsyCounD](https://github.com/CAS-SIAT-XinHai/CPsyCoun)
* 数据集 `single_turn_dataset_1` 来自本项目
* 数据集 `single_turn_dataset_2` 来自本项目
* 数据集 `self_cognition_EmoLLM` 来自本项目
* 数据集 `ruozhiba_raw` 来源[COIG-CQIA](https://huggingface.co/datasets/m-a-p/COIG-CQIA/viewer/ruozhiba)
### **Role-play**

View File

@ -22,6 +22,8 @@
| *General* | multi_turn_dataset_2 | Conversation | 27,000+ |
| *General* | single_turn_dataset_1 | QA | 14,000+ |
| *General* | single_turn_dataset_2 | QA | 18,300+ |
| *General* | self_cognition_EmoLLM | QA | 85+ |
| *General* | ruozhiba_raw | QA | 240+ |
| *Role-play* | aiwei | Conversation | 4000+ |
| *Role-play* | SoulStar | QA | 11,200+ |
| *Role-play* | tiangou | Conversation | 3900+ |
@ -38,6 +40,8 @@
* dataset `multi_turn_dataset_2` from [CPsyCounD](https://github.com/CAS-SIAT-XinHai/CPsyCoun)
* dataset `single_turn_dataset_1` from this repo
* dataset `single_turn_dataset_2` from this repo
* dataset `self_cognition_EmoLLM` from this repo
* dataset `ruozhiba_raw` from [COIG-CQIA](https://huggingface.co/datasets/m-a-p/COIG-CQIA/viewer/ruozhiba)
**Role-play**
* dataset `aiwei` from this repo

View File

@ -4,90 +4,8 @@
通过使用doc2x的库实现将pdf文件转换为结构化md文档。
通过代码调用(需要提供api_key)
通过代码调用(需要提供api_key),详见代码`pdf2md.py`
~~~python
import requests as rq
import json
import os
import zipfile
class PDF2MD:
def __init__(self, api_key):
self.api_key = api_key
self.url = "https://api.doc2x.noedgeai.com/api/v1/pdf"
self.export_url = "https://api.doc2x.noedgeai.com/api/export"
def convert(self, filepath, to="md"):
filename = os.path.splitext(os.path.basename(filepath))[0]
res = rq.post(self.url, files={"file": open(filepath, "rb")}, headers={"Authorization": "Bearer " + self.api_key}, stream=True)
if res.status_code == 200:
txt_path = filename + ".txt"
with open(txt_path, "w", encoding="utf-8") as f:
for line in res.iter_lines():
if len(line) > 0:
decoded_line = line.decode("utf-8")
f.write(decoded_line + "\n")
print(decoded_line)
uuid = json.loads(decoded_line.replace("data: ", ''))['uuid']
print(uuid)
if to == "md" or to == 'latex':
path = filename + '.zip'
elif to == 'docx':
path = filename + '.docx'
export_url = self.export_url + "?request_id=" + uuid + "&to=" + to
res = rq.get(export_url, headers={"Authorization": "Bearer " + self.api_key})
if res.status_code == 200:
with open(path, "wb") as f:
f.write(res.content)
print("下载成功,存入:", path)
if to == "md" or to == 'latex':
zip_file = zipfile.ZipFile(path)
# 创建以原始文件名命名的文件夹
if not os.path.exists(filename):
os.mkdir(filename)
# 解压到该文件夹内
for names in zip_file.namelist():
zip_file.extract(names, filename)
zip_file.close()
# 找到解压后的md文件
for file in os.listdir(filename):
if file.endswith(".md"):
extracted_md = os.path.join(filename, file)
break
# 重命名md文件
new_md_name = os.path.join(filename, filename+'.md')
os.rename(extracted_md, new_md_name)
print("解压并重命名md文件为:", new_md_name)
else:
print(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
else:
print(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
def main():
api_key = "sk-xxx"
filepath = r"test.pdf"
converter = PDF2MD(api_key)
converter.convert(filepath, to="md")
if __name__ == "__main__":
main()
~~~
## 通过网页使用在线PDF2MD服务