多模态的支持

+ 修复多个bug:消息框换行及空格问题、语音识别优化;
+ 彩蛋转正,Fay沟通与ChatGPT并行;
+ 加入yolov8姿态识别;
+ 加入VisualGLM-6B多模态单机离线大语言模型。
This commit is contained in:
xszyou 2023-05-27 17:03:43 +08:00
parent 65884afee9
commit ae1d2ae292
16 changed files with 310 additions and 48 deletions

View File

@ -10,6 +10,8 @@ Fay数字人助理版是fay开源项目的重要分支专注于构建智能
## **推荐集成**
集成VisualGLMB站视频
给Fay加上本地免费语音识别达摩院funaar: https://www.bilibili.com/video/BV1qs4y1g74e/?share_source=copy_web&vd_source=64cd9062f5046acba398177b62bea9ad
消费级pc大模型ChatGLM-6B的基础上前置Rasa会话管理https://m.bilibili.com/video/BV1D14y1f7pr
@ -39,7 +41,7 @@ UE5工程https://github.com/xszyou/fay-ue5
控制器与采用 WebSocket 方式与 UE 通讯
![](images/cs.png)
![](images/UE.png)
下载工程: [https://pan.baidu.com/s/1RBo2Pie6A5yTrCf1cn_Tuw?pwd=ck99](https://pan.baidu.com/s/1RBo2Pie6A5yTrCf1cn_Tuw?pwd=ck99)
@ -92,8 +94,11 @@ UE5工程https://github.com/xszyou/fay-ue5
│   ├── ms_tts_sdk.py # 微软 文本转语音
│   ├── xf_aiui.py # 讯飞 人机交互-自然语言处理
│   ├── chatgpt.py # gpt3.5对接
│   ├── yuan_1_0.py # 浪潮.源大模型对接
│   ├── nlp_rasa.py # ChatGLM-6B的基础上前置Rasa会话管理(强烈推荐)
│   ├── nlp_gpt.py # 对接chat.openai.com(免key)
│   ├── yuan_1_0.py # 浪潮.源大模型对接
│   ├── nlp_rasa.py # ChatGLM-6B的基础上前置Rasa会话管理(强烈推荐)
│   ├── nlp_VisualGLM.py # 对接多模态大语言模型VisualGLM-6B
│   ├── yolov8.py # yolov8资态识别
│   └── xf_ltp.py # 讯飞 情感分析
├── bin # 可执行文件目录
├── core # 数字人核心
@ -109,28 +114,36 @@ UE5工程https://github.com/xszyou/fay-ue5
│   └── window.py # 窗口模块
├── scheduler
│   └── thread_manager.py # 调度管理器
── utils # 工具模块
── utils # 工具模块
├── config_util.py
├── storer.py
└── util.py
└── test # 都是惊喜
```
## **三、升级日志**
**2023.05.27**
+ 修复多个bug消息框换行及空格问题、语音识别优化
+ 彩蛋转正Fay沟通与ChatGPT并行
+ 加入yolov8姿态识别
+ 加入VisualGLM-6B多模态单机离线大语言模型。
**2023.05.12**
+ 打出Fay数字人助理版作为主分支带货版移到分支[`fay-sales-edition`](https://github.com/TheRamU/Fay/tree/fay-sales-edition)
+ 添加Fay助理的文字沟通窗口文字与语音同步
+ 添加沟通记录本地保存功能;
+ 升级ChatGLM-6B的应用逻辑长文本与语音回复分享
+ 升级ChatGLM-6B的应用逻辑长文本与语音回复分离。
## **四、安装说明**
### **环境**
- Python 3.8、3.9、3.10
- Python 3.9、3.10
- Windows、macos、linux
### **安装依赖**
@ -155,15 +168,16 @@ python main.py
| 代码模块 | 描述 | 链接 |
| ------------------------- | -------------------------- | ------------------------------------------------------------ |
| ./ai_module/ali_nls.py | 实时语音识别免费3个月,asr二选一 | https://ai.aliyun.com/nls/trans |
| ./ai_module/funasr.py | 达摩院开源免费本地asr asr二选一 | fay/test/funasr/README.MD |
| ./ai_module/ms_tts_sdk.py | 微软 文本转情绪语音(可选) | https://azure.microsoft.com/zh-cn/services/cognitive-services/text-to-speech/ |
| ./ai_module/ali_nls.py | 实时语音识别(非必须,免费3个月,asr二选一 | https://ai.aliyun.com/nls/trans |
| ./ai_module/funasr.py | 达摩院开源免费本地asr 非必须,asr二选一 | fay/test/funasr/README.MD |
| ./ai_module/ms_tts_sdk.py | 微软 文本转情绪语音(非必须不配置时使用免费的edge-tts | https://azure.microsoft.com/zh-cn/services/cognitive-services/text-to-speech/ |
| ./ai_module/xf_ltp.py | 讯飞 情感分析 | https://www.xfyun.cn/service/emotion-analysis |
| ./utils/ngrok_util.py | ngrok.cc 外网穿透(可选) | http://ngrok.cc |
| ./ai_module/yuan_1_0.py | 浪潮源大模型NLP 4选1 | https://air.inspur.com/ |
| ./ai_module/chatgpt.py | ChatGPTNLP 4选1 | ******* |
| ./ai_module/xf_aiui.py | 讯飞自然语言处理NLP 4选1 | https://aiui.xfyun.cn/solution/webapi |
| ./ai_module/nlp_rasa.py | ChatGLM-6B的基础上前置Rasa会话管理NLP 4选1 | https://m.bilibili.com/video/BV1D14y1f7pr |
| ./ai_module/yuan_1_0.py | 浪潮源大模型NLP 多选1 | https://air.inspur.com/ |
| ./ai_module/chatgpt.py | ChatGPTNLP多选1 | ******* |
| ./ai_module/xf_aiui.py | 讯飞自然语言处理NLP多选1 | https://aiui.xfyun.cn/solution/webapi |
| ./ai_module/nlp_rasa.py | ChatGLM-6B的基础上前置Rasa会话管理NLP 多选1 | https://m.bilibili.com/video/BV1D14y1f7pr |
| ./ai_module/nlp_VisualGLM.py | 对接VisualGLM-6B多模态单机离线大语言模型NLP 多选1 | B站视频 |
@ -228,7 +242,7 @@ python main.py
商务联系QQ 467665317我们提供开发顾问、数字人模型定制及高校教学资源实施服务
http://yafrm.com/forum.php?mod=viewthread&tid=302
关注公众号获取最新微信技术交流群二维码(**请先star本仓库**
关注公众号(fay数字人)获取最新微信技术交流群二维码(**请先star本仓库**
![](images/gzh.jpg)

View File

@ -0,0 +1,37 @@
"""
这是对于清华智谱VisualGLM-6B的代码在使用前请先安装并启动好VisualGLM-6B.
https://github.com/THUDM/VisualGLM-6B
"""
import json
import requests
import uuid
import os
import cv2
from ai_module import yolov8
# Initialize an empty history list
communication_history = []
def question(cont):
if not yolov8.new_instance().get_status():
return "请先启动“Fay Eyes”"
content = {
"text":cont,
"history":communication_history}
img = yolov8.new_instance().get_img()
if yolov8.new_instance().get_status() and img is not None:
filename = str(uuid.uuid4()) + ".jpg"
current_working_directory = os.getcwd()
filepath = os.path.join(current_working_directory, "data", filename)
cv2.imwrite(filepath, img)
content["image"] = filepath
url = "http://127.0.0.1:8080"
print(content)
req = json.dumps(content)
headers = {'content-type': 'application/json'}
r = requests.post(url, headers=headers, data=req)
# Save this conversation to history
communication_history.append([cont, r.text])
return r.text + "\n(相片:" + filepath + ")"

View File

@ -1,13 +1,17 @@
from revChatGPT.V1 import Chatbot
from core.content_db import Content_Db
from utils import config_util as cfg
import time
count = 0
def question(cont):
global count
try:
chatbot = Chatbot(config={
"access_token": cfg.key_gpt_access_token,
"paid": False,
"collect_analytics": True,
"model": "gpt-4",
"conversation_id":cfg.key_gpt_conversation_id
},conversation_id=cfg.key_gpt_conversation_id,
parent_id=None)
@ -16,6 +20,11 @@ def question(cont):
response = ""
for data in chatbot.ask(prompt):
response = data["message"]
count = 0
return response
except:
return 'gpt当前繁忙请稍后重试'
except Exception as e:
count += 1
if count < 3:
time.sleep(15)
return question(cont)
return 'gpt当前繁忙请稍后重试' + e

146
ai_module/yolov8.py Normal file
View File

@ -0,0 +1,146 @@
from ultralytics import YOLO
from scipy.spatial import procrustes
import numpy as np
import cv2
import time
from scheduler.thread_manager import MyThread
__fei_eyes = None
class FeiEyes:
def __init__(self):
"""
鼻子0
左眼1右眼2
左耳3右耳4
左肩5右肩6
左肘7右肘8
左腕9右腕10
左髋11右髋12
左膝13右膝14
左脚踝15右脚踝16
"""
self.POSE_PAIRS = [
(3, 5), (5, 6), # upper body
(5, 7), (6, 8), (7, 9), (8, 10), # lower body
(11, 12), (11, 13), (12, 14), (13, 15) # arms
]
self.my_face = np.array([[154.4565, 193.7006],
[181.8575, 164.8366],
[117.1820, 164.3602],
[213.5605, 193.0460],
[ 62.7056, 193.5217]])
self.is_running = False
self.img = None
def is_sitting(self,keypoints):
left_hip, right_hip = keypoints[11][:2], keypoints[12][:2]
left_knee, right_knee = keypoints[13][:2], keypoints[14][:2]
left_ankle, right_ankle = keypoints[15][:2], keypoints[16][:2]
# 髋部和膝盖的平均位置
hip_knee_y = (left_hip[1] + right_hip[1] + left_knee[1] + right_knee[1]) / 4
# 膝盖和脚踝的平均位置
knee_ankle_y = (left_knee[1] + right_knee[1] + left_ankle[1] + right_ankle[1]) / 4
# 如果髋部和膝盖的平均位置在膝盖和脚踝的平均位置上方,判定为坐着
return hip_knee_y < knee_ankle_y
def is_standing(self,keypoints):
head = keypoints[0][:2]
left_ankle, right_ankle = keypoints[15][:2], keypoints[16][:2]
# 头部位置较高且脚部与地面接触
if head[1] > left_ankle[1] and head[1] > right_ankle[1]:
return True
else:
return False
def get_counts(self):
if not self.is_running:
return 0,0,0
return self.person_count, self.stand_count, self.sit_count
def get_status(self):
return self.is_running
def get_img(self):
if self.is_running:
return self.img
else:
return None
def start(self):
cap = cv2.VideoCapture(0)
if cap.isOpened():
self.is_running = True
MyThread(target=self.run, args=[cap]).start()
def stop(self):
self.is_running = False
def run(self, cap):
model = YOLO("yolov8n-pose.pt")
while self.is_running:
time.sleep(0.033)
ret, frame = cap.read()
self.img = frame
operated_frame = frame.copy()
if not ret:
break
results = model.predict(operated_frame, verbose=False)
person_count = 0
sit_count = 0
stand_count = 0
for res in results: # loop over results
for box, cls in zip(res.boxes.xyxy, res.boxes.cls): # loop over detections
x1, y1, x2, y2 = box
cv2.rectangle(operated_frame, (int(x1.item()), int(y1.item())), (int(x2.item()), int(y2.item())), (0, 255, 0), 2)
cv2.putText(operated_frame, f"{res.names[int(cls.item())]}", (int(x1.item()), int(y1.item()) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
if res.keypoints is not None and res.keypoints.size(0) > 0: # check if keypoints exist
keypoints = res.keypoints[0]
#TODO人脸相似性的比较待优化
keypoints_np = keypoints[0:5].cpu().numpy()
mtx1, mtx2, disparity = procrustes(keypoints_np[:, :2], self.my_face)
#总人数
person_count += 1
#坐着的人数
if self.is_sitting(keypoints):
sit_count += 1
#站着的人数
elif self.is_standing(keypoints):
stand_count += 1
for keypoint in keypoints: # loop over keypoints
x, y, conf = keypoint
if conf > 0.5: # draw keypoints with confidence greater than 0.5
cv2.circle(operated_frame, (int(x.item()), int(y.item())), 3, (0, 0, 255), -1)
# Draw lines connecting keypoints
for pair in self.POSE_PAIRS:
pt1, pt2 = keypoints[pair[0]][:2], keypoints[pair[1]][:2]
conf1, conf2 = keypoints[pair[0]][2], keypoints[pair[1]][2]
if conf1 > 0.5 and conf2 > 0.5:
# cv2.line(operated_frame, (int(pt1[0].item()), int(pt1[1].item())), (int(pt2[0].item()), int(pt2[1].item())), (255, 255, 0), 2)
pass
self.person_count = person_count
self.sit_count = sit_count
self.stand_count = stand_count
cv2.imshow("YOLO v8 Fay Eyes", operated_frame)
cv2.waitKey(1)
cap.release()
cv2.destroyAllWindows()
def new_instance():
global __fei_eyes
if __fei_eyes is None:
__fei_eyes = FeiEyes()
return __fei_eyes

View File

@ -29,11 +29,13 @@ from core.content_db import Content_Db
from datetime import datetime
from ai_module import nlp_rasa
from ai_module import nlp_gpt
from ai_module import yolov8
from ai_module import nlp_VisualGLM as VisualGLM
#文本消息处理
def send_for_answer(msg,sendto):
contentdb = Content_Db()
contentdb.add_content('member','send',msg)
contentdb.add_content('member','send', msg)
text = ''
textlist = []
try:
@ -53,7 +55,8 @@ def send_for_answer(msg,sendto):
elif cfg.key_chat_module == 'rasa':
textlist = nlp_rasa.question(msg)
text = textlist[0]['text']
elif cfg.key_chat_module == "VisualGLM":
text = VisualGLM.question(msg)
else:
raise RuntimeError('讯飞key、yuan key、chatgpt key都没有配置')
@ -289,12 +292,22 @@ class FeiFei:
# self.__isExecute = True #!!!!
if index == 1:
fay_eyes = yolov8.new_instance()
if fay_eyes.get_status():#YOLO正在运行
person_count, stand_count, sit_count = fay_eyes.get_counts()
if person_count != 1: #不是有且只有一个人,不互动
wsa_server.get_web_instance().add_cmd({"panelMsg": "不是有且只有一个人,不互动"})
continue
answer = self.__get_answer(interact.interleaver, self.q_msg)
if(self.muting): #静音指令正在执行
wsa_server.get_web_instance().add_cmd({"panelMsg": "静音指令正在执行,不互动"})
continue
contentdb = Content_Db()
contentdb.add_content('member','speak',self.q_msg)
wsa_server.get_web_instance().add_cmd({"panelReply": {"type":"member","content":self.q_msg}})
answer = self.__get_answer(interact.interleaver, self.q_msg)
if self.muting:
continue
text = ''
textlist = []
if answer is None:
@ -312,6 +325,9 @@ class FeiFei:
elif cfg.key_chat_module == 'rasa':
textlist = nlp_rasa.question(self.q_msg)
text = textlist[0]['text']
elif cfg.key_chat_module == "VisualGLM":
text = VisualGLM.question(self.q_msg)
else:
raise RuntimeError('讯飞key、yuan key、chatgpt key都没有配置')
util.log(1, '自然语言处理完成. 耗时: {} ms'.format(math.floor((time.time() - tm) * 1000)))
@ -593,11 +609,10 @@ class FeiFei:
wsa_server.get_web_instance().add_cmd({"panelMsg": self.a_msg})
time.sleep(audio_length + 0.5)
wsa_server.get_web_instance().add_cmd({"panelMsg": ""})
if config_util.config["interact"]["playSound"]:
util.log(1, '结束播放!')
time.sleep(audio_length + 0.5)
self.speaking = False
except Exception as e:
print(e)

View File

@ -213,9 +213,9 @@ def stop():
if recorderListener is not None:
util.log(1, '正在关闭录音服务...')
recorderListener.stop()
if deviceInputListener is not None:
util.log(1, '正在关闭远程音频输入输出服务...')
deviceInputListener.stop()
# if deviceInputListener is not None:
# util.log(1, '正在关闭远程音频输入输出服务...')
# deviceInputListener.stop()
util.log(1, '正在关闭核心服务...')
feiFei.stop()
util.log(1, '服务已关闭!')
@ -244,22 +244,15 @@ def start():
liveRoom = config_util.config['source']['liveRoom']
record = config_util.config['source']['record']
if liveRoom['enabled']:
util.log(1, '开启直播服务...')
viewerListener = ViewerListener() # 监听直播间
viewerListener.start()
if record['enabled']:
util.log(1, '开启录音服务...')
recorderListener = RecorderListener(record['device'], feiFei) # 监听麦克风
recorderListener.start()
#edit by xszyou on 20230113:通过此服务来连接k210、手机等音频输入设备
util.log(1,'开启远程设备音频输入服务...')
deviceInputListener = DeviceInputListener(feiFei) # 设备音频输入输出麦克风
deviceInputListener.start()
# util.log(1,'开启远程设备音频输入服务...')
# deviceInputListener = DeviceInputListener(feiFei) # 设备音频输入输出麦克风
# deviceInputListener.start()
util.log(1, '注册命令...')
MyThread(target=console_listener).start() # 监听控制台

View File

@ -1,3 +1,4 @@
import imp
import json
import time
@ -10,10 +11,12 @@ import fay_booter
from core.tts_voice import EnumVoice
from gevent import pywsgi
from scheduler.thread_manager import MyThread
from utils import config_util
from utils import config_util, util
from core import wsa_server
from core import fay_core
from core.content_db import Content_Db
from ai_module import yolov8
__app = Flask(__name__)
CORS(__app, supports_credentials=True)
@ -40,6 +43,19 @@ def api_submit():
# print(data)
config_data = json.loads(data)
config_util.save_config(config_data['config'])
return '{"result":"successful"}'
@__app.route('/api/control-eyes', methods=['post'])
def control_eyes():
eyes = yolov8.new_instance()
if(not eyes.get_status()):
eyes.start()
util.log(1, "YOLO v8正在启动...")
else:
eyes.stop()
util.log(1, "YOLO v8正在关闭...")
return '{"result":"successful"}'

View File

@ -20,6 +20,7 @@ new Vue({
fileList: {},
panel_msg: "",
play_sound_enabled: false,
visualization_detection_enabled: false,
source_liveRoom_enabled: false,
source_liveRoom_url: '',
source_record_enabled: false,
@ -233,6 +234,7 @@ new Vue({
let perception = interact["perception"]
let items = config["items"]
_this.play_sound_enabled = interact["playSound"]
_this.visualization_detection_enabled = interact["visualization"]
_this.source_liveRoom_enabled = source["liveRoom"]["enabled"]
_this.source_liveRoom_url = source["liveRoom"]["url"]
_this.source_record_enabled = source["record"]["enabled"]
@ -315,6 +317,7 @@ new Vue({
},
"interact": {
"playSound": this.play_sound_enabled,
"visualization": this.visualization_detection_enabled,
"QnA": this.interact_QnA,
"maxInteractTime": this.interact_maxInteractTime,
"perception": {
@ -378,6 +381,19 @@ new Vue({
xhr.open("post", url)
xhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded")
xhr.send()
},
postControlEyes() {
let url = "http://127.0.0.1:5000/api/control-eyes";
let xhr = new XMLHttpRequest()
xhr.open("post", url)
xhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded")
xhr.send()
if(this.visualization_detection_enabled){
this.visualization_detection_enabled = false
}else{
this.visualization_detection_enabled = true
}
},
isEmptyItem(data) {
let isEmpty = true
@ -481,7 +497,7 @@ new Vue({
let xhr = new XMLHttpRequest()
xhr.open("post", url)
xhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded")
xhr.send('data=' + JSON.stringify(send_data))
xhr.send('data=' + encodeURIComponent(JSON.stringify(send_data)))
let executed = false
xhr.onreadystatechange = async function () {
if (!executed && xhr.status === 200) {

View File

@ -36,6 +36,7 @@
padding: 20px;
overflow-y: scroll;
flex: 1;
white-space: pre-wrap;
}
.content:hover::-webkit-scrollbar-thumb {
@ -252,6 +253,10 @@
inactive-color="#ff4949">
</el-switch>
</li>
<li>
<el-button type="delete" class="btn_open"
@click=postControlEyes()>Fay Eyes</el-button>
</li>
</ul>
</div>
</div>
@ -333,10 +338,10 @@
<div class="input-area">
<textarea v-model="send_msg" name="text" id="textarea" placeholder="发送些内容给Fay..."></textarea>
<div class="button-area">
<button id="send-btn" @click="send(1)">发 送</button>
<!--删除注释打开彩蛋
<button id="send-btn" @click="send(2)" style="margin-left: 25px;">发送gpt</button>
-->
<button id="send-btn" @click="send(1)">Fay</button>
<button id="send-btn" @click="send(2)" style="margin-left: 25px;">ChatGPT</button>
</div>
</div>
</div>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

After

Width:  |  Height:  |  Size: 2.2 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 106 KiB

After

Width:  |  Height:  |  Size: 893 KiB

View File

@ -43,8 +43,8 @@ if __name__ == '__main__':
ws_server.start_server()
web_ws_server = wsa_server.new_web_instance(port=10003)
web_ws_server.start_server()
#Edit by xszyou in 20230516:增加本地asr
if config_util.ASR_mode == "ali" and config_util.config['source']['record']:
#Edit by xszyou in 20230516:增加本地asraliyun调成可选配置
if config_util.ASR_mode == "ali" and config_util.config['source']['record']['enabled']:
ali_nls.start()
flask_server.start()
app = QApplication(sys.argv)

View File

@ -19,4 +19,5 @@ pytz
gevent~=22.10.1
edge_tts~=6.1.3
eyed3
#revChatGPT 删除注释打开彩蛋
revChatGPT
ultralytics

View File

@ -18,10 +18,10 @@ ms_tts_key=
ms_tts_region=
# 讯飞 情绪分析 服务密钥 https://www.xfyun.cn/service/emotion-analysis/
xf_ltp_app_id=604404c8
xf_ltp_api_key=78d2db9a83cda0b355c76ea791bc74da
xf_ltp_app_id=
xf_ltp_api_key=
#NLP选一:xfaiui、yuan、chatgpt、rasa(需启动chatglm及rasahttps://m.bilibili.com/video/BV1D14y1f7pr)
#NLP选一:xfaiui、yuan、chatgpt、rasa(需启动chatglm及rasahttps://m.bilibili.com/video/BV1D14y1f7pr)、VisualGLM
chat_module=xfaiui
# 讯飞 自然语言处理 服务密钥(NLP3选1) https://aiui.xfyun.cn/solution/webapi/

View File

@ -1,5 +1,6 @@
import codecs
import os
import sys
import random
import time
@ -37,3 +38,12 @@ def printInfo(level, sender, text, send_time=-1):
def log(level, text):
printInfo(level, "系统", text)
class DisablePrint:
def __enter__(self):
self._original_stdout = sys.stdout
sys.stdout = open(os.devnull, 'w')
def __exit__(self, exc_type, exc_val, exc_tb):
sys.stdout.close()
sys.stdout = self._original_stdout

BIN
yolov8n-pose.pt Normal file

Binary file not shown.