feat:Add new finetune configurations and datasets
This commit is contained in:
parent
7331ed04ef
commit
1a6b8eac20
52
README.md
52
README.md
@ -2,17 +2,15 @@
|
|||||||
|
|
||||||
<!-- PROJECT SHIELDS -->
|
<!-- PROJECT SHIELDS -->
|
||||||
|
|
||||||
[![Contributors][contributors-shield]][contributors-url]
|
[Contributors][contributors-url]
|
||||||
[![Forks][forks-shield]][forks-url]
|
[Forks][forks-url]
|
||||||
[![Issues][issues-shield]][issues-url]
|
[Issues][issues-url]
|
||||||
[![MIT License][license-shield]][license-url]
|
[MIT License][license-url]
|
||||||
[![Stargazers][stars-shield]][stars-url]
|
[Stargazers][stars-url]
|
||||||
|
|
||||||
|
|
||||||
<br />
|
<br />
|
||||||
<!-- PROJECT LOGO -->
|
<!-- PROJECT LOGO -->
|
||||||
|
|
||||||
|
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<a href="https://github.com/aJupyter/EmoLLM/">
|
<a href="https://github.com/aJupyter/EmoLLM/">
|
||||||
<img src="assets/logo.jpeg" alt="Logo" width="30%">
|
<img src="assets/logo.jpeg" alt="Logo" width="30%">
|
||||||
@ -35,7 +33,21 @@
|
|||||||
|
|
||||||
<!-- 本篇README.md面向开发者 -->
|
<!-- 本篇README.md面向开发者 -->
|
||||||
|
|
||||||
**EmoLLM** 是一个能够支持 **理解用户-支持用户-帮助用户** 心理健康辅导链路的心理健康大模型,由 [InternLM2](https://github.com/InternLM/InternLM) 指令微调而来,欢迎大家star~⭐⭐
|
**EmoLLM** 是一个能够支持 **理解用户-支持用户-帮助用户** 心理健康辅导链路的心理健康大模型,由 `LLM`指令微调而来,欢迎大家star~⭐⭐。目前已经开源的 `LLM`微调配置如下:
|
||||||
|
|
||||||
|
| 模型 | 类型 |
|
||||||
|
| :-------------------: | :------: |
|
||||||
|
| InternLM2_7B_chat | qlora |
|
||||||
|
| InternLM2_1_8B_chat | 全量微调 |
|
||||||
|
| Qwen_7b_chat | qlora |
|
||||||
|
| Qwen1_5-0_5B-Chat | 全量微调 |
|
||||||
|
| Baichuan2_13B_chat | qlora |
|
||||||
|
| ChatGLM3_6B | lora |
|
||||||
|
| DeepSeek MoE_16B_chat | qlora |
|
||||||
|
| Mixtral 8x7B_instruct | qlora |
|
||||||
|
| …… | …… |
|
||||||
|
|
||||||
|
欢迎大家为本项目做出贡献~
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -52,20 +64,23 @@
|
|||||||
|
|
||||||
### 最近更新
|
### 最近更新
|
||||||
|
|
||||||
|
- 【2024.2.23】更新[若干微调配置](/xtuner_config/)(目前微调的模型请见)新增 [data_pro.json](/datasets/data_pro.json)(数量更多、场景更全、更丰富)和 [aiwei.json](/datasets/aiwei.json)(温柔御姐角色扮演专用,带有Emoji表情),即将推出 `温柔御姐心理医生艾薇`
|
||||||
- 【2024.2.18】 [基于Qwen1_5-0_5B-Chat全量微调版本开源](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary),算力有限的道友可以玩起来~
|
- 【2024.2.18】 [基于Qwen1_5-0_5B-Chat全量微调版本开源](https://www.modelscope.cn/models/aJupyter/EmoLLM_Qwen1_5-0_5B-Chat_full_sft/summary),算力有限的道友可以玩起来~
|
||||||
- 【2024.2.6】 EmoLLM在[**Openxlab** ](https://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model) 平台下载量高达18.7k,欢迎大家体验!
|
- 【2024.2.6】 EmoLLM在[**Openxlab** ](https://openxlab.org.cn/models/detail/jujimeizuo/EmoLLM_Model) 平台下载量高达18.7k,欢迎大家体验!
|
||||||
|
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/7e931682-c54d-4ded-bc67-79130c68d744" alt="模型下载量">
|
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/7e931682-c54d-4ded-bc67-79130c68d744" alt="模型下载量">
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
- 【2024.2.5】 项目荣获公众号**NLP工程化**推文宣传[推文链接](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A),为博主推广一波,欢迎大家关注!!🥳🥳
|
<details>
|
||||||
|
<summary>查看更多</summary>
|
||||||
|
|
||||||
|
- 【2024.2.5】 项目荣获公众号**NLP工程化**推文宣传[推文链接](https://mp.weixin.qq.com/s/78lrRl2tlXEKUfElnkVx4A),为博主推广一波,欢迎大家关注!!🥳🥳
|
||||||
|
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/47868d6a-2e91-4aa9-a630-e594c14295b4" alt="公众号二维码">
|
<img src="https://github.com/aJupyter/EmoLLM/assets/62385492/47868d6a-2e91-4aa9-a630-e594c14295b4" alt="公众号二维码">
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<details>
|
|
||||||
<summary>查看更多</summary>
|
|
||||||
|
|
||||||
- 【2024.2.3】 [项目宣传视频](https://www.bilibili.com/video/BV1N7421N76X/)完成 😊
|
- 【2024.2.3】 [项目宣传视频](https://www.bilibili.com/video/BV1N7421N76X/)完成 😊
|
||||||
- 【2024.1.27】 完善数据构建文档、微调指南、部署指南、Readme等相关文档 👏
|
- 【2024.1.27】 完善数据构建文档、微调指南、部署指南、Readme等相关文档 👏
|
||||||
- 【2024.1.25】 完成EmoLLM第一版并部署上线 https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀
|
- 【2024.1.25】 完成EmoLLM第一版并部署上线 https://openxlab.org.cn/apps/detail/jujimeizuo/EmoLLM 😀
|
||||||
@ -75,23 +90,26 @@
|
|||||||
## 目录
|
## 目录
|
||||||
|
|
||||||
- [EmoLLM-心理健康大模型](#emollm-心理健康大模型)
|
- [EmoLLM-心理健康大模型](#emollm-心理健康大模型)
|
||||||
- [开发前的配置要求](#开发前的配置要求)
|
- [最近更新](#最近更新)
|
||||||
- [**使用指南**](#使用指南)
|
- [目录](#目录)
|
||||||
|
- [开发前的配置要求](#开发前的配置要求)
|
||||||
|
- [**使用指南**](#使用指南)
|
||||||
- [文件目录说明](#文件目录说明)
|
- [文件目录说明](#文件目录说明)
|
||||||
- [数据构建](#数据构建)
|
- [数据构建](#数据构建)
|
||||||
- [微调指南](#微调指南)
|
- [微调指南](#微调指南)
|
||||||
- [部署指南](#部署指南)
|
- [部署指南](#部署指南)
|
||||||
- [使用到的框架](#使用到的框架)
|
- [使用到的框架](#使用到的框架)
|
||||||
- [如何参与本项目](#如何参与本项目)
|
- [如何参与本项目](#如何参与本项目)
|
||||||
- [版本控制](#版本控制)
|
- [版本控制](#版本控制)
|
||||||
- [作者(排名不分先后)](#作者排名不分先后)
|
- [作者(排名不分先后)](#作者排名不分先后)
|
||||||
- [版权说明](#版权说明)
|
- [版权说明](#版权说明)
|
||||||
- [特别鸣谢](#特别鸣谢)
|
- [特别鸣谢](#特别鸣谢)
|
||||||
|
- [Star History](#star-history)
|
||||||
- [🌟 Contributors](#-contributors)
|
- [🌟 Contributors](#-contributors)
|
||||||
|
|
||||||
###### 开发前的配置要求
|
###### 开发前的配置要求
|
||||||
|
|
||||||
- 硬件:A100 40G
|
- 硬件:A100 40G(仅针对InternLM2_7B_chat+qlora微调+deepspeed zero2优化)
|
||||||
|
|
||||||
###### **使用指南**
|
###### **使用指南**
|
||||||
|
|
||||||
@ -145,8 +163,6 @@ git clone https://github.com/aJupyter/EmoLLM.git
|
|||||||
- [Pytorch](https://pytorch.org/)
|
- [Pytorch](https://pytorch.org/)
|
||||||
- …
|
- …
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#### 如何参与本项目
|
#### 如何参与本项目
|
||||||
|
|
||||||
贡献使开源社区成为一个学习、激励和创造的绝佳场所。你所作的任何贡献都是**非常感谢**的。
|
贡献使开源社区成为一个学习、激励和创造的绝佳场所。你所作的任何贡献都是**非常感谢**的。
|
||||||
|
24870
datasets/aiwei.json
Normal file
24870
datasets/aiwei.json
Normal file
File diff suppressed because it is too large
Load Diff
138359
datasets/data_pro.json
Normal file
138359
datasets/data_pro.json
Normal file
File diff suppressed because one or more lines are too long
@ -3,39 +3,38 @@ import os
|
|||||||
|
|
||||||
|
|
||||||
def save_merge_json(data_lis, file_path):
|
def save_merge_json(data_lis, file_path):
|
||||||
|
import json
|
||||||
|
|
||||||
with open(file_path, 'wt', encoding='utf-8') as file:
|
with open(file_path, 'wt', encoding='utf-8') as file:
|
||||||
json.dump(data_lis, file, indent=4, ensure_ascii=False)
|
json.dump(data_lis, file, ensure_ascii=False)
|
||||||
|
|
||||||
|
|
||||||
def get_all_file_paths(folder_path, suffix=''):
|
def get_all_file_paths(folder_path):
|
||||||
print(folder_path)
|
# 确保传入的是一个目录
|
||||||
files = os.listdir(folder_path)
|
if not os.path.isdir(folder_path):
|
||||||
path = []
|
raise ValueError(f"{folder_path} is not a valid directory")
|
||||||
for file in files:
|
|
||||||
file_path = os.path.join(folder_path, file)
|
# 获取文件夹下所有文件的路径
|
||||||
if os.path.isdir(file_path):
|
file_paths = [os.path.join(folder_path, file) for file in os.listdir(
|
||||||
path.extend(get_all_file_paths(file_path))
|
folder_path) if os.path.isfile(os.path.join(folder_path, file))]
|
||||||
else:
|
return file_paths
|
||||||
if file_path.endswith(suffix):
|
|
||||||
path.append(file_path)
|
|
||||||
return path
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
conversion_lis = []
|
conversion_lis = []
|
||||||
folder_path = './' # input
|
|
||||||
merge_path = 'merge.json' # input
|
|
||||||
paths = get_all_file_paths(folder_path=folder_path, suffix='.json')
|
|
||||||
|
|
||||||
for path in paths:
|
for path in get_all_file_paths(r'data\res-aiwei'):
|
||||||
print(path)
|
print(path)
|
||||||
with open(path, 'rt', encoding='utf-8') as lines:
|
|
||||||
datas = []
|
with open(path, 'rt', encoding='utf-8') as file:
|
||||||
for line in lines:
|
for line in file:
|
||||||
datas.append(line)
|
# 移除行尾的换行符
|
||||||
try:
|
line = line.rstrip('\n')
|
||||||
datas = json.loads(''.join(datas))
|
# 解析JSON
|
||||||
conversion_lis.extend(datas)
|
try:
|
||||||
except json.JSONDecodeError as e:
|
data = json.loads(line)
|
||||||
print(f"Error decoding JSON: {e}")
|
conversion_lis.append(data)
|
||||||
save_merge_json(data_lis=conversion_lis, file_path=merge_path)
|
except json.JSONDecodeError as e:
|
||||||
|
print(f"Error decoding JSON: {e}")
|
||||||
|
save_merge_json(data_lis=conversion_lis,
|
||||||
|
file_path=r'.\merge.json')
|
||||||
|
218
xtuner_config/baichuan2_13b_chat_qlora_alpaca_e3.py
Normal file
218
xtuner_config/baichuan2_13b_chat_qlora_alpaca_e3.py
Normal file
@ -0,0 +1,218 @@
|
|||||||
|
# Copyright (c) OpenMMLab. All rights reserved.
|
||||||
|
import torch
|
||||||
|
from datasets import load_dataset
|
||||||
|
from mmengine.dataset import DefaultSampler
|
||||||
|
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
|
||||||
|
LoggerHook, ParamSchedulerHook)
|
||||||
|
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
|
||||||
|
from peft import LoraConfig
|
||||||
|
from torch.optim import AdamW
|
||||||
|
from transformers import (AutoModelForCausalLM, AutoTokenizer,
|
||||||
|
BitsAndBytesConfig)
|
||||||
|
|
||||||
|
from xtuner.dataset import process_hf_dataset
|
||||||
|
from xtuner.dataset.collate_fns import default_collate_fn
|
||||||
|
from xtuner.dataset.map_fns import alpaca_map_fn, template_map_fn_factory
|
||||||
|
from xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook,
|
||||||
|
VarlenAttnArgsToMessageHubHook)
|
||||||
|
from xtuner.engine.runner import TrainLoop
|
||||||
|
from xtuner.model import SupervisedFinetune
|
||||||
|
from xtuner.utils import PROMPT_TEMPLATE, SYSTEM_TEMPLATE
|
||||||
|
|
||||||
|
|
||||||
|
from mmengine.visualization import Visualizer,WandbVisBackend, TensorboardVisBackend
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 1 Settings #
|
||||||
|
#######################################################################
|
||||||
|
# Model
|
||||||
|
pretrained_model_name_or_path = '/root/model/baichuan-inc/Baichuan2-13B-Chat'
|
||||||
|
use_varlen_attn = False
|
||||||
|
|
||||||
|
# Data
|
||||||
|
data_path = './merge.json'
|
||||||
|
prompt_template = PROMPT_TEMPLATE.baichuan2_chat
|
||||||
|
max_length = 2048
|
||||||
|
pack_to_max_length = True
|
||||||
|
|
||||||
|
# Scheduler & Optimizer
|
||||||
|
batch_size = 16 # per_device
|
||||||
|
accumulative_counts = 4
|
||||||
|
dataloader_num_workers = 0
|
||||||
|
max_epochs = 3
|
||||||
|
optim_type = AdamW
|
||||||
|
lr = 2e-4
|
||||||
|
betas = (0.9, 0.999)
|
||||||
|
weight_decay = 0
|
||||||
|
max_norm = 1 # grad clip
|
||||||
|
warmup_ratio = 0.03
|
||||||
|
|
||||||
|
# Save
|
||||||
|
save_steps = 100
|
||||||
|
save_total_limit = 2 # Maximum checkpoints to keep (-1 means unlimited)
|
||||||
|
|
||||||
|
# Evaluate the generation performance during the training
|
||||||
|
evaluation_freq = 100
|
||||||
|
SYSTEM = "现在你是一个心理专家,我有一些心理问题,请你用专业的知识帮我解决。"
|
||||||
|
evaluation_inputs = [
|
||||||
|
'我压力很大', '生活没意思', "非常容易羡慕别人啊"
|
||||||
|
]
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 2 Model & Tokenizer #
|
||||||
|
#######################################################################
|
||||||
|
tokenizer = dict(
|
||||||
|
type=AutoTokenizer.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True,
|
||||||
|
padding_side='right')
|
||||||
|
|
||||||
|
model = dict(
|
||||||
|
type=SupervisedFinetune,
|
||||||
|
use_varlen_attn=use_varlen_attn,
|
||||||
|
llm=dict(
|
||||||
|
type=AutoModelForCausalLM.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True,
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
quantization_config=dict(
|
||||||
|
type=BitsAndBytesConfig,
|
||||||
|
load_in_4bit=True,
|
||||||
|
load_in_8bit=False,
|
||||||
|
llm_int8_threshold=6.0,
|
||||||
|
llm_int8_has_fp16_weight=False,
|
||||||
|
bnb_4bit_compute_dtype=torch.float16,
|
||||||
|
bnb_4bit_use_double_quant=True,
|
||||||
|
bnb_4bit_quant_type='nf4')),
|
||||||
|
lora=dict(
|
||||||
|
type=LoraConfig,
|
||||||
|
r=64,
|
||||||
|
lora_alpha=16,
|
||||||
|
lora_dropout=0.1,
|
||||||
|
bias='none',
|
||||||
|
task_type='CAUSAL_LM'))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 3 Dataset & Dataloader #
|
||||||
|
#######################################################################
|
||||||
|
alpaca_en = dict(
|
||||||
|
type=process_hf_dataset,
|
||||||
|
dataset=dict(type=load_dataset, path='json', data_files=dict(train=data_path)),
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
max_length=max_length,
|
||||||
|
dataset_map_fn=None,
|
||||||
|
template_map_fn=dict(
|
||||||
|
type=template_map_fn_factory, template=prompt_template),
|
||||||
|
remove_unused_columns=True,
|
||||||
|
shuffle_before_pack=True,
|
||||||
|
pack_to_max_length=pack_to_max_length,
|
||||||
|
use_varlen_attn=use_varlen_attn)
|
||||||
|
|
||||||
|
train_dataloader = dict(
|
||||||
|
batch_size=batch_size,
|
||||||
|
num_workers=dataloader_num_workers,
|
||||||
|
dataset=alpaca_en,
|
||||||
|
sampler=dict(type=DefaultSampler, shuffle=True),
|
||||||
|
collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 4 Scheduler & Optimizer #
|
||||||
|
#######################################################################
|
||||||
|
# optimizer
|
||||||
|
optim_wrapper = dict(
|
||||||
|
type=AmpOptimWrapper,
|
||||||
|
optimizer=dict(
|
||||||
|
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
|
||||||
|
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
|
||||||
|
accumulative_counts=accumulative_counts,
|
||||||
|
loss_scale='dynamic',
|
||||||
|
dtype='float16')
|
||||||
|
|
||||||
|
# learning policy
|
||||||
|
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501
|
||||||
|
param_scheduler = [
|
||||||
|
dict(
|
||||||
|
type=LinearLR,
|
||||||
|
start_factor=1e-5,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=0,
|
||||||
|
end=warmup_ratio * max_epochs,
|
||||||
|
convert_to_iter_based=True),
|
||||||
|
dict(
|
||||||
|
type=CosineAnnealingLR,
|
||||||
|
eta_min=0.0,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=warmup_ratio * max_epochs,
|
||||||
|
end=max_epochs,
|
||||||
|
convert_to_iter_based=True)
|
||||||
|
]
|
||||||
|
|
||||||
|
# train, val, test setting
|
||||||
|
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 5 Runtime #
|
||||||
|
#######################################################################
|
||||||
|
# Log the dialogue periodically during the training process, optional
|
||||||
|
custom_hooks = [
|
||||||
|
dict(type=DatasetInfoHook, tokenizer=tokenizer),
|
||||||
|
dict(
|
||||||
|
type=EvaluateChatHook,
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
every_n_iters=evaluation_freq,
|
||||||
|
evaluation_inputs=evaluation_inputs,
|
||||||
|
system=SYSTEM,
|
||||||
|
prompt_template=prompt_template)
|
||||||
|
]
|
||||||
|
|
||||||
|
if use_varlen_attn:
|
||||||
|
custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)]
|
||||||
|
|
||||||
|
# configure default hooks
|
||||||
|
default_hooks = dict(
|
||||||
|
# record the time of every iteration.
|
||||||
|
timer=dict(type=IterTimerHook),
|
||||||
|
# print log every 10 iterations.
|
||||||
|
logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
|
||||||
|
# enable the parameter scheduler.
|
||||||
|
param_scheduler=dict(type=ParamSchedulerHook),
|
||||||
|
# save checkpoint per `save_steps`.
|
||||||
|
checkpoint=dict(
|
||||||
|
type=CheckpointHook,
|
||||||
|
by_epoch=False,
|
||||||
|
interval=save_steps,
|
||||||
|
max_keep_ckpts=save_total_limit),
|
||||||
|
# set sampler seed in distributed evrionment.
|
||||||
|
sampler_seed=dict(type=DistSamplerSeedHook),
|
||||||
|
)
|
||||||
|
|
||||||
|
# configure environment
|
||||||
|
env_cfg = dict(
|
||||||
|
# whether to enable cudnn benchmark
|
||||||
|
cudnn_benchmark=False,
|
||||||
|
# set multi process parameters
|
||||||
|
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
|
||||||
|
# set distributed parameters
|
||||||
|
dist_cfg=dict(backend='nccl'),
|
||||||
|
)
|
||||||
|
|
||||||
|
# set visualizer
|
||||||
|
visualizer = dict(
|
||||||
|
type=Visualizer,
|
||||||
|
vis_backends=[dict(type=WandbVisBackend)]
|
||||||
|
)
|
||||||
|
|
||||||
|
# set log level
|
||||||
|
log_level = 'INFO'
|
||||||
|
|
||||||
|
# load from which checkpoint
|
||||||
|
load_from = None
|
||||||
|
|
||||||
|
# whether to resume training from the loaded checkpoint
|
||||||
|
resume = False
|
||||||
|
|
||||||
|
# Defaults to use random seed and disable `deterministic`
|
||||||
|
randomness = dict(seed=None, deterministic=False)
|
||||||
|
|
||||||
|
# set log processor
|
||||||
|
log_processor = dict(by_epoch=False)
|
205
xtuner_config/chatglm3_6b_lora_alpaca_e3.py
Normal file
205
xtuner_config/chatglm3_6b_lora_alpaca_e3.py
Normal file
@ -0,0 +1,205 @@
|
|||||||
|
# Copyright (c) OpenMMLab. All rights reserved.
|
||||||
|
import torch
|
||||||
|
from datasets import load_dataset
|
||||||
|
from mmengine.dataset import DefaultSampler
|
||||||
|
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
|
||||||
|
LoggerHook, ParamSchedulerHook)
|
||||||
|
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
|
||||||
|
from peft import LoraConfig
|
||||||
|
from torch.optim import AdamW
|
||||||
|
from transformers import (AutoModelForCausalLM, AutoTokenizer,
|
||||||
|
BitsAndBytesConfig)
|
||||||
|
|
||||||
|
from xtuner.dataset import process_hf_dataset
|
||||||
|
from xtuner.dataset.collate_fns import default_collate_fn
|
||||||
|
from xtuner.dataset.map_fns import alpaca_map_fn, template_map_fn_factory
|
||||||
|
from xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook,
|
||||||
|
VarlenAttnArgsToMessageHubHook)
|
||||||
|
from xtuner.engine.runner import TrainLoop
|
||||||
|
from xtuner.model import SupervisedFinetune
|
||||||
|
from xtuner.utils import PROMPT_TEMPLATE, SYSTEM_TEMPLATE
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 1 Settings #
|
||||||
|
#######################################################################
|
||||||
|
# Model
|
||||||
|
pretrained_model_name_or_path = '/root/model/ZhipuAI/chatglm3-6b'
|
||||||
|
use_varlen_attn = False
|
||||||
|
|
||||||
|
# Data
|
||||||
|
data_path = './merge.json'
|
||||||
|
prompt_template = PROMPT_TEMPLATE.chatglm3
|
||||||
|
max_length = 2048
|
||||||
|
pack_to_max_length = True
|
||||||
|
|
||||||
|
# Scheduler & Optimizer
|
||||||
|
batch_size = 20 # per_device
|
||||||
|
accumulative_counts = 4
|
||||||
|
dataloader_num_workers = 0
|
||||||
|
max_epochs = 3
|
||||||
|
optim_type = AdamW
|
||||||
|
lr = 2e-4
|
||||||
|
betas = (0.9, 0.999)
|
||||||
|
weight_decay = 0
|
||||||
|
max_norm = 1 # grad clip
|
||||||
|
warmup_ratio = 0.03
|
||||||
|
|
||||||
|
# Save
|
||||||
|
save_steps = 100
|
||||||
|
save_total_limit = 2 # Maximum checkpoints to keep (-1 means unlimited)
|
||||||
|
|
||||||
|
# Evaluate the generation performance during the training
|
||||||
|
evaluation_freq = 100
|
||||||
|
SYSTEM = "现在你是一个心理专家,我有一些心理问题,请你用专业的知识帮我解决。"
|
||||||
|
evaluation_inputs = [
|
||||||
|
'我压力很大', '生活没意思', "非常容易羡慕别人啊"
|
||||||
|
]
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 2 Model & Tokenizer #
|
||||||
|
#######################################################################
|
||||||
|
tokenizer = dict(
|
||||||
|
type=AutoTokenizer.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True,
|
||||||
|
encode_special_tokens=True,
|
||||||
|
padding_side='left')
|
||||||
|
|
||||||
|
model = dict(
|
||||||
|
type=SupervisedFinetune,
|
||||||
|
use_varlen_attn=use_varlen_attn,
|
||||||
|
llm=dict(
|
||||||
|
type=AutoModelForCausalLM.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True,
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
),
|
||||||
|
lora=dict(
|
||||||
|
type=LoraConfig,
|
||||||
|
r=64,
|
||||||
|
lora_alpha=16,
|
||||||
|
lora_dropout=0.1,
|
||||||
|
bias='none',
|
||||||
|
task_type='CAUSAL_LM'))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 3 Dataset & Dataloader #
|
||||||
|
#######################################################################
|
||||||
|
alpaca_en = dict(
|
||||||
|
type=process_hf_dataset,
|
||||||
|
dataset=dict(type=load_dataset, path='json', data_files=dict(train=data_path)),
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
max_length=max_length,
|
||||||
|
dataset_map_fn=None,
|
||||||
|
template_map_fn=dict(
|
||||||
|
type=template_map_fn_factory, template=prompt_template),
|
||||||
|
remove_unused_columns=True,
|
||||||
|
shuffle_before_pack=True,
|
||||||
|
pack_to_max_length=pack_to_max_length,
|
||||||
|
use_varlen_attn=use_varlen_attn)
|
||||||
|
|
||||||
|
train_dataloader = dict(
|
||||||
|
batch_size=batch_size,
|
||||||
|
num_workers=dataloader_num_workers,
|
||||||
|
dataset=alpaca_en,
|
||||||
|
sampler=dict(type=DefaultSampler, shuffle=True),
|
||||||
|
collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 4 Scheduler & Optimizer #
|
||||||
|
#######################################################################
|
||||||
|
# optimizer
|
||||||
|
optim_wrapper = dict(
|
||||||
|
type=AmpOptimWrapper,
|
||||||
|
optimizer=dict(
|
||||||
|
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
|
||||||
|
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
|
||||||
|
accumulative_counts=accumulative_counts,
|
||||||
|
loss_scale='dynamic',
|
||||||
|
dtype='float16')
|
||||||
|
|
||||||
|
# learning policy
|
||||||
|
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501
|
||||||
|
param_scheduler = [
|
||||||
|
dict(
|
||||||
|
type=LinearLR,
|
||||||
|
start_factor=1e-5,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=0,
|
||||||
|
end=warmup_ratio * max_epochs,
|
||||||
|
convert_to_iter_based=True),
|
||||||
|
dict(
|
||||||
|
type=CosineAnnealingLR,
|
||||||
|
eta_min=0.0,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=warmup_ratio * max_epochs,
|
||||||
|
end=max_epochs,
|
||||||
|
convert_to_iter_based=True)
|
||||||
|
]
|
||||||
|
|
||||||
|
# train, val, test setting
|
||||||
|
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 5 Runtime #
|
||||||
|
#######################################################################
|
||||||
|
# Log the dialogue periodically during the training process, optional
|
||||||
|
custom_hooks = [
|
||||||
|
dict(type=DatasetInfoHook, tokenizer=tokenizer),
|
||||||
|
dict(
|
||||||
|
type=EvaluateChatHook,
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
every_n_iters=evaluation_freq,
|
||||||
|
evaluation_inputs=evaluation_inputs,
|
||||||
|
system=SYSTEM,
|
||||||
|
prompt_template=prompt_template)
|
||||||
|
]
|
||||||
|
|
||||||
|
if use_varlen_attn:
|
||||||
|
custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)]
|
||||||
|
|
||||||
|
# configure default hooks
|
||||||
|
default_hooks = dict(
|
||||||
|
# record the time of every iteration.
|
||||||
|
timer=dict(type=IterTimerHook),
|
||||||
|
# print log every 10 iterations.
|
||||||
|
logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
|
||||||
|
# enable the parameter scheduler.
|
||||||
|
param_scheduler=dict(type=ParamSchedulerHook),
|
||||||
|
# save checkpoint per `save_steps`.
|
||||||
|
checkpoint=dict(
|
||||||
|
type=CheckpointHook,
|
||||||
|
by_epoch=False,
|
||||||
|
interval=save_steps,
|
||||||
|
max_keep_ckpts=save_total_limit),
|
||||||
|
# set sampler seed in distributed evrionment.
|
||||||
|
sampler_seed=dict(type=DistSamplerSeedHook),
|
||||||
|
)
|
||||||
|
|
||||||
|
# configure environment
|
||||||
|
env_cfg = dict(
|
||||||
|
# whether to enable cudnn benchmark
|
||||||
|
cudnn_benchmark=False,
|
||||||
|
# set multi process parameters
|
||||||
|
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
|
||||||
|
# set distributed parameters
|
||||||
|
dist_cfg=dict(backend='nccl'),
|
||||||
|
)
|
||||||
|
|
||||||
|
# set visualizer
|
||||||
|
visualizer = None
|
||||||
|
|
||||||
|
# set log level
|
||||||
|
log_level = 'INFO'
|
||||||
|
|
||||||
|
# load from which checkpoint
|
||||||
|
load_from = None
|
||||||
|
|
||||||
|
# whether to resume training from the loaded checkpoint
|
||||||
|
resume = False
|
||||||
|
|
||||||
|
# Defaults to use random seed and disable `deterministic`
|
||||||
|
randomness = dict(seed=None, deterministic=False)
|
||||||
|
|
||||||
|
# set log processor
|
||||||
|
log_processor = dict(by_epoch=False)
|
216
xtuner_config/deepseek_moe_16b_chat_qlora_oasst1_e3.py
Normal file
216
xtuner_config/deepseek_moe_16b_chat_qlora_oasst1_e3.py
Normal file
@ -0,0 +1,216 @@
|
|||||||
|
# Copyright (c) OpenMMLab. All rights reserved.
|
||||||
|
import torch
|
||||||
|
from datasets import load_dataset
|
||||||
|
from mmengine.dataset import DefaultSampler
|
||||||
|
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
|
||||||
|
LoggerHook, ParamSchedulerHook)
|
||||||
|
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
|
||||||
|
from peft import LoraConfig
|
||||||
|
from torch.optim import AdamW
|
||||||
|
from transformers import (AutoModelForCausalLM, AutoTokenizer,
|
||||||
|
BitsAndBytesConfig)
|
||||||
|
|
||||||
|
from xtuner.dataset import process_hf_dataset
|
||||||
|
from xtuner.dataset.collate_fns import default_collate_fn
|
||||||
|
from xtuner.dataset.map_fns import oasst1_map_fn, template_map_fn_factory
|
||||||
|
from xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook,
|
||||||
|
VarlenAttnArgsToMessageHubHook)
|
||||||
|
from xtuner.engine.runner import TrainLoop
|
||||||
|
from xtuner.model import SupervisedFinetune
|
||||||
|
from xtuner.utils import PROMPT_TEMPLATE
|
||||||
|
|
||||||
|
from mmengine.visualization import Visualizer,WandbVisBackend, TensorboardVisBackend
|
||||||
|
#######################################################################
|
||||||
|
# PART 1 Settings #
|
||||||
|
#######################################################################
|
||||||
|
# Model
|
||||||
|
pretrained_model_name_or_path = '/root/model/deepseek-ai/deepseek-moe-16b-chat'
|
||||||
|
use_varlen_attn = False
|
||||||
|
|
||||||
|
# Data
|
||||||
|
data_path = './merge.json'
|
||||||
|
prompt_template = PROMPT_TEMPLATE.deepseek_moe
|
||||||
|
max_length = 2048
|
||||||
|
pack_to_max_length = True
|
||||||
|
|
||||||
|
# Scheduler & Optimizer
|
||||||
|
batch_size = 16 # per_device
|
||||||
|
accumulative_counts = 8
|
||||||
|
dataloader_num_workers = 0
|
||||||
|
max_epochs = 3
|
||||||
|
optim_type = AdamW
|
||||||
|
lr = 2e-4
|
||||||
|
betas = (0.9, 0.999)
|
||||||
|
weight_decay = 0
|
||||||
|
max_norm = 1 # grad clip
|
||||||
|
warmup_ratio = 0.03
|
||||||
|
|
||||||
|
# Save
|
||||||
|
save_steps = 100
|
||||||
|
save_total_limit = 2 # Maximum checkpoints to keep (-1 means unlimited)
|
||||||
|
|
||||||
|
# Evaluate the generation performance during the training
|
||||||
|
evaluation_freq = 100
|
||||||
|
SYSTEM = "现在你是一个心理专家,我有一些心理问题,请你用专业的知识帮我解决。"
|
||||||
|
evaluation_inputs = [
|
||||||
|
'我压力很大', '生活没意思', "非常容易羡慕别人啊"
|
||||||
|
]
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 2 Model & Tokenizer #
|
||||||
|
#######################################################################
|
||||||
|
tokenizer = dict(
|
||||||
|
type=AutoTokenizer.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True,
|
||||||
|
padding_side='right')
|
||||||
|
|
||||||
|
model = dict(
|
||||||
|
type=SupervisedFinetune,
|
||||||
|
use_varlen_attn=use_varlen_attn,
|
||||||
|
llm=dict(
|
||||||
|
type=AutoModelForCausalLM.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True,
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
quantization_config=dict(
|
||||||
|
type=BitsAndBytesConfig,
|
||||||
|
load_in_4bit=True,
|
||||||
|
load_in_8bit=False,
|
||||||
|
llm_int8_threshold=6.0,
|
||||||
|
llm_int8_has_fp16_weight=False,
|
||||||
|
bnb_4bit_compute_dtype=torch.float16,
|
||||||
|
bnb_4bit_use_double_quant=True,
|
||||||
|
bnb_4bit_quant_type='nf4')),
|
||||||
|
lora=dict(
|
||||||
|
type=LoraConfig,
|
||||||
|
r=16,
|
||||||
|
lora_alpha=16,
|
||||||
|
lora_dropout=0.05,
|
||||||
|
bias='none',
|
||||||
|
task_type='CAUSAL_LM'))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 3 Dataset & Dataloader #
|
||||||
|
#######################################################################
|
||||||
|
train_dataset = dict(
|
||||||
|
type=process_hf_dataset,
|
||||||
|
dataset=dict(type=load_dataset, path='json', data_files=dict(train=data_path)),
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
max_length=max_length,
|
||||||
|
dataset_map_fn=None,
|
||||||
|
template_map_fn=dict(
|
||||||
|
type=template_map_fn_factory, template=prompt_template),
|
||||||
|
remove_unused_columns=True,
|
||||||
|
shuffle_before_pack=True,
|
||||||
|
pack_to_max_length=pack_to_max_length,
|
||||||
|
use_varlen_attn=use_varlen_attn)
|
||||||
|
|
||||||
|
train_dataloader = dict(
|
||||||
|
batch_size=batch_size,
|
||||||
|
num_workers=dataloader_num_workers,
|
||||||
|
dataset=train_dataset,
|
||||||
|
sampler=dict(type=DefaultSampler, shuffle=True),
|
||||||
|
collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 4 Scheduler & Optimizer #
|
||||||
|
#######################################################################
|
||||||
|
# optimizer
|
||||||
|
optim_wrapper = dict(
|
||||||
|
type=AmpOptimWrapper,
|
||||||
|
optimizer=dict(
|
||||||
|
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
|
||||||
|
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
|
||||||
|
accumulative_counts=accumulative_counts,
|
||||||
|
loss_scale='dynamic',
|
||||||
|
dtype='float16')
|
||||||
|
|
||||||
|
# learning policy
|
||||||
|
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501
|
||||||
|
param_scheduler = [
|
||||||
|
dict(
|
||||||
|
type=LinearLR,
|
||||||
|
start_factor=1e-5,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=0,
|
||||||
|
end=warmup_ratio * max_epochs,
|
||||||
|
convert_to_iter_based=True),
|
||||||
|
dict(
|
||||||
|
type=CosineAnnealingLR,
|
||||||
|
eta_min=0.0,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=warmup_ratio * max_epochs,
|
||||||
|
end=max_epochs,
|
||||||
|
convert_to_iter_based=True)
|
||||||
|
]
|
||||||
|
|
||||||
|
# train, val, test setting
|
||||||
|
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 5 Runtime #
|
||||||
|
#######################################################################
|
||||||
|
# Log the dialogue periodically during the training process, optional
|
||||||
|
custom_hooks = [
|
||||||
|
dict(type=DatasetInfoHook, tokenizer=tokenizer),
|
||||||
|
dict(
|
||||||
|
type=EvaluateChatHook,
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
every_n_iters=evaluation_freq,
|
||||||
|
evaluation_inputs=evaluation_inputs,
|
||||||
|
system=SYSTEM,
|
||||||
|
prompt_template=prompt_template)
|
||||||
|
]
|
||||||
|
|
||||||
|
if use_varlen_attn:
|
||||||
|
custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)]
|
||||||
|
|
||||||
|
# configure default hooks
|
||||||
|
default_hooks = dict(
|
||||||
|
# record the time of every iteration.
|
||||||
|
timer=dict(type=IterTimerHook),
|
||||||
|
# print log every 10 iterations.
|
||||||
|
logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
|
||||||
|
# enable the parameter scheduler.
|
||||||
|
param_scheduler=dict(type=ParamSchedulerHook),
|
||||||
|
# save checkpoint per `save_steps`.
|
||||||
|
checkpoint=dict(
|
||||||
|
type=CheckpointHook,
|
||||||
|
by_epoch=False,
|
||||||
|
interval=save_steps,
|
||||||
|
max_keep_ckpts=save_total_limit),
|
||||||
|
# set sampler seed in distributed evrionment.
|
||||||
|
sampler_seed=dict(type=DistSamplerSeedHook),
|
||||||
|
)
|
||||||
|
|
||||||
|
# configure environment
|
||||||
|
env_cfg = dict(
|
||||||
|
# whether to enable cudnn benchmark
|
||||||
|
cudnn_benchmark=False,
|
||||||
|
# set multi process parameters
|
||||||
|
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
|
||||||
|
# set distributed parameters
|
||||||
|
dist_cfg=dict(backend='nccl'),
|
||||||
|
)
|
||||||
|
|
||||||
|
# set visualizer
|
||||||
|
visualizer = dict(
|
||||||
|
type=Visualizer,
|
||||||
|
vis_backends=[dict(type=WandbVisBackend)]
|
||||||
|
)
|
||||||
|
|
||||||
|
# set log level
|
||||||
|
log_level = 'INFO'
|
||||||
|
|
||||||
|
# load from which checkpoint
|
||||||
|
load_from = None
|
||||||
|
|
||||||
|
# whether to resume training from the loaded checkpoint
|
||||||
|
resume = False
|
||||||
|
|
||||||
|
# Defaults to use random seed and disable `deterministic`
|
||||||
|
randomness = dict(seed=None, deterministic=False)
|
||||||
|
|
||||||
|
# set log processor
|
||||||
|
log_processor = dict(by_epoch=False)
|
198
xtuner_config/internlm2_1_8b_full_alpaca_e3.py
Normal file
198
xtuner_config/internlm2_1_8b_full_alpaca_e3.py
Normal file
@ -0,0 +1,198 @@
|
|||||||
|
# Copyright (c) OpenMMLab. All rights reserved.
|
||||||
|
from datasets import load_dataset
|
||||||
|
from mmengine.dataset import DefaultSampler
|
||||||
|
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
|
||||||
|
LoggerHook, ParamSchedulerHook)
|
||||||
|
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
|
||||||
|
from torch.optim import AdamW
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
|
||||||
|
from xtuner.dataset import process_hf_dataset
|
||||||
|
from xtuner.dataset.collate_fns import default_collate_fn
|
||||||
|
from xtuner.dataset.map_fns import alpaca_map_fn, template_map_fn_factory
|
||||||
|
from xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook,
|
||||||
|
VarlenAttnArgsToMessageHubHook)
|
||||||
|
from xtuner.engine.runner import TrainLoop
|
||||||
|
from xtuner.model import SupervisedFinetune
|
||||||
|
from xtuner.utils import PROMPT_TEMPLATE, SYSTEM_TEMPLATE
|
||||||
|
|
||||||
|
from mmengine.visualization import Visualizer,WandbVisBackend, TensorboardVisBackend
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 1 Settings #
|
||||||
|
#######################################################################
|
||||||
|
# Model
|
||||||
|
pretrained_model_name_or_path = '/root/model/jayhust/internlm2-chat-1_8b'
|
||||||
|
use_varlen_attn = False
|
||||||
|
|
||||||
|
# Data
|
||||||
|
data_path = './merge.json'
|
||||||
|
prompt_template = PROMPT_TEMPLATE.default
|
||||||
|
max_length = 2048
|
||||||
|
pack_to_max_length = True
|
||||||
|
|
||||||
|
# Scheduler & Optimizer
|
||||||
|
batch_size = 16 # per_device
|
||||||
|
accumulative_counts = 4
|
||||||
|
dataloader_num_workers = 0
|
||||||
|
max_epochs = 3
|
||||||
|
optim_type = AdamW
|
||||||
|
lr = 2e-5
|
||||||
|
betas = (0.9, 0.999)
|
||||||
|
weight_decay = 0
|
||||||
|
max_norm = 1 # grad clip
|
||||||
|
warmup_ratio = 0.03
|
||||||
|
|
||||||
|
# Save
|
||||||
|
save_steps = 100
|
||||||
|
save_total_limit = 2 # Maximum checkpoints to keep (-1 means unlimited)
|
||||||
|
|
||||||
|
# Evaluate the generation performance during the training
|
||||||
|
evaluation_freq = 100
|
||||||
|
SYSTEM = "现在你是一个心理专家,我有一些心理问题,请你用专业的知识帮我解决。"
|
||||||
|
evaluation_inputs = [
|
||||||
|
'我压力很大', '生活没意思', "非常容易羡慕别人啊"
|
||||||
|
]
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 2 Model & Tokenizer #
|
||||||
|
#######################################################################
|
||||||
|
tokenizer = dict(
|
||||||
|
type=AutoTokenizer.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True,
|
||||||
|
padding_side='right')
|
||||||
|
|
||||||
|
model = dict(
|
||||||
|
type=SupervisedFinetune,
|
||||||
|
use_varlen_attn=use_varlen_attn,
|
||||||
|
llm=dict(
|
||||||
|
type=AutoModelForCausalLM.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 3 Dataset & Dataloader #
|
||||||
|
#######################################################################
|
||||||
|
alpaca_en = dict(
|
||||||
|
type=process_hf_dataset,
|
||||||
|
dataset=dict(type=load_dataset, path='json', data_files=dict(train=data_path)),
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
max_length=max_length,
|
||||||
|
dataset_map_fn=None,
|
||||||
|
template_map_fn=dict(
|
||||||
|
type=template_map_fn_factory, template=prompt_template),
|
||||||
|
remove_unused_columns=True,
|
||||||
|
shuffle_before_pack=True,
|
||||||
|
pack_to_max_length=pack_to_max_length,
|
||||||
|
use_varlen_attn=use_varlen_attn)
|
||||||
|
|
||||||
|
train_dataloader = dict(
|
||||||
|
batch_size=batch_size,
|
||||||
|
num_workers=dataloader_num_workers,
|
||||||
|
dataset=alpaca_en,
|
||||||
|
sampler=dict(type=DefaultSampler, shuffle=True),
|
||||||
|
collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 4 Scheduler & Optimizer #
|
||||||
|
#######################################################################
|
||||||
|
# optimizer
|
||||||
|
optim_wrapper = dict(
|
||||||
|
type=AmpOptimWrapper,
|
||||||
|
optimizer=dict(
|
||||||
|
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
|
||||||
|
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
|
||||||
|
accumulative_counts=accumulative_counts,
|
||||||
|
loss_scale='dynamic',
|
||||||
|
dtype='float16')
|
||||||
|
|
||||||
|
# learning policy
|
||||||
|
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501
|
||||||
|
param_scheduler = [
|
||||||
|
dict(
|
||||||
|
type=LinearLR,
|
||||||
|
start_factor=1e-5,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=0,
|
||||||
|
end=warmup_ratio * max_epochs,
|
||||||
|
convert_to_iter_based=True),
|
||||||
|
dict(
|
||||||
|
type=CosineAnnealingLR,
|
||||||
|
eta_min=0.0,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=warmup_ratio * max_epochs,
|
||||||
|
end=max_epochs,
|
||||||
|
convert_to_iter_based=True)
|
||||||
|
]
|
||||||
|
|
||||||
|
# train, val, test setting
|
||||||
|
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 5 Runtime #
|
||||||
|
#######################################################################
|
||||||
|
# Log the dialogue periodically during the training process, optional
|
||||||
|
custom_hooks = [
|
||||||
|
dict(type=DatasetInfoHook, tokenizer=tokenizer),
|
||||||
|
dict(
|
||||||
|
type=EvaluateChatHook,
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
every_n_iters=evaluation_freq,
|
||||||
|
evaluation_inputs=evaluation_inputs,
|
||||||
|
system=SYSTEM,
|
||||||
|
prompt_template=prompt_template)
|
||||||
|
]
|
||||||
|
|
||||||
|
if use_varlen_attn:
|
||||||
|
custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)]
|
||||||
|
|
||||||
|
# configure default hooks
|
||||||
|
default_hooks = dict(
|
||||||
|
# record the time of every iteration.
|
||||||
|
timer=dict(type=IterTimerHook),
|
||||||
|
# print log every 10 iterations.
|
||||||
|
logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
|
||||||
|
# enable the parameter scheduler.
|
||||||
|
param_scheduler=dict(type=ParamSchedulerHook),
|
||||||
|
# save checkpoint per `save_steps`.
|
||||||
|
checkpoint=dict(
|
||||||
|
type=CheckpointHook,
|
||||||
|
by_epoch=False,
|
||||||
|
interval=save_steps,
|
||||||
|
max_keep_ckpts=save_total_limit),
|
||||||
|
# set sampler seed in distributed evrionment.
|
||||||
|
sampler_seed=dict(type=DistSamplerSeedHook),
|
||||||
|
)
|
||||||
|
|
||||||
|
# configure environment
|
||||||
|
env_cfg = dict(
|
||||||
|
# whether to enable cudnn benchmark
|
||||||
|
cudnn_benchmark=False,
|
||||||
|
# set multi process parameters
|
||||||
|
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
|
||||||
|
# set distributed parameters
|
||||||
|
dist_cfg=dict(backend='nccl'),
|
||||||
|
)
|
||||||
|
|
||||||
|
# set visualizer
|
||||||
|
visualizer = dict(
|
||||||
|
type=Visualizer,
|
||||||
|
vis_backends=[dict(type=WandbVisBackend)]
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# set log level
|
||||||
|
log_level = 'INFO'
|
||||||
|
|
||||||
|
# load from which checkpoint
|
||||||
|
load_from = None
|
||||||
|
|
||||||
|
# whether to resume training from the loaded checkpoint
|
||||||
|
resume = False
|
||||||
|
|
||||||
|
# Defaults to use random seed and disable `deterministic`
|
||||||
|
randomness = dict(seed=None, deterministic=False)
|
||||||
|
|
||||||
|
# set log processor
|
||||||
|
log_processor = dict(by_epoch=False)
|
221
xtuner_config/mixtral_8x7b_instruct_qlora_oasst1_e3.py
Normal file
221
xtuner_config/mixtral_8x7b_instruct_qlora_oasst1_e3.py
Normal file
@ -0,0 +1,221 @@
|
|||||||
|
# Copyright (c) OpenMMLab. All rights reserved.
|
||||||
|
import torch
|
||||||
|
from datasets import load_dataset
|
||||||
|
from mmengine.dataset import DefaultSampler
|
||||||
|
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
|
||||||
|
LoggerHook, ParamSchedulerHook)
|
||||||
|
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
|
||||||
|
from peft import LoraConfig
|
||||||
|
from torch.optim import AdamW
|
||||||
|
from transformers import (AutoModelForCausalLM, AutoTokenizer,
|
||||||
|
BitsAndBytesConfig)
|
||||||
|
|
||||||
|
from xtuner.dataset import process_hf_dataset
|
||||||
|
from xtuner.dataset.collate_fns import default_collate_fn
|
||||||
|
from xtuner.dataset.map_fns import oasst1_map_fn, template_map_fn_factory
|
||||||
|
from xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook,
|
||||||
|
VarlenAttnArgsToMessageHubHook)
|
||||||
|
from xtuner.engine.runner import TrainLoop
|
||||||
|
from xtuner.model import SupervisedFinetune
|
||||||
|
from xtuner.utils import PROMPT_TEMPLATE
|
||||||
|
|
||||||
|
from mmengine.visualization import Visualizer,WandbVisBackend, TensorboardVisBackend
|
||||||
|
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 1 Settings #
|
||||||
|
#######################################################################
|
||||||
|
# Model
|
||||||
|
pretrained_model_name_or_path = '/root/model/HIT-SCIR/Chinese-Mixtral-8x7B'
|
||||||
|
use_varlen_attn = False
|
||||||
|
|
||||||
|
# Data
|
||||||
|
data_path = './merge.json'
|
||||||
|
prompt_template = PROMPT_TEMPLATE.mixtral
|
||||||
|
max_length = 2048
|
||||||
|
pack_to_max_length = True
|
||||||
|
|
||||||
|
# Scheduler & Optimizer
|
||||||
|
batch_size = 16 # per_device
|
||||||
|
accumulative_counts = 4
|
||||||
|
dataloader_num_workers = 0
|
||||||
|
max_epochs = 3
|
||||||
|
optim_type = AdamW
|
||||||
|
lr = 2e-4
|
||||||
|
betas = (0.9, 0.999)
|
||||||
|
weight_decay = 0
|
||||||
|
max_norm = 1 # grad clip
|
||||||
|
warmup_ratio = 0.03
|
||||||
|
|
||||||
|
# Save
|
||||||
|
save_steps = 500
|
||||||
|
save_total_limit = 2 # Maximum checkpoints to keep (-1 means unlimited)
|
||||||
|
|
||||||
|
# Evaluate the generation performance during the training
|
||||||
|
evaluation_freq = 500
|
||||||
|
SYSTEM = "现在你是一个心理专家,我有一些心理问题,请你用专业的知识帮我解决。"
|
||||||
|
evaluation_inputs = [
|
||||||
|
'我压力很大', '生活没意思', "非常容易羡慕别人啊"
|
||||||
|
]
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 2 Model & Tokenizer #
|
||||||
|
#######################################################################
|
||||||
|
tokenizer = dict(
|
||||||
|
type=AutoTokenizer.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True,
|
||||||
|
padding_side='right')
|
||||||
|
|
||||||
|
model = dict(
|
||||||
|
type=SupervisedFinetune,
|
||||||
|
use_varlen_attn=use_varlen_attn,
|
||||||
|
llm=dict(
|
||||||
|
type=AutoModelForCausalLM.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True,
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
quantization_config=dict(
|
||||||
|
type=BitsAndBytesConfig,
|
||||||
|
load_in_4bit=True,
|
||||||
|
load_in_8bit=False,
|
||||||
|
llm_int8_threshold=6.0,
|
||||||
|
llm_int8_has_fp16_weight=False,
|
||||||
|
bnb_4bit_compute_dtype=torch.float16,
|
||||||
|
bnb_4bit_use_double_quant=True,
|
||||||
|
bnb_4bit_quant_type='nf4')),
|
||||||
|
lora=dict(
|
||||||
|
type=LoraConfig,
|
||||||
|
r=64,
|
||||||
|
lora_alpha=16,
|
||||||
|
lora_dropout=0.1,
|
||||||
|
target_modules=[
|
||||||
|
'q_proj', 'k_proj', 'v_proj', 'o_proj', 'w1', 'w2', 'w3'
|
||||||
|
],
|
||||||
|
bias='none',
|
||||||
|
task_type='CAUSAL_LM'))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 3 Dataset & Dataloader #
|
||||||
|
#######################################################################
|
||||||
|
train_dataset = dict(
|
||||||
|
type=process_hf_dataset,
|
||||||
|
dataset=dict(type=load_dataset, path='json', data_files=dict(train=data_path)),
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
max_length=max_length,
|
||||||
|
dataset_map_fn=None,
|
||||||
|
template_map_fn=dict(
|
||||||
|
type=template_map_fn_factory, template=prompt_template),
|
||||||
|
remove_unused_columns=True,
|
||||||
|
shuffle_before_pack=True,
|
||||||
|
pack_to_max_length=pack_to_max_length,
|
||||||
|
use_varlen_attn=use_varlen_attn)
|
||||||
|
|
||||||
|
train_dataloader = dict(
|
||||||
|
batch_size=batch_size,
|
||||||
|
num_workers=dataloader_num_workers,
|
||||||
|
dataset=train_dataset,
|
||||||
|
sampler=dict(type=DefaultSampler, shuffle=True),
|
||||||
|
collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 4 Scheduler & Optimizer #
|
||||||
|
#######################################################################
|
||||||
|
# optimizer
|
||||||
|
optim_wrapper = dict(
|
||||||
|
type=AmpOptimWrapper,
|
||||||
|
optimizer=dict(
|
||||||
|
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
|
||||||
|
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
|
||||||
|
accumulative_counts=accumulative_counts,
|
||||||
|
loss_scale='dynamic',
|
||||||
|
dtype='float16')
|
||||||
|
|
||||||
|
# learning policy
|
||||||
|
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501
|
||||||
|
param_scheduler = [
|
||||||
|
dict(
|
||||||
|
type=LinearLR,
|
||||||
|
start_factor=1e-5,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=0,
|
||||||
|
end=warmup_ratio * max_epochs,
|
||||||
|
convert_to_iter_based=True),
|
||||||
|
dict(
|
||||||
|
type=CosineAnnealingLR,
|
||||||
|
eta_min=0.0,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=warmup_ratio * max_epochs,
|
||||||
|
end=max_epochs,
|
||||||
|
convert_to_iter_based=True)
|
||||||
|
]
|
||||||
|
|
||||||
|
# train, val, test setting
|
||||||
|
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 5 Runtime #
|
||||||
|
#######################################################################
|
||||||
|
# Log the dialogue periodically during the training process, optional
|
||||||
|
custom_hooks = [
|
||||||
|
dict(type=DatasetInfoHook, tokenizer=tokenizer),
|
||||||
|
dict(
|
||||||
|
type=EvaluateChatHook,
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
every_n_iters=evaluation_freq,
|
||||||
|
evaluation_inputs=evaluation_inputs,
|
||||||
|
system=SYSTEM,
|
||||||
|
prompt_template=prompt_template)
|
||||||
|
]
|
||||||
|
|
||||||
|
if use_varlen_attn:
|
||||||
|
custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)]
|
||||||
|
|
||||||
|
# configure default hooks
|
||||||
|
default_hooks = dict(
|
||||||
|
# record the time of every iteration.
|
||||||
|
timer=dict(type=IterTimerHook),
|
||||||
|
# print log every 10 iterations.
|
||||||
|
logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
|
||||||
|
# enable the parameter scheduler.
|
||||||
|
param_scheduler=dict(type=ParamSchedulerHook),
|
||||||
|
# save checkpoint per `save_steps`.
|
||||||
|
checkpoint=dict(
|
||||||
|
type=CheckpointHook,
|
||||||
|
by_epoch=False,
|
||||||
|
interval=save_steps,
|
||||||
|
max_keep_ckpts=save_total_limit),
|
||||||
|
# set sampler seed in distributed evrionment.
|
||||||
|
sampler_seed=dict(type=DistSamplerSeedHook),
|
||||||
|
)
|
||||||
|
|
||||||
|
# configure environment
|
||||||
|
env_cfg = dict(
|
||||||
|
# whether to enable cudnn benchmark
|
||||||
|
cudnn_benchmark=False,
|
||||||
|
# set multi process parameters
|
||||||
|
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
|
||||||
|
# set distributed parameters
|
||||||
|
dist_cfg=dict(backend='nccl'),
|
||||||
|
)
|
||||||
|
|
||||||
|
# set visualizer
|
||||||
|
# visualizer = None
|
||||||
|
visualizer = dict(
|
||||||
|
type=Visualizer,
|
||||||
|
vis_backends=[dict(type=TensorboardVisBackend)]
|
||||||
|
)
|
||||||
|
# set log level
|
||||||
|
log_level = 'INFO'
|
||||||
|
|
||||||
|
# load from which checkpoint
|
||||||
|
load_from = None
|
||||||
|
|
||||||
|
# whether to resume training from the loaded checkpoint
|
||||||
|
resume = False
|
||||||
|
|
||||||
|
# Defaults to use random seed and disable `deterministic`
|
||||||
|
randomness = dict(seed=None, deterministic=False)
|
||||||
|
|
||||||
|
# set log processor
|
||||||
|
log_processor = dict(by_epoch=False)
|
192
xtuner_config/qwen1_5_0_5_B_full.py
Normal file
192
xtuner_config/qwen1_5_0_5_B_full.py
Normal file
@ -0,0 +1,192 @@
|
|||||||
|
# Copyright (c) OpenMMLab. All rights reserved.
|
||||||
|
from datasets import load_dataset
|
||||||
|
from mmengine.dataset import DefaultSampler
|
||||||
|
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
|
||||||
|
LoggerHook, ParamSchedulerHook)
|
||||||
|
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
|
||||||
|
from torch.optim import AdamW
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
|
||||||
|
from xtuner.dataset import process_hf_dataset
|
||||||
|
from xtuner.dataset.collate_fns import default_collate_fn
|
||||||
|
from xtuner.dataset.map_fns import alpaca_map_fn, template_map_fn_factory
|
||||||
|
from xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook,
|
||||||
|
VarlenAttnArgsToMessageHubHook)
|
||||||
|
from xtuner.engine.runner import TrainLoop
|
||||||
|
from xtuner.model import SupervisedFinetune
|
||||||
|
from xtuner.utils import PROMPT_TEMPLATE, SYSTEM_TEMPLATE
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 1 Settings #
|
||||||
|
#######################################################################
|
||||||
|
# Model
|
||||||
|
pretrained_model_name_or_path = '/root/model/qwen/Qwen1___5-0___5B-Chat'
|
||||||
|
use_varlen_attn = False
|
||||||
|
|
||||||
|
# Data
|
||||||
|
data_path = './data_pro.json'
|
||||||
|
prompt_template = PROMPT_TEMPLATE.qwen_chat
|
||||||
|
max_length = 2048
|
||||||
|
pack_to_max_length = True
|
||||||
|
|
||||||
|
# Scheduler & Optimizer
|
||||||
|
batch_size = 16 # per_device
|
||||||
|
accumulative_counts = 4
|
||||||
|
dataloader_num_workers = 0
|
||||||
|
max_epochs = 3
|
||||||
|
optim_type = AdamW
|
||||||
|
lr = 2e-5
|
||||||
|
betas = (0.9, 0.999)
|
||||||
|
weight_decay = 0
|
||||||
|
max_norm = 1 # grad clip
|
||||||
|
warmup_ratio = 0.03
|
||||||
|
|
||||||
|
# Save
|
||||||
|
save_steps = 100
|
||||||
|
save_total_limit = 2 # Maximum checkpoints to keep (-1 means unlimited)
|
||||||
|
|
||||||
|
# Evaluate the generation performance during the training
|
||||||
|
evaluation_freq = 100
|
||||||
|
SYSTEM = "现在你是一个心理专家,我有一些心理问题,请你用专业的知识帮我解决。"
|
||||||
|
evaluation_inputs = [
|
||||||
|
'我压力很大', '生活没意思', "非常容易羡慕别人啊"
|
||||||
|
]
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 2 Model & Tokenizer #
|
||||||
|
#######################################################################
|
||||||
|
tokenizer = dict(
|
||||||
|
type=AutoTokenizer.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True,
|
||||||
|
padding_side='right')
|
||||||
|
|
||||||
|
model = dict(
|
||||||
|
type=SupervisedFinetune,
|
||||||
|
use_varlen_attn=use_varlen_attn,
|
||||||
|
llm=dict(
|
||||||
|
type=AutoModelForCausalLM.from_pretrained,
|
||||||
|
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||||
|
trust_remote_code=True))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 3 Dataset & Dataloader #
|
||||||
|
#######################################################################
|
||||||
|
alpaca_en = dict(
|
||||||
|
type=process_hf_dataset,
|
||||||
|
dataset=dict(type=load_dataset, path='json', data_files=dict(train=data_path)),
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
max_length=max_length,
|
||||||
|
dataset_map_fn=None,
|
||||||
|
template_map_fn=dict(
|
||||||
|
type=template_map_fn_factory, template=prompt_template),
|
||||||
|
remove_unused_columns=True,
|
||||||
|
shuffle_before_pack=True,
|
||||||
|
pack_to_max_length=pack_to_max_length,
|
||||||
|
use_varlen_attn=use_varlen_attn)
|
||||||
|
|
||||||
|
train_dataloader = dict(
|
||||||
|
batch_size=batch_size,
|
||||||
|
num_workers=dataloader_num_workers,
|
||||||
|
dataset=alpaca_en,
|
||||||
|
sampler=dict(type=DefaultSampler, shuffle=True),
|
||||||
|
collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn))
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 4 Scheduler & Optimizer #
|
||||||
|
#######################################################################
|
||||||
|
# optimizer
|
||||||
|
optim_wrapper = dict(
|
||||||
|
type=AmpOptimWrapper,
|
||||||
|
optimizer=dict(
|
||||||
|
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
|
||||||
|
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
|
||||||
|
accumulative_counts=accumulative_counts,
|
||||||
|
loss_scale='dynamic',
|
||||||
|
dtype='float16')
|
||||||
|
|
||||||
|
# learning policy
|
||||||
|
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501
|
||||||
|
param_scheduler = [
|
||||||
|
dict(
|
||||||
|
type=LinearLR,
|
||||||
|
start_factor=1e-5,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=0,
|
||||||
|
end=warmup_ratio * max_epochs,
|
||||||
|
convert_to_iter_based=True),
|
||||||
|
dict(
|
||||||
|
type=CosineAnnealingLR,
|
||||||
|
eta_min=0.0,
|
||||||
|
by_epoch=True,
|
||||||
|
begin=warmup_ratio * max_epochs,
|
||||||
|
end=max_epochs,
|
||||||
|
convert_to_iter_based=True)
|
||||||
|
]
|
||||||
|
|
||||||
|
# train, val, test setting
|
||||||
|
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
|
||||||
|
|
||||||
|
#######################################################################
|
||||||
|
# PART 5 Runtime #
|
||||||
|
#######################################################################
|
||||||
|
# Log the dialogue periodically during the training process, optional
|
||||||
|
custom_hooks = [
|
||||||
|
dict(type=DatasetInfoHook, tokenizer=tokenizer),
|
||||||
|
dict(
|
||||||
|
type=EvaluateChatHook,
|
||||||
|
tokenizer=tokenizer,
|
||||||
|
every_n_iters=evaluation_freq,
|
||||||
|
evaluation_inputs=evaluation_inputs,
|
||||||
|
system=SYSTEM,
|
||||||
|
prompt_template=prompt_template)
|
||||||
|
]
|
||||||
|
|
||||||
|
if use_varlen_attn:
|
||||||
|
custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)]
|
||||||
|
|
||||||
|
# configure default hooks
|
||||||
|
default_hooks = dict(
|
||||||
|
# record the time of every iteration.
|
||||||
|
timer=dict(type=IterTimerHook),
|
||||||
|
# print log every 10 iterations.
|
||||||
|
logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
|
||||||
|
# enable the parameter scheduler.
|
||||||
|
param_scheduler=dict(type=ParamSchedulerHook),
|
||||||
|
# save checkpoint per `save_steps`.
|
||||||
|
checkpoint=dict(
|
||||||
|
type=CheckpointHook,
|
||||||
|
by_epoch=False,
|
||||||
|
interval=save_steps,
|
||||||
|
max_keep_ckpts=save_total_limit),
|
||||||
|
# set sampler seed in distributed evrionment.
|
||||||
|
sampler_seed=dict(type=DistSamplerSeedHook),
|
||||||
|
)
|
||||||
|
|
||||||
|
# configure environment
|
||||||
|
env_cfg = dict(
|
||||||
|
# whether to enable cudnn benchmark
|
||||||
|
cudnn_benchmark=False,
|
||||||
|
# set multi process parameters
|
||||||
|
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
|
||||||
|
# set distributed parameters
|
||||||
|
dist_cfg=dict(backend='nccl'),
|
||||||
|
)
|
||||||
|
|
||||||
|
# set visualizer
|
||||||
|
visualizer = None
|
||||||
|
|
||||||
|
# set log level
|
||||||
|
log_level = 'INFO'
|
||||||
|
|
||||||
|
# load from which checkpoint
|
||||||
|
load_from = '/root/Emollm/work_dirs/qwen_0_5_B/iter_255.pth'
|
||||||
|
|
||||||
|
# whether to resume training from the loaded checkpoint
|
||||||
|
resume = False
|
||||||
|
|
||||||
|
# Defaults to use random seed and disable `deterministic`
|
||||||
|
randomness = dict(seed=None, deterministic=False)
|
||||||
|
|
||||||
|
# set log processor
|
||||||
|
log_processor = dict(by_epoch=False)
|
Loading…
Reference in New Issue
Block a user