SWIFT supports the training, inference, evaluation and deployment of nearly 200 LLMs and MLLMs (multimodal large models). Developers can directly apply the SWIFT framework to their own research and production environment, and realize the complete link from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by [PEFT](https://github.com/huggingface/peft), SWIFT also provides a complete library of adapters to support the latest training technologies, such as NEFTune, LoRA+, LLaMA-PRO, etc., which can be used directly in your own custom workflows without training scripts. At the same time, SWIFT is also expanding the capabilities of other modalities, and currently supports AnimateDiff full-parameter training and LoRA training.
Now our project uses this project to customize the [dataset](https://github.com/SmartFlowAI/EmoLLM/blob/main/datasets) and convert it to a suitable json format (see the SWIFT code section), and fine-tune it in SWIFT (the project has now finished fine-tuning Qwen-7b-chat).
SWIFT has a rich documentation system, if you have any questions about using it, please check [here](https://github.com/modelscope/swift/tree/main/docs/source/LLM).
You can find it in the [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope]( https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) to experience SWIFT web-ui functionality.
## 🎉 News
- 🔥2024.04.26: Complete the SWIFT fine-tuning of the qwen-7b-chat model and upload it to [ModelScope](https://www.modelscope.cn/models/monbear/qwen-7b-chat-lora/summary).
- 🔥2024.04.27: Complete the quantization of the qwen-7b-chat fine-tuning model and upload it to [ModelScope](https://www.modelscope.cn/models/monbear/qwen1half-7b-chat-lora/summary).
- 🔥2024.04.29: obtain[AI 赋能大学计划“全国高校行”](https://mp.weixin.qq.com/s/yyaulQ1wBzKq5cXaGl2Wag) First prize
## 🛠️ Installation and use of the SWIFT finetune framework
### <u>Environment preparation</u>
GPU devices: A10, 3090, V100, A100 are acceptable.
The project Swift fine-tuning uses free computing resources based on Intel CPUs provided by the Magic Community, and uses the GPU environment (8 cores, 32GB video memory, 24G);
SWIFT runs in a Python environment. Please make sure your Python version is higher than 3.8.
Here we install the experimental environment, which includes the creation of the virtual environment, ms-swift and the installation of related dependencies.
```bash
# Setting a PIP Global Image (Accelerated Download)
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# Install ms-swift
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
# If you want to use DeepSpeed.
pip install deepspeed -U
# If you want to use auto_gptq-based Qlora training. (Recommended, better than BNB)
# Models supported by Otto_Goptek: 'Hetps://Github.Com/Moderskope/Swift/Blob/Main/Dox/Seuss/Lem/Supported Models and Datasets. Mike#Model'
# There is a correspondence between Otto_Gheptek and the bold version, please select the version according to 'Hetps://Github.Com/Judgment/Ottogt #Quike-Instarasin'
pip install auto_gptq -U
# If you want to use BNB-based Qlora training.
pip install bitsandbytes -U
# Environment alignment (usually does not need to be run. If you run the error, you can run the following code, the repository is tested with the latest environment)
# Use your own dataset (we use our own conversation dataset aiwei.jsonl here)
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset chatml.jsonl \
--output_dir output \
# Use DDP
# Experimental environment: 2 * 3090
# 2 * 23GB GPU memory
CUDA_VISIBLE_DEVICES=0,1 \
NPROC_PER_NODE=2 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset AI-ModelScope/blossom-math-v2 \
--output_dir output \
# Multi-machine multi-card
# node0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NNODES=2 \
NODE_RANK=0 \
MASTER_ADDR=127.0.0.1 \
NPROC_PER_NODE=4 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset AI-ModelScope/blossom-math-v2 \
--output_dir output \
# node1
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NNODES=2 \
NODE_RANK=1 \
MASTER_ADDR=xxx.xxx.xxx.xxx \
NPROC_PER_NODE=4 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset AI-ModelScope/blossom-math-v2 \
--output_dir output \
```
In order to lower the threshold for use, Swift has also thoughtfully added [Interface Training Inference](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md "界面训练推理")method。There is also the use of [sh script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen1half_7b_chat_awq/lora "sh脚本")。There is also the use of [sh script]. You can check out the [official documentation](https://github.com/modelscope/swift/blob/main/docs/source "官方文档") of swift on Github.
## 📃 Quantify large models
SWIFT supports the quantification of models using AWQ, GPTQ, BNB, HQQ, and EETQ technologies. Among them, AWQ and GPTQ quantization support VLLM for inference acceleration, which requires the use of calibration datasets, which has better quantization performance but slower quantization speed. On the other hand, BNB, HQQ, and EETQ do not need calibration data, and the quantization speed is faster. All five quantification methods support Qlora fine-tuning.
AWQ and GPTQ need to be quantified using 'swift export'. BNB, HQQ, and EETQ can be quickly quantified directly at SFT and INFER time.
From the perspective of VLLM inference acceleration support, AWQ and GPTQ are more recommended for quantization. From the perspective of quantifying the effect, it is more recommended to use AWQ, HQQ and GPTQ for quantification. From the perspective of quantization speed, it is more recommended to use HQQ for quantization.
Here we recommend using the AWQ quantification technique for QLORA fine-tuning.
### Environment preparation
GPU devices: A10, 3090, V100, A100 are acceptable.
```bash
# Quantify using awq:
# There is a correspondence between the AutoAWQ and CUDA versions, please select the version according to 'https://github.com/casper-hansen/AutoAWQ'
pip install autoawq -U
# Quantification using GPTQ:
# There is a correspondence between Otto_Gheptek and the bold version, please select the version according to 'Hetps://Github.Com/Judgment/Ottogt #Quike-Instarasin'
pip install auto_gptq -U
# Quantifying with BNB:
pip install bitsandbytes -U
# Quantification using HQQ:
# Transformers version > 4.40 is required, installed from source
# Environment alignment (usually does not need to be run. If you run the error, you can run the following code, the repository is tested with the latest environment)
pip install -r requirements/framework.txt -U
pip install -r requirements/llm.txt -U
```
## <u>Quantify the fine-tuned model</u>
```bash
# 'alpaca-zh alpaca-en sharegpt-gpt4-mini' was used as the quantitative data set
- If you want to push your debugged model to your own Magic Community, you can use the following command. After that, you can find your model on the `I Created` section of the Moda Community homepage. If you want to publish it, remember to write the `README.md`