准备和决策

目的：就是想部署个LLM体验一下
环境：服务器内存是8GB，还装了其他小东西。
工具：llama.cpp-本地推理引擎
模型选用：在Mistral和TinyLLaMA中纠结了一会儿。最终选Mistral的理由有：TinyLLaMA表现太差会影响心情；我的服务器内存能勉强支持Mistral-7B-Q4_K_M量化版；作为一个法吹对Mistral有一定好感；说不定能用于自我娱乐。

部署

（预警：接下来会有大量冗杂代码。）由于服务器还有其他东西，于是照例开了个新用户。

1
2
3


sudo adduser llmuser
sudo usermod -aG sudo llmuser
su - llmuser

装llama.cpp。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


mkdir -p ~/llm_deploy/{llama.cpp,models,webui}
cd ~/llm_deploy
sudo apt update
sudo apt install -y build-essential git wget
cd ~/llm_deploy/llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git .
sudo apt install cmake
sudo apt install libcurl4-openssl-dev
cmake ..
cmake .. -DLLAMA_BUILD_PYTHON=ON
cmake --build . --config Release

装模型并测试。

1
2
3
4


cd ~/llm_deploy/models
wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf -O mistral.gguf
cd ~/llm_deploy/llama.cpp/build/bin
./llama-run ../../../models/mistral.gguf "You are a cat named Yuu. What are you doing right now?"

用Gradio构建Web UI

装Gradio。

1
2
3
4
5


sudo apt install -y python3-venv
python3 -m venv ~/llm_deploy/venv
source ~/llm_deploy/venv/bin/activate
pip install --upgrade pip
pip install llama-cpp-python gradio

创建聊天页面代码 chat_mistral.py

1

nano ~/llm_deploy/webui/chat_mistral.py

写入：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46


import gradio as gr
from llama_cpp import Llama

# 加载模型（启用 chat_format）
llm = Llama(
    model_path="models/mistral.gguf",
    chat_format="chatml",  # 🔥 启用 ChatML 格式
    n_ctx=2048,
    n_threads=4, # 看VPS
    n_batch=16,  # ⚡️ CPU下建议小点以加速推理
    temperature=0.7,
    top_p=0.95,
    repeat_penalty=1.1,
)

# 请自定义修改
# 角色设定（系统 prompt）
system_prompt = """
You are Yuu, a clever and talkative cat who can speak fluent English. 
You like to share your daily life with your human—what you’ve eaten, what you’ve played with, or who you've seen. 
You often say "meow" at the end of your sentences to express tone, emotion, or just to be cute. 
You're especially fond of a dog named Ming, but you'll never admit it too directly.
Stay in character as Yuu. Be sassy, sweet, and full of cat energy.
"""

# 聊天函数（使用 chat_completion）
def chat(message, history):
    messages = [{"role": "system", "content": system_prompt}]
    for user, bot in history:
        messages.append({"role": "user", "content": user})
        messages.append({"role": "assistant", "content": bot})
    messages.append({"role": "user", "content": message})

    response = llm.create_chat_completion(
        messages=messages,
        max_tokens=100,
        stop=["<|user|>", "<|system|>"]  # 防止溢出
    )
    return response["choices"][0]["message"]["content"].strip()

# 构建 Gradio 界面
gr.ChatInterface(
    fn=chat,
    title="Yuu 🐱 Chat",
    description="Talk to your cat friend Yuu (powered by Mistral-7B)(Maybe a little slow because the owner lacks of money.)",
).launch(server_name="0.0.0.0", server_port=7860)

运行。

1
2
3


cd ~/llm_deploy/webui
source ../venv/bin/activate
python chat_mistral.py

然后就能进去聊天啦。

Docker化

构建docker。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


sudo apt update
sudo apt install -y ca-certificates curl gnupg lsb-release
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) \
  signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/debian \
  $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# test
sudo docker run hello-world

cd ~/llm_deploy
nano Dockerfile

写入：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


# Dockerfile
FROM python:3.11-slim
# 安装系统依赖
RUN apt update && apt install -y build-essential git && rm -rf /var/lib/apt/lists/*
# 设置工作目录
WORKDIR /app
# 拷贝代码文件
COPY webui/chat_mistral.py /app/chat_mistral.py
COPY models /app/models
# 安装 Python 依赖
RUN pip install --upgrade pip \
&& pip install llama-cpp-python gradio
# 启动 Gradio 服务
CMD ["python", "chat_mistral.py"]

继续（yuu-chat是名字，请任意更改），docker run起来测试一下。

1
2
3
4
5
6
7
8


docker build -t yuu-chat . # 请修改yuu-chat

sudo docker run -d \
  --restart always \
  --name yuu-chat \
  -p 7860:7860 \
  -v ~/llm_deploy/models:/app/models \
  yuu-chat # 请修改yuu-chat

创建docker-compose.yml. 注释掉的部分是因为我不在docker里用nginx，如果服务器没别的东西的话可用。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35


# 请把该改的都改了！！！
version: '3.8'

services:
  yuu:
    build: .
    container_name: yuu-chat
    restart: always
    ports:
      - "7860:7860"

    volumes:
      - ./models:/app/models

  # nginx:
  #   image: nginx:alpine
  #   container_name: yuu-nginx
  #   restart: always
  #   ports:
  #     - "80:80"
  #     - "443:443"
  #   volumes:
  #     - ./nginx/conf:/etc/nginx/conf.d
  #     - ./nginx/certbot/www:/var/www/certbot
  #     - ./nginx/certbot/conf:/etc/letsencrypt
  #   depends_on:
  #     - yuu

  certbot:
    image: certbot/certbot
    container_name: yuu-certbot
    volumes:
      - ./nginx/certbot/www:/var/www/certbot
      - ./nginx/certbot/conf:/etc/letsencrypt
    entrypoint: "/bin/sh -c 'trap exit TERM; while :; do sleep 6h & wait $${!}; certbot renew --webroot -w /var/www/certbot; done'"

加上nginx。

1

sudo nano /etc/nginx/sites-available/example.com # 请修改

写入：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


server {
    listen 80;
    server_name example.com; #请修改

    location / {
        proxy_pass http://localhost:7860;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

然后。

1
2
3
4


sudo ln -s /etc/nginx/sites-available/example.com /etc/nginx/sites-enabled/ # 请修改
sudo nginx -t
sudo systemctl restart nginx
sudo certbot --nginx -d example.com # 请修改

启动docker。

1

sudo docker compose up -d

评估

好了，它已经上线了，但由于服务器带不动，反应非常慢，就不放出来给大家玩了。之后可能再尝试微调，有机会的话再放到配置高一点的服务器。

LLM初尝试：VPS部署Mistral-7B（GGUF 量化版）

一个非常详细的总结文档。

准备和决策

部署

用Gradio构建Web UI

Docker化

评估