灼眼者的官方网站

Docker 部署与生产运维

概述

生产环境下的 OpenClaw 部署需要考虑高可用、可扩展和可观测等多方面因素。本章将系统性地介绍如何使用 Docker Compose 和 Kubernetes 进行生产级部署，并搭建完善的监控告警与日志收集体系。

Docker Compose 完整配置

对于大多数中小规模场景，Docker Compose 是最优的部署选择。以下是一个生产级的完整配置：

version: '3.8'

services:
  openclaw:
    image: openclaw/openclaw:${VERSION:-latest}
    ports:
      - '3000:3000'
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgresql://openclaw:${DB_PASSWORD}@postgres:5432/openclaw
      - REDIS_URL=redis://redis:6379
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    env_file:
      - .env.production
    volumes:
      - ./config:/app/config
      - ./data:/app/data
      - ./logs:/app/logs
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    restart: unless-stopped
    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:3000/health']
      interval: 30s
      timeout: 10s
      retries: 3
    logging:
      driver: 'json-file'
      options:
        max-size: '10m'
        max-file: '3'

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: openclaw
      POSTGRES_USER: openclaw
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ['CMD-SHELL', 'pg_isready -U openclaw']
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
    healthcheck:
      test: ['CMD', 'redis-cli', 'ping']
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped

  nginx:
    image: nginx:alpine
    ports:
      - '80:80'
      - '443:443'
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
      - ./nginx/ssl:/etc/nginx/ssl
      - ./nginx/www:/var/www/html
    depends_on:
      - openclaw
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:

多服务编排架构

生产环境中 OpenClaw 通常需要与多个辅助服务协同工作。核心服务包括：

OpenClaw 主服务：负责消息路由、模型调用和插件执行
PostgreSQL：存储用户数据、会话历史和配置信息
Redis：提供缓存、会话管理和消息队列能力
Nginx：反向代理、SSL 终结和负载均衡
Prometheus + Grafana：指标采集与可视化

Kubernetes 部署配置模板

对于大规模集群场景，Kubernetes 提供了更强大的编排能力。以下是 OpenClaw 的 K8s 部署模板：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: openclaw
  namespace: openclaw
spec:
  replicas: 3
  selector:
    matchLabels:
      app: openclaw
  template:
    metadata:
      labels:
        app: openclaw
    spec:
      containers:
        - name: openclaw
          image: openclaw/openclaw:latest
          ports:
            - containerPort: 3000
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: openclaw-secrets
                  key: database-url
            - name: ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: openclaw-secrets
                  key: anthropic-api-key
          resources:
            requests:
              cpu: '500m'
              memory: '512Mi'
            limits:
              cpu: '2000m'
              memory: '2Gi'
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: openclaw-service
  namespace: openclaw
spec:
  selector:
    app: openclaw
  ports:
    - protocol: TCP
      port: 3000
      targetPort: 3000
  type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: openclaw-hpa
  namespace: openclaw
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: openclaw
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

监控与告警

Prometheus 和 Grafana 是业界主流的监控方案。通过 OpenClaw 内置的 metrics 端点，可以轻松接入监控系统：

# prometheus.yml
scrape_configs:
  - job_name: 'openclaw'
    scrape_interval: 15s
    metrics_path: '/metrics'
    static_configs:
      - targets:
          - 'openclaw:3000'
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        replacement: 'openclaw-production'

  - job_name: 'node-exporter'
    scrape_interval: 30s
    static_configs:
      - targets:
          - 'node-exporter:9100'

关键监控指标包括：请求延迟分布（P50/P95/P99）、模型调用成功率、每秒请求数（RPS）、活跃连接数、内存与 CPU 使用率以及数据库连接池状态。建议设置以下告警规则：当 P99 延迟超过 5 秒时触发告警、错误率超过 1% 时触发告警、内存使用率超过 85% 时触发扩容通知。

日志收集方案

推荐使用 Loki + Promtail 作为日志收集栈，它比 ELK 更轻量且与 Prometheus 无缝集成：

services:
  loki:
    image: grafana/loki:2.9
    ports:
      - '3100:3100'
    volumes:
      - ./loki/config.yaml:/etc/loki/config.yaml
      - loki_data:/loki

  promtail:
    image: grafana/promtail:2.9
    volumes:
      - ./logs:/var/log/openclaw
      - ./promtail/config.yaml:/etc/promtail/config.yaml

如果你更熟悉 Elastic Stack，也可以使用 Filebeat 采集日志并发送到 Elasticsearch，再通过 Kibana 进行可视化分析。

水平扩展与负载均衡

OpenClaw 无状态架构天然支持水平扩展。扩展时需要注意以下几点：

会话亲和性：基于 WebSocket 的实时通信需要启用会话保持，建议使用 Redis 存储会话状态而非内存
消息队列：使用 Redis Stream 或 RabbitMQ 作为消息缓冲层，确保消息不丢失
数据库连接池：随着实例数增加，合理配置 PgBouncer 等连接池中间件防止数据库连接耗尽
限流策略：在网关层配置令牌桶限流，保护后端服务不被突发流量冲垮

部署方案对比表

维度	Docker Compose	Kubernetes	裸机部署
部署复杂度	低	高	中
学习成本	低	高	中
弹性扩缩	手动	自动（HPA）	手动
高可用	单机	多副本+自愈	需自行实现
资源效率	中	高	高
运维成本	低	中	高
滚动更新	手动	自动	手动
适用规模	小/中	大	小
环境一致性	好	好	差
生态集成	一般	丰富	有限

总结而言，建议采用以下决策路径：初创阶段使用 Docker Compose 快速上线，用户量增长后平滑迁移至 Kubernetes 集群。无论选择何种方案，都应从一开始就建立完善的监控告警和日志体系，这是保障生产服务稳定运行的基石。