What & Why
Running AI agents, multiple web apps, databases, and tooling on a single server with no visibility or deployment guardrails is a reliability and security risk. This infrastructure was designed to: separate concerns across two VPS nodes, use Coolify as a proper PaaS (not raw docker-compose guesswork), enforce Tailscale-first admin access to minimize public attack surface, and maintain automated backups with clear data persistence rules.
Key Features
- Two-node architecture — VPS1 (primary: OpenClaw, Nerve, agents) + VPS2 (Coolify control plane, hosted apps, Traefik reverse proxy)
- Coolify v4 PaaS — manages 5+ apps with persistent storage, auto-SSL via Let's Encrypt, API-driven deployments
- Traefik v3 reverse proxy — auto-routes all *.djasha.me subdomains with SSL; dynamic config for non-Coolify services on VPS1
- Tailscale-first access — all admin interfaces locked to a private Tailscale network; public exposure scoped to webhooks only
- Cloudflare DNS — 4 domains managed (djasha.me, neonoir.ai, asha.news, djasha.space) with full SSL mode
- Cloudflare R2 — S3-compatible object storage for backups and artifacts
- Daily backup cron — OpenClaw state backed up at 00:30 Asia/Amman
- Strict persistence rules — every Coolify app requires persistent storage configured before deploy; Docker volumes are source of truth
Tech & Implementation
Stack: Ubuntu 24.04 · Docker · Coolify 4.0-beta · Traefik v3 · Tailscale · Cloudflare · systemd · Hostinger VPS (x2)
Active services:
- VPS1 (systemd): OpenClaw Gateway, Nerve, Paperclip, Qdrant (vector DB), Ollama (local LLM)
- VPS2 (Coolify): Open WebUI, PocketBase, Portainer, FileBrowser, FreshRSS, Mattermost + Postgres, LiteLLM + Prometheus
Notable decision: Splitting OpenClaw/agents (VPS1, systemd-managed) from web apps (VPS2, Coolify-managed) means a Coolify update or misconfiguration can't take down the agent gateway. The agent infrastructure is treated as critical — always-on, manually managed — while web apps benefit from Coolify's deployment UX.
Outcome
15+ services running in production across both nodes. Zero-downtime deployments via Coolify for web apps. Agent gateway has maintained continuous uptime with daily backup verification. Infrastructure knowledge documented in INFRASTRUCTURE.md (machine-readable by OpenClaw itself) enabling the agent to perform deployments, DNS changes, and service restarts autonomously.
Architecture
graph TB
subgraph "VPS1 — Agents Node"
OC["OpenClaw Gateway"]
Nerve["Nerve UI"]
Qdrant["Qdrant vector DB"]
Ollama["Ollama local LLM"]
Paperclip["Paperclip orchestration"]
end
subgraph "VPS2 — Apps Node"
Coolify["Coolify PaaS"]
Traefik["Traefik v3"]
MM["Mattermost + Postgres"]
WebUI["Open WebUI"]
LiteLLM["LiteLLM + Prometheus"]
PB["PocketBase"]
RSS["FreshRSS"]
end
subgraph "Network"
TS["Private Tailscale network"]
CF["Cloudflare DNS"]
R2["Cloudflare R2 backups"]
end
CF -->|"public traffic"| Traefik
TS -->|"private admin"| VPS1
TS -->|"private admin"| VPS2
Coolify --> Traefik
OC --> Nerve