Cloud API bills add up fast. You send a few thousand requests through GPT-4 or Claude, wire up some automation workflows, and suddenly you're looking at a $200/month invoice for something that runs twice a day. Worse, you're sending your business data (customer names, internal docs, proprietary processes) to someone else's servers every single time.
A homelab fixes both problems. For the price of two months of cloud API bills, you can own hardware that runs local models, hosts your automation stack, and gives you a private AI infrastructure that never phones home. The models have caught up. A 7B parameter model running on a $300 mini PC can handle classification, summarization, extraction, and basic reasoning at speeds that are fine for automation. You don't need GPT-4 for every task.
This guide walks through building an AI agent homelab from bare hardware to running autonomous workflows. Real hardware picks with real prices, a tested software stack, network architecture that works from anywhere, and actual configs you can paste into your terminal.
Who this is for: Developers, sysadmins, and tinkerers who want to run AI agents on hardware they own. You should be comfortable with Linux, Docker, and SSH. No ML background needed. We're using pre-trained models, not training them.
01 Hardware: What to Buy
Your hardware choice depends on one question: do you want to run local LLMs, or just host the orchestration layer and call cloud APIs? Both are valid. The orchestration-only path costs a third as much.
Option A: Orchestration-only (~$150-300)
If you're calling OpenAI, Anthropic, or Groq APIs and just need somewhere to run n8n, OpenClaw, databases, and scheduled jobs, almost anything works. A Raspberry Pi 5 (8GB) handles it. An Intel N100 mini PC handles it better. You don't need GPU, and you don't need much RAM.
- Budget pick: Beelink Mini S12 Pro (Intel N100, 16GB RAM, 500GB SSD) — ~$160
- Mid-range: Any N100/N305 mini PC with 16GB RAM — ~$200
- What you get: Enough compute for 20+ Docker containers, a PostgreSQL database, and dozens of concurrent automation workflows
The N100 sips 10 watts at idle. Your electric bill won't notice it exists.
Option B: Local LLM capable (~$400-800)
Running models locally requires more RAM than anything else. The model gets loaded into memory, and if it doesn't fit, you're swapping to disk and inference takes minutes instead of seconds. Rule of thumb: you need roughly 1GB of RAM per billion parameters at Q4 quantization.
- 7B models (Mistral 7B, Llama 3 8B): 8GB RAM minimum, 16GB comfortable
- 13B models (Llama 3 13B, CodeLlama 13B): 16GB minimum, 32GB comfortable
- 70B+ models: You need a GPU. Consumer hardware won't cut it for CPU inference at this scale.
For a homelab that runs 7B-13B models locally while hosting your full automation stack:
- Good: Beelink SER5 (Ryzen 5 5560U, 16GB, 500GB) — ~$280. Handles 7B models.
- Better: Minisforum UM780 XTX (Ryzen 7 7840HS, 32GB, 1TB) — ~$500. Runs 13B models smoothly, has an iGPU that Ollama can use.
- Best value for GPU inference: Used Dell Optiplex or HP ProDesk + used NVIDIA RTX 3060 12GB (~$500 total). The 12GB VRAM handles 7B models at full speed and 13B quantized.
Don't overlook used enterprise gear. A refurbished Lenovo ThinkCentre M920q Tiny with a 9th-gen i5 and 32GB RAM sells for $120-150 on eBay. It won't run models, but as an orchestration node it's overkill.
Option C: Proxmox virtualization host (~$300-600)
If you want to isolate workloads (and you should), run Proxmox VE as your hypervisor. One physical box, multiple VMs: one for AI/Ollama, one for automation (n8n, OpenClaw), one for monitoring. If a rogue automation script eats all the RAM, your other VMs keep running.
Any of the machines above work as a Proxmox host. The AMD Ryzen options are better here because their iGPU can be passed through to a VM for hardware-accelerated inference, while the host runs headless.
02 Operating System & Virtualization
Install Proxmox VE if you want VM isolation. Install Ubuntu Server 24.04 LTS if you want simplicity. Both work. Proxmox adds overhead but gives you snapshots, live migration (if you later add a second node), and clean separation between workloads.
Proxmox setup (recommended)
# Download Proxmox VE 8.x ISO from proxmox.com
# Flash to USB with balenaEtcher or dd
# Boot, install, set a static IP
# After install, disable the enterprise repo (unless you have a subscription):
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" \
> /etc/apt/sources.list.d/pve-no-subscription.list
apt update && apt dist-upgrade -y
Create your first VM for the AI/automation workload:
# From the Proxmox web UI (https://your-ip:8006):
# → Create VM → Ubuntu Server 24.04 ISO
# → 4 cores, 16GB RAM (adjust to your hardware)
# → 100GB disk (thin provisioned)
# → Start after creation
Bare metal Ubuntu (alternative)
If you want things running in 15 minutes instead of 45, skip Proxmox and install Ubuntu Server directly. You lose VM isolation but gain simplicity. Good for single-purpose boxes.
# After installing Ubuntu Server 24.04:
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl git htop tmux
03 The Software Stack
Here's the full stack, from bottom to top. Every piece is open source or has a generous free tier. Total cost for software: $0.
Docker & Docker Compose
Everything runs in containers. No exceptions. Containers give you reproducible deployments, easy rollbacks, and clean dependency isolation.
# Install Docker (official method):
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Log out and back in, then verify:
docker run hello-world
Ollama — Local LLM inference
Ollama turns running local models into a single command. It handles quantization, GPU detection, memory management, and exposes an OpenAI-compatible API. If your app works with GPT, it works with Ollama by changing one URL.
# Install Ollama:
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model:
ollama pull llama3:8b
ollama pull mistral:7b
ollama pull nomic-embed-text # for embeddings
# Test it:
ollama run llama3:8b "Summarize the benefits of running AI locally in 3 sentences."
# Ollama API is now running on localhost:11434
# OpenAI-compatible endpoint: http://localhost:11434/v1/chat/completions
Ollama automatically uses your GPU if it detects one. On CPU-only systems, expect 5-15 tokens/second for 7B models — slow for chat, but perfectly fine for batch processing and automation.
n8n — Workflow automation
n8n is the backbone of your automation layer. It connects to 400+ services, has a visual workflow editor, and supports custom JavaScript/Python nodes. Think Zapier, but self-hosted and with no per-execution pricing.
# docker-compose.yml for n8n:
services:
n8n:
image: n8nio/n8n:latest
restart: always
ports:
- "5678:5678"
environment:
- N8N_BASIC_AUTH_ACTIVE=true
- N8N_BASIC_AUTH_USER=admin
- N8N_BASIC_AUTH_PASSWORD=your-secure-password
- N8N_HOST=n8n.yourdomain.com
- N8N_PROTOCOL=https
- GENERIC_TIMEZONE=America/New_York
volumes:
- n8n_data:/home/node/.n8n
volumes:
n8n_data:
docker compose up -d
n8n connects to Ollama natively via its "Chat Model" node. Point it at http://localhost:11434 and select your model. Now your workflows can classify emails, extract data from PDFs, generate responses, and make decisions. All running on your hardware.
OpenClaw — AI assistant gateway
OpenClaw gives you a persistent AI assistant that connects to Telegram, Discord, or any chat platform. It runs cron jobs, manages tools, and maintains memory across sessions. It's the "agent brain" that ties your homelab together.
# Install OpenClaw:
npm install -g openclaw
# Initialize configuration:
openclaw init
# Start the gateway:
openclaw gateway start
OpenClaw can use Ollama as its model backend, or it can call cloud APIs for tasks that need stronger reasoning (Claude, GPT-4). The pattern that works: use local models for high-volume, low-complexity tasks (classification, extraction, summarization), and route complex reasoning to cloud APIs on demand.
PostgreSQL — Structured data store
Your agents need somewhere to store state, results, and historical data. SQLite works for single-agent setups. PostgreSQL works for everything else.
# Add to your docker-compose.yml:
postgres:
image: postgres:16
restart: always
environment:
POSTGRES_USER: agent
POSTGRES_PASSWORD: your-secure-password
POSTGRES_DB: homelab
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:
Monitoring — know when things break
Agents that run unattended need monitoring. Without it, a failed workflow runs silently for weeks before you notice the data pipeline is stale.
# Minimal monitoring stack (add to docker-compose.yml):
uptime-kuma:
image: louislam/uptime-kuma:latest
restart: always
ports:
- "3001:3001"
volumes:
- uptime_data:/app/data
Uptime Kuma monitors your services and sends alerts via Telegram, Discord, email, or 20+ other channels. Set up checks for Ollama (http://localhost:11434), n8n (http://localhost:5678), and any custom endpoints your agents expose.
Want 12 ready-to-import n8n workflows?
The AI Automation Starter Kit includes 12 production-tested n8n workflows — lead scoring, content repurposing, email triage, web scraping, and more. Each one works with Ollama or cloud APIs. Import, configure, run.
Get the starter kit — $39 →04 Network Architecture
Your homelab sits behind a NAT. Your laptop is at a coffee shop. Your phone is on cellular. You need secure access to your agents from anywhere without exposing ports to the internet.
Tailscale — mesh VPN
Tailscale creates a private network across all your devices using WireGuard. Every device gets a stable IP. No port forwarding, no dynamic DNS, no firewall holes. It's the single best thing you can install on a homelab.
# Install on your homelab server:
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
# Install on your laptop/phone too.
# Now your homelab is reachable at its Tailscale IP from anywhere.
Access n8n at http://100.x.x.x:5678, Ollama at http://100.x.x.x:11434, and Proxmox at https://100.x.x.x:8006 from any device on your tailnet. No public exposure.
Tailscale Serve — HTTPS without the hassle
If you want proper HTTPS with real certificates (useful for OpenClaw's webhook endpoints and n8n's OAuth callbacks):
# Expose n8n over HTTPS on your tailnet:
tailscale serve --bg https+insecure://localhost:5678
# Expose OpenClaw gateway:
tailscale serve --bg --https 443 http://localhost:18789
Now you have valid HTTPS certificates, automatic renewal, and zero attack surface. The ports are only reachable from your tailnet.
Reverse proxy for multiple services
If you're running 5+ services and want clean URLs, add Caddy as a reverse proxy:
# Caddyfile
n8n.homelab.local {
reverse_proxy localhost:5678
}
ollama.homelab.local {
reverse_proxy localhost:11434
}
monitor.homelab.local {
reverse_proxy localhost:3001
}
Combined with Tailscale's MagicDNS, you get https://homelab.tail-abc123.ts.net URLs that resolve automatically on every device in your tailnet.
05 Building Your First Agent Workflow
Hardware is running. Software is installed. Now make it do something useful. Here's a real workflow: an inbox triage agent that reads incoming emails, classifies them by urgency and category, drafts responses for routine ones, and alerts you about anything that needs human attention.
The architecture
- Trigger: n8n polls your email inbox every 5 minutes (IMAP node)
- Classify: Send the email subject + first 500 chars to Ollama (Llama 3 8B) with a classification prompt
- Route: Based on the classification: urgent goes to Telegram alert, routine gets a draft reply, spam gets archived
- Store: Log every email and its classification to PostgreSQL for pattern analysis
- Learn: Weekly cron job analyzes the PostgreSQL data and updates the classification prompt with new patterns
The classification prompt is the key piece:
You are an email classifier. Categorize the following email into exactly one category:
- URGENT: requires human response within 4 hours
- ROUTINE: standard business email, can be auto-drafted
- FYI: informational, no response needed
- SPAM: promotional or unsolicited
Respond with ONLY the category name, nothing else.
Subject: {{$json.subject}}
From: {{$json.from}}
Body: {{$json.body.substring(0, 500)}}
Llama 3 8B handles this reliably at 10+ classifications per second on CPU. It's a constrained output task, exactly what smaller models excel at.
More workflow ideas
- RSS → Summary → Telegram: Monitor 50 tech blogs, use Ollama to summarize new posts, send a daily digest to your chat
- GitHub → Code Review: When a PR is opened, pull the diff, send it to Claude for review, post the review as a PR comment
- Invoice extraction: PDF invoices arrive by email → OCR with Tesseract → extract amounts/dates/vendors with Ollama → insert into a spreadsheet
- Uptime alerting: When a monitored service goes down, the agent checks logs, identifies the likely cause, and sends a Telegram message with the diagnosis and suggested fix
- Lead scoring: New form submissions get scored by the LLM based on your criteria, high-quality leads go to CRM, others get an automated follow-up email
Each of these runs 24/7 on your homelab, costs nothing per execution, and keeps your data local.
06 Security Considerations
Your homelab is running agents that read your email, access your APIs, and make decisions. Secure it accordingly.
Basics that matter
- Don't expose services to the public internet. Use Tailscale for all remote access. Zero ports forwarded.
- Use separate API keys for each service. If your n8n instance is compromised, the attacker shouldn't get your OpenAI key, your email creds, and your database password.
- Enable authentication on everything. n8n has built-in auth. Ollama doesn't. Put it behind a reverse proxy with basic auth if anyone else shares your network.
- Encrypt your disks. Full-disk encryption (LUKS on Linux) means a stolen mini PC doesn't mean stolen data.
- Automatic updates. Enable unattended-upgrades for security patches. Your homelab shouldn't need babysitting to stay patched.
Agent-specific security
- Least privilege: Each agent should only have access to the APIs and data it needs. Don't give your email triage bot access to your banking API.
- Rate limiting: Put rate limits on any agent that can send messages or make API calls. A bug in a loop can send 10,000 Telegram messages in a minute.
- Human-in-the-loop: For any action with real consequences (sending emails, making purchases, modifying data), require human approval. Autonomous doesn't mean unsupervised.
- Log everything: Every agent action should be logged. When something goes wrong (and it will), you need the trail to debug it.
# Enable UFW and lock down to Tailscale:
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow in on tailscale0
sudo ufw enable
# Now only Tailscale traffic reaches your services
07 Maintenance & Scaling
A homelab that works on day one but breaks on day thirty is useless. Build maintenance into the system.
Backups
Proxmox has built-in backup scheduling. For Docker volumes, use a cron job:
# Backup all Docker volumes nightly:
#!/bin/bash
BACKUP_DIR=/mnt/backup/docker-volumes
DATE=$(date +%Y%m%d)
for vol in $(docker volume ls -q); do
docker run --rm -v "$vol":/source -v "$BACKUP_DIR":/backup \
alpine tar czf "/backup/${vol}_${DATE}.tar.gz" -C /source .
done
# Keep 7 days of backups:
find "$BACKUP_DIR" -name "*.tar.gz" -mtime +7 -delete
Model updates
New model releases drop monthly. Update when it matters, not compulsively:
# Check for model updates:
ollama list
# Pull a newer version:
ollama pull llama3:8b
# Test before swapping your production workflows
ollama run llama3:8b "Test classification prompt here"
Scaling up
When one box isn't enough:
- Add a second node to your Proxmox cluster. Live-migrate VMs between nodes for zero-downtime maintenance.
- Dedicate hardware — one machine for Ollama/GPU inference, one for orchestration. Prevents a heavy model load from starving your automation workflows.
- Add a NAS (Synology, TrueNAS) for centralized storage and backups. Separates compute from data.
- Use Tailscale subnet routers to expose your entire homelab subnet to your tailnet without installing Tailscale on every device.
What We Didn't Cover
This guide gets you from zero to a running AI agent homelab. But a production-grade setup goes deeper:
- Advanced Proxmox configuration — GPU passthrough, ZFS storage, high-availability clustering, backup rotation strategies
- Docker networking — custom bridge networks, container DNS, secrets management with Docker Swarm or Vault
- Monitoring at scale — Prometheus + Grafana dashboards, custom metrics for agent performance, alerting on model inference latency
- CI/CD for workflows — version-controlling n8n workflows, testing automation pipelines before deployment, rollback strategies
- Multi-site replication — running homelab nodes in different locations with Tailscale mesh, failover between sites
Get the Infrastructure Guides Bundle
Step-by-step guides for Proxmox, Docker, Tailscale, and monitoring — the full infrastructure layer under your AI agents. Includes GPU passthrough, backup automation, and multi-node clustering.
Download the bundle — $24 →Start Small, Build Up
You don't need to build the whole stack on day one. Start with a $160 mini PC running Docker, n8n, and one automation workflow. Get that working, get it useful, then add Ollama. Then add Proxmox when you outgrow the single-box setup. Then add a second node when your homelab addiction truly takes hold.
The point isn't to build the perfect infrastructure. The point is to own your AI stack, control your data, and stop paying per-API-call for tasks a local model handles fine. Every workflow you move to your homelab is one that runs forever at zero marginal cost.
The hardware is cheap. The software is free. The only cost is your time. And if you're reading a homelab blog post, you were probably going to spend that time tinkering anyway.