As AI agents move into production, knowing what they cost, how often they fail, and how fast they respond becomes critical. AgentWatch is a self-hosted observability platform that gives you complete visibility into your AI agent workflows — token usage, latency (p50/p95), error rates, and cost per task — all in real-time, with zero vendor lock-in. The entire system is a single FastAPI server (~364 lines), a SQLite database, and a vanilla JS dashboard with no build step required.
Key Features
| Feature | Details |
|---|---|
| Zero-Code Integration | Auto-patches OpenAI and Anthropic clients — no code changes needed |
| Real-Time Streaming | Live event feed via Server-Sent Events (SSE) |
| Cost Tracking | Built-in per-model pricing for GPT-4o, Claude, and more |
| Latency Insights | p50, p95, and average latency per agent/model/task |
| Error Monitoring | Error rates broken down by agent, model, and task |
| Self-Hosted | SQLite + FastAPI — no Postgres, no external dependencies |
| Pip Installable | pip install -e . and agentwatch serve to start |
Architecture
┌──────────────────────────────────────────────────────────────┐
│ Your AI Agent Code │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────┐
│ AgentWatch SDK │
│ (auto-patching, │
│ decorator, │
│ manual log) │
└──────────────────┘
↓
┌──────────────────┐
│ Background │
│ Event Sender │
│ (batching, │
│ threading) │
└──────────────────┘
↓
┌─────────────────────────┐
│ FastAPI Server (8787) │
│ ├─ REST API endpoints │
│ ├─ SSE streaming │
│ └─ Static dashboard │
└─────────────────────────┘
↓
┌─────────────────────────┐
│ SQLite Database │
│ (events, metrics) │
└─────────────────────────┘
↓
┌─────────────────────────┐
│ Dashboard (Browser) │
│ ├─ Metric cards │
│ ├─ Charts (cost, lat) │
│ ├─ Live event feed │
│ └─ Breakdown tables │
└─────────────────────────┘
Components
| Component | Purpose | Tech |
|---|---|---|
| SDK | Client instrumentation library | Python threading, httpx |
| Server | REST API + SSE, async event handling | FastAPI, aiosqlite |
| Database | Event storage & metrics aggregation | SQLite, async queries |
| Dashboard | Real-time monitoring UI | Vanilla JS, Chart.js, SSE |
SDK Integration Modes
AgentWatch provides four ways to instrument your agents:
1. Auto-Patching (Zero Code Changes)
import agentwatch
agentwatch.init(
server_url="http://localhost:8787",
agent="research-bot",
auto_patch=True
)
# All OpenAI/Anthropic calls are now tracked automatically
client = openai.OpenAI()
response = client.chat.completions.create(model="gpt-4o", ...)
# ✅ Tokens, latency, cost recorded automatically
2. Decorator (Selective Tracking)
@track(task="summarize-documents", agent="research-bot")
def summarize(text):
response = client.chat.completions.create(...)
return response.choices[0].message.content
3. Manual Logging (Full Control)
agentwatch.log_event(
task_name="custom-analysis",
agent_name="analytics-bot",
model="gpt-4o-mini",
input_tokens=500,
output_tokens=100,
latency_ms=1200.0,
status="success"
)
4. Class-Based API
watch = AgentWatch(server_url="http://localhost:8787", agent="my-agent")
watch.log_event(task_name="query", model="gpt-4o", ...)
Dashboard
The dashboard is a single HTML file (~48 KB) with no build step, featuring a dark theme designed for long monitoring sessions:
- Metric Cards — Total cost, request count with success rate badge, p50/p95 latency, active agent count
- Cost & Tokens Chart — Dual Y-axis with cost as filled gradient line and tokens as dashed purple line
- Latency Chart — p50 and p95 with shaded band between them
- Live Event Feed — Real-time SSE stream with color-coded agent badges, task labels, token counts, and status dots
- Breakdown Tables — Sortable by agent, model, or task with color-coded error rate bars
Built-In Cost Tracking
Pricing for popular models is included out of the box:
| Model | Input | Output |
|---|---|---|
| gpt-4o | $2.50/M | $10.00/M |
| gpt-4o-mini | $0.15/M | $0.60/M |
| claude-opus-4 | $15.00/M | $75.00/M |
| claude-sonnet-4 | $3.00/M | $15.00/M |
| claude-haiku-3.5 | $0.80/M | $4.00/M |
Unsupported models fall back to default pricing.
Performance
| Metric | Value |
|---|---|
| Event ingestion | ~5,000 events/sec (batch mode) |
| Query latency | <50ms for typical queries |
| Dashboard refresh | ~500ms (parallel fetches) |
| Storage | ~50 KB per 1,000 events |
| Memory | ~50 MB baseline |
Tech Stack
- Server: FastAPI with async/await throughout (~364 lines)
- Database: SQLite via aiosqlite with indexed queries and parameterized SQL (no injection vulnerabilities)
- SDK: Python with httpx, background threading for non-blocking event batching
- Dashboard: Vanilla HTML/CSS/JS with Chart.js — no Node.js, no build step
- CLI:
agentwatch serveentry point with configurable host, port, and DB path - Demo: Backfill mode generates 7 days of realistic simulated data (~3,500 events) with reproducible seeding
Why AgentWatch
- Cost Control — Know exactly what each agent, model, and task costs before the bill arrives
- Reliability — Catch error rate spikes and latency regressions in real-time
- Visibility — Full audit trail of every LLM call without external dependencies
- Non-Intrusive — Tracking failures never break your application; events are silently dropped with a warning
- No Vendor Lock-In — Self-hosted, open source, MIT licensed, zero external services required