AgentWatch: Production-Grade Observability for AI Agents

As AI agents move into production, knowing what they cost, how often they fail, and how fast they respond becomes critical. AgentWatch is a self-hosted observability platform that gives you complete visibility into your AI agent workflows — token usage, latency (p50/p95), error rates, and cost per task — all in real-time, with zero vendor lock-in. The entire system is a single FastAPI server (~364 lines), a SQLite database, and a vanilla JS dashboard with no build step required.

Key Features

Feature	Details
Zero-Code Integration	Auto-patches OpenAI and Anthropic clients — no code changes needed
Real-Time Streaming	Live event feed via Server-Sent Events (SSE)
Cost Tracking	Built-in per-model pricing for GPT-4o, Claude, and more
Latency Insights	p50, p95, and average latency per agent/model/task
Error Monitoring	Error rates broken down by agent, model, and task
Self-Hosted	SQLite + FastAPI — no Postgres, no external dependencies
Pip Installable	`pip install -e .` and `agentwatch serve` to start

Architecture

┌──────────────────────────────────────────────────────────────┐
│                     Your AI Agent Code                       │
└──────────────────────────────────────────────────────────────┘
                              ↓
                   ┌──────────────────┐
                   │  AgentWatch SDK  │
                   │ (auto-patching,  │
                   │  decorator,      │
                   │  manual log)     │
                   └──────────────────┘
                              ↓
                   ┌──────────────────┐
                   │  Background      │
                   │  Event Sender    │
                   │  (batching,      │
                   │   threading)     │
                   └──────────────────┘
                              ↓
                ┌─────────────────────────┐
                │  FastAPI Server (8787)  │
                │  ├─ REST API endpoints  │
                │  ├─ SSE streaming       │
                │  └─ Static dashboard    │
                └─────────────────────────┘
                              ↓
                ┌─────────────────────────┐
                │   SQLite Database       │
                │  (events, metrics)      │
                └─────────────────────────┘
                              ↓
                ┌─────────────────────────┐
                │  Dashboard (Browser)    │
                │  ├─ Metric cards        │
                │  ├─ Charts (cost, lat)  │
                │  ├─ Live event feed     │
                │  └─ Breakdown tables    │
                └─────────────────────────┘

Components

Component	Purpose	Tech
SDK	Client instrumentation library	Python threading, httpx
Server	REST API + SSE, async event handling	FastAPI, aiosqlite
Database	Event storage & metrics aggregation	SQLite, async queries
Dashboard	Real-time monitoring UI	Vanilla JS, Chart.js, SSE

SDK Integration Modes

AgentWatch provides four ways to instrument your agents:

1. Auto-Patching (Zero Code Changes)

import agentwatch

agentwatch.init(
    server_url="http://localhost:8787",
    agent="research-bot",
    auto_patch=True
)

# All OpenAI/Anthropic calls are now tracked automatically
client = openai.OpenAI()
response = client.chat.completions.create(model="gpt-4o", ...)
# ✅ Tokens, latency, cost recorded automatically

2. Decorator (Selective Tracking)

@track(task="summarize-documents", agent="research-bot")
def summarize(text):
    response = client.chat.completions.create(...)
    return response.choices[0].message.content

3. Manual Logging (Full Control)

agentwatch.log_event(
    task_name="custom-analysis",
    agent_name="analytics-bot",
    model="gpt-4o-mini",
    input_tokens=500,
    output_tokens=100,
    latency_ms=1200.0,
    status="success"
)

4. Class-Based API

watch = AgentWatch(server_url="http://localhost:8787", agent="my-agent")
watch.log_event(task_name="query", model="gpt-4o", ...)

Dashboard

The dashboard is a single HTML file (~48 KB) with no build step, featuring a dark theme designed for long monitoring sessions:

Metric Cards — Total cost, request count with success rate badge, p50/p95 latency, active agent count
Cost & Tokens Chart — Dual Y-axis with cost as filled gradient line and tokens as dashed purple line
Latency Chart — p50 and p95 with shaded band between them
Live Event Feed — Real-time SSE stream with color-coded agent badges, task labels, token counts, and status dots
Breakdown Tables — Sortable by agent, model, or task with color-coded error rate bars

Built-In Cost Tracking

Pricing for popular models is included out of the box:

Model	Input	Output
gpt-4o	$2.50/M	$10.00/M
gpt-4o-mini	$0.15/M	$0.60/M
claude-opus-4	$15.00/M	$75.00/M
claude-sonnet-4	$3.00/M	$15.00/M
claude-haiku-3.5	$0.80/M	$4.00/M

Unsupported models fall back to default pricing.

Performance

Metric	Value
Event ingestion	~5,000 events/sec (batch mode)
Query latency	<50ms for typical queries
Dashboard refresh	~500ms (parallel fetches)
Storage	~50 KB per 1,000 events
Memory	~50 MB baseline

Tech Stack

Server: FastAPI with async/await throughout (~364 lines)
Database: SQLite via aiosqlite with indexed queries and parameterized SQL (no injection vulnerabilities)
SDK: Python with httpx, background threading for non-blocking event batching
Dashboard: Vanilla HTML/CSS/JS with Chart.js — no Node.js, no build step
CLI: agentwatch serve entry point with configurable host, port, and DB path
Demo: Backfill mode generates 7 days of realistic simulated data (~3,500 events) with reproducible seeding

Why AgentWatch

Cost Control — Know exactly what each agent, model, and task costs before the bill arrives
Reliability — Catch error rate spikes and latency regressions in real-time
Visibility — Full audit trail of every LLM call without external dependencies
Non-Intrusive — Tracking failures never break your application; events are silently dropped with a warning
No Vendor Lock-In — Self-hosted, open source, MIT licensed, zero external services required