Analyze OpenObserve logs and traces with an AI MCP observability toolkit

Created by

Taiwo Hassan

Last update

Last update a month ago

🔎 AI Observability Toolkit for OpenObserve MCP Server (Logs + Traces)

An MCP server that exposes 10 specialized AI tools for deep observability over your OpenObserve logs and traces.

Designed for AI agents to perform:

Schema inspection
Error fingerprinting
Traffic anomaly detection
Latency profiling
Dependency bottleneck detection
Precise trace forensics

This server transforms OpenObserve into a structured, AI-queryable observability engine.

🧠 Overview

Instead of giving an AI raw log access, this MCP server provides purpose-built forensic tools.

The AI can:

Inspect available fields before writing queries
Detect real root causes instead of duplicate errors
Identify traffic spikes
Profile p99 latency per operation
Detect cold starts
Identify slow dependencies
Map exactly which span failed inside a trace
Run flexible SQL queries on demand

🛠️ Exposed MCP Tools

1️⃣ Stream Schema Inspection

Purpose:
Allows the AI to see available fields before constructing queries.

What it does:

Executes DESCRIBE default
Prevents hallucinated fields
Enables safe query generation

2️⃣ Unique Error Fingerprinting

Purpose:
Groups identical error messages to reveal true root causes.

What it does:

Groups by message
Counts occurrences
Returns top recurring failures

3️⃣ Volume Trend Analysis

Purpose:
Detects sudden spikes in log volume.

What it does:

1-minute histogram over _timestamp
Surfaces abnormal traffic bursts
Useful for detecting DDoS or recursive loops

4️⃣ Log Pattern Discovery

Purpose:
Summarizes common log prefixes to understand normal behavior.

What it does:

Groups by first 20 characters of message
Helps anomaly detection
Builds behavioral baseline

5️⃣ P99 Latency Analysis (Traces)

Purpose:
Identifies the slowest 1% of operations.

What it does:

Uses approx_percentile_cont(duration, 0.99)
Groups by operation_name
Surfaces performance outliers

6️⃣ Cold-Start Identification (Traces)

Purpose:
Detects slow initialization spans.

What it does:

Filters operation_name = 'init'
Identifies unusually long startup spans
Useful for serverless and containerized systems

7️⃣ Dependency Hotspots (Traces)

Purpose:
Finds which external service causes the most delay.

What it does:

Groups by service_name
Calculates average duration
Orders by slowest dependency

8️⃣ SQL Logs Query

Purpose:
Flexible SQL execution for logs.

What it does:

Accepts full SQL query
Time-bounded search
Supports root cause analysis, security auditing, performance debugging

9️⃣ Span Error Mapping (Traces)

Purpose:
Pinpoints exactly which span failed inside a trace.

What it does:

Filters spans where status_code >= 400
Requires trace_id
Returns span_id and operation_name

🔟 SQL Traces Query

Purpose:
Flexible SQL execution for trace data.

What it does:

Accepts full SQL query
Time-bounded search
Enables advanced trace-level investigations

⚙️ Architecture Notes

All tools are exposed via MCP Server Trigger
Connected using ai_tool bindings
Queries are time-bounded using start_time and end_time
Uses HTTP Basic Auth for OpenObserve
Logs endpoint: /api/default/_search
Traces endpoint: /api/default/_search?type=traces

🚀 Use Cases

Why did my API slow down in the last hour?
What are the most common errors today?
Which service dependency is causing latency?
Show me where this trace failed.
Is there abnormal traffic right now?

The AI can now answer all of these using your real telemetry.