Back to Templates

Analyze OpenObserve logs and traces with an AI MCP observability toolkit

Created by

Created by: Taiwo Hassan || taiwo
Taiwo Hassan

Last update

Last update 6 days ago

Categories

Share


🔎 AI Observability Toolkit for OpenObserve MCP Server (Logs + Traces)

An MCP server that exposes 10 specialized AI tools for deep observability over your OpenObserve logs and traces.

Designed for AI agents to perform:

  • Schema inspection
  • Error fingerprinting
  • Traffic anomaly detection
  • Latency profiling
  • Dependency bottleneck detection
  • Precise trace forensics

This server transforms OpenObserve into a structured, AI-queryable observability engine.


🧠 Overview

Instead of giving an AI raw log access, this MCP server provides purpose-built forensic tools.

The AI can:

  • Inspect available fields before writing queries
  • Detect real root causes instead of duplicate errors
  • Identify traffic spikes
  • Profile p99 latency per operation
  • Detect cold starts
  • Identify slow dependencies
  • Map exactly which span failed inside a trace
  • Run flexible SQL queries on demand

🛠️ Exposed MCP Tools


1️⃣ Stream Schema Inspection

Purpose:
Allows the AI to see available fields before constructing queries.

What it does:

  • Executes DESCRIBE default
  • Prevents hallucinated fields
  • Enables safe query generation

2️⃣ Unique Error Fingerprinting

Purpose:
Groups identical error messages to reveal true root causes.

What it does:

  • Groups by message
  • Counts occurrences
  • Returns top recurring failures

3️⃣ Volume Trend Analysis

Purpose:
Detects sudden spikes in log volume.

What it does:

  • 1-minute histogram over _timestamp
  • Surfaces abnormal traffic bursts
  • Useful for detecting DDoS or recursive loops

4️⃣ Log Pattern Discovery

Purpose:
Summarizes common log prefixes to understand normal behavior.

What it does:

  • Groups by first 20 characters of message
  • Helps anomaly detection
  • Builds behavioral baseline

5️⃣ P99 Latency Analysis (Traces)

Purpose:
Identifies the slowest 1% of operations.

What it does:

  • Uses approx_percentile_cont(duration, 0.99)
  • Groups by operation_name
  • Surfaces performance outliers

6️⃣ Cold-Start Identification (Traces)

Purpose:
Detects slow initialization spans.

What it does:

  • Filters operation_name = 'init'
  • Identifies unusually long startup spans
  • Useful for serverless and containerized systems

7️⃣ Dependency Hotspots (Traces)

Purpose:
Finds which external service causes the most delay.

What it does:

  • Groups by service_name
  • Calculates average duration
  • Orders by slowest dependency

8️⃣ SQL Logs Query

Purpose:
Flexible SQL execution for logs.

What it does:

  • Accepts full SQL query
  • Time-bounded search
  • Supports root cause analysis, security auditing, performance debugging

9️⃣ Span Error Mapping (Traces)

Purpose:
Pinpoints exactly which span failed inside a trace.

What it does:

  • Filters spans where status_code >= 400
  • Requires trace_id
  • Returns span_id and operation_name

🔟 SQL Traces Query

Purpose:
Flexible SQL execution for trace data.

What it does:

  • Accepts full SQL query
  • Time-bounded search
  • Enables advanced trace-level investigations

⚙️ Architecture Notes

  • All tools are exposed via MCP Server Trigger
  • Connected using ai_tool bindings
  • Queries are time-bounded using start_time and end_time
  • Uses HTTP Basic Auth for OpenObserve
  • Logs endpoint: /api/default/_search
  • Traces endpoint: /api/default/_search?type=traces

🚀 Use Cases

  • Why did my API slow down in the last hour?
  • What are the most common errors today?
  • Which service dependency is causing latency?
  • Show me where this trace failed.
  • Is there abnormal traffic right now?

The AI can now answer all of these using your real telemetry.