MCP Usage Guide
This guide shows you how to use PulseStream’s MCP tools in practice. Each section includes the tool calls involved and realistic example output so you know what to expect.
MCP vs. the dashboard
Section titled “MCP vs. the dashboard”Use the dashboard for overview, triage, and post-incident review. It gives you filtered views of incident groups, service health trends, and analysis history.
Use MCP when you are actively debugging a problem and want to ask questions in plain language from your IDE or terminal. MCP is conversational. You can follow up, drill down, and carry context across multiple questions without leaving your editor.
The two complement each other. A typical flow: notice a problem in the dashboard, switch to MCP to investigate, come back to the dashboard to verify things are stable after you fix it.
Verify your connection first
Section titled “Verify your connection first”Before investigating anything, confirm your token and workspace are correct.
Prompt: "Am I connected to PulseStream?"
Tool called: whoami
Example output:
PulseStream Connection Status=============================
Workspace: production (ws_01abc23def456)Organization: org_78xyzRole: CustomerCluster: cluster-uid-af3b2
Scopes: read:namespaces, read:incidents, read:services, chat
Connection: OKIf the workspace or cluster looks wrong, generate a new token at Settings - Access Tokens and call set_token.
Workflow 1: Investigating an active incident
Section titled “Workflow 1: Investigating an active incident”You get paged. There is an open critical incident. You want to know what is happening and what to do about it.
-
List open incidents
Prompt:
"Show me all open critical incidents"Tool called:
list_incident_groupswithstatus=open, severity=criticalExample output:
Incident Groups (2 total):1. OOMKilled: payment-processor - repeated restarts- ID: grp_9kx2mq7p- Status: open- Severity: critical- Incidents: 14- Service: payment-processor- First seen: 4/3/2026, 1:12:00 AM- Last seen: 4/3/2026, 3:47:00 AM2. CrashLoopBackOff: order-api - exit code 137- ID: grp_4rn8wt1c- Status: open- Severity: critical- Incidents: 6- Service: order-api- First seen: 4/3/2026, 2:55:00 AM- Last seen: 4/3/2026, 3:44:00 AM -
Get incident details
Prompt:
"Tell me more about the payment-processor incident"Tool called:
get_incident_groupwithgroup_id=grp_9kx2mq7pExample output (truncated):
{"id": "grp_9kx2mq7p","title": "OOMKilled: payment-processor - repeated restarts","status": "open","severity": "critical","incident_count": 14,"ai_analyzed": false,"service_name": "payment-processor","service_namespace": "production","first_seen": "2026-04-03T01:12:00Z","last_seen": "2026-04-03T03:47:00Z","incidents": [{"id": "inc_m3z9p","error_message": "OOMKilled","event_count": 8,"severity": "critical","last_seen": "2026-04-03T03:47:00Z","source": "kubernetes"}]} -
Run AI root cause analysis
Prompt:
"Analyze the payment-processor incident - what is the root cause?"Tool called:
analyze_incident_groupwithgroup_id=grp_9kx2mq7pExample output (streaming, then delivered as a full response):
## Root Cause Analysis: payment-processor OOMKilled**Probable cause:** Memory leak in the Stripe webhook handler. The service isconsuming unbounded memory when processing large webhook payloads (> 500KB),specifically in the event deserialization path introduced in the v2.8.1 deployat 00:45 UTC.**Evidence:**- 14 OOMKilled events since 01:12 UTC, all within 35-90 minutes of pod start- Memory usage climbs steadily from ~180MB to the 512MB limit before each kill- Pattern consistent with object accumulation, not a spike load issue- The 14 events correlate with the Stripe webhook volume seen during peak hours**Affected scope:** 3 replicas of payment-processor in namespace production**Recommended actions:**1. Temporarily increase memory limit to 1Gi to stop the crash loop2. Review webhook payload parsing in src/webhooks/stripe.py - look for listaccumulation across requests3. Deploy a fix before reverting the memory limit increase4. Set a memory alert at 80% to catch this pattern earlier next time**Related services:** order-api depends on payment-processor for checkoutconfirmation. It will degrade if payment-processor remains unstable. -
Ask follow-up questions
You can ask follow-up questions in the same conversation thread. Pass
conversation_idback to preserve context across turns.Prompt:
"What changed in the last deploy that could have caused this?"Tool called:
send_group_chat_messagewithgroup_id=grp_9kx2mq7p, message="What changed in the last deploy...", conversation_id=<from previous response>The AI has context from the prior analysis and can give a more targeted answer without re-examining all the evidence.
Workflow 2: Proactive service investigation
Section titled “Workflow 2: Proactive service investigation”You do not have an active incident but something feels wrong. Your API gateway latency is elevated and you want to understand why.
Prompt: "Why is latency increasing on the api-gateway service in production?"
Tool called: investigate_service with service_name=api-gateway, namespace=production, message="Why is latency increasing on this service?"
The tool scopes the AI investigation to api-gateway in production and streams back findings. The response includes a conversation_id at the end for follow-ups.
Example output:
## Service Investigation: api-gateway (production)
**Observation:** api-gateway is showing a p99 latency increase of ~340ms startingaround 02:20 UTC. The service itself is healthy (2/2 replicas running), but upstreamcalls to payment-processor are timing out intermittently.
**Root cause:** The elevated latency is downstream. api-gateway has a 5-secondtimeout on payment-processor calls. When payment-processor pods restart due toOOMKilled (see grp_9kx2mq7p), in-flight requests to api-gateway time out waitingfor a response that never comes.
**Current state:**- api-gateway: healthy, 2 replicas, no errors in its own logs- payment-processor: critical - 14 OOMKilled events since 01:12 UTC- Dependency chain: api-gateway -> payment-processor -> stripe
**Recommended actions:**1. Fix payment-processor first - that is the actual problem2. Consider adding a circuit breaker on api-gateway's payment-processor calls to fail fast instead of waiting on timeouts3. Return a 503 with a retry-after header instead of a timeout to callers
---conversation_id: conv_7tq4mr2xFollow up: "What would a circuit breaker look like for this dependency?"
Tool called: investigate_service with conversation_id=conv_7tq4mr2x
The AI picks up where it left off, with full context of what it already found.
Workflow 3: Namespace health check after a deployment
Section titled “Workflow 3: Namespace health check after a deployment”You just deployed a new version to the staging namespace and want to confirm everything looks healthy before promoting to production.
Prompt: "Check the staging namespace - everything healthy after the deployment?"
Tool called: investigate_namespace with namespace=staging, message="Check overall health after a recent deployment. Are any services unhealthy, crashing, or showing elevated error rates?"
Example output:
## Namespace Health: staging
**Summary:** staging is mostly healthy. One service needs attention.
**Healthy services (4):**- api-gateway: 2/2 replicas running- order-api: 3/3 replicas running- notification-service: 1/1 replicas running- frontend: 2/2 replicas running
**Needs attention (1):**- worker-jobs: 1/3 replicas running. 2 pods are in CrashLoopBackOff with exit code 1. Error in pod logs: "DATABASE_URL environment variable not set". This suggests a missing or incorrectly named environment variable in the new deployment's ConfigMap.
**Recommended actions:**1. Check the ConfigMap for worker-jobs - likely missing DATABASE_URL2. Compare environment variables between the previous and current deployment3. Redeploy with the correct env vars
**No new incidents detected** in the last 30 minutes for this namespace.
---conversation_id: conv_2bp8ks5nWorkflow 4: Resuming a previous investigation
Section titled “Workflow 4: Resuming a previous investigation”You were investigating a memory issue yesterday and need to pick up where you left off.
-
List previous conversations
Prompt:
"Show me my recent investigations"Tool called:
list_conversationsExample output:
Investigation Conversations (3 total):1. Service investigation: payment-processor- ID: conv_7tq4mr2x- Scope: namespace (production)- Messages: 6- Last: "What would a circuit breaker look like for this dependency?..."- Last Activity: 4/3/2026, 3:58:00 AM2. Service investigation: worker-jobs- ID: conv_2bp8ks5n- Scope: namespace (staging)- Messages: 4- Last: "Can you look at the ConfigMap diff between v2.7 and v2.8?..."- Last Activity: 4/2/2026, 11:14:00 PM3. Incident investigation: grp_9kx2mq7p- ID: conv_5xm3qt9a- Scope: cluster- Messages: 3- Last: "What changed in the last deploy that could have caused this?..."- Last Activity: 4/3/2026, 2:21:00 AMUse the conversation ID with investigate_cluster, investigate_namespace,investigate_service, or send_group_chat_message to resume. -
Resume the conversation
Prompt:
"Resume the payment-processor investigation and tell me if the OOM issue is resolved"Tool called:
investigate_servicewithservice_name=payment-processor, namespace=production, conversation_id=conv_7tq4mr2x, message="Is the OOM issue resolved?"The AI has the full prior context, including what it found before, and can give a direct update without starting over.
Multi-turn conversations: what context is retained
Section titled “Multi-turn conversations: what context is retained”When you pass a conversation_id in a follow-up call, the AI has access to:
- Every question you asked in that conversation
- Every answer it gave
- The service, namespace, or incident group the conversation was scoped to
It does not re-run the full investigation from scratch on every turn. Follow-up answers are faster and more targeted because the AI is building on what it already knows.
The conversation_id is always returned at the end of responses from investigate_namespace, investigate_cluster, investigate_service, and send_group_chat_message. Copy it into your next message to keep the thread going.
Tool categories: when to use what
Section titled “Tool categories: when to use what”Incident tools
Section titled “Incident tools”Use these when you have an active alert or a known incident.
| Tool | When to use |
|---|---|
list_incident_groups | Start here. See all open incidents, filtered by severity or status. |
get_incident_group | Get raw details and the list of individual incidents in a group. |
analyze_incident_group | Trigger AI root cause analysis. Use this once per incident group. |
send_group_chat_message | Ask follow-up questions about a specific incident. |
Investigation tools
Section titled “Investigation tools”Use these for proactive debugging when you suspect a problem but there is no incident yet, or when you want to understand system state without a specific alert.
| Tool | When to use |
|---|---|
investigate_service | You know which service you care about. Most targeted. |
investigate_namespace | You want a health check of a whole namespace - good after deployments. |
investigate_cluster | You have no idea where the problem is. Broadest scan. |
Service and namespace tools
Section titled “Service and namespace tools”Use these to gather context before or alongside an investigation.
| Tool | When to use |
|---|---|
list_services | See all services with their namespace, replica count, and analysis status. |
get_service_details | Get full details on one service, including codebase analysis summary. |
list_namespaces | See all namespaces with service counts and analysis state. |
get_namespace | Get full details on one namespace. |
get_namespace_analysis | Read a previously completed codebase analysis for a namespace. |
analyze_namespace | Trigger a fresh codebase analysis. Slow but comprehensive. |
Conversation tools
Section titled “Conversation tools”| Tool | When to use |
|---|---|
list_conversations | Find a previous investigation you want to continue. |
get_conversation | Get metadata on a specific conversation before resuming it. |
Audit tool
Section titled “Audit tool”| Tool | When to use |
|---|---|
query_audit_logs | Search for who did what and when. Useful for post-incident review. |