Guide

MCP Server Setup.

CheckUpstream exposes an MCP (Model Context Protocol) server as a remote HTTP endpoint. Point your AI tool at one URL and OAuth handles the rest — no binary to install, no env vars to manage, no npm package to pin.

Zero-install setup.

Drop this into your AI tool's MCP config and you're done:

JSON

{
  "mcpServers": {
    "checkupstream": {
      "url": "https://checkupstream.com/api/mcp/transport",
      "transport": "http"
    }
  }
}

Paths:

Claude Desktop (macOS) — ~/Library/Application Support/Claude/claude_desktop_config.json
Claude Code — project-level .mcp.json or user-level ~/.claude/mcp.json
Cursor — Settings → MCP → Add new MCP server → paste the URL

The first time Claude or Cursor calls a CheckUpstream tool, it opens your browser to the CheckUpstream consent screen. Approve the request and the AI tool caches a short-lived bearer that it refreshes on its own. Nothing is stored on disk in plain text.

Success

No npm install, no API token to paste, no secret to rotate. The OAuth flow is handled by your AI tool and the CheckUpstream web app. Revoke access any time from Settings → Credentials → AI Connections.

What you get.

CheckUpstream's MCP server exposes three kinds of building blocks, and they're designed to be composed:

Tools — atomic operations the LLM calls to fetch data or perform an action (check_service, diagnose_error, acknowledge_incident). Reads are safe to auto-approve; writes are scope-gated.
Prompts — pre-built multi-step workflow templates that tell the LLM which tools to chain and how to interpret the results (debug_upstream_error chains diagnose_error → get_blast_radius → get_service_history). Invoke them from your AI tool's prompt picker.
Resources — URI-addressable entities (checkupstream://services/slug/stripe) the LLM can fetch directly or drill into from a tool result. Auto-complete against your org's actual services, projects, and incidents.

Tool vs. prompt. diagnose_error is a tool — it returns a verdict for one error message. debug_upstream_error is a prompt — it tells the LLM to call diagnose_error, reason about the result, then chain follow-up tools depending on the verdict. Use tools when you know exactly what you want; use prompts when you want the LLM to run a playbook.

Auto-generated from live introspection on 2026-04-13. 81 tools, 4 prompts, 4 resource templates. Run pnpm run inventory:mcp to refresh.

Read tools60

Scope: read. Auto-approvable by most clients.

Tool	What it does
`audit_dependencies`	Scan your project dependencies for upstream service risk — maps packages to services and flags active incidents.
`check_service`	Check the current status of a specific upstream service (e.g., Stripe, OpenAI, AWS)
`dashboard_stats`	Get a summary of your organization: project count, services monitored, active incidents
`diagnose_error`	Analyze an error message and check if it correlates with a known upstream service outage
`find_projects_using_package`	Reverse dependency lookup: which of your projects depend on this package, and what versions are pinned.
`generate_postmortem`	Generate a structured post-incident review for a specific incident.
`get_active_incidents`	Active/unresolved incidents affecting your tracked upstream dependencies.
`get_architecture_recommendations`	AI-generated architecture improvement suggestions based on the org's topology, dependencies, and incident history.
`get_blast_radius`	Analyze the blast radius of a service outage — shows cascade-affected downstream services, impacted projects, and organizations.
`get_composite_sla_scorecard`	Aggregate uptime and breach state for a composite SLA over a window.
`get_dependency_graph`	Full dependency graph for visualisation — projects, services, and the edges between them.
`get_dependency_health`	Summarize upstream health across your dependencies.
`get_error_budget`	Current error budget remaining for an SLA — how much downtime you can absorb this window before breaching.
`get_historical_risk_report`	Risk score time series for the org's tracked services over 30 or 90 days.
`get_impact_analysis`	Analyze which of your projects are affected by a specific upstream service, showing dependent packages
`get_incident`	Full detail of a single incident: status, impact, started/resolved timestamps, affected service, customer-impact assessment.
`get_incident_correlations`	Which signals (community reports, SDK errors, status-page changes) correlate with a specific incident.
`get_incident_org_impact`	How much THIS specific incident hurts YOUR org: which projects depend on the affected service, which packages, criticality.
`get_incident_timeline`	Chronological status updates posted on a single incident — what the upstream said and when.
`get_incident_trends`	Get weekly or monthly incident trend data for your tracked services over the last 6 months
`get_mttr_predictions`	Predicted mean-time-to-resolution for active incidents on tracked services.
`get_my_error_breakdown`	Error breakdown by type / status code / endpoint reported by your SDK over the configured window.
`get_my_service_health`	Most recent SDK-reported health metrics across your tracked services: error rate, latency, request volume, time window.
`get_project_topology`	Topology entries for one project — explicit deployment-target / hosting / runtime declarations.
`get_recent_activity`	Recent events across the org — incidents, alert dispatches, sync runs, member actions.
`get_risk_scores`	Get risk scores for all upstream services your organization depends on — weighted composite of uptime, error rate, incident frequency, resolution time
`get_runbook`	Single runbook with its trigger conditions, action steps, and metadata.
`get_service`	Single-service detail with components, current status, status-page URL, last-checked timestamp, and recent incidents.
`get_service_ai_analysis`	Cached AI-generated analysis of a service's reliability, common failure modes, and recommended mitigations.
`get_service_historical_context`	Past incidents on this service that resolved in similar ways.
`get_service_history`	Get the incident history for a service over a time window.
`get_service_reliability`	Per-service reliability rollups: uptime %, MTTR, incident count, last incident, time series sparkline.
`get_service_risk_scores`	Risk scores per service across the org's tracked dependencies.
`get_sla_attribution`	Attribute SLA breaches to specific incidents over a window, ranked by downtime contribution.
`get_sla_credit_evidence`	All evidence collected for a single SLA credit claim — incidents, downtime breakdown, vendor terms.
`get_sla_credit_summary`	Aggregate SLA credit totals by status across the org.
`get_sla_forecast`	Forecast which SLAs are likely to breach by end of window based on current burn rate.
`get_sla_history`	Daily SLA snapshots for one SLA over the configured window.
`get_sla_scorecard`	Detailed SLA scorecard for the org with target uptime, current state, breaches, and credit estimates per SLA.
`get_sla_status`	Get SLA compliance scorecard for your organization — shows target vs actual uptime, compliance rate, downtime, and incident counts per service
`get_sla_suggestions`	Suggested SLAs to create for services that don't have one yet, based on the org's catalog and watchlist.
`get_upgrade_path`	Get upgrade intelligence for an npm package — latest version, major bumps, breaking change boundaries, and mapped service info
`list_active_incidents`	List all active incidents affecting your organization's upstream services.
`list_active_maintenance`	Maintenance windows currently in effect — useful for explaining 'why is this service flaky right now?
`list_alert_configs`	List all alert configurations for your organization
`list_alert_failures`	Alerts that fired but couldn't be delivered (Slack webhook 500, email bounce, etc.
`list_community_signals`	Outages reported by community signals (other CheckUpstream customers / SDK telemetry / public sources) that haven't yet hit the official status page.
`list_composite_slas`	All composite SLAs in your org — multi-service aggregate uptime targets (e.g. 'payments stack' = Stripe + Datadog + Redis).
`list_incident_history`	Resolved + unresolved incidents over the org's history-retention window.
`list_inferred_topology`	Topology entries auto-detected from telemetry / dependencies that haven't been confirmed by a human yet.
`list_maintenance_windows`	Scheduled maintenance windows for your org's services — both upcoming and past.
`list_orphan_services`	Services your org tracks that aren't currently used by any project.
`list_projects`	List all projects in your organization with dependency counts and sync status
`list_runbooks`	List all runbooks configured for your organization, including trigger conditions, actions, and linked services
`list_service_catalog`	All services CheckUpstream knows about — not just ones your org tracks.
`list_services`	List upstream services your organization depends on, with their current status and components.
`list_sla_credits`	SLA credits collected from vendors when their service breached its SLA.
`list_slas`	All SLA configurations in your org with their target uptime and current compliance state.
`list_team_members`	List all members of your organization
`list_vendor_sla_terms`	The contractual SLA terms parsed from each vendor's official agreement — uptime targets, credit percentages, exclusions.

Write tools15

Scope: write. Destructive ones accept dry_run: true for a preview.

Tool	What it does
`acknowledge_incident`	Mark an incident as acknowledged.
`create_alert_config`	Create a new alert configuration.
`create_maintenance_window`	Schedule a maintenance window for a service or the whole org.
`create_runbook`	Create a runbook that fires when an incident on a service hits a severity threshold.
`delete_alert_config`	Permanently delete an alert configuration.
`delete_maintenance_window`	Permanently delete a maintenance window.
`delete_runbook`	Permanently delete a runbook.
`dismiss_community_signal`	Mark a community-detected outage as a false positive for your org.
`file_sla_credit_claim`	Mark an SLA credit as filed against the vendor with an optional claim reference and notes.
`resolve_sla_credit_claim`	Mark an SLA credit claim as approved or denied by the vendor, with optional notes.
`set_service_status`	Override a service's current status (e.g. force 'major_outage' when you know about a vendor issue before it hits the status page).
`test_alert_config`	Send a synthetic test event through an alert config so you can verify the channel works.
`unwatch_service`	Remove a service from your org's watchlist.
`update_alert_config`	Update an existing alert config's settings.
`watch_service`	Add a service to your org's watchlist.

Prompts4

Invoked from the MCP client's prompt picker, not callable as tools.

Prompt	What it does
`triage_incident`	Structured prompt for AI-assisted incident triage — gathers incident details, affected services, and blast radius to guide response
`debug_upstream_error`	End-to-end debug flow for a suspected upstream error.
`weekly_reliability_review`	Generate a structured weekly reliability review for the org.
`upgrade_risk_assessment`	Risk-assess an npm package upgrade: find where the package is used, map the version delta, correlate with upstream service health, and recommend a rollout strategy.

Resources4

URI-addressable. Fetch via ReadMcpResourceTool or your client's equivalent.

URI template	What it is
`checkupstream://services/{id}`	Detailed service status including components and recent incidents
`checkupstream://services/slug/{slug}`	Detailed service status addressable by slug (e.g. stripe, openai, aws).
`checkupstream://projects/{name}`	Project snapshot: dependency counts, sync status, and a summary of upstream health for the project.
`checkupstream://incidents/{id}`	Incident details including status updates and affected service

Managing connected tools.

Connected AI tools show up in Settings → Credentials → AI Connections. Each entry displays the client name, issued date, and last-used timestamp. Revoking an entry invalidates the OAuth grant immediately — the next tool call from that client fails until the user reconnects.

Verify it works.

After OAuth completes you land on a confirmation page that auto-bounces you back to your AI tool. Once it does, run one prompt to prove the connection round-trips end to end:

"Ping CheckUpstream and tell me what my org looks like."

The AI client will call ping_mcp (zero-side-effect liveness probe) followed by whoami (org name + scopes + your most-used tools over the last 30 days). If you see your org name + scope set come back, you're set up.

Example prompts.

Once connected, ask your AI assistant questions like:

"Is Stripe having any issues right now?"
"Audit my dependencies and tell me what's at risk."
"Which of my projects use the openai package?"
"Has Stripe had any incidents in the last 24 hours?"
"I'm getting a 503 from api.stripe.com — is there a known outage?"
"Show me the SLA compliance scorecard."
"Which services have the highest risk scores?"
"What's the weekly reliability review look like?"
"Triage incident `inc_abc123`."

Self-hosted CheckUpstream.

If you run your own CheckUpstream instance, point your MCP client at its transport URL instead:

JSON

{
  "mcpServers": {
    "checkupstream": {
      "url": "https://your-instance.example.com/api/mcp/transport",
      "transport": "http"
    }
  }
}

OAuth discovery (/.well-known/oauth-authorization-server and /.well-known/oauth-protected-resource) is served by the same process — no extra configuration required.

Troubleshooting.

Start here: 30-second diagnosis

Most issues fall into one of two buckets. A single command tells you which:

claude mcp list

checkupstream isn't in the output — the server was never added. Run the install command from Zero-install setup (opens in new tab) above.
! Needs authentication — the OAuth flow hasn't completed. Trigger it by asking your AI anything that would use a CheckUpstream tool; the browser should open to /mcp-authorize.
✗ Failed to connect — network / transport problem. Jump to MCP server stuck in "Failed to connect" (opens in new tab) below.
✓ Connected but my AI doesn't see the tools — the tool list is cached per-session. Jump to Connected but no tools visible (opens in new tab) below.

If none of those match, the specific-symptom sections below cover the rest.

The recover-from-anything fix

Nine times out of ten, this one recipe unsticks a broken install:

claude mcp remove checkupstream -s user
claude mcp add --transport http --scope user checkupstream https://checkupstream.com/api/mcp/transport

Then fully quit and restart your AI tool. Claude Code and Claude Desktop only register newly-connected MCP tools at session start — a hot reconnect leaves the tool list empty even after OAuth succeeds.

Specific symptoms

OAuth popup doesn't open

Check that your AI tool supports the remote MCP transport. Claude Desktop, Claude Code, and Cursor all support it as of 2025-06. For older clients, you may need to upgrade.

Browser shows "Authentication Successful" and your AI tool picks up automatically

Expected. Claude Code runs a short-lived loopback listener on http://localhost:<port>/callback while auth is pending — when your browser follows the post-approval redirect, the listener catches the code and state, exchanges the code for a bearer, and transitions the MCP server from Needs authentication to connected. You don't have to paste anything.

Browser shows "This site can't be reached" on localhost after approving

Rare — means the loopback listener exited before your browser followed the redirect (usually because you approved more than a minute after starting the flow). Restart the auth flow from your AI tool. If that keeps happening, make sure nothing on your machine is blocking the ephemeral port range (common culprit: heavy VPN rules or a local firewall).

"Invalid or expired token"

The access token aged out (access tokens are 1 hour; refresh tokens are 30 days). Reconnect from your AI tool — it will run the refresh grant automatically, or re-run the full OAuth flow if the refresh token is also gone.

MCP server stuck in "Failed to connect"

Click reconnect from the MCP settings panel first. If that doesn't recover, remove and re-add the server:

claude mcp remove checkupstream -s user
claude mcp add --transport http checkupstream https://checkupstream.com/api/mcp/transport -s user

Then fully restart Claude Code so the fresh session re-registers the OAuth helper tools, and retry the first tool call. If this fails repeatedly on a self-hosted instance, verify that https://your-instance.example.com/api/mcp/transport serves a direct 401 when hit without a bearer token — a 3xx redirect means your deployment has a canonical-domain rewrite that is stripping the Authorization header on the hop, and the fix is to make the MCP transport host match the canonical.

Tool not found after connecting

Restart your AI tool so it re-fetches the tool list. Some clients cache on startup.

Connected but no tools visible

claude mcp list shows ✓ Connected for checkupstream, but your Claude session doesn't expose any mcp__checkupstream__* tools when you ask about them. This happens when the MCP finishes connecting after the session starts — the transport is alive but the tool registry for that session never picked the new server up. The fix is to fully quit (⌘Q / exit) and relaunch so the new session reads the connected tool list from scratch. A hot /mcp reconnect is NOT enough; a session restart is.

Need to share access with your team

Each team member runs the OAuth flow from their own machine. Revoke individual grants from Settings → Credentials → AI Connections.

Credentials & Security — the full credential model explained
Getting Started — connect your first project

Plug in your AI tool.

Drop the snippet into your MCP config and you're connected. OAuth handles the rest — no token to manage.

Get Started Credentials Guide