Skip to main content

MCP Server Setup

CheckUpstream exposes an MCP (Model Context Protocol) server as a remote HTTP endpoint. Point your AI tool at one URL and OAuth handles the rest — no binary to install, no env vars to manage, no npm package to pin.


Zero-install setup

Drop this into your AI tool's MCP config and you're done:

{
  "mcpServers": {
    "checkupstream": {
      "url": "https://checkupstream.com/api/mcp/transport",
      "transport": "http"
    }
  }
}

Paths:

  • Claude Desktop (macOS)~/Library/Application Support/Claude/claude_desktop_config.json
  • Claude Code — project-level .mcp.json or user-level ~/.claude/mcp.json
  • Cursor — Settings → MCP → Add new MCP server → paste the URL

The first time Claude or Cursor calls a CheckUpstream tool, it opens your browser to the CheckUpstream consent screen. Approve the request and the AI tool caches a short-lived bearer that it refreshes on its own. Nothing is stored on disk in plain text.


What you get

CheckUpstream's MCP server exposes three kinds of building blocks, and they're designed to be composed:

  • Tools — atomic operations the LLM calls to fetch data or perform an action (check_service, diagnose_error, acknowledge_incident). Reads are safe to auto-approve; writes are scope-gated.
  • Prompts — pre-built multi-step workflow templates that tell the LLM which tools to chain and how to interpret the results (debug_upstream_error chains diagnose_errorget_blast_radiusget_service_history). Invoke them from your AI tool's prompt picker.
  • Resources — URI-addressable entities (checkupstream://services/slug/stripe) the LLM can fetch directly or drill into from a tool result. Auto-complete against your org's actual services, projects, and incidents.

Tool vs. prompt. diagnose_error is a tool — it returns a verdict for one error message. debug_upstream_error is a prompt — it tells the LLM to call diagnose_error, reason about the result, then chain follow-up tools depending on the verdict. Use tools when you know exactly what you want; use prompts when you want the LLM to run a playbook.

Auto-generated from live introspection on 2026-04-13. 81 tools, 4 prompts, 4 resource templates. Run pnpm run inventory:mcp to refresh.

Read tools (60)

Scope: read. Auto-approvable by most clients.

ToolWhat it does
audit_dependenciesScan your project dependencies for upstream service risk — maps packages to services and flags active incidents.
check_serviceCheck the current status of a specific upstream service (e.g., Stripe, OpenAI, AWS)
dashboard_statsGet a summary of your organization: project count, services monitored, active incidents
diagnose_errorAnalyze an error message and check if it correlates with a known upstream service outage
find_projects_using_packageReverse dependency lookup: which of your projects depend on this package, and what versions are pinned.
generate_postmortemGenerate a structured post-incident review for a specific incident.
get_active_incidentsActive/unresolved incidents affecting your tracked upstream dependencies.
get_architecture_recommendationsAI-generated architecture improvement suggestions based on the org's topology, dependencies, and incident history.
get_blast_radiusAnalyze the blast radius of a service outage — shows cascade-affected downstream services, impacted projects, and organizations.
get_composite_sla_scorecardAggregate uptime and breach state for a composite SLA over a window.
get_dependency_graphFull dependency graph for visualisation — projects, services, and the edges between them.
get_dependency_healthSummarize upstream health across your dependencies.
get_error_budgetCurrent error budget remaining for an SLA — how much downtime you can absorb this window before breaching.
get_historical_risk_reportRisk score time series for the org's tracked services over 30 or 90 days.
get_impact_analysisAnalyze which of your projects are affected by a specific upstream service, showing dependent packages
get_incidentFull detail of a single incident: status, impact, started/resolved timestamps, affected service, customer-impact assessment.
get_incident_correlationsWhich signals (community reports, SDK errors, status-page changes) correlate with a specific incident.
get_incident_org_impactHow much THIS specific incident hurts YOUR org: which projects depend on the affected service, which packages, criticality.
get_incident_timelineChronological status updates posted on a single incident — what the upstream said and when.
get_incident_trendsGet weekly or monthly incident trend data for your tracked services over the last 6 months
get_mttr_predictionsPredicted mean-time-to-resolution for active incidents on tracked services.
get_my_error_breakdownError breakdown by type / status code / endpoint reported by your SDK over the configured window.
get_my_service_healthMost recent SDK-reported health metrics across your tracked services: error rate, latency, request volume, time window.
get_project_topologyTopology entries for one project — explicit deployment-target / hosting / runtime declarations.
get_recent_activityRecent events across the org — incidents, alert dispatches, sync runs, member actions.
get_risk_scoresGet risk scores for all upstream services your organization depends on — weighted composite of uptime, error rate, incident frequency, resolution time
get_runbookSingle runbook with its trigger conditions, action steps, and metadata.
get_serviceSingle-service detail with components, current status, status-page URL, last-checked timestamp, and recent incidents.
get_service_ai_analysisCached AI-generated analysis of a service's reliability, common failure modes, and recommended mitigations.
get_service_historical_contextPast incidents on this service that resolved in similar ways.
get_service_historyGet the incident history for a service over a time window.
get_service_reliabilityPer-service reliability rollups: uptime %, MTTR, incident count, last incident, time series sparkline.
get_service_risk_scoresRisk scores per service across the org's tracked dependencies.
get_sla_attributionAttribute SLA breaches to specific incidents over a window, ranked by downtime contribution.
get_sla_credit_evidenceAll evidence collected for a single SLA credit claim — incidents, downtime breakdown, vendor terms.
get_sla_credit_summaryAggregate SLA credit totals by status across the org.
get_sla_forecastForecast which SLAs are likely to breach by end of window based on current burn rate.
get_sla_historyDaily SLA snapshots for one SLA over the configured window.
get_sla_scorecardDetailed SLA scorecard for the org with target uptime, current state, breaches, and credit estimates per SLA.
get_sla_statusGet SLA compliance scorecard for your organization — shows target vs actual uptime, compliance rate, downtime, and incident counts per service
get_sla_suggestionsSuggested SLAs to create for services that don't have one yet, based on the org's catalog and watchlist.
get_upgrade_pathGet upgrade intelligence for an npm package — latest version, major bumps, breaking change boundaries, and mapped service info
list_active_incidentsList all active incidents affecting your organization's upstream services.
list_active_maintenanceMaintenance windows currently in effect — useful for explaining 'why is this service flaky right now?
list_alert_configsList all alert configurations for your organization
list_alert_failuresAlerts that fired but couldn't be delivered (Slack webhook 500, email bounce, etc.
list_community_signalsOutages reported by community signals (other CheckUpstream customers / SDK telemetry / public sources) that haven't yet hit the official status page.
list_composite_slasAll composite SLAs in your org — multi-service aggregate uptime targets (e.g. 'payments stack' = Stripe + Datadog + Redis).
list_incident_historyResolved + unresolved incidents over the org's history-retention window.
list_inferred_topologyTopology entries auto-detected from telemetry / dependencies that haven't been confirmed by a human yet.
list_maintenance_windowsScheduled maintenance windows for your org's services — both upcoming and past.
list_orphan_servicesServices your org tracks that aren't currently used by any project.
list_projectsList all projects in your organization with dependency counts and sync status
list_runbooksList all runbooks configured for your organization, including trigger conditions, actions, and linked services
list_service_catalogAll services CheckUpstream knows about — not just ones your org tracks.
list_servicesList upstream services your organization depends on, with their current status and components.
list_sla_creditsSLA credits collected from vendors when their service breached its SLA.
list_slasAll SLA configurations in your org with their target uptime and current compliance state.
list_team_membersList all members of your organization
list_vendor_sla_termsThe contractual SLA terms parsed from each vendor's official agreement — uptime targets, credit percentages, exclusions.

Write tools (15)

Scope: write. Destructive ones accept dry_run: true for a preview.

ToolWhat it does
acknowledge_incidentMark an incident as acknowledged.
create_alert_configCreate a new alert configuration.
create_maintenance_windowSchedule a maintenance window for a service or the whole org.
create_runbookCreate a runbook that fires when an incident on a service hits a severity threshold.
delete_alert_configPermanently delete an alert configuration.
delete_maintenance_windowPermanently delete a maintenance window.
delete_runbookPermanently delete a runbook.
dismiss_community_signalMark a community-detected outage as a false positive for your org.
file_sla_credit_claimMark an SLA credit as filed against the vendor with an optional claim reference and notes.
resolve_sla_credit_claimMark an SLA credit claim as approved or denied by the vendor, with optional notes.
set_service_statusOverride a service's current status (e.g. force 'major_outage' when you know about a vendor issue before it hits the status page).
test_alert_configSend a synthetic test event through an alert config so you can verify the channel works.
unwatch_serviceRemove a service from your org's watchlist.
update_alert_configUpdate an existing alert config's settings.
watch_serviceAdd a service to your org's watchlist.

Prompts (4)

Invoked from the MCP client's prompt picker, not callable as tools.

PromptWhat it does
triage_incidentStructured prompt for AI-assisted incident triage — gathers incident details, affected services, and blast radius to guide response
debug_upstream_errorEnd-to-end debug flow for a suspected upstream error.
weekly_reliability_reviewGenerate a structured weekly reliability review for the org.
upgrade_risk_assessmentRisk-assess an npm package upgrade: find where the package is used, map the version delta, correlate with upstream service health, and recommend a rollout strategy.

Resources (4)

URI-addressable. Fetch via ReadMcpResourceTool or your client's equivalent.

URI templateWhat it is
checkupstream://services/{id}Detailed service status including components and recent incidents
checkupstream://services/slug/{slug}Detailed service status addressable by slug (e.g. stripe, openai, aws).
checkupstream://projects/{name}Project snapshot: dependency counts, sync status, and a summary of upstream health for the project.
checkupstream://incidents/{id}Incident details including status updates and affected service

Managing connected tools

Connected AI tools show up in Settings → Credentials → AI Connections. Each entry displays the client name, issued date, and last-used timestamp. Revoking an entry invalidates the OAuth grant immediately — the next tool call from that client fails until the user reconnects.


Verify it works

After OAuth completes you land on a confirmation page that auto-bounces you back to your AI tool. Once it does, run one prompt to prove the connection round-trips end to end:

  • "Ping CheckUpstream and tell me what my org looks like."

The AI client will call ping_mcp (zero-side-effect liveness probe) followed by whoami (org name + scopes + your most-used tools over the last 30 days). If you see your org name + scope set come back, you're set up.

Example prompts

Once connected, ask your AI assistant questions like:

  • "Is Stripe having any issues right now?"
  • "Audit my dependencies and tell me what's at risk."
  • "Which of my projects use the openai package?"
  • "Has Stripe had any incidents in the last 24 hours?"
  • "I'm getting a 503 from api.stripe.com — is there a known outage?"
  • "Show me the SLA compliance scorecard."
  • "Which services have the highest risk scores?"
  • "What's the weekly reliability review look like?"
  • "Triage incident `inc_abc123`."

Self-hosted CheckUpstream

If you run your own CheckUpstream instance, point your MCP client at its transport URL instead:

{
  "mcpServers": {
    "checkupstream": {
      "url": "https://your-instance.example.com/api/mcp/transport",
      "transport": "http"
    }
  }
}

OAuth discovery (/.well-known/oauth-authorization-server and /.well-known/oauth-protected-resource) is served by the same process — no extra configuration required.


Troubleshooting

Start here: 30-second diagnosis

Most issues fall into one of two buckets. A single command tells you which:

claude mcp list

If none of those match, the specific-symptom sections below cover the rest.

The recover-from-anything fix

Nine times out of ten, this one recipe unsticks a broken install:

claude mcp remove checkupstream -s user
claude mcp add --transport http --scope user checkupstream https://checkupstream.com/api/mcp/transport

Then fully quit and restart your AI tool. Claude Code and Claude Desktop only register newly-connected MCP tools at session start — a hot reconnect leaves the tool list empty even after OAuth succeeds.

Specific symptoms

OAuth popup doesn't open Check that your AI tool supports the remote MCP transport. Claude Desktop, Claude Code, and Cursor all support it as of 2025-06. For older clients, you may need to upgrade.

Browser shows "Authentication Successful" and your AI tool picks up automatically Expected. Claude Code runs a short-lived loopback listener on http://localhost:<port>/callback while auth is pending — when your browser follows the post-approval redirect, the listener catches the code and state, exchanges the code for a bearer, and transitions the MCP server from Needs authentication to connected. You don't have to paste anything.

Browser shows "This site can't be reached" on localhost:<port> after approving Rare — means the loopback listener exited before your browser followed the redirect (usually because you approved more than a minute after starting the flow). Restart the auth flow from your AI tool. If that keeps happening, make sure nothing on your machine is blocking the ephemeral port range (common culprit: heavy VPN rules or a local firewall).

"Invalid or expired token" The access token aged out (access tokens are 1 hour; refresh tokens are 30 days). Reconnect from your AI tool — it will run the refresh grant automatically, or re-run the full OAuth flow if the refresh token is also gone.

MCP server stuck in "Failed to connect" Click reconnect from the MCP settings panel first. If that doesn't recover, remove and re-add the server:

claude mcp remove checkupstream -s user
claude mcp add --transport http checkupstream https://checkupstream.com/api/mcp/transport -s user

Then fully restart Claude Code so the fresh session re-registers the OAuth helper tools, and retry the first tool call. If this fails repeatedly on a self-hosted instance, verify that https://your-instance.example.com/api/mcp/transport serves a direct 401 when hit without a bearer token — a 3xx redirect means your deployment has a canonical-domain rewrite that is stripping the Authorization header on the hop, and the fix is to make the MCP transport host match the canonical.

Tool not found after connecting Restart your AI tool so it re-fetches the tool list. Some clients cache on startup.

Connected but no tools visible claude mcp list shows ✓ Connected for checkupstream, but your Claude session doesn't expose any mcp__checkupstream__* tools when you ask about them. This happens when the MCP finishes connecting after the session starts — the transport is alive but the tool registry for that session never picked the new server up. The fix is to fully quit (⌘Q / exit) and relaunch so the new session reads the connected tool list from scratch. A hot /mcp reconnect is NOT enough; a session restart is.

Need to share access with your team Each team member runs the OAuth flow from their own machine. Revoke individual grants from Settings → Credentials → AI Connections.