Documentation Index
Fetch the complete documentation index at: https://docs.scaling.cloud/llms.txt
Use this file to discover all available pages before exploring further.
An incident in Scaling represents a service disruption or degradation that requires your team’s attention. Each incident has a title, a severity level, and a status that moves forward through a fixed lifecycle — from the moment it’s opened until it’s fully resolved.
Incidents can be assigned directly to a user, or handed to the current on-call responder for a schedule — which also attaches the matching escalation policy. Every transition is recorded in a permanent audit trail.
Severity levels
Severity indicates the impact of an incident. Set it when you create the incident and update it if conditions change.
| Severity | When to use |
|---|
critical | Complete outage or data loss affecting all users. Requires immediate action. |
high | Major feature broken or significant performance degradation. Urgent response needed. |
medium | Partial or intermittent impact. A workaround may exist. |
low | Minor issue with minimal user impact. Can be addressed in normal working hours. |
Status lifecycle
Every incident starts at investigating and moves forward through four statuses. Transitions are one-way — you cannot skip a step or go back to a previous status.
investigating → identified → monitoring → resolved
| Status | Meaning |
|---|
investigating | Your team is looking into the cause. Impact is not yet confirmed. |
identified | The root cause is known. A fix is in progress. |
monitoring | A fix has been applied. You are watching to confirm it holds. |
resolved | The incident is over. No further action is required. |
Status transitions are enforced. You cannot move an incident from monitoring back to identified, and you cannot skip directly from investigating to resolved. Progress through each step in order.resolved is terminal. Once an incident is resolved, no further updates can be posted on it. The only mutation that survives the terminal status is redaction of an existing update. If you need a post-incident write-up, post it as a final public update before transitioning to resolved.
Incident Updates
An Incident Update is the single unit of timeline activity on an incident. Every update carries:
| Field | Meaning |
|---|
body | Free-text message (up to 10,000 characters). Optional for internal updates, required for public. |
statusChange | Optional lifecycle transition. When set, the incident’s status advances in the same write. |
visibility | Either internal (staff-only) or public (rendered on your status page). Default is internal. |
postedBy | The user (or API key) who posted the update. |
postedAt | Server-set ISO timestamp. |
At least one of body or statusChange must be present. A public update always requires a non-empty body.
This single concept replaces what used to be two separate things: free-form internal notes, and the implicit “row written every time you transitioned status.” Today, both flow through the same shape — the difference is whether body, statusChange, or both are set.
Visibility — internal vs public
| Visibility | Where it renders |
|---|
internal | Staff incident timeline only. If statusChange is set, the transition still appears as a bare row on the public page, but the body never does. |
public | Staff timeline and the public status page — the body is the message customers read. |
Every input surface (web, Slack, MCP, public API) defaults to internal. Publishing is always an explicit, deliberate action — you cannot publish by accident.
Publishing requires a covering Status Page
Posting a public update is rejected with NO_PUBLIC_SURFACE (400) unless your org has at least one published Status Page whose selected components overlap with the incident’s affected components. This is enforced server-side before any write.
This prevents the silent-failure case where you publish into the void — i.e., write a public message that no surface actually renders. Configure your Status Page to include the affected components before publishing. See Status pages for component selection.
Redacting a published mistake
Updates are append-only. To correct or remove a previous statement, redact it and post a new one — the system never silently edits a customer-visible record.
| Caller | Can redact when |
|---|
| The original author | Within 5 minutes of postedAt. |
| An organization admin | At any time. |
Redaction wipes the body, sets redactedAt and redactedBy, and preserves any statusChange the update carried — the system does not lie about lifecycle state. On the public status page, the slot remains visible at its original timestamp, rendered as “This update has been removed.” The original wording is gone, but the fact that something existed and was pulled back is visible.
Redaction is permitted even on resolved incidents — it is the only mutation that survives the terminal status.
Status history and audit trail
The incident detail view shows the full ordered timeline of updates: internal notes, public messages, and status transitions interleaved at their actual post times. Each entry records who posted it, when, and (for redacted updates) who redacted it and when.
The legacy statusHistory field on the Get Incident response remains populated for backwards compatibility — it surfaces just the status transitions. For the full timeline (notes + transitions + public updates), call List Incident Updates.
Creating an incident
When you create an incident, the following fields are available:
| Field | Required | Notes |
|---|
title | Yes | 1–100 characters. |
description | No | Additional context. Up to 10,000 characters. |
severity | Yes | critical, high, medium, or low. |
ownerId | Either | User ID to assign directly as the incident owner. |
ownerScheduleId | Either | On-call schedule whose current responder becomes the owner. See below. |
componentIds | No | UUIDs of affected components, up to 50. |
At least one of ownerId or ownerScheduleId is required so the paging path always has a target.
Assigning an on-call owner
Pass ownerScheduleId at creation time to hand the incident to the team that is currently on-call. The server resolves ownership and escalation in one step:
- It looks up the current on-call responder for the schedule (including active overrides) and sets them as the incident owner.
- It searches your escalation policies for one whose layers target that schedule, and attaches the match.
If you also supply ownerId, that user wins as the incident owner — the schedule is still used to find a matching escalation policy. If no one is currently on-call and no ownerId was supplied, the incident is still created without an owner; you can assign one later from the incident detail page.
For critical and high severity incidents, pass ownerScheduleId so the right responders are paged automatically through the matching escalation policy. See Escalation Policies for how policies and schedules connect.