Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scaling.cloud/llms.txt

Use this file to discover all available pages before exploring further.

An incident in Scaling represents a service disruption or degradation that requires your team’s attention. Each incident has a title, a severity level, and a status that moves forward through a fixed lifecycle — from the moment it’s opened until it’s fully resolved. Incidents can be assigned directly to a user, or handed to the current on-call responder for a schedule — which also attaches the matching escalation policy. Every transition is recorded in a permanent audit trail.

Severity levels

Severity indicates the impact of an incident. Set it when you create the incident and update it if conditions change.
SeverityWhen to use
criticalComplete outage or data loss affecting all users. Requires immediate action.
highMajor feature broken or significant performance degradation. Urgent response needed.
mediumPartial or intermittent impact. A workaround may exist.
lowMinor issue with minimal user impact. Can be addressed in normal working hours.

Status lifecycle

Every incident starts at investigating and moves forward through four statuses. Transitions are one-way — you cannot skip a step or go back to a previous status.
investigating → identified → monitoring → resolved
StatusMeaning
investigatingYour team is looking into the cause. Impact is not yet confirmed.
identifiedThe root cause is known. A fix is in progress.
monitoringA fix has been applied. You are watching to confirm it holds.
resolvedThe incident is over. No further action is required.
Status transitions are enforced. You cannot move an incident from monitoring back to identified, and you cannot skip directly from investigating to resolved. Progress through each step in order.resolved is terminal. Once an incident is resolved, no further updates can be posted on it. The only mutation that survives the terminal status is redaction of an existing update. If you need a post-incident write-up, post it as a final public update before transitioning to resolved.

Incident Updates

An Incident Update is the single unit of timeline activity on an incident. Every update carries:
FieldMeaning
bodyFree-text message (up to 10,000 characters). Optional for internal updates, required for public.
statusChangeOptional lifecycle transition. When set, the incident’s status advances in the same write.
visibilityEither internal (staff-only) or public (rendered on your status page). Default is internal.
postedByThe user (or API key) who posted the update.
postedAtServer-set ISO timestamp.
At least one of body or statusChange must be present. A public update always requires a non-empty body. This single concept replaces what used to be two separate things: free-form internal notes, and the implicit “row written every time you transitioned status.” Today, both flow through the same shape — the difference is whether body, statusChange, or both are set.

Visibility — internal vs public

VisibilityWhere it renders
internalStaff incident timeline only. If statusChange is set, the transition still appears as a bare row on the public page, but the body never does.
publicStaff timeline and the public status page — the body is the message customers read.
Every input surface (web, Slack, MCP, public API) defaults to internal. Publishing is always an explicit, deliberate action — you cannot publish by accident.

Publishing requires a covering Status Page

Posting a public update is rejected with NO_PUBLIC_SURFACE (400) unless your org has at least one published Status Page whose selected components overlap with the incident’s affected components. This is enforced server-side before any write. This prevents the silent-failure case where you publish into the void — i.e., write a public message that no surface actually renders. Configure your Status Page to include the affected components before publishing. See Status pages for component selection.

Redacting a published mistake

Updates are append-only. To correct or remove a previous statement, redact it and post a new one — the system never silently edits a customer-visible record.
CallerCan redact when
The original authorWithin 5 minutes of postedAt.
An organization adminAt any time.
Redaction wipes the body, sets redactedAt and redactedBy, and preserves any statusChange the update carried — the system does not lie about lifecycle state. On the public status page, the slot remains visible at its original timestamp, rendered as “This update has been removed.” The original wording is gone, but the fact that something existed and was pulled back is visible. Redaction is permitted even on resolved incidents — it is the only mutation that survives the terminal status.

Status history and audit trail

The incident detail view shows the full ordered timeline of updates: internal notes, public messages, and status transitions interleaved at their actual post times. Each entry records who posted it, when, and (for redacted updates) who redacted it and when. The legacy statusHistory field on the Get Incident response remains populated for backwards compatibility — it surfaces just the status transitions. For the full timeline (notes + transitions + public updates), call List Incident Updates.

Creating an incident

When you create an incident, the following fields are available:
FieldRequiredNotes
titleYes1–100 characters.
descriptionNoAdditional context. Up to 10,000 characters.
severityYescritical, high, medium, or low.
ownerIdEitherUser ID to assign directly as the incident owner.
ownerScheduleIdEitherOn-call schedule whose current responder becomes the owner. See below.
componentIdsNoUUIDs of affected components, up to 50.
At least one of ownerId or ownerScheduleId is required so the paging path always has a target.

Assigning an on-call owner

Pass ownerScheduleId at creation time to hand the incident to the team that is currently on-call. The server resolves ownership and escalation in one step:
  • It looks up the current on-call responder for the schedule (including active overrides) and sets them as the incident owner.
  • It searches your escalation policies for one whose layers target that schedule, and attaches the match.
If you also supply ownerId, that user wins as the incident owner — the schedule is still used to find a matching escalation policy. If no one is currently on-call and no ownerId was supplied, the incident is still created without an owner; you can assign one later from the incident detail page.
For critical and high severity incidents, pass ownerScheduleId so the right responders are paged automatically through the matching escalation policy. See Escalation Policies for how policies and schedules connect.