Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scaling.cloud/llms.txt

Use this file to discover all available pages before exploring further.

This guide shows you how to publish CloudWatch alarms defined in AWS CDK to the Scaling AWS integration so that incidents open on OK → ALARM and auto-resolve on ALARM → OK. If you set up alarms in the console or with Terraform instead, see Ingest CloudWatch Alarms. For an overview of the integration, see AWS CloudWatch.

Why both transitions matter

The integration opens an incident when an alarm transitions to ALARM and resolves the matching incident when the same alarm transitions back to OK. If your CDK only wires the alarm action and forgets the OK action, CloudWatch never publishes the recovery transition to SNS — your incident stays open forever even after the underlying metric recovers. This is the single most common CDK mistake when connecting alarms to Scaling. The patterns below avoid it by default.

Prerequisites

You should already have the AWS integration created in Scaling and the SNS topic + HTTPS subscription in place. If you don’t, follow the first four steps of Ingest CloudWatch Alarms and then come back here to wire your alarms. You’ll need:
  • An existing SNS topic subscribed to your Scaling webhook URL, with SignatureVersion=2 (Scaling rejects the SNS default of 1).
  • A CDK app using aws-cdk-lib v2 (TypeScript). The snippets below were written against aws-cdk-lib@^2.250.
In CDK, you reference the topic by its ARN — usually exported from the stack that owns it, or imported via Topic.fromTopicArn.
If your topic is also created in CDK, set signatureVersion: '2' on the Topic directly so every subscription inherits RSA-SHA256 — no per-subscription override needed.
import { Topic } from 'aws-cdk-lib/aws-sns'
import {
  SubscriptionProtocol,
  UrlSubscription,
} from 'aws-cdk-lib/aws-sns-subscriptions'

const scalingTopic = new Topic(this, 'scaling-alarms-topic', {
  topicName: 'scaling-alarms',
  // Scaling requires RSA-SHA256. SNS defaults to RSA-SHA1, which is rejected.
  signatureVersion: '2',
})

scalingTopic.addSubscription(
  new UrlSubscription(
    'https://api.scaling.cloud/webhooks/aws/<token>',
    { protocol: SubscriptionProtocol.HTTPS }
  )
)
For an existing subscription you can’t recreate, update it in place with aws sns set-subscription-attributes --attribute-name SignatureVersion --attribute-value 2.
The same topic can carry both ALARM and OK notifications for many alarms — you do not need one topic per transition. One topic per Scaling component is the right granularity.

Approach A: cdk-monitoring-constructs

If you’re using cdk-monitoring-constructs (the MonitoringFacade / SLO-style API), wire the OK transition by passing both onAlarmTopic and onOkTopic to SnsAlarmActionStrategy. By default the strategy only sets onAlarmTopic.
import { Stack, type StackProps } from 'aws-cdk-lib'
import { Topic } from 'aws-cdk-lib/aws-sns'
import {
  MonitoringFacade,
  SnsAlarmActionStrategy,
} from 'cdk-monitoring-constructs'
import type { Construct } from 'constructs'

export class PaymentsMonitoringStack extends Stack {
  constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, props)

    const scalingTopic = Topic.fromTopicArn(
      this,
      'scaling-alarms-topic',
      'arn:aws:sns:us-east-1:123456789012:scaling-alarms'
    )

    new MonitoringFacade(this, 'monitoring', {
      alarmFactoryDefaults: {
        alarmNamePrefix: 'payments',
        actionsEnabled: true,
        action: new SnsAlarmActionStrategy({
          onAlarmTopic: scalingTopic,
          // Required: publish OK transitions to the same topic so the
          // Scaling webhook can auto-resolve the incident on recovery.
          onOkTopic: scalingTopic,
        }),
      },
    })
      .monitorLambdaFunction({
        lambdaFunction: paymentsFn,
        addFaultRateAlarm: {
          Slo: {
            maxErrorRate: 0.01,
            alarmDescriptionOverride:
              '[P2] payments – Lambda unhandled error rate above 1% over 10 minutes.',
          },
        },
      })
  }
}
Every alarm produced through this facade — monitorLambdaFunction, monitorApiGateway, monitorSqsQueue, custom metrics — inherits both topics. You configure the wiring once.
SnsAlarmActionStrategy accepts onAlarmTopic, onOkTopic, and onInsufficientDataTopic independently. Omitting onOkTopic is the default — and the default is wrong for Scaling. Always set it explicitly.

Approach B: raw aws-cdk-lib/aws-cloudwatch

If you build Alarm instances directly, call both addAlarmAction and addOkAction with the same SnsAction. Scattering those two calls everywhere is error-prone, so wrap them in a small helper and route every alarm through it:
import { Alarm, type IAlarm } from 'aws-cdk-lib/aws-cloudwatch'
import { SnsAction } from 'aws-cdk-lib/aws-cloudwatch-actions'
import { Topic } from 'aws-cdk-lib/aws-sns'

const scalingTopic = Topic.fromTopicArn(
  this,
  'scaling-alarms-topic',
  'arn:aws:sns:us-east-1:123456789012:scaling-alarms'
)

const snsAction = new SnsAction(scalingTopic)

/**
 * Wire SNS to both ALARM and OK transitions so the Scaling webhook
 * can open AND auto-resolve the matching incident. Without
 * `addOkAction`, CloudWatch never publishes the recovery transition.
 */
const routeAlarmToScaling = (alarm: IAlarm): void => {
  alarm.addAlarmAction(snsAction)
  alarm.addOkAction(snsAction)
}

const dlqDepthAlarm = new Alarm(this, 'payments-dlq-depth', {
  alarmName: 'payments-dlq-depth',
  alarmDescription:
    '[P2] payments – DLQ has unprocessed messages. Handler failed beyond maxReceiveCount.',
  metric: dlq.metricApproximateNumberOfMessagesVisible(),
  threshold: 0,
  evaluationPeriods: 1,
  comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,
})

routeAlarmToScaling(dlqDepthAlarm)
A helper is preferable to scattering addAlarmAction and addOkAction calls across the codebase. The next person reviewing your stack only has to check that every alarm flows through routeAlarmToScaling, not that both methods are called everywhere.

Set severity from the alarm description

Scaling parses an optional severity marker from the alarm’s AlarmDescription field and uses it for the opened incident. The recognised prefixes are:
PrefixIncident severity
[P1]critical
[P2]high
[P3]medium
[P4]low
If the description is empty or has no recognised prefix, the incident defaults to high. The marker can appear anywhere in the description, but convention is to put it at the start so the level is obvious in CloudWatch too:
new Alarm(this, 'api-latency-p95', {
  alarmDescription:
    '[P3] payments – API p95 latency above 2000ms over 10 minutes. Check cold starts and DB query times.',
  // ...metric, threshold, etc.
})
The full description (including the marker) is also forwarded into the incident’s body so responders see the original CloudWatch context.

Verification

After cdk deploy, confirm the OK transition is actually wired before you trust the auto-resolve path.
1

Check the synthesized template

Run cdk synth and inspect the generated CloudFormation. Every AWS::CloudWatch::Alarm resource should have both AlarmActions and OKActions populated with your SNS topic ARN:
Type: AWS::CloudWatch::Alarm
Properties:
  AlarmActions:
    - arn:aws:sns:us-east-1:123456789012:scaling-alarms
  OKActions:
    - arn:aws:sns:us-east-1:123456789012:scaling-alarms
If OKActions is missing or empty on any alarm, that alarm’s incident will never auto-resolve.
2

Confirm in the AWS console

Open any deployed alarm in the CloudWatch console. Under Actions, the In alarm and OK rows should both list your SNS topic. Insufficient data can be left empty — the integration ignores those notifications by design.
3

Force a transition end-to-end

Temporarily lower an alarm’s threshold (or trigger the underlying condition) so it transitions OK → ALARM. Within a few seconds a new incident appears in Scaling on the mapped component. Restore the threshold so the alarm returns to OK — the open incident should auto-resolve.

Common pitfalls

SymptomCause
Incident opens but never auto-resolvesaddOkAction / onOkTopic is missing on the alarm. CloudWatch isn’t publishing the recovery transition.
Notifications rejected with topic_arn_mismatchThe integration is pinned to a different topic ARN than the one your CDK is publishing through. Pinning happens on the first valid SubscriptionConfirmation. Either point your CDK at the pinned topic, or delete and recreate the integration.
Subscription stays in “Pending confirmation”Two common causes. (1) The topic still publishes SignatureVersion=1 — set signatureVersion: '2' on the Topic (or aws sns set-subscription-attributes on the existing subscription) and Scaling will accept the next confirmation. (2) The webhook URL has trailing whitespace or path encoding issues from a copy-paste — recreate the subscription cleanly from the topic ARN.
Alarm spends time in INSUFFICIENT_DATAExpected — the integration ignores INSUFFICIENT_DATA notifications by design, so you don’t need to suppress them. Configure treatMissingData based on your metric, not based on Scaling.
Same alarm opens multiple incidentsThe original ALARM never recovered to OK (or OK actions aren’t wired). Subsequent re-evaluations don’t open new incidents while the previous one is still open, but a missed OK followed by an ALARM → ALARM re-deliver can look like it. Verify OKActions in CloudFormation first.
Scaling enforces topic-ARN pinning per integration: the first valid delivery decides which topic ARN is allowed, and every subsequent delivery’s TopicArn is checked against the pinned value. If you migrate to a new topic, recreate the integration so the pin can be re-set.

See also