Deploying to ECS
Use Self Hosted SigNoz if you need
- Full on-prem control
- Custom build tweaks
- Zero outbound traffic
Otherwise, try SigNoz Cloud to
- Onboard in 5 minutes
- Auto-scale & upgrades
- Zero ops maintenance
Amazon ECS gives you a fully managed control plane for orchestrating containers, while EC2‑backed capacity providers let you tie an Auto Scaling Group (ASG) to the cluster so that your compute pool grows and shrinks with demand. Running SigNoz on this foundation combines open‑source observability with native AWS elasticity and cost control:
ECS (EC2 launch type) — lets you use custom AMIs, instance sizes, or spot instances for heavier ClickHouse workloads.
Capacity provider — automatically adds / removes EC2 instances as task demand fluctuates (no manual ASG tweaking).
SigNoz components:
- SigNoz
- SigNoz Collector
- Clickhouse
- Zookeeper
1. Prerequisites
Before you begin, ensure you have the following in place:
- AWS CLI v2 installed and configured with appropriate credentials and region.
- An AWS account with adequate service quotas for:
- Amazon EBS (block storage)
- Amazon EC2 (compute instances)
- IAM (identity & access management)
- Amazon ECS cluster already created, with Capacity Providers enabled (EC2 + Auto Scaling Group)
- Read more: ECS Capacity Providers
- IAM roles for your tasks:
- Task Execution Role with the managed policy
AmazonECSTaskExecutionRolePolicy
- Task Role for SigNoz containers (attach only the permissions your services need)
- Task Execution Role with the managed policy
- SigNoz container images (
signoz/*:<version>
) pushed to a registry reachable by ECS (e.g. Amazon ECR or Docker Hub) - A private S3 bucket (optional) for storing custom ClickHouse configs or backups
- A Custom AMI or Launch Template (optional) if you need custom kernel settings or plan to use Spot instances
- (Optional) Amazon EFS or EBS volumes attached & mounted on your ECS instances for persistent data (ClickHouse, dashboards, configs)
EBS sizing note SigNoz stores metrics/traces in ClickHouse—start with a gp3 volume (20 GiB, 3 000 IOPS, 125 MiB/s) and monitor usage to scale as needed.
2. Network Stack
2.1. CloudFormation Template
Save the JSON below as sigz-network.json
.
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "Network stack for SigNoz on ECS (generic)",
"Parameters": {
"VpcCidr": {"Type": "String", "Description": "CIDR for the VPC (e.g., 10.0.0.0/16)"},
"PublicSubnet1Cidr": {"Type": "String"},
"PublicSubnet2Cidr": {"Type": "String"},
"PrivateSubnet1Cidr": {"Type": "String"},
"PrivateSubnet2Cidr": {"Type": "String"},
"AZ1": {"Type": "AWS::EC2::AvailabilityZone::Name"},
"AZ2": {"Type": "AWS::EC2::AvailabilityZone::Name"}
},
"Resources": {
"Vpc": {
"Type": "AWS::EC2::VPC",
"Properties": {
"CidrBlock": {"Ref": "VpcCidr"},
"EnableDnsSupport": true,
"EnableDnsHostnames": true,
"Tags": [{"Key": "Name", "Value": "sigz-vpc"}]
}
},
"InternetGateway": {"Type": "AWS::EC2::InternetGateway"},
"VPCGatewayAttachment": {
"Type": "AWS::EC2::VPCGatewayAttachment",
"Properties": {
"VpcId": {"Ref": "Vpc"},
"InternetGatewayId": {"Ref": "InternetGateway"}
}
},
"PublicSubnet1": {
"Type": "AWS::EC2::Subnet",
"Properties": {
"VpcId": {"Ref": "Vpc"},
"CidrBlock": {"Ref": "PublicSubnet1Cidr"},
"AvailabilityZone": {"Ref": "AZ1"},
"MapPublicIpOnLaunch": true,
"Tags": [{"Key": "Name", "Value": "sigz-pub-1"}]
}
},
"PublicSubnet2": {
"Type": "AWS::EC2::Subnet",
"Properties": {
"VpcId": {"Ref": "Vpc"},
"CidrBlock": {"Ref": "PublicSubnet2Cidr"},
"AvailabilityZone": {"Ref": "AZ2"},
"MapPublicIpOnLaunch": true,
"Tags": [{"Key": "Name", "Value": "sigz-pub-2"}]
}
},
"PrivateSubnet1": {
"Type": "AWS::EC2::Subnet",
"Properties": {
"VpcId": {"Ref": "Vpc"},
"CidrBlock": {"Ref": "PrivateSubnet1Cidr"},
"AvailabilityZone": {"Ref": "AZ1"},
"Tags": [{"Key": "Name", "Value": "sigz-priv-1"}]
}
},
"PrivateSubnet2": {
"Type": "AWS::EC2::Subnet",
"Properties": {
"VpcId": {"Ref": "Vpc"},
"CidrBlock": {"Ref": "PrivateSubnet2Cidr"},
"AvailabilityZone": {"Ref": "AZ2"},
"Tags": [{"Key": "Name", "Value": "sigz-priv-2"}]
}
},
"EIPNat": {"Type": "AWS::EC2::EIP", "Properties": {"Domain": "vpc"}},
"NatGateway": {
"Type": "AWS::EC2::NatGateway",
"Properties": {
"SubnetId": {"Ref": "PublicSubnet1"},
"AllocationId": {"Fn::GetAtt": ["EIPNat", "AllocationId"]},
"Tags": [{"Key": "Name", "Value": "sigz-nat"}]
}
},
"PublicRouteTable": {
"Type": "AWS::EC2::RouteTable",
"Properties": {"VpcId": {"Ref": "Vpc"}}
},
"PublicRoute": {
"Type": "AWS::EC2::Route",
"DependsOn": "VPCGatewayAttachment",
"Properties": {
"RouteTableId": {"Ref": "PublicRouteTable"},
"DestinationCidrBlock": "0.0.0.0/0",
"GatewayId": {"Ref": "InternetGateway"}
}
},
"PrivateRouteTable": {"Type": "AWS::EC2::RouteTable", "Properties": {"VpcId": {"Ref": "Vpc"}}},
"PrivateRoute": {
"Type": "AWS::EC2::Route",
"Properties": {
"RouteTableId": {"Ref": "PrivateRouteTable"},
"DestinationCidrBlock": "0.0.0.0/0",
"NatGatewayId": {"Ref": "NatGateway"}
}
},
"PubAssoc1": {
"Type": "AWS::EC2::SubnetRouteTableAssociation",
"Properties": {"SubnetId": {"Ref": "PublicSubnet1"}, "RouteTableId": {"Ref": "PublicRouteTable"}}
},
"PubAssoc2": {
"Type": "AWS::EC2::SubnetRouteTableAssociation",
"Properties": {"SubnetId": {"Ref": "PublicSubnet2"}, "RouteTableId": {"Ref": "PublicRouteTable"}}
},
"PrivAssoc1": {
"Type": "AWS::EC2::SubnetRouteTableAssociation",
"Properties": {"SubnetId": {"Ref": "PrivateSubnet1"}, "RouteTableId": {"Ref": "PrivateRouteTable"}}
},
"PrivAssoc2": {
"Type": "AWS::EC2::SubnetRouteTableAssociation",
"Properties": {"SubnetId": {"Ref": "PrivateSubnet2"}, "RouteTableId": {"Ref": "PrivateRouteTable"}}
}
},
"Outputs": {
"VpcId": {"Value": {"Ref": "Vpc"}},
"PublicSubnetIds": {"Value": {"Fn::Join": [",", [{"Ref": "PublicSubnet1"}, {"Ref": "PublicSubnet2"}]]}},
"PrivateSubnetIds": {"Value": {"Fn::Join": [",", [{"Ref": "PrivateSubnet1"}, {"Ref": "PrivateSubnet2"}]]}}
}
}
2.2 Create the Stack
aws cloudformation deploy \
--template-file sigz-network.json \
--stack-name sigz-network \
--parameter-overrides \
VpcCidr=<VPC_CIDR> \
PublicSubnet1Cidr=<PUB1_CIDR> PublicSubnet2Cidr=<PUB2_CIDR> \
PrivateSubnet1Cidr=<PRIV1_CIDR> PrivateSubnet2Cidr=<PRIV2_CIDR> \
AZ1=<AZ1> AZ2=<AZ2> \
--capabilities CAPABILITY_NAMED_IAM
Note on EBS traffic Private subnets route outbound traffic through the NAT Gateway. Estimate ~5 GB/mo per ClickHouse backup when sizing NAT data‑processing costs.
3 IAM Roles & Policies
3.1 Trust Policy
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Service": "ecs-tasks.amazonaws.com"},
"Action": "sts:AssumeRole"
}]
}
3.2 Execution Role (CLI)
aws iam create-role --role-name <SigNozECSTaskExecutionRole> --assume-role-policy-document file://ecs-trust.json
aws iam attach-role-policy --role-name <SigNozECSTaskExecutionRole> --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
3.3 Task Role (least privilege)
aws iam create-role --role-name <SigNozTaskRole> --assume-role-policy-document file://ecs-trust.json
# Add permissions only if SigNoz needs them.
Console path IAM → Roles → Create role → Trusted entity = ECS Task
→ attach policies.
4 Launch Template, ASG, and Capacity Provider
4.1 Launch Template
Include User‑data that installs the ECS agent and joins the cluster.
aws ec2 create-launch-template \
--launch-template-name <SigNozLaunchTemplate> \
--version-description initial \
--launch-template-data '{
"ImageId": "<AMI_ID>",
"InstanceType": "t3.large",
"IamInstanceProfile": {"Arn": "arn:aws:iam::<ACCOUNT_ID>:instance-profile/<ECSInstanceProfile>"},
"UserData": "<BASE64-ENCODED-USERDATA>"
}'
(Console: EC2 → Launch templates → Create launch template.)
4.2 Auto Scaling Group (ASG)
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name <SigNozASG> \
--launch-template LaunchTemplateName=<SigNozLaunchTemplate>,Version='$Latest' \
--vpc-zone-identifier "<PrivateSubnet1>,<PrivateSubnet2>" \
--desired-capacity 2 --min-size 2 --max-size 6
4.3 Capacity Provider
aws ecs create-capacity-provider \
--name <SigNozCapProvider> \
--auto-scaling-group-provider '{"autoScalingGroupArn":"<ASG_ARN>","managedScaling":{"status":"ENABLED","targetCapacity":80}}'
aws ecs put-cluster-capacity-providers \
--cluster <ECS_CLUSTER> \
--capacity-providers <SigNozCapProvider> \
--default-capacity-provider-strategy capacityProvider=<SigNozCapProvider>,weight=1,base=0
(Console: ECS → Clusters → Capacity providers.)
5 Task Definitions
Below is a single, all-in-one ECS task definition JSON that includes every SigNoz component in one task:
- init-clickhouse
- clickhouse
- zookeeper-1
- SigNoz
- Otel-collector
- Schema-migrator-sync and schema-migrator-async.
Each section below shows one containerDefinition
snippet. Replace all <…>
placeholders with your own values.
1. init-clickhouse
{
"name": "init-clickhouse",
"image": "clickhouse/clickhouse-server:24.1.2-alpine",
"cpu": 256,
"memory": 256,
"memoryReservation": 256,
"essential": false,
"entryPoint": ["sh", "-c"],
"command": [
"version=\"v0.0.1\" && node_os=$(uname -s | tr '[:upper:]' '[:lower:]') && node_arch=$(uname -m | sed s/aarch64/arm64/ | sed s/x86_64/amd64/) && \\
echo \"Fetching histogram-binary for ${node_os}/${node_arch}\" && cd /tmp && \\
wget -O histogram-quantile.tar.gz \"https://github.com/SigNoz/signoz/releases/download/histogram-quantile%2F${version}/histogram-quantile_${node_os}_${node_arch}.tar.gz\" && \\
tar -xvzf histogram-quantile.tar.gz && mv histogram-quantile /var/lib/clickhouse/user_scripts/histogramQuantile"
],
"mountPoints": [
{
"sourceVolume": "signoz-clickhouse-zookeeper",
"containerPath": "/var/lib/clickhouse/user_scripts"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/aws/ecs/<LOG_GROUP>",
"awslogs-region": "<AWS_REGION>",
"awslogs-stream-prefix": "init-clickhouse"
}
}
}
2. config-fetcher
{
"name": "config-fetcher",
"image": "amazon/aws-cli:2.27.32",
"cpu": 10,
"memory": 102,
"memoryReservation": 102,
"essential": false,
"entryPoint": ["aws"],
"command": [
"s3", "cp",
"s3://<YOUR_CONFIG_BUCKET>/common/clickhouse/config.xml",
"/etc/clickhouse-server/config.xml"
],
"mountPoints": [
{
"sourceVolume": "signoz-clickhouse-zookeeper",
"containerPath": "/etc/clickhouse-server"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/aws/ecs/<LOG_GROUP>",
"awslogs-region": "<AWS_REGION>",
"awslogs-stream-prefix": "config-fetcher"
}
}
}
3. clickhouse
{
"name": "clickhouse",
"image": "clickhouse/clickhouse-server:24.1.2-alpine",
"cpu": 1024,
"memory": 512,
"memoryReservation": 512,
"essential": true,
"portMappings": [
{ "containerPort": 9000, "hostPort": 9000, "protocol": "tcp" },
{ "containerPort": 8123, "hostPort": 8123, "protocol": "tcp" },
{ "containerPort": 9363, "hostPort": 9363, "protocol": "tcp" }
],
"dependsOn": [
{ "containerName": "init-clickhouse", "condition": "COMPLETE" },
{ "containerName": "zookeeper-1", "condition": "HEALTHY" },
{ "containerName": "config-fetcher", "condition": "COMPLETE" }
],
"mountPoints": [
{
"sourceVolume": "signoz-clickhouse-zookeeper",
"containerPath": "/var/lib/clickhouse"
},
{
"sourceVolume": "signoz-clickhouse-zookeeper",
"containerPath": "/etc/clickhouse-server"
}
],
"healthCheck": {
"command": ["CMD-SHELL", "wget --spider -q 0.0.0.0:8123/ping || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 30
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/aws/ecs/<LOG_GROUP>",
"awslogs-region": "<AWS_REGION>",
"awslogs-stream-prefix": "clickhouse"
}
}
}
4. Zookeeper
{
"name": "zookeeper-1",
"image": "bitnami/zookeeper:3.7.1",
"cpu": 512,
"memory": 512,
"memoryReservation": 512,
"essential": true,
"portMappings": [
{ "containerPort": 2181, "hostPort": 2181, "protocol": "tcp" },
{ "containerPort": 2888, "hostPort": 2888, "protocol": "tcp" },
{ "containerPort": 3888, "hostPort": 3888, "protocol": "tcp" },
{ "containerPort": 9141, "hostPort": 9141, "protocol": "tcp" }
],
"environment": [
{ "name": "ALLOW_ANONYMOUS_LOGIN", "value": "yes" },
{ "name": "ZOO_SERVER_ID", "value": "1" },
{ "name": "ZOO_ENABLE_PROMETHEUS_METRICS","value": "yes" },
{ "name": "ZOO_AUTOPURGE_INTERVAL", "value": "1" },
{ "name": "ZOO_PROMETHEUS_METRICS_PORT_NUMBER","value":"9141"}
],
"healthCheck": {
"command": ["CMD-SHELL","curl -s -m 2 http://localhost:8080/commands/ruok | grep error | grep null"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 30
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/aws/ecs/<LOG_GROUP>",
"awslogs-region": "<AWS_REGION>",
"awslogs-stream-prefix": "zookeeper"
}
}
}
5. signoz
{
"name": "signoz",
"image": "signoz/signoz:<VERSION>",
"cpu": 512,
"memory": 512,
"memoryReservation": 256,
"essential": true,
"portMappings": [
{ "containerPort": 8080, "hostPort": 8080, "protocol": "tcp" }
],
"command": ["--config=/root/config/prometheus.yml"],
"environment": [
{ "name": "SIGNOZ_ALERTMANAGER_PROVIDER", "value": "signoz" },
{ "name": "SIGNOZ_TELEMETRYSTORE_CLICKHOUSE_DSN","value": "tcp://clickhouse:9000" },
{ "name": "SIGNOZ_SQLSTORE_SQLITE_PATH", "value": "/var/lib/signoz/signoz.db" },
{ "name": "STORAGE", "value": "clickhouse" },
{ "name": "TELEMETRY_ENABLED", "value": "true" },
],
"mountPoints": [
{ "sourceVolume": "signoz-config", "containerPath": "/root/config" },
{ "sourceVolume": "signoz-sqlite", "containerPath": "/var/lib/signoz" }
],
"healthCheck": {
"command": ["CMD","wget","--spider","-q","localhost:8080/api/v1/health"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 0
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/aws/ecs/<LOG_GROUP>",
"awslogs-region": "<AWS_REGION>",
"awslogs-stream-prefix": "signoz"
}
}
}
6. otel-collector
{
"name": "otel-collector",
"image": "signoz/signoz-otel-collector:<OTELCOL_TAG>",
"cpu": 512,
"memory": 512,
"memoryReservation": 256,
"essential": false,
"portMappings": [
{ "containerPort": 4317, "hostPort": 4317, "protocol": "tcp" },
{ "containerPort": 4318, "hostPort": 4318, "protocol": "tcp" }
],
"command": [
"--config=/etc/otel-collector-config.yaml",
"--manager-config=/etc/manager-config.yaml",
"--copy-path=/var/tmp/collector-config.yaml",
"--feature-gates=-pkg.translator.prometheus.NormalizeName"
],
"dependsOn": [
{ "containerName": "signoz", "condition": "HEALTHY" }
],
"mountPoints": [
{ "sourceVolume": "otel-config", "containerPath": "/etc" }
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/aws/ecs/<LOG_GROUP>",
"awslogs-region": "<AWS_REGION>",
"awslogs-stream-prefix": "otel-collector"
}
}
}
7. schema-migrator-sync
{
"name": "schema-migrator-sync",
"image": "signoz/signoz-schema-migrator:<OTELCOL_TAG>",
"cpu": 128,
"memory": 256,
"memoryReservation": 128,
"essential": false,
"command": ["sync", "--dsn=tcp://clickhouse:9000", "--up="],
"dependsOn": [
{ "containerName": "clickhouse", "condition": "HEALTHY" }
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/aws/ecs/<LOG_GROUP>",
"awslogs-region": "<AWS_REGION>",
"awslogs-stream-prefix": "schema-migrator-sync"
}
}
}
8. schema-migrator-async
{
"name": "schema-migrator-async",
"image": "signoz/signoz-schema-migrator:<OTELCOL_TAG>",
"cpu": 128,
"memory": 256,
"memoryReservation": 128,
"essential": false,
"command": ["async", "--dsn=tcp://clickhouse:9000", "--up="],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/aws/ecs/<LOG_GROUP>",
"awslogs-region": "<AWS_REGION>",
"awslogs-stream-prefix": "schema-migrator-async"
}
}
}
9. Volumes
"volumes": [
{
"name": "signoz-clickhouse-zookeeper",
"configuredAtLaunch": true
}
],
10. General Task Definitions config
"taskDefinition": {
"family": "signoz-full-stack",
"taskRoleArn": "arn:aws:iam::<AWS_ACCOUNT_ID>:role/ecsTaskRole",
"executionRoleArn": "arn:aws:iam::<AWS_ACCOUNT_ID>:role/ecsTaskExecutionRole",
"networkMode": "awsvpc",
"requiresCompatibilities": ["EC2"],
"runtimePlatform": {
"cpuArchitecture": "X86_64",
"operatingSystemFamily": "LINUX"
},
"cpu": "1843",
"memory": "1536",
"containerDefinitions": {...}",
"volumes": {...},
}
6. Register the Task Definitions
aws ecs register-task-definition --cli-input-json file://clickhouse-zookeeper.json
# repeat for other Task Definitions
7 Create ECS Services
Create one service per component, using the capacity provider strategy.
aws ecs create-service \
--cluster <ECS_CLUSTER> \
--service-name signoz-clickhouse-svc \
--task-definition signoz-clickhouse-zookeeper:<REV> \
--desired-count 1 \
--capacity-provider-strategy capacityProvider=<SigNozCapProvider>,weight=1,base=0 \
--network-configuration "awsvpcConfiguration={subnets=[<PrivateSubnet1>,<PrivateSubnet2>],securityGroups=[<SG_ID>],assignPublicIp=DISABLED}" \
--enable-execute-command \
--deployment-circuit-breaker enable=true,rollback=true
8 Scaling & Capacity
Layer | Metric | Action |
---|---|---|
Capacity Provider (ASG) | Avg CPU > 70 % | scale‑out by +1 EC2 instance |
ECS Service | CPUUtilization / MemoryUtilization | adjust desiredCount (min 1 → max N) |
ClickHouse EBS | IOPS / throughput saturation | raise gp3 IOPS or volume size |
Managed Scaling on the capacity provider handles most scale‑out events automatically.
Cleanup Commands (CLI)
# Scale down all services
aws ecs update-service --cluster <ECS_CLUSTER> --service <SERVICE> --desired-count 0
# Delete services (after tasks stop)
aws ecs delete-service --cluster <ECS_CLUSTER> --service <SERVICE> --force
# Optionally delete capacity provider, cluster, ASG, and network stack