Scaling Workflow
Kubernetes scaling strategies including manual scaling, Horizontal Pod Autoscaling (HPA), and Vertical Pod Autoscaling (VPA).
When to Use
- Scaling applications based on load
- Setting up autoscaling policies
- Optimizing resource utilization
- Handling traffic spikes
Manual Scaling
Scale Deployment
# Scale to specific number of replicas
kubectl scale deployment myapp --replicas=5
# Scale multiple deployments
kubectl scale deployment myapp myapp2 --replicas=3
# Scale with label selector
kubectl scale deployment -l app=myapp --replicas=3
# Scale replica set
kubectl scale rs myapp-rs --replicas=5
# Scale stateful set
kubectl scale statefulset postgres --replicas=3
Verify Scaling
# Check deployment status
kubectl get deployment myapp
# Watch scaling progress
kubectl get pods -l app=myapp -w
# Check replica set
kubectl get rs -l app=myapp
Horizontal Pod Autoscaler (HPA)
Prerequisites
Install metrics-server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Verify metrics-server:
kubectl get deployment metrics-server -n kube-system
kubectl top nodes
kubectl top pods
Create HPA (CPU-based)
Imperative:
# Create HPA based on CPU utilization
kubectl autoscale deployment myapp --cpu-percent=70 --min=2 --max=10
# View HPA
kubectl get hpa myapp
# Describe HPA
kubectl describe hpa myapp
Declarative:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
HPA with Multiple Metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 20
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 4
periodSeconds: 30
selectPolicy: Max
metrics:
# CPU utilization
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Memory utilization
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
# Custom metric (requests per second)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
HPA Based on Custom Metrics
Prometheus adapter example:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa-custom
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
- type: Pods
pods:
metric:
name: queue_length
target:
type: AverageValue
averageValue: "30"
Monitor HPA
# Get HPA status
kubectl get hpa
# Watch HPA in real-time
kubectl get hpa -w
# Describe HPA for details
kubectl describe hpa myapp-hpa
# View events
kubectl get events --sort-by=.metadata.creationTimestamp | grep HorizontalPodAutoscaler
# Check current metrics
kubectl top pods -l app=myapp
Test HPA
Load testing:
# Generate load with kubectl run
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://myapp; done"
# Watch HPA scale up
kubectl get hpa myapp-hpa -w
# Watch pods being created
kubectl get pods -l app=myapp -w
Delete HPA
# Delete HPA
kubectl delete hpa myapp-hpa
# Deployment will maintain last replica count
kubectl scale deployment myapp --replicas=3
Vertical Pod Autoscaler (VPA)
Install VPA
# Clone VPA repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
# Install VPA
./hack/vpa-up.sh
# Verify installation
kubectl get deployment -n kube-system | grep vpa
Create VPA
Recommendation mode (does not modify pods):
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Off" # Recommendation only
Auto mode (updates pods):
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: myapp
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
controlledResources: ["cpu", "memory"]
Initial mode (sets resources on pod creation only):
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Initial"
View VPA Recommendations
# Get VPA status
kubectl get vpa myapp-vpa
# Describe VPA for recommendations
kubectl describe vpa myapp-vpa
# View recommendations in YAML
kubectl get vpa myapp-vpa -o yaml
Example output:
recommendation:
containerRecommendations:
- containerName: myapp
lowerBound:
cpu: 150m
memory: 256Mi
target:
cpu: 200m
memory: 512Mi
uncappedTarget:
cpu: 250m
memory: 768Mi
upperBound:
cpu: 400m
memory: 1Gi
Cluster Autoscaler
Cloud Provider Setup
AWS (EKS):
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false
GKE (automatically enabled):
# Enable cluster autoscaler
gcloud container clusters update my-cluster \
--enable-autoscaling \
--min-nodes=1 \
--max-nodes=10 \
--zone=us-central1-a
Pod Disruption Budget (PDB)
Protect pods during scaling:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: myapp-pdb
namespace: production
spec:
minAvailable: 2 # Or use maxUnavailable: 1
selector:
matchLabels:
app: myapp
With percentage:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: myapp-pdb
namespace: production
spec:
maxUnavailable: 25%
selector:
matchLabels:
app: myapp
Scaling Best Practices
Resource Requests and Limits
Required for HPA:
resources:
requests:
cpu: 100m # HPA uses this for percentage calculation
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
HPA Configuration
Recommended settings:
minReplicas: 2+ for high availabilitymaxReplicas: 10-20x minReplicas- CPU target: 60-80% utilization
- Memory target: 70-90% utilization
- Scale-down stabilization: 300 seconds (5 minutes)
- Scale-up stabilization: 0-60 seconds
Scaling Behavior
Conservative scaling:
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 minutes before scaling down
policies:
- type: Percent
value: 25 # Scale down by max 25% at a time
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2 # Add max 2 pods at a time
periodSeconds: 30
Aggressive scaling:
behavior:
scaleDown:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 30
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100 # Double pods each time
periodSeconds: 15
Anti-Patterns to Avoid
- No resource requests - HPA cannot calculate metrics
- Too aggressive scaling - Causes thrashing
- VPA + HPA on same metric - Conflicts
- No PodDisruptionBudget - Unsafe during scaling
- Single replica minimum - No high availability
- No monitoring - Cannot tune scaling
Scaling Strategies
Predictive Scaling
Schedule-based autoscaling (CronJob pattern):
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-up-morning
namespace: production
spec:
schedule: "0 8 * * 1-5" # 8 AM Monday-Friday
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler
restartPolicy: OnFailure
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- kubectl
- scale
- deployment/myapp
- --replicas=10
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down-evening
namespace: production
spec:
schedule: "0 18 * * 1-5" # 6 PM Monday-Friday
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler
restartPolicy: OnFailure
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- kubectl
- scale
- deployment/myapp
- --replicas=3
KEDA (Kubernetes Event-Driven Autoscaling)
Scale based on external metrics:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: myapp-scaledobject
namespace: production
spec:
scaleTargetRef:
name: myapp
minReplicaCount: 2
maxReplicaCount: 20
triggers:
# Scale based on Prometheus metric
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_requests_per_second
threshold: '1000'
query: sum(rate(http_requests_total[2m]))
# Scale based on message queue
- type: rabbitmq
metadata:
queueName: tasks
queueLength: '50'
connectionFromEnv: RABBITMQ_URL
Monitoring Scaling
Metrics to Track
# Current resource usage
kubectl top pods -l app=myapp
kubectl top nodes
# HPA status
kubectl get hpa myapp-hpa
# Replica count over time
kubectl get deployment myapp -w
# Events
kubectl get events --sort-by=.metadata.creationTimestamp | grep -i scale
Prometheus Queries
Pod count:
kube_deployment_status_replicas{deployment="myapp"}
CPU utilization:
rate(container_cpu_usage_seconds_total{pod=~"myapp-.*"}[5m])
Memory usage:
container_memory_usage_bytes{pod=~"myapp-.*"}
HPA target metric:
kube_horizontalpodautoscaler_status_current_replicas{horizontalpodautoscaler="myapp-hpa"}
kube_horizontalpodautoscaler_status_desired_replicas{horizontalpodautoscaler="myapp-hpa"}
Troubleshooting Scaling
HPA Not Scaling
Check metrics availability:
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods
kubectl top nodes
kubectl top pods -l app=myapp
Check resource requests:
kubectl get deployment myapp -o yaml | grep -A 5 resources
Check HPA status:
kubectl describe hpa myapp-hpa
# Look for:
# - Unable to get metrics
# - Failed to compute desired number of replicas
# - Resource requests not set
VPA Not Working
Check VPA installation:
kubectl get deployment -n kube-system | grep vpa
Check VPA status:
kubectl describe vpa myapp-vpa
Cluster Autoscaler Issues
Check logs:
kubectl logs -n kube-system -l app=cluster-autoscaler
Check node status:
kubectl get nodes
kubectl describe nodes
Scaling Checklist
- Resource requests and limits defined
- Metrics-server installed and working
- HPA configured with appropriate thresholds
- MinReplicas >= 2 for high availability
- PodDisruptionBudget configured
- Scaling behavior tuned for workload
- Monitoring and alerting set up
- Load testing performed
- Cost implications considered
- Documentation updated