Kubernetes Cost Optimization: 10 Proven Strategies to Cut Your Cloud Bill

Kubernetes has become the de facto standard for container orchestration, powering production workloads at companies of every size. But with great power comes great cloud bills. Without deliberate optimization, a typical Kubernetes cluster can waste 30–50% of its allocated cloud spend on idle resources, over-provisioned workloads, and inefficient configurations.

Whether you're a DevOps engineer looking to trim infrastructure costs or an engineering manager tasked with reducing the monthly cloud invoice, this guide walks you through 10 proven, production-tested strategies for Kubernetes cost optimization — complete with YAML configs, kubectl commands, and a real-world case study that saved $5,000 per month.

Strategy 1: Right-Size Resource Requests and Limits

The single biggest source of Kubernetes waste is misconfigured resource requests and limits. When teams copy-paste default values or leave them unset entirely, the scheduler either over-allocates nodes or lets pods consume unbounded resources.

Why It Matters

Kubernetes uses requests for scheduling and limits for enforcement. If your request is too high, you pay for capacity you never use. If it's too low, your pods get throttled or evicted. The sweet spot is matching requests to actual usage with a small safety margin.

How to Do It

Step 1 — Measure actual usage with kubectl top:

# Check pod-level resource usage
kubectl top pods --all-namespaces --sort-by=cpu

# Check node-level utilization
kubectl top nodes

# Get detailed usage for a specific namespace
kubectl top pods -n production --containers

Step 2 — Right-size your pod specs:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
        - name: api
          image: myregistry/api-server:v2.4.1
          resources:
            requests:
              cpu: "250m"    # 25% of a vCPU — based on p95 usage
              memory: "512Mi"
            limits:
              cpu: "500m"    # 2x request for burst capacity
              memory: "1Gi"

Step 3 — Enforce with LimitRanges and ResourceQuotas:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
    - default:
        cpu: "500m"
        memory: "1Gi"
      defaultRequest:
        cpu: "100m"
        memory: "256Mi"
      max:
        cpu: "2"
        memory: "4Gi"
      type: Container
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    persistentvolumeclaims: "10"

Pro Tip: Use tools like KRR (Kubernetes Resource Recommender) or Goldilocks to automatically analyze historical usage and recommend right-sized values.

Strategy 2: Horizontal Pod Autoscaler (HPA)

Workloads rarely need the same number of replicas at all hours. HPA automatically scales the number of pods based on observed CPU, memory, or custom metrics — so you only run what you need.

Implementation

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 30

Key configuration points:

averageUtilization: 65 — Scale up when average CPU exceeds 65%. Adjust based on your latency tolerance.
stabilizationWindowSeconds: 300 for scale-down — Prevents flapping by requiring 5 minutes of low utilization before removing pods.
maxReplicas: 20 — Sets a hard ceiling to prevent runaway scaling.

Verify HPA Status

# Check HPA status and current metrics
kubectl get hpa -n production

# Watch HPA in real-time
kubectl get hpa api-server-hpa -n production -w

# Describe for detailed scaling events
kubectl describe hpa api-server-hpa -n production

Pro Tip: For custom metrics (requests per second, queue depth), integrate the Kubernetes Event-Driven Autoscaling (KEDA) operator, which supports over 60 data sources including Kafka, Redis, Prometheus, and AWS CloudWatch.

Strategy 3: Vertical Pod Autoscaler (VPA)

Not all workloads scale horizontally. Stateful services like databases, caches, and message brokers often benefit from vertical scaling — adjusting CPU and memory up or down based on usage.

Install VPA

# Clone the autoscaler repo
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

# Install VPA components
./vpa-up.sh

# Verify installation
kubectl get pods -n kube-system | grep vpa

Configure VPA

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: postgres-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: postgres
  updatePolicy:
    updateMode: "Auto"  # Automatically applies recommendations
  resourcePolicy:
    containerPolicies:
      - containerName: "*"
        minAllowed:
          cpu: 100m
          memory: 256Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]

Warning: VPA in Auto mode will evict and restart pods to apply new resource settings. Use Off (recommendation-only) mode for sensitive workloads and review suggestions manually.

Check VPA Recommendations

# View VPA recommendations
kubectl describe vpa postgres-vpa -n production | grep -A 20 Recommendation

HPA + VPA: Use Them Together Carefully

You cannot run HPA and VPA on the same metric for the same resource. However, you can use them together for different dimensions:

HPA on CPU → scales replica count
VPA on memory → adjusts memory per pod

This gives you elastic capacity for both throughput and memory pressure.

Strategy 4: Cluster Autoscaler

Pod-level autoscaling only helps if your cluster has capacity. The Cluster Autoscaler (CA) automatically adds or removes nodes when pods are pending or nodes are underutilized.

Install on AWS EKS

# Download the Cluster Autoscaler manifest for your K8s version
curl -sO https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovqq/aws/cluster-autoscaler-autodiscover.yaml

# Set your cluster name
sed -i 's/<YOUR_CLUSTER_NAME>/my-prod-cluster/g' cluster-autoscaler-autodiscover.yaml

# Apply
kubectl apply -f cluster-autoscaler-autodiscover.yaml

Key Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.30.1
          name: cluster-autoscaler
          command:
            - ./cluster-autoscaler
            - --balance-similar-node-groups
            - --expander=priority
            - --scale-down-unneeded-time=10m
            - --scale-down-delay-after-add=10m
            - --max-node-provision-time=15m
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-prod-cluster

Key flags for cost optimization:

--scale-down-unneeded-time=10m — Nodes are removed after 10 minutes of being empty (default is 30m). Lower this for aggressive savings.
--scale-down-delay-after-add=10m — Prevents oscillation after scaling up.
--expander=priority — Prefer Spot instance node groups over on-demand when scaling.

Verify Cluster Autoscaler

# Check CA logs
kubectl logs -f deployment/cluster-autoscaler -n kube-system

# Monitor node scaling events
kubectl get events --field-selector reason=ScaleDown -A -w

Strategy 5: Leverage Spot/Preemptible Instances

Spot instances (AWS), preemptible VMs (GCP), and spot VMs (Azure) offer 60–90% discounts on compute capacity in exchange for possible interruption with short notice. For stateless, fault-tolerant workloads, this is the single highest-impact cost optimization.

Node Group Architecture

Design your cluster with a mixed node group strategy:

# Production node group topology
# ┌──────────────────────────────────────┐
# │  On-Demand Node Group (base capacity) │
# │  min: 2, max: 5, type: m5.large      │
# │  Critical workloads only             │
# ├──────────────────────────────────────┤
# │  Spot Node Group A (compute)          │
# │  min: 1, max: 20, types: diverse     │
# │  Stateless web/API workloads         │
# ├──────────────────────────────────────┤
# │  Spot Node Group B (memory)           │
# │  min: 0, max: 10, types: diverse     │
# │  Batch jobs, workers                 │
# └──────────────────────────────────────┘

Pod-Level Spot Toleration and Node Affinity

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-frontend
  namespace: production
spec:
  replicas: 6
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubectl.kubernetes.io/instance-type
                    operator: In
                    values:
                      - spot
      tolerations:
        - key: "spot-instance"
          operator: "Exists"
          effect: "NoSchedule"
      containers:
        - name: web
          image: myregistry/web-frontend:v1.8.0
          resources:
            requests:
              cpu: "200m"
              memory: "512Mi"
          # Enable graceful shutdown on Spot interruption
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "nginx -s quit && sleep 10"]

Install the Node Termination Handler

For AWS, install AWS Node Termination Handler to gracefully drain nodes before Spot interruption:

helm repo add eks https://aws.github.io/eks-charts
helm install node-termination-handler eks/aws-node-termination-handler \
  --namespace kube-system \
  --set nodeSelector.kubectl\\.kubernetes\\.io/instance-type=spot \
  --set awsRegion=us-east-1

Cost Impact: A medium-sized cluster running 70% Spot instances can reduce the total compute bill by 50% or more compared to running entirely on-demand.

Strategy 6: Storage Optimization

Storage costs are the hidden budget killer in Kubernetes. Misconfigured PersistentVolumeClaims (PVCs), orphaned volumes, and over-provisioned storage classes silently drain your cloud budget month after month.

Audit Your Storage

# List all PVCs across namespaces
kubectl get pvc --all-namespaces -o wide

# Find PVCs not attached to any pod (orphaned)
kubectl get pvc --all-namespaces -o json | \
  jq -r '.items[] | select(.status.phase=="Bound") | select(.metadata.name as $name | (.status.accessModes | length) > 0) | "\(.metadata.namespace)/\(.metadata.name) \(.spec.resources.requests.storage)"'

# List storage classes
kubectl get storageclass

Use Tiered Storage Classes

# Fast SSD for databases (higher cost)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  throughput: "300"
  iops: "3000"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# Standard for dev/test workloads (lower cost)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard-hdd
provisioner: ebs.csi.aws.com
parameters:
  type: st1   # Throughput-optimized HDD — 50% cheaper than gp3
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Automated PVC Cleanup

apiVersion: apps/v1
kind: CronJob
metadata:
  name: pvc-cleaner
  namespace: kube-system
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: pvc-cleaner-sa
          containers:
            - name: kubectl
              image: bitnami/kubectl:1.30
              command:
                - /bin/bash
                - -c
                - |
                  echo "Scanning for unattached PVCs..."
                  for pvc in $(kubectl get pvc -A -o jsonpath='{range .items[*]}{.metadata.namespace}{"/"}{.metadata.name}{" "}{end}'); do
                    ns=$(echo $pvc | cut -d/ -f1)
                    name=$(echo $pvc | cut -d/ -f2)
                    # Check if any pod references this PVC
                    if ! kubectl get pods -n $ns -o json | jq -e ".items[].spec.volumes[]? | select(.persistentVolumeClaim.claimName==\"$name\")" > /dev/null 2>&1; then
                      echo "Orphaned PVC: $ns/$name"
                      # kubectl delete pvc -n $ns $name  # Uncomment to auto-delete
                    fi
                  done
          restartPolicy: OnFailure

Right-Size Volumes

# Check actual disk usage vs provisioned capacity
for pvc in $(kubectl get pvc -A -o jsonpath='{range .items[*]}{.metadata.namespace}{"/"}{.metadata.name}{" "}{end}'); do
  ns=$(echo $pvc | cut -d/ -f1)
  name=$(echo $pvc | cut -d/ -f2)
  capacity=$(kubectl get pvc -n $ns $name -o jsonpath='{.status.capacity.storage}')
  echo "$ns/$name — Provisioned: $capacity"
done

Shrink oversized volumes and delete unused ones. A cluster with 50 PVCs at 100GB each where only 15GB is used is wasting 4.25TB of paid storage.

Strategy 7: Reduce Network Egress Costs

Cloud providers charge for data leaving their network (egress). In a microservices architecture, poorly designed service-to-service communication can generate significant cross-AZ or cross-region traffic charges.

Best Practices

1. Keep traffic in-cluster with ClusterIP services:

apiVersion: v1
kind: Service
metadata:
  name: backend-svc
  namespace: production
spec:
  type: ClusterIP   # Default — keeps traffic internal
  selector:
    app: backend
  ports:
    - port: 8080
      targetPort: 8080

2. Use a service mesh for locality-aware routing:

Istio and Linkerd support locality-aware load balancing, which prioritizes sending traffic to endpoints in the same AZ to avoid cross-AZ data transfer fees.

# Istio DestinationRule with locality LB
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: backend-dr
  namespace: production
spec:
  host: backend-svc
  trafficPolicy:
    loadBalancer:
      localityLbSetting:
        enabled: true
        distribute:
          - from: "us-east-1a/*"
            to:
              "us-east-1a/*": 80
              "us-east-1b/*": 20
    outlierDetection:
      consecutive5xxErrors: 3
      interval: 30s
      baseEjectionTime: 60s

3. Co-locate services with Pod Anti-Affinity:

spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app: frontend
            topologyKey: "kubernetes.io/hostname"

4. Use a CDN for static assets to prevent traffic from hitting your cluster entirely.

Cost Impact: Cross-AZ data transfer costs ~$0.01–0.02 per GB on AWS and GCP. For high-traffic microservice clusters, locality-aware routing alone can save $500–$2,000/month.

Strategy 8: Implement Monitoring and Cost Visibility Tools

You can't optimize what you can't measure. A robust monitoring stack gives you the data to make informed cost decisions and catch waste before it compounds.

Recommended Tools Stack

| Tool | Purpose | Best For | |------|---------|----------| | KubeCost | Real-time cost allocation per namespace, label, deployment | Cost visibility | | Prometheus + Grafana | Infrastructure and application metrics | Performance monitoring | | Goldilocks | Automated resource recommendation | Right-sizing | | KRR | CLI-based resource recommendations | Quick audits | | Cloud Custodian | Policy-based resource governance | Compliance + cleanup |

Install KubeCost

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="..." \
  --set prometheus.server.persistentVolume.size=64Gi \
  --set persistentVolume.size=32Gi

# Port-forward to access the dashboard
kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090:9090

Key Prometheus Queries for Cost Monitoring

# CPU utilization vs request (efficiency)
sum(rate(container_cpu_usage_seconds_total{container!="POD"}[5m])) by (namespace)
/
sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace)

# Memory waste (allocated but unused)
sum(kube_pod_container_resource_requests{resource="memory"}) by (namespace)
-
sum(container_memory_working_set_bytes{container!="POD"}) by (namespace)

# Idle pods (near-zero CPU for 1 hour)
avg_over_time(rate(container_cpu_usage_seconds_total[1h])[1h:5m]) < 0.001

Set Up Alerting for Cost Anomalies

groups:
  - name: cost-alerts
    rules:
      - alert: HighCPUWaste
        expr: |
          (sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace)
          - sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace))
          / sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace)
          > 0.5
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "More than 50% CPU waste in {{ $labels.namespace }}"
          description: "Namespace {{ $labels.namespace }} has over 50% unused CPU requests."

Strategy 9: Adopt FinOps Practices

Technology alone won't fix your cloud bill. FinOps — the practice of bringing financial accountability to variable cloud spend — creates the organizational framework that makes cost optimization sustainable.

Build a FinOps Culture

1. Assign cost ownership with labels:

# Standardized labels required on all workloads
metadata:
  labels:
    app.kubernetes.io/name: payment-service
    app.kubernetes.io/team: payments-team
    finops.cost-center: "eng-platform"
    finops.environment: "production"
    finops.budget-owner: "jane.doe@company.com"

2. Enforce labeling with Kyverno policies:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-finops-labels
spec:
  validationFailureAction: enforce
  rules:
    - name: require-cost-center
      match:
        resources:
          kinds:
            - Pod
            - Deployment
            - StatefulSet
      validate:
        message: "All workloads must include the 'finops.cost-center' label."
        pattern:
          metadata:
            labels:
              finops.cost-center: "?*"

3. Set up cost dashboards and monthly reviews:

Create Grafana dashboards showing per-namespace cost trends
Hold monthly FinOps reviews with engineering teams
Set budget alerts at 80% and 100% of monthly allocation
Share a weekly "cost optimization scorecard" with team leads

4. Implement showback or chargeback:

Use KubeCost or native cloud billing to generate per-team cost reports. When teams see their own bill, behavior changes fast.

Pro Tip: Start with showback (visibility only) before implementing chargeback (actual budget deductions). Teams need time to build cost awareness before they're penalized for waste.

Strategy 10: Real-World Case Study — From $12K to $7K per Month

Let's walk through a real optimization engagement for a mid-stage SaaS company running a 25-node production cluster on AWS EKS. The goal: reduce the $12,000/month cloud bill by at least 30%.

The Starting Point

| Metric | Value | |--------|-------| | Monthly cloud bill | $12,000 | | Node count | 25 × m5.2xlarge (8 vCPU, 32 GB) | | Average CPU utilization | 18% | | Average memory utilization | 34% | | Spot instance ratio | 0% (all on-demand) | | Orphaned PVCs | 23 (totaling 2.3 TB) |

Phase 1: Audit and Right-Size (Week 1–2)

Actions taken:

Installed KubeCost and Goldilocks for cost visibility
Ran kubectl top analysis for 7 days to capture p95 usage patterns
Right-sized requests on 47 deployments — average CPU request reduced from 500m to 200m
Deleted 23 orphaned PVCs, saving 2.3 TB of gp2 storage
Resized 8 oversized PVCs from 500GB to 100GB

# Snapshot of the right-sizing script
for deploy in $(kubectl get deploy -n production -o name); do
  echo "Analyzing $deploy..."
  kubectl describe $deploy -n production | grep -A2 "Limits\|Requests"
  # Manual review + update via kubectl patch
done

Monthly savings: ~$1,800

Phase 2: Implement Autoscaling (Week 3–4)

Actions taken:

Deployed HPA on all web-tier and API-tier deployments
Configured VPA in recommendation mode for stateful workloads
Enabled Cluster Autoscaler with --scale-down-unneeded-time=10m
Set min replicas to 2 (down from static 6) for non-critical services

# Apply HPA across all production web services
for deploy in web-frontend api-gateway notification-service; do
  kubectl autoscale deployment $deploy -n production \
    --cpu-percent=65 \
    --min=2 \
    --max=15
done

Monthly savings: ~$1,500 (from nodes scaling down during off-peak hours)

Phase 3: Migrate to Spot Instances (Week 5–6)

Actions taken:

Created two Spot node groups with diverse instance types (m5.large, m5a.large, m5ad.large)
Configured podAntiAffinity and topologySpreadConstraints for high availability
Installed AWS Node Termination Handler for graceful Spot interruptions
Migrated 70% of workloads to Spot nodes, keeping critical services on on-demand

# topologySpreadConstraints for HA on Spot
topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: api-gateway

Monthly savings: ~$2,500 (Spot pricing at ~65% discount for migrated workloads)

Phase 4: Network and Storage Polish (Week 7–8)

Actions taken:

Enabled Istio locality-aware load balancing to reduce cross-AZ traffic
Switched dev/staging storage from gp3 to st1 (throughput HDD)
Implemented the PVC cleanup CronJob for automated orphan detection
Added a Cloudflare CDN in front of static assets to reduce egress

Monthly savings: ~$600

Results

| Metric | Before | After | Change | |--------|--------|-------|--------| | Monthly cloud bill | $12,000 | $5,600 | −53% | | Avg CPU utilization | 18% | 52% | +189% | | Avg memory utilization | 34% | 61% | +79% | | Spot instance ratio | 0% | 70% | — | | Orphaned PVCs | 23 | 0 | — |

Total monthly savings: $6,400 — exceeding the 30% target by a wide margin.

Key Lessons

Visibility comes first. KubeCost made the waste impossible to ignore.
Right-sizing is the fastest win. It took 2 weeks and saved 15%.
Spot instances are the biggest lever. They required more engineering effort but delivered the largest savings.
FinOps culture sustains gains. Monthly reviews prevented cost regression.

Conclusion

Kubernetes cost optimization is not a one-time activity — it's an ongoing practice that combines the right tools, the right configurations, and the right organizational culture. Here's a quick recap of the 10 strategies:

Right-size requests and limits — The fastest, highest-ROI change you can make today
Implement HPA — Scale horizontally with demand
Deploy VPA — Scale vertically for stateful workloads
Enable Cluster Autoscaler — Don't pay for empty nodes
Use Spot instances — The single biggest cost reducer
Optimize storage — Audit, tier, and clean up PVCs
Reduce network egress — Keep traffic local with service mesh routing
Install monitoring tools — KubeCost, Prometheus, Goldilocks
Adopt FinOps — Make cost everyone's responsibility
Apply all strategies together — Real savings come from compounding improvements

Start with Strategy 1 this week. Install KubeCost. Run kubectl top. The data will tell you exactly where your money is going — and where to start cutting.

Ready to optimize your Kubernetes costs? Begin with a cost audit today and implement these strategies incrementally. Your cloud bill — and your CFO — will thank you.

Have questions about implementing these strategies in your environment? Drop a comment below or reach out to the TechTrends Pro team for a personalized cost optimization assessment.