02. App Deployment Flow

Overview

This document describes the end-to-end deployment flows for applications managed through the Internal Developer Portal. All deployments follow GitOps principles and are orchestrated through Argo Workflows, with ArgoCD handling the actual Kubernetes deployments.

Key Principles

GitOps-First: All changes tracked in Git
Automated: Minimal manual intervention
Auditable: Complete history and traceability
Safe: Validation gates at every step
Reversible: Easy rollback capabilities
Multi-Cluster: Consistent deployment across regions

Deployment Strategies

Strategy Comparison

Strategy	Use Case	Traffic Split	Deployment Time	Risk Level	Rollback Speed
Standard	Non-critical updates, dev/test	N/A	Fast	Medium	Fast
Blue/Green	Major releases, critical services	0/100 → 100/0	Medium	Low	Instant
Canary	High-risk changes, gradual rollout	Progressive	Slow	Very Low	Fast
Rolling	Standard updates, stateless apps	Gradual	Medium	Medium	Medium

Decision Tree

Workflow Architecture

Component Interaction

Standard Deployment Process

Overview

Standard deployment is the default strategy for most applications. It builds the Docker image, pushes to registry, updates GitOps values, and lets ArgoCD sync the changes.

Process Flow

Workflow Template Structure

# deployment-standard.yaml
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: deployment-standard
  namespace: argo
spec:
  entrypoint: main

  arguments:
    parameters:
      - name: app-name
      - name: app-repo
      - name: gitops-repo
      - name: environment
      - name: version
      - name: image-registry

  templates:
    - name: main
      steps:
        - - name: validate
            template: validate-params

        - - name: build
            template: build-and-push

        - - name: update-gitops
            template: update-gitops-values

        - - name: wait-for-sync
            template: wait-argocd-sync

        - - name: health-check
            template: validate-deployment

    - name: validate-params
      script:
        image: alpine:latest
        command: [sh]
        source: |
          # Validate all required parameters
          echo "Validating deployment parameters..."
          # Check version format
          # Check permissions
          # Validate environment exists

    - name: build-and-push
      container:
        image: gcr.io/kaniko-project/executor:latest
        args:
          - --dockerfile=Dockerfile
          - --context=git://{{workflow.parameters.app-repo}}
          - --destination={{workflow.parameters.image-registry}}/{{workflow.parameters.app-name}}:{{workflow.parameters.version}}
          - --cache=true

    - name: update-gitops-values
      script:
        image: alpine/git:latest
        command: [sh]
        source: |
          # Clone GitOps repository
          git clone {{workflow.parameters.gitops-repo}}
          cd gitops

          # Update image tag in values file
          yq eval '.image.tag = "{{workflow.parameters.version}}"' \
            -i apps/{{workflow.parameters.app-name}}/{{workflow.parameters.environment}}/values.yaml

          # Commit and push
          git config user.name "Backstage IDP"
          git config user.email "[email protected]"
          git add .
          git commit -m "Deploy {{workflow.parameters.app-name}} {{workflow.parameters.version}} to {{workflow.parameters.environment}}"
          git push origin main

    - name: wait-argocd-sync
      script:
        image: argoproj/argocd:latest
        command: [sh]
        source: |
          argocd app wait {{workflow.parameters.app-name}}-{{workflow.parameters.environment}} \
            --timeout 600 \
            --health

    - name: validate-deployment
      script:
        image: bitnami/kubectl:latest
        command: [sh]
        source: |
          # Check pod status
          kubectl get pods -n {{workflow.parameters.environment}} \
            -l app={{workflow.parameters.app-name}}

          # Run smoke tests
          kubectl run smoke-test --rm -i --restart=Never \
            --image=curlimages/curl:latest \
            -- curl http://{{workflow.parameters.app-name}}.{{workflow.parameters.environment}}.svc.cluster.local/health

User Experience in Backstage

Navigate to Application: User selects their application from catalog
Deployment Tab: Click on "Deployments" tab
Trigger Deployment:
- Select environment (dev, staging, prod)
- Choose deployment strategy
- Optionally override image tag or commit SHA
Monitor Progress: Real-time workflow progress with logs
Validate Success: View deployment status and health checks

Blue/Green Deployment

Overview

Blue/Green deployment maintains two identical production environments. Traffic is switched instantly from the old version (Blue) to the new version (Green) once validated.

Architecture

Process Flow

Deployment Steps

Phase 1: Deploy Green Environment

# Step 1: Deploy new version to Green
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-green
spec:
  destination:
    namespace: production
    server: https://kubernetes.default.svc
  source:
    repoURL: https://git.company.com/gitops/myapp
    targetRevision: main
    path: overlays/production-green
    helm:
      values: |
        replicaCount: 3
        image:
          tag: v2.0.0
        service:
          name: myapp-green
        labels:
          slot: green
          version: v2.0.0

Phase 2: Validate Green Environment

# Automated validation tests
#!/bin/bash

GREEN_ENDPOINT="http://myapp-green.production.svc.cluster.local"

# Health check
curl -f $GREEN_ENDPOINT/health || exit 1

# Smoke tests
curl -f $GREEN_ENDPOINT/api/v1/status || exit 1

# Integration tests
kubectl run integration-test --rm -i --restart=Never \
  --image=company/test-runner:latest \
  -- pytest tests/integration --target=$GREEN_ENDPOINT

# Load test (optional)
kubectl run load-test --rm -i --restart=Never \
  --image=grafana/k6:latest \
  -- run - <load-test.js --env TARGET=$GREEN_ENDPOINT

Phase 3: Switch Traffic

# Update Ingress or VirtualService
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
    - myapp.company.com
  http:
    - match:
        - headers:
            x-version:
              exact: "preview"
      route:
        - destination:
            host: myapp-green
            port:
              number: 80
    - route:
        - destination:
            host: myapp-green  # Changed from myapp-blue
            port:
              number: 80
          weight: 100
        - destination:
            host: myapp-blue
            port:
              number: 80
          weight: 0

Phase 4: Cleanup (Optional)

After validation period (e.g., 24 hours):

Scale down Blue environment
Rename Green to Blue for next deployment
Document deployment completion

Workflow Template

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: deployment-blue-green
spec:
  entrypoint: main
  arguments:
    parameters:
      - name: app-name
      - name: version
      - name: environment
      - name: current-slot  # blue or green

  templates:
    - name: main
      steps:
        - - name: determine-target-slot
            template: get-inactive-slot

        - - name: deploy-to-target
            template: deploy-inactive-slot
            arguments:
              parameters:
                - name: target-slot
                  value: "{{steps.determine-target-slot.outputs.result}}"

        - - name: validate-target
            template: run-validation
            arguments:
              parameters:
                - name: target-slot
                  value: "{{steps.determine-target-slot.outputs.result}}"

        - - name: await-approval
            template: manual-approval

        - - name: switch-traffic
            template: update-traffic-routing
            arguments:
              parameters:
                - name: target-slot
                  value: "{{steps.determine-target-slot.outputs.result}}"

Canary Deployment

Overview

Canary deployment gradually rolls out changes to a small subset of users before rolling out to the entire infrastructure.

Progressive Rollout Strategy

Process Flow

Metrics Evaluation

# Canary analysis configuration
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp

  progressDeadlineSeconds: 60

  service:
    port: 80
    targetPort: 8080

  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10

    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m

      - name: request-duration
        thresholdRange:
          max: 500
        interval: 1m

      - name: error-rate
        thresholdRange:
          max: 1
        interval: 1m

    webhooks:
      - name: load-test
        url: http://flagger-loadtester/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary/"

      - name: smoke-test
        url: http://flagger-loadtester/
        timeout: 5s
        metadata:
          type: bash
          cmd: "curl -f http://myapp-canary/health"

Decision Matrix for Rollback

Metric	Threshold	Action
Error Rate	> 1%	Immediate rollback
P95 Latency	> 2x baseline	Immediate rollback
Success Rate	< 99%	Immediate rollback
5xx Errors	> 10 per minute	Immediate rollback
Pod Crash	Any canary pod crashes	Pause and investigate
Memory Usage	> 90%	Pause deployment

Rollback Procedures

Automatic Rollback Triggers

Rollback Strategies

1. Instant Rollback (Blue/Green)

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: rollback-bluegreen
spec:
  entrypoint: instant-rollback
  templates:
    - name: instant-rollback
      steps:
        - - name: identify-previous
            template: get-previous-version

        - - name: switch-traffic
            template: update-routing
            arguments:
              parameters:
                - name: target-version
                  value: "{{steps.identify-previous.outputs.result}}"

        - - name: verify
            template: verify-rollback

Timeline: < 30 seconds

2. Gradual Rollback (Canary)

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: rollback-canary
spec:
  entrypoint: gradual-rollback
  templates:
    - name: gradual-rollback
      steps:
        - - name: reduce-canary-100-to-50
            template: update-traffic-weight
            arguments:
              parameters:
                - name: canary-weight
                  value: "50"

        - - name: wait-30s
            template: sleep
            arguments:
              parameters:
                - name: duration
                  value: "30"

        - - name: reduce-canary-50-to-0
            template: update-traffic-weight
            arguments:
              parameters:
                - name: canary-weight
                  value: "0"

        - - name: delete-canary
            template: cleanup-canary-pods

Timeline: 1-2 minutes

3. Full Rollback (Standard)

#!/bin/bash
# Complete rollback script

APP_NAME=$1
ENVIRONMENT=$2
PREVIOUS_VERSION=$3

echo "Rolling back $APP_NAME in $ENVIRONMENT to $PREVIOUS_VERSION"

# Update GitOps repository
cd gitops
git pull origin main

# Revert to previous version
yq eval ".image.tag = \"$PREVIOUS_VERSION\"" \
  -i apps/$APP_NAME/$ENVIRONMENT/values.yaml

# Commit and push
git add .
git commit -m "Rollback $APP_NAME to $PREVIOUS_VERSION in $ENVIRONMENT"
git push origin main

# Wait for ArgoCD sync
argocd app sync $APP_NAME-$ENVIRONMENT
argocd app wait $APP_NAME-$ENVIRONMENT --health --timeout 300

# Verify rollback
kubectl get pods -n $ENVIRONMENT -l app=$APP_NAME

Timeline: 2-5 minutes

Manual Rollback from UI

Navigate to application in Backstage
Go to "Deployment History"
Select previous successful deployment
Click "Rollback to this version"
Confirm rollback
Monitor rollback progress

Multi-Cluster Deployment

Cluster Topology

Multi-Cluster Deployment Flow

Progressive Multi-Cluster Rollout

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: deployment-multi-cluster-progressive
spec:
  entrypoint: main
  arguments:
    parameters:
      - name: app-name
      - name: version
      - name: clusters
        value: '["us-east-1", "us-west-1", "eu-central-1"]'

  templates:
    - name: main
      steps:
        # Build once
        - - name: build-image
            template: build-and-push

        # Deploy to first cluster (canary region)
        - - name: deploy-canary-region
            template: deploy-to-cluster
            arguments:
              parameters:
                - name: cluster
                  value: "us-east-1"

        # Validate canary region
        - - name: validate-canary
            template: validate-deployment
            arguments:
              parameters:
                - name: cluster
                  value: "us-east-1"

        # Wait for approval or auto-proceed after soak time
        - - name: soak-time
            template: sleep
            arguments:
              parameters:
                - name: duration
                  value: "300"  # 5 minutes

        # Deploy to remaining clusters in parallel
        - - name: deploy-remaining
            template: deploy-to-cluster
            arguments:
              parameters:
                - name: cluster
                  value: "{{item}}"
            withItems:
              - us-west-1
              - eu-central-1

        # Validate all clusters
        - - name: validate-all
            template: validate-deployment
            arguments:
              parameters:
                - name: cluster
                  value: "{{item}}"
            withItems:
              - us-east-1
              - us-west-1
              - eu-central-1

Workflow Templates

Template Library

Template Name	Purpose	Duration	Rollback Support
`deployment-standard`	Standard rolling deployment	5-10 min	Yes
`deployment-blue-green`	Blue/Green deployment	15-20 min	Instant
`deployment-canary`	Canary deployment	30-60 min	Automatic
`rollback-instant`	Immediate rollback	< 1 min	N/A
`rollback-gradual`	Gradual rollback	2-5 min	N/A
`traffic-switch`	Update traffic routing	< 1 min	Yes
`multi-cluster-deploy`	Deploy to all clusters	10-15 min	Per-cluster
`health-check`	Validate deployment health	2 min	N/A

Common Workflow Parameters

parameters:
  # Application identification
  - name: app-name
    description: "Name of the application"

  - name: app-repo
    description: "Git repository URL for application source"

  - name: gitops-repo
    description: "Git repository URL for GitOps configurations"

  # Version control
  - name: version
    description: "Version tag for the deployment (e.g., v1.2.3)"

  - name: commit-sha
    description: "Git commit SHA to build from"

  # Environment and cluster
  - name: environment
    description: "Target environment (dev, staging, prod)"

  - name: cluster
    description: "Target Kubernetes cluster"

  - name: namespace
    description: "Kubernetes namespace"

  # Container registry
  - name: image-registry
    description: "Container registry URL"

  - name: registry-credentials
    description: "Secret name for registry authentication"

  # Deployment strategy
  - name: strategy
    description: "Deployment strategy (standard, blue-green, canary)"
    enum: [standard, blue-green, canary]

  # Configuration
  - name: config-overrides
    description: "JSON object with configuration overrides"

  - name: replica-count
    description: "Number of pod replicas"
    default: "3"

  # Validation
  - name: skip-tests
    description: "Skip automated tests"
    default: "false"

  - name: auto-rollback
    description: "Enable automatic rollback on failure"
    default: "true"

GitOps Repository Structure

Recommended Structure

gitops/
├── apps/
│   ├── app1/
│   │   ├── base/
│   │   │   ├── deployment.yaml
│   │   │   ├── service.yaml
│   │   │   ├── kustomization.yaml
│   │   │   └── configmap.yaml
│   │   ├── overlays/
│   │   │   ├── dev/
│   │   │   │   ├── kustomization.yaml
│   │   │   │   ├── values.yaml
│   │   │   │   └── patches/
│   │   │   ├── staging/
│   │   │   │   ├── kustomization.yaml
│   │   │   │   ├── values.yaml
│   │   │   │   └── patches/
│   │   │   └── production/
│   │   │       ├── kustomization.yaml
│   │   │       ├── values.yaml
│   │   │       ├── blue/
│   │   │       │   ├── kustomization.yaml
│   │   │       │   └── values.yaml
│   │   │       └── green/
│   │   │           ├── kustomization.yaml
│   │   │           └── values.yaml
│   │   └── clusters/
│   │       ├── us-east-1/
│   │       ├── us-west-1/
│   │       └── eu-central-1/
│   └── app2/
│       └── ...
├── platform/
│   ├── argocd/
│   ├── argo-workflows/
│   ├── monitoring/
│   └── ingress/
└── clusters/
    ├── us-east-1/
    │   ├── apps.yaml
    │   └── config.yaml
    ├── us-west-1/
    └── eu-central-1/

Example: Application Values

# apps/myapp/overlays/production/values.yaml
replicaCount: 3

image:
  repository: registry.company.com/myapp
  tag: v1.2.3
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80
  targetPort: 8080

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: myapp.company.com
      paths:
        - path: /
          pathType: Prefix

resources:
  limits:
    cpu: 1000m
    memory: 1Gi
  requests:
    cpu: 500m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

healthChecks:
  livenessProbe:
    httpGet:
      path: /health/live
      port: 8080
    initialDelaySeconds: 30
    periodSeconds: 10
  readinessProbe:
    httpGet:
      path: /health/ready
      port: 8080
    initialDelaySeconds: 5
    periodSeconds: 5

env:
  - name: ENVIRONMENT
    value: production
  - name: LOG_LEVEL
    value: info
  - name: DATABASE_HOST
    valueFrom:
      secretKeyRef:
        name: myapp-secrets
        key: db-host

Environment Management

Environment Hierarchy

Environment Configuration

Environment	Purpose	Auto-Deploy	Approval Required	Clusters	Replicas
Development	Feature testing	Yes	No	1 (shared)	1
Staging	Pre-production validation	Yes	No	1 (dedicated)	2
Production	Live traffic	No	Yes	3 (multi-region)	3-10

Promotion Workflow

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: promote-environment
spec:
  entrypoint: main
  arguments:
    parameters:
      - name: app-name
      - name: source-env
      - name: target-env

  templates:
    - name: main
      steps:
        # Get current version in source environment
        - - name: get-source-version
            template: get-deployed-version
            arguments:
              parameters:
                - name: environment
                  value: "{{workflow.parameters.source-env}}"

        # Run validation tests
        - - name: validate-source
            template: run-tests
            arguments:
              parameters:
                - name: environment
                  value: "{{workflow.parameters.source-env}}"

        # Request approval for production
        - - name: request-approval
            template: manual-approval
            when: "{{workflow.parameters.target-env}} == production"

        # Deploy to target environment
        - - name: deploy-to-target
            template: deploy
            arguments:
              parameters:
                - name: environment
                  value: "{{workflow.parameters.target-env}}"
                - name: version
                  value: "{{steps.get-source-version.outputs.result}}"

        # Validate target deployment
        - - name: validate-target
            template: run-tests
            arguments:
              parameters:
                - name: environment
                  value: "{{workflow.parameters.target-env}}"

Validation and Health Checks

Health Check Levels

Automated Tests

1. Smoke Tests

#!/bin/bash
# smoke-tests.sh

APP_URL=$1
NAMESPACE=$2

echo "Running smoke tests for $APP_URL"

# Test 1: Health endpoint
echo "Test 1: Health endpoint"
if curl -f -s "$APP_URL/health" | grep -q "healthy"; then
    echo "✓ Health check passed"
else
    echo "✗ Health check failed"
    exit 1
fi

# Test 2: Readiness endpoint
echo "Test 2: Readiness endpoint"
if curl -f -s "$APP_URL/health/ready" | grep -q "ready"; then
    echo "✓ Readiness check passed"
else
    echo "✗ Readiness check failed"
    exit 1
fi

# Test 3: API version endpoint
echo "Test 3: API version"
VERSION=$(curl -s "$APP_URL/api/version" | jq -r '.version')
if [ -n "$VERSION" ]; then
    echo "✓ API version: $VERSION"
else
    echo "✗ API version check failed"
    exit 1
fi

# Test 4: Metrics endpoint
echo "Test 4: Metrics endpoint"
if curl -f -s "$APP_URL/metrics" | grep -q "# HELP"; then
    echo "✓ Metrics endpoint responding"
else
    echo "✗ Metrics endpoint failed"
    exit 1
fi

echo "All smoke tests passed!"

2. Integration Tests

# integration_tests.py
import requests
import pytest
import os

BASE_URL = os.environ.get('APP_URL')

def test_database_connection():
    """Test database connectivity"""
    response = requests.get(f"{BASE_URL}/health/db")
    assert response.status_code == 200
    assert response.json()['status'] == 'connected'

def test_cache_connection():
    """Test Redis cache connectivity"""
    response = requests.get(f"{BASE_URL}/health/cache")
    assert response.status_code == 200
    assert response.json()['status'] == 'connected'

def test_api_crud_operations():
    """Test basic CRUD operations"""
    # Create
    create_response = requests.post(
        f"{BASE_URL}/api/v1/items",
        json={"name": "test", "value": 123}
    )
    assert create_response.status_code == 201
    item_id = create_response.json()['id']

    # Read
    get_response = requests.get(f"{BASE_URL}/api/v1/items/{item_id}")
    assert get_response.status_code == 200
    assert get_response.json()['name'] == "test"

    # Update
    update_response = requests.put(
        f"{BASE_URL}/api/v1/items/{item_id}",
        json={"name": "updated", "value": 456}
    )
    assert update_response.status_code == 200

    # Delete
    delete_response = requests.delete(f"{BASE_URL}/api/v1/items/{item_id}")
    assert delete_response.status_code == 204

def test_authentication():
    """Test authentication flow"""
    # Login
    auth_response = requests.post(
        f"{BASE_URL}/api/v1/auth/login",
        json={"username": "testuser", "password": "testpass"}
    )
    assert auth_response.status_code == 200
    token = auth_response.json()['token']

    # Access protected endpoint
    headers = {"Authorization": f"Bearer {token}"}
    protected_response = requests.get(
        f"{BASE_URL}/api/v1/protected",
        headers=headers
    )
    assert protected_response.status_code == 200

def test_rate_limiting():
    """Test rate limiting"""
    # Make multiple requests
    responses = [
        requests.get(f"{BASE_URL}/api/v1/items")
        for _ in range(110)  # Assuming limit is 100/min
    ]

    # Check that some requests are rate limited
    status_codes = [r.status_code for r in responses]
    assert 429 in status_codes  # Too Many Requests

Performance Validation

# k6-load-test.js
  import http from 'k6/http';
  import { check, sleep } from 'k6';

  export let options = {
stages: [
  { duration: '1m', target: 50 },   // Ramp up
  { duration: '3m', target: 50 },   // Stay at 50 users
  { duration: '1m', target: 100 },  // Ramp up more
  { duration: '3m', target: 100 },  // Stay at 100 users
  { duration: '1m', target: 0 },    // Ramp down
],
thresholds: {
  'http_req_duration': ['p(95)<500'],  // 95% of requests under 500ms
  'http_req_failed': ['rate<0.01'],     // Error rate under 1%
},
};

  export default function () {
  let response = http.get(`${__ENV.APP_URL}/api/v1/items`);

  check(response, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
});

  sleep(1);
}

Best Practices

1. Deployment Best Practices

✅ Always use GitOps - never manual kubectl
✅ Tag images with semantic versions
✅ Include git commit SHA in image labels
✅ Test in lower environments first
✅ Implement health checks in applications
✅ Use resource limits and requests
✅ Enable pod disruption budgets
✅ Implement graceful shutdown
✅ Monitor during and after deployment
✅ Document deployment procedures

2. Traffic Management Best Practices

✅ Start with small traffic percentages
✅ Monitor error rates continuously
✅ Have automated rollback triggers
✅ Keep old version running during transition
✅ Test with synthetic traffic first
✅ Use feature flags for risky changes
✅ Implement circuit breakers
✅ Log all traffic switches

3. Security Best Practices

✅ Scan images before deployment
✅ Use least-privilege RBAC
✅ Never commit secrets to Git
✅ Rotate secrets regularly
✅ Enable audit logging
✅ Implement network policies
✅ Use signed container images
✅ Regular security updates

Troubleshooting

Common Issues and Solutions

Issue	Symptoms	Solution
Image Pull Error	Pods in ImagePullBackOff	Check registry credentials, verify image exists
Pod Crashes	CrashLoopBackOff	Check logs, verify resources, check dependencies
Slow Rollout	Deployment takes too long	Increase readiness probe timeout, check resource availability
Traffic Not Switching	Old version still receiving traffic	Verify ingress/service mesh configuration
ArgoCD Not Syncing	Changes in Git not applied	Check ArgoCD sync policy, verify repository access

Debug Commands

# Check workflow status
kubectl get workflows -n argo

# View workflow logs
argo logs -n argo <workflow-name>

# Check ArgoCD application status
argocd app get <app-name>

# Check pod status
kubectl get pods -n <namespace> -l app=<app-name>

# View pod logs
kubectl logs -n <namespace> -l app=<app-name> --tail=100

# Describe failing pod
kubectl describe pod -n <namespace> <pod-name>

# Check events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

Metrics and KPIs

Deployment Metrics

Deployment Frequency: Number of deployments per day/week
Lead Time: Time from commit to production
Change Failure Rate: Percentage of deployments causing issues
Mean Time to Recovery (MTTR): Time to recover from failures
Deployment Success Rate: Percentage of successful deployments
Rollback Rate: Percentage of deployments rolled back

Target SLIs

Metric	Target
Deployment Duration	< 15 minutes (95th percentile)
Deployment Success Rate	> 95%
Rollback Time	< 5 minutes
Change Failure Rate	< 5%
MTTR	< 30 minutes

Overview​

Key Principles​

Deployment Strategies​

Strategy Comparison​

Decision Tree​

Workflow Architecture​

Component Interaction​

Standard Deployment Process​

Overview​

Process Flow​

Workflow Template Structure​

User Experience in Backstage​

Blue/Green Deployment​

Overview​

Architecture​

Process Flow​

Deployment Steps​

Phase 1: Deploy Green Environment​

Phase 2: Validate Green Environment​

Phase 3: Switch Traffic​

Phase 4: Cleanup (Optional)​

Workflow Template​

Canary Deployment​

Overview​

Progressive Rollout Strategy​

Process Flow​

Metrics Evaluation​

Decision Matrix for Rollback​

Rollback Procedures​

Automatic Rollback Triggers​

Rollback Strategies​

1. Instant Rollback (Blue/Green)​

2. Gradual Rollback (Canary)​

3. Full Rollback (Standard)​

Manual Rollback from UI​

Multi-Cluster Deployment​

Cluster Topology​

Multi-Cluster Deployment Flow​

Progressive Multi-Cluster Rollout​

Workflow Templates​

Template Library​

Common Workflow Parameters​

GitOps Repository Structure​

Recommended Structure​

Example: Application Values​

Environment Management​

Environment Hierarchy​

Environment Configuration​

Promotion Workflow​

Validation and Health Checks​

Health Check Levels​

Automated Tests​

1. Smoke Tests​

2. Integration Tests​

Performance Validation​

Best Practices​

1. Deployment Best Practices​

2. Traffic Management Best Practices​

3. Security Best Practices​

Troubleshooting​

Common Issues and Solutions​

Debug Commands​

Metrics and KPIs​

Deployment Metrics​

Target SLIs​

Overview

Key Principles

Deployment Strategies

Strategy Comparison

Decision Tree

Workflow Architecture

Component Interaction

Standard Deployment Process

Overview

Process Flow

Workflow Template Structure

User Experience in Backstage

Blue/Green Deployment

Overview

Architecture

Process Flow

Deployment Steps

Phase 1: Deploy Green Environment

Phase 2: Validate Green Environment

Phase 3: Switch Traffic

Phase 4: Cleanup (Optional)

Workflow Template

Canary Deployment

Overview

Progressive Rollout Strategy

Process Flow

Metrics Evaluation

Decision Matrix for Rollback

Rollback Procedures

Automatic Rollback Triggers

Rollback Strategies

1. Instant Rollback (Blue/Green)

2. Gradual Rollback (Canary)

3. Full Rollback (Standard)

Manual Rollback from UI

Multi-Cluster Deployment

Cluster Topology

Multi-Cluster Deployment Flow

Progressive Multi-Cluster Rollout

Workflow Templates

Template Library

Common Workflow Parameters

GitOps Repository Structure

Recommended Structure

Example: Application Values

Environment Management

Environment Hierarchy

Environment Configuration

Promotion Workflow

Validation and Health Checks

Health Check Levels

Automated Tests

1. Smoke Tests

2. Integration Tests

Performance Validation

Best Practices

1. Deployment Best Practices

2. Traffic Management Best Practices

3. Security Best Practices

Troubleshooting

Common Issues and Solutions

Debug Commands

Metrics and KPIs

Deployment Metrics

Target SLIs