Skip to main content

01. Architecture Overview

This document outlines the architecture for an Internal Developer Portal (IDP) built on Spotify's Backstage platform. The IDP serves as a self-service platform enabling application teams to manage their deployments across multiple Kubernetes clusters with minimal operational knowledge.

System Overview

Purpose

The Internal Developer Portal provides a unified interface for application teams to:

  • Register and manage applications in the service catalog
  • Trigger deployments across multiple environments and clusters
  • Manage traffic routing strategies (Blue/Green, Canary)
  • Monitor deployment status and health
  • Perform rollbacks and version management
  • Access documentation and best practices

Key Stakeholders

RoleResponsibilities
Platform Engineering TeamMaintain IDP, Argo Workflows templates, cluster infrastructure
Application TeamsDeploy and manage their applications via IDP
Operations TeamMonitor platform health, manage incidents
Security TeamEnsure compliance, manage access controls

Architecture Principles

1. Self-Service First

  • Teams can perform all deployment operations without platform team intervention
  • Clear, intuitive interfaces with minimal learning curve
  • Guided workflows with validation and safety checks

2. Security by Default

  • RBAC enforcement at all layers
  • Audit logging for all operations
  • Secrets management via external systems (e.g., Vault, K8s secrets)
  • No direct cluster access required

3. Declarative Configuration

  • All application configurations stored as code (GitOps)
  • Version controlled and auditable
  • Infrastructure as Code principles

4. Multi-Tenancy

  • Clear isolation between teams and applications
  • Resource quotas and limits
  • Cost attribution and tracking

5. Observability

  • Comprehensive logging and monitoring
  • Deployment tracking and history
  • Performance metrics and SLIs

6. Progressive Delivery

  • Support for multiple deployment strategies
  • Traffic splitting capabilities
  • Automated rollback on failures

Core Components

Component Details

1. Backstage Platform

Technology: Node.js, React, TypeScript Purpose: Core IDP platform providing unified interface

Key Features:

  • Service Catalog: Central registry of all applications, services, and resources
  • Software Templates: Scaffolding for new applications
  • TechDocs: Documentation as code
  • Plugin Architecture: Extensible via custom plugins

2. Custom Backstage Plugins

a) Deployment Plugin
  • Trigger Argo Workflows for deployments
  • Display deployment status and history
  • Support for multiple deployment strategies
  • Environment-specific configurations
b) Traffic Management Plugin
  • Blue/Green deployment controls
  • Canary release management
  • Traffic splitting configuration
  • Rollback capabilities
c) Multi-Cluster Management Plugin
  • View applications across clusters
  • Cluster health and capacity
  • Regional deployment status
d) Monitoring & Observability Plugin
  • Integration with monitoring systems (Prometheus, Grafana)
  • Application metrics and logs
  • Alert management

3. Argo Workflows

Purpose: Workflow orchestration for CI/CD pipelines

Key Workflow Templates:

  • build-and-push-image: Build Docker image and push to registry
  • update-gitops-values: Update image tags in ArgoCD values
  • deploy-application: Orchestrate full deployment
  • rollback-deployment: Automated rollback procedures
  • traffic-switch: Manage traffic routing
  • health-check: Post-deployment validation

4. ArgoCD

Purpose: GitOps continuous delivery

Configuration:

  • Application-per-environment pattern
  • App-of-apps pattern for multi-cluster
  • Sync policies and health checks
  • Progressive sync waves

5. Kubernetes Clusters

Multi-Cluster Setup:

  • Geographic redundancy across regions
  • Active-active configuration
  • Cross-cluster service mesh
  • Centralized monitoring

Integration Architecture

Integration Points

1. Identity & Access Management

  • LDAP/Active Directory integration
  • OAuth 2.0 / OIDC authentication
  • RBAC at Backstage, Kubernetes, and Argo levels
  • Service accounts for automation

2. Git Repositories

  • Application Repos: Source code and Dockerfiles
  • GitOps Repos: Kubernetes manifests and Helm values
  • Catalog Repos: Backstage entity definitions
  • Webhook integration for automated triggers

3. Container Registry

  • Private Docker registry (Artifactory, ECR)
  • Image scanning and vulnerability assessment
  • Retention policies
  • Access controls per team

4. Secrets Management

  • External secrets operator integration
  • HashiCorp Vault or cloud-native solutions
  • Automatic secret rotation
  • Audit logging

5. Monitoring & Observability

  • Prometheus for metrics
  • Grafana for visualization
  • ELK/Loki for logging
  • Jaeger/Tempo for tracing
  • Integration with Backstage dashboards

6. Notification Systems

  • Slack integration for deployment notifications
  • Email alerts for failures
  • PagerDuty for incident management
  • Webhook support for custom integrations

High-Level Architecture


Data Flow

1. Application Deployment Flow

2. Traffic Switch Flow (Blue/Green)

3. Canary Deployment Flow


Security Architecture

1. Authentication & Authorization

2. Security Layers

LayerSecurity Controls
Network- Network policies
- Service mesh mTLS
- API Gateway with WAF
- Private subnets
Application- OIDC authentication
- RBAC authorization
- Input validation
- CSRF protection
API- API key management
- Rate limiting
- Request signing
- Audit logging
Secrets- External secrets operator
- Vault integration
- No secrets in Git
- Rotation policies
Container- Image scanning
- Signed images
- Non-root containers
- Resource limits
Kubernetes- Pod security standards
- Network policies
- RBAC
- Admission controllers

3. RBAC Model

# Example RBAC Structure
roles:
- name: application-owner
permissions:
- deploy:own-app
- rollback:own-app
- view:own-app-metrics
- switch-traffic:own-app

- name: application-developer
permissions:
- view:own-app
- deploy:dev-environment
- view:own-app-logs

- name: platform-admin
permissions:
- deploy:*
- configure:platform
- manage:users
- view:all-metrics

- name: viewer
permissions:
- view:catalog
- view:documentation

4. Audit & Compliance

  • All API calls logged with user identity
  • Deployment history with approval chains
  • Change tracking in Git repositories
  • Compliance reports for regulatory requirements
  • Retention policies for logs and audit trails

Scalability & Reliability

1. Horizontal Scaling

ComponentScaling Strategy
BackstageMultiple replicas behind load balancer
Argo WorkflowsController replicas with leader election
ArgoCDMultiple application controllers
DatabasesPostgreSQL with replication
RedisRedis cluster for caching

2. High Availability

3. Disaster Recovery

  • RTO: 30 minutes
  • RPO: 5 minutes

DR Strategy:

  1. Multi-region deployment of all components
  2. Database replication with automated failover
  3. GitOps ensures infrastructure as code
  4. Regular backup of Backstage catalog
  5. Documented runbooks for failure scenarios

4. Performance Targets

MetricTargetMeasurement
Page Load Time< 2 seconds95th percentile
API Response Time< 500ms95th percentile
Deployment Trigger< 5 secondsAverage
Catalog Search< 1 second95th percentile
Concurrent Users500+Peak capacity

References