01. Architecture Overview
This document outlines the architecture for an Internal Developer Portal (IDP) built on Spotify's Backstage platform. The IDP serves as a self-service platform enabling application teams to manage their deployments across multiple Kubernetes clusters with minimal operational knowledge.
System Overview
Purpose
The Internal Developer Portal provides a unified interface for application teams to:
- Register and manage applications in the service catalog
- Trigger deployments across multiple environments and clusters
- Manage traffic routing strategies (Blue/Green, Canary)
- Monitor deployment status and health
- Perform rollbacks and version management
- Access documentation and best practices
Key Stakeholders
| Role | Responsibilities |
|---|---|
| Platform Engineering Team | Maintain IDP, Argo Workflows templates, cluster infrastructure |
| Application Teams | Deploy and manage their applications via IDP |
| Operations Team | Monitor platform health, manage incidents |
| Security Team | Ensure compliance, manage access controls |
Architecture Principles
1. Self-Service First
- Teams can perform all deployment operations without platform team intervention
- Clear, intuitive interfaces with minimal learning curve
- Guided workflows with validation and safety checks
2. Security by Default
- RBAC enforcement at all layers
- Audit logging for all operations
- Secrets management via external systems (e.g., Vault, K8s secrets)
- No direct cluster access required
3. Declarative Configuration
- All application configurations stored as code (GitOps)
- Version controlled and auditable
- Infrastructure as Code principles
4. Multi-Tenancy
- Clear isolation between teams and applications
- Resource quotas and limits
- Cost attribution and tracking
5. Observability
- Comprehensive logging and monitoring
- Deployment tracking and history
- Performance metrics and SLIs
6. Progressive Delivery
- Support for multiple deployment strategies
- Traffic splitting capabilities
- Automated rollback on failures
Core Components
Component Details
1. Backstage Platform
Technology: Node.js, React, TypeScript Purpose: Core IDP platform providing unified interface
Key Features:
- Service Catalog: Central registry of all applications, services, and resources
- Software Templates: Scaffolding for new applications
- TechDocs: Documentation as code
- Plugin Architecture: Extensible via custom plugins
2. Custom Backstage Plugins
a) Deployment Plugin
- Trigger Argo Workflows for deployments
- Display deployment status and history
- Support for multiple deployment strategies
- Environment-specific configurations
b) Traffic Management Plugin
- Blue/Green deployment controls
- Canary release management
- Traffic splitting configuration
- Rollback capabilities
c) Multi-Cluster Management Plugin
- View applications across clusters
- Cluster health and capacity
- Regional deployment status
d) Monitoring & Observability Plugin
- Integration with monitoring systems (Prometheus, Grafana)
- Application metrics and logs
- Alert management
3. Argo Workflows
Purpose: Workflow orchestration for CI/CD pipelines
Key Workflow Templates:
build-and-push-image: Build Docker image and push to registryupdate-gitops-values: Update image tags in ArgoCD valuesdeploy-application: Orchestrate full deploymentrollback-deployment: Automated rollback procedurestraffic-switch: Manage traffic routinghealth-check: Post-deployment validation
4. ArgoCD
Purpose: GitOps continuous delivery
Configuration:
- Application-per-environment pattern
- App-of-apps pattern for multi-cluster
- Sync policies and health checks
- Progressive sync waves
5. Kubernetes Clusters
Multi-Cluster Setup:
- Geographic redundancy across regions
- Active-active configuration
- Cross-cluster service mesh
- Centralized monitoring
Integration Architecture
Integration Points
1. Identity & Access Management
- LDAP/Active Directory integration
- OAuth 2.0 / OIDC authentication
- RBAC at Backstage, Kubernetes, and Argo levels
- Service accounts for automation
2. Git Repositories
- Application Repos: Source code and Dockerfiles
- GitOps Repos: Kubernetes manifests and Helm values
- Catalog Repos: Backstage entity definitions
- Webhook integration for automated triggers
3. Container Registry
- Private Docker registry (Artifactory, ECR)
- Image scanning and vulnerability assessment
- Retention policies
- Access controls per team
4. Secrets Management
- External secrets operator integration
- HashiCorp Vault or cloud-native solutions
- Automatic secret rotation
- Audit logging
5. Monitoring & Observability
- Prometheus for metrics
- Grafana for visualization
- ELK/Loki for logging
- Jaeger/Tempo for tracing
- Integration with Backstage dashboards
6. Notification Systems
- Slack integration for deployment notifications
- Email alerts for failures
- PagerDuty for incident management
- Webhook support for custom integrations
High-Level Architecture
Data Flow
1. Application Deployment Flow
2. Traffic Switch Flow (Blue/Green)
3. Canary Deployment Flow
Security Architecture
1. Authentication & Authorization
2. Security Layers
| Layer | Security Controls |
|---|---|
| Network | - Network policies - Service mesh mTLS - API Gateway with WAF - Private subnets |
| Application | - OIDC authentication - RBAC authorization - Input validation - CSRF protection |
| API | - API key management - Rate limiting - Request signing - Audit logging |
| Secrets | - External secrets operator - Vault integration - No secrets in Git - Rotation policies |
| Container | - Image scanning - Signed images - Non-root containers - Resource limits |
| Kubernetes | - Pod security standards - Network policies - RBAC - Admission controllers |
3. RBAC Model
# Example RBAC Structure
roles:
- name: application-owner
permissions:
- deploy:own-app
- rollback:own-app
- view:own-app-metrics
- switch-traffic:own-app
- name: application-developer
permissions:
- view:own-app
- deploy:dev-environment
- view:own-app-logs
- name: platform-admin
permissions:
- deploy:*
- configure:platform
- manage:users
- view:all-metrics
- name: viewer
permissions:
- view:catalog
- view:documentation
4. Audit & Compliance
- All API calls logged with user identity
- Deployment history with approval chains
- Change tracking in Git repositories
- Compliance reports for regulatory requirements
- Retention policies for logs and audit trails
Scalability & Reliability
1. Horizontal Scaling
| Component | Scaling Strategy |
|---|---|
| Backstage | Multiple replicas behind load balancer |
| Argo Workflows | Controller replicas with leader election |
| ArgoCD | Multiple application controllers |
| Databases | PostgreSQL with replication |
| Redis | Redis cluster for caching |
2. High Availability
3. Disaster Recovery
- RTO: 30 minutes
- RPO: 5 minutes
DR Strategy:
- Multi-region deployment of all components
- Database replication with automated failover
- GitOps ensures infrastructure as code
- Regular backup of Backstage catalog
- Documented runbooks for failure scenarios
4. Performance Targets
| Metric | Target | Measurement |
|---|---|---|
| Page Load Time | < 2 seconds | 95th percentile |
| API Response Time | < 500ms | 95th percentile |
| Deployment Trigger | < 5 seconds | Average |
| Catalog Search | < 1 second | 95th percentile |
| Concurrent Users | 500+ | Peak capacity |