05. Team Onboarding and User Guide

Welcome to the IDP

What is the Internal Developer Portal?

The Internal Developer Portal (IDP) is a self-service platform that empowers your team to deploy and manage applications across multiple Kubernetes clusters without requiring deep knowledge of the underlying infrastructure.

What Can You Do?

Key Benefits

Self-Service: Deploy without waiting for platform team
Safe: Automated validation and easy rollbacks
Consistent: Same process across all environments
Visible: Track all deployments and their status
Multi-Region: Deploy to multiple clusters automatically

Getting Started

1. Access the Portal

Navigate to: https://idp.company.com

Use your company LDAP/AD credentials
You'll be automatically signed in via SSO

Upon first login, you'll see:

Dashboard: Overview of your team's applications
Catalog: All registered services
Documentation: TechDocs for all services
Your Profile: Your teams and permissions

4. Understand Your Permissions

Role	Permissions
Developer	View apps, Deploy to dev, View logs
Team Lead	All developer permissions + Deploy to staging
Application Owner	All permissions + Deploy to production, Manage traffic
Platform Admin	Full access to all applications

Check your permissions:

Click your profile icon (top right)
Select "Settings"
View "Teams & Permissions"

Registering Your Application

Prerequisites

Before registering your application, ensure you have:

✅ A Git repository for your application
✅ A Dockerfile in your repository
✅ Basic understanding of your app's requirements
✅ Owner or contributor access to the repository

Step 1: Prepare catalog-info.yaml

Create a file named catalog-info.yaml in the root of your repository:

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: my-awesome-app
  description: My awesome application
  annotations:
    # Link to your Git repository
    github.com/project-slug: company/my-awesome-app

    # Documentation path (optional)
    backstage.io/techdocs-ref: dir:.

    # Grafana dashboard (optional)
    grafana/dashboard-selector: 'app=my-awesome-app'

  tags:
    - nodejs
    - api
    - rest

  links:
    - url: https://wiki.company.com/my-awesome-app
      title: Wiki
      icon: docs

spec:
  type: service
  lifecycle: experimental  # experimental, production, deprecated
  owner: team-awesome      # Your team name
  system: awesome-system   # System this belongs to

  # Dependencies (optional)
  dependsOn:
    - resource:postgres-db
    - component:auth-service

  # APIs provided (optional)
  providesApis:
    - my-awesome-api

  # APIs consumed (optional)
  consumesApis:
    - payment-api

Step 2: Define API (Optional)

If your application exposes an API, create an API entity:

---
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: my-awesome-api
  description: REST API for my awesome app
spec:
  type: openapi
  lifecycle: production
  owner: team-awesome
  definition:
    # Path to OpenAPI spec
    $text: ./openapi.yaml

Step 3: Commit and Push

git add catalog-info.yaml
git commit -m "Add Backstage catalog info"
git push origin main

Step 4: Register in Portal

Go to the Catalog page
Click "Register Existing Component"
Select "URL"

Enter your repository URL:

https://github.com/company/my-awesome-app/blob/main/catalog-info.yaml

Click "Analyze"
Review the entities found
Click "Import"

Step 5: Verify Registration

Navigate to Catalog
Search for your application
Click on it to view details
Verify all information is correct

Deploying Your Application

Deployment Overview

Step 1: Navigate to Your Application

Go to Catalog
Click on your application
Click the "Deployments" tab

Step 2: Trigger a Deployment

Option A: Standard Deployment (Recommended for Dev/Staging)

Steps:

Click "Deploy" button
Environment: Select from dropdown
- dev - Development environment
- staging - Staging environment
- production - Production environment
Strategy: Select "Standard (Rolling Update)"
Version (optional):
- Leave empty for latest commit
- Or specify a version tag (e.g., v1.2.3)
- Or specify a git commit SHA
Click "Deploy"

Option B: Blue/Green Deployment (Production)

When to use: Major releases, critical updates

Steps:

Click "Deploy"
Environment: production
Strategy: "Blue/Green"
Version: Specify version
Click "Deploy"
Wait for Green deployment to complete
Run tests against Green environment
Switch traffic when ready (see Traffic Management section)

Option C: Canary Deployment (High-Risk Changes)

When to use: High-risk changes, gradual rollouts

Steps:

Click "Deploy"
Environment: production
Strategy: "Canary"
Canary Percentage: Start with 10
Version: Specify version
Click "Deploy"
Monitor metrics automatically
System will gradually increase traffic if healthy
Automatic rollback if metrics degrade

Step 3: Monitor Deployment Progress

The deployment page will show:

Current Status: Running, Succeeded, Failed
Progress Steps: Each step with status
Logs: Real-time workflow logs
Timeline: Estimated completion time

Step 4: Verify Deployment

Once complete, verify:

Status: Shows "Succeeded" in green
Health Checks: All pods healthy
Version: Correct version deployed
Metrics: No error spikes

Managing Deployments

View Deployment History

Steps:

Go to your application in Catalog
Click "Deployments" tab
Scroll to "Deployment History"
Filter by environment if needed

Information shown:

Timestamp
Version deployed
Environment
Strategy used
Status (Success/Failed)
Duration
Deployed by (your username)
Actions (Rollback button)

Rollback a Deployment

Steps:

Go to Deployment History
Find the deployment you want to rollback to
Click the "Rollback" button (undo icon)
Confirm the rollback
Monitor the rollback progress

Rollback times:

Blue/Green: < 30 seconds (instant switch)
Standard: 2-5 minutes
Canary: 1-2 minutes

Cancel an In-Progress Deployment

If a deployment is stuck or needs to be stopped:

Go to deployment details
Click "Cancel Deployment"
Confirm cancellation
System will clean up resources

Traffic Management

Blue/Green Traffic Switch

Steps:

After Blue/Green deployment completes
Go to "Traffic Management" tab
Review current traffic split:
- Blue: 100% (old version)
- Green: 0% (new version)
Run validation tests on Green
Click "Switch to Green"
Confirm the switch
Monitor metrics after switch

Safety features:

Keep Blue running for 24 hours (for quick rollback)
Instant rollback to Blue if needed
Health checks before switching

Canary Traffic Control

Automatic mode (Recommended):

System automatically increases traffic
Based on error rates and latency
Automatic rollback if issues detected

Manual mode:

Go to "Traffic Management" tab
View current canary percentage
Click "Increase Canary Traffic"
Select new percentage (10%, 25%, 50%, 75%, 100%)
Monitor metrics
Repeat until 100%

Rollback canary:

Click "Rollback Canary"
Traffic immediately routes to stable version
Canary pods are removed

Traffic Split Metrics

Monitor these metrics during traffic changes:

Error Rate: Should stay < 1%
Response Time: Should stay within 10% of baseline
Success Rate: Should stay > 99%
Traffic Distribution: Verify actual vs. intended split

Monitoring and Observability

Application Dashboard

Access your dashboard:

Navigate to your application
Click "Monitoring" tab
View real-time metrics

Key Metrics to Monitor

Metric	What to Watch	Alert Threshold
Request Rate	Traffic volume	Sudden drops
Error Rate	Failed requests	> 1%
Response Time	Latency (P95)	> 500ms
CPU Usage	Resource utilization	> 80%
Memory Usage	Memory consumption	> 85%
Pod Crashes	Restarts	Any crashes

View Logs

Real-time logs:

Go to "Logs" tab
Select environment
Select pod (if multiple)
View streaming logs

Search logs:

# Example searches
error
status=500
user_id=12345

Filter by:

Log level (INFO, WARN, ERROR)
Time range
Pod name
Keywords

Alerts and Notifications

Set up alerts:

Go to "Alerts" tab
Click "Create Alert"
Configure:
- Metric to monitor
- Threshold
- Notification channel (Slack, Email)
Save alert

Alert types:

High error rate
Slow response time
Pod crashes
High resource usage
Deployment failures

Troubleshooting

Common Issues

Issue 1: Deployment Stuck in "Pending"

Symptoms:

Deployment shows "Pending" for > 5 minutes
No progress in logs

Possible Causes:

Image not found in registry
Insufficient cluster resources
Configuration error

How to debug:

Steps:

Click on deployment to see details
View "Workflow Logs"
Check for error messages
Common fixes:
- Fix Dockerfile if build failed
- Verify registry access
- Check resource quotas
- Review configuration

Issue 2: Pods Crash After Deployment

Symptoms:

Deployment succeeds
Pods start but crash immediately
Status shows "CrashLoopBackOff"

How to debug:

Go to "Logs" tab
View pod logs
Look for:
- Configuration errors
- Missing environment variables
- Database connection issues
- Port conflicts

Common fixes:

Update environment variables
Fix configuration
Check dependencies (DB, Redis, etc.)
Verify resource limits

Issue 3: Slow Deployment

Symptoms:

Deployment takes > 15 minutes
Progress seems stuck

How to debug:

Check each step duration
Identify bottleneck:
- Build step slow? Optimize Dockerfile
- Image push slow? Check network/registry
- Sync slow? Check cluster resources

Issue 4: Traffic Switch Fails

Symptoms:

Traffic switch command completes
Traffic still goes to old version

How to debug:

Check ingress/gateway configuration
Verify service labels
Check pod labels match
Review service mesh config

Contact support if issues persist > 30 minutes.

Best Practices

Development Workflow

Deployment Best Practices

1. Always Test in Lower Environments First

✅ Do:

Deploy to dev first
Test thoroughly in staging
Use production-like data in staging

❌ Don't:

Skip dev/staging
Deploy untested code to production
Assume it works if it worked locally

2. Use Semantic Versioning

✅ Do:

v1.0.0 - Major release
v1.1.0 - New feature (minor)
v1.1.1 - Bug fix (patch)

❌ Don't:

v1
my-feature
latest (in production)

3. Deploy During Low-Traffic Hours

✅ Do:

Production: Deploy during maintenance windows
Use Blue/Green for zero-downtime
Schedule deployments

❌ Don't:

Deploy during peak hours
Deploy on Friday afternoons
Deploy without announcement

4. Monitor After Deployment

✅ Do:

Watch metrics for 30 minutes post-deployment
Check error rates
Verify logs for issues
Test critical paths

❌ Don't:

Deploy and leave immediately
Ignore alerts
Assume it's working

5. Document Changes

✅ Do:

Write clear commit messages
Update documentation
Note breaking changes
Tag releases in Git

❌ Don't:

Use generic commit messages ("fix", "update")
Leave docs outdated
Forget to tag releases

Traffic Management Best Practices

1. Blue/Green Deployments

✅ Do:

Test Green thoroughly before switch
Keep Blue for 24 hours
Have rollback plan
Monitor after switch

2. Canary Deployments

✅ Do:

Start with 10% traffic
Wait 15-30 minutes between increases
Monitor error rates closely
Use automatic rollback

❌ Don't:

Jump directly to 100%
Ignore metrics
Disable automatic rollback

Configuration Best Practices

1. Environment Variables

✅ Do:

env:
  - name: DATABASE_HOST
    valueFrom:
      secretKeyRef:
        name: app-secrets
        key: db-host
  - name: LOG_LEVEL
    value: "info"

❌ Don't:

env:
  - name: DATABASE_PASSWORD
    value: "plain-text-password"  # NEVER!

2. Resource Limits

✅ Do:

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

❌ Don't:

Omit resource limits
Set limits too low (causes crashes)
Set limits too high (wastes resources)

Quick Reference

Common Commands

Task	Steps
Register App	Catalog → Register → Enter URL → Import
Deploy to Dev	App → Deployments → Deploy → dev → Standard → Deploy
Deploy to Prod	App → Deployments → Deploy → production → Blue/Green → Deploy
Rollback	App → Deployments → History → Find deployment → Rollback
Switch Traffic	App → Traffic → Switch to Green → Confirm
View Logs	App → Logs → Select environment → View

Welcome to the IDP​

What is the Internal Developer Portal?​

What Can You Do?​

Key Benefits​

Getting Started​

1. Access the Portal​

2. Sign In​

3. First Login​

4. Understand Your Permissions​

Registering Your Application​

Prerequisites​

Step 1: Prepare catalog-info.yaml​

Step 2: Define API (Optional)​

Step 3: Commit and Push​

Step 4: Register in Portal​

Step 5: Verify Registration​

Deploying Your Application​

Deployment Overview​

Step 1: Navigate to Your Application​

Step 2: Trigger a Deployment​

Option A: Standard Deployment (Recommended for Dev/Staging)​

Option B: Blue/Green Deployment (Production)​

Option C: Canary Deployment (High-Risk Changes)​

Step 3: Monitor Deployment Progress​

Step 4: Verify Deployment​

Managing Deployments​

View Deployment History​

Rollback a Deployment​

Cancel an In-Progress Deployment​

Traffic Management​

Blue/Green Traffic Switch​

Canary Traffic Control​

Traffic Split Metrics​

Monitoring and Observability​

Application Dashboard​

Key Metrics to Monitor​

View Logs​

Alerts and Notifications​

Troubleshooting​

Common Issues​

Issue 1: Deployment Stuck in "Pending"​

Issue 2: Pods Crash After Deployment​

Issue 3: Slow Deployment​

Issue 4: Traffic Switch Fails​

Best Practices​

Development Workflow​

Deployment Best Practices​

1. Always Test in Lower Environments First​

2. Use Semantic Versioning​

3. Deploy During Low-Traffic Hours​

4. Monitor After Deployment​

5. Document Changes​

Traffic Management Best Practices​

1. Blue/Green Deployments​

2. Canary Deployments​

Configuration Best Practices​

1. Environment Variables​

2. Resource Limits​

Quick Reference​

Common Commands​

Welcome to the IDP

What is the Internal Developer Portal?

What Can You Do?

Key Benefits

Getting Started

1. Access the Portal

2. Sign In

3. First Login

4. Understand Your Permissions

Registering Your Application

Prerequisites

Step 1: Prepare catalog-info.yaml

Step 2: Define API (Optional)

Step 3: Commit and Push

Step 4: Register in Portal

Step 5: Verify Registration

Deploying Your Application

Deployment Overview

Step 1: Navigate to Your Application

Step 2: Trigger a Deployment

Option A: Standard Deployment (Recommended for Dev/Staging)

Option B: Blue/Green Deployment (Production)

Option C: Canary Deployment (High-Risk Changes)

Step 3: Monitor Deployment Progress

Step 4: Verify Deployment

Managing Deployments

View Deployment History

Rollback a Deployment

Cancel an In-Progress Deployment

Traffic Management

Blue/Green Traffic Switch

Canary Traffic Control

Traffic Split Metrics

Monitoring and Observability

Application Dashboard

Key Metrics to Monitor

View Logs

Alerts and Notifications

Troubleshooting

Common Issues

Issue 1: Deployment Stuck in "Pending"

Issue 2: Pods Crash After Deployment

Issue 3: Slow Deployment

Issue 4: Traffic Switch Fails

Best Practices

Development Workflow

Deployment Best Practices

1. Always Test in Lower Environments First

2. Use Semantic Versioning

3. Deploy During Low-Traffic Hours

4. Monitor After Deployment

5. Document Changes

Traffic Management Best Practices

1. Blue/Green Deployments

2. Canary Deployments

Configuration Best Practices

1. Environment Variables

2. Resource Limits

Quick Reference

Common Commands