Kubernetes Architecture: The 10,000-Foot View
Kubernetes (K8s) is a container orchestration platform. It schedules containers across a cluster of machines, handles networking between them, scales them up/down based on load, restarts them when they crash, and rolls out updates without downtime.
The architecture: the Control Plane (API Server, etcd, Scheduler, Controller Manager) manages cluster state. Worker Nodes run your containers in Pods. You interact with the API Server using kubectl or YAML manifests.
Everything in Kubernetes is a "resource" with a declarative YAML specification. You declare the desired state ("I want 3 replicas of this container"), and Kubernetes continuously works to make the actual state match the desired state.
Key Takeaways
Pods, Deployments & ReplicaSets
A Pod is the smallest deployable unit — one or more containers that share networking (same IP), storage, and lifecycle. In practice, most Pods run a single container. Multi-container Pods are used for sidecar patterns (log collectors, service mesh proxies).
A Deployment manages ReplicaSets and Pods. It handles: creating N replicas, rolling updates (updating containers one at a time), rollbacks (reverting to a previous version), and self-healing (restarting crashed containers).
You almost never create Pods directly. Instead, you create a Deployment, which creates a ReplicaSet, which creates Pods. This three-level hierarchy enables seamless updates and rollbacks.
# deployment.yaml — Production-ready deployment apiVersion: apps/v1 kind: Deployment metadata: name: ink-api labels: app: ink-api spec: replicas: 3 selector: matchLabels: app: ink-api strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # Max 1 extra Pod during update maxUnavailable: 0 # Never reduce below desired count template: metadata: labels: app: ink-api spec: containers: - name: api image: inkandhorizon/api:v2.1.0 ports: - containerPort: 3000 resources: requests: cpu: 250m memory: 256Mi limits: cpu: 500m memory: 512Mi readinessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 15 periodSeconds: 20 env: - name: DATABASE_URL valueFrom: secretKeyRef: name: ink-secrets key: database-url
Key Takeaways
Services & Networking
Pods get random IPs that change on restart. Services provide a stable DNS name and IP that routes traffic to healthy Pods. There are three Service types: ClusterIP (internal traffic), NodePort (expose on every node), and LoadBalancer (cloud provider load balancer).
For production ingress (routing external HTTP/HTTPS traffic to services), Kubernetes 1.31 recommends the Gateway API — the next-generation replacement for the Ingress resource.
# Service: Stable endpoint for Pods apiVersion: v1 kind: Service metadata: name: ink-api-service spec: selector: app: ink-api # Routes to Pods with this label ports: - port: 80 targetPort: 3000 type: ClusterIP # Internal only (default) --- # Gateway API: Modern HTTP routing (replaces Ingress) apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: ink-routes spec: parentRefs: - name: main-gateway hostnames: - "inkandhorizon.com" rules: - matches: - path: type: PathPrefix value: /api backendRefs: - name: ink-api-service port: 80 - matches: - path: type: PathPrefix value: / backendRefs: - name: ink-web-service port: 80
HPA v2: Horizontal Pod Autoscaling
HPA v2 automatically scales the number of Pod replicas based on metrics: CPU utilization, memory usage, custom metrics (request rate, queue depth), or external metrics (cloud provider metrics).
The key parameters: minReplicas (floor), maxReplicas (ceiling), and target utilization. HPA checks metrics every 15 seconds and scales up/down to keep utilization near the target. There is a 5-minute cooldown after scale-down to prevent oscillation.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ink-api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ink-api
minReplicas: 2
maxReplicas: 10
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50 # Scale up max 50% at a time
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # 5min cooldown
policies:
- type: Pods
value: 1 # Scale down 1 Pod at a time
periodSeconds: 60
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale when CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Scale when memory > 80%Key Takeaways
Pod Disruption Budgets & High Availability
Pod Disruption Budgets (PDBs) protect your application during voluntary disruptions (node maintenance, cluster upgrades, spot instance interruptions). They guarantee that a minimum number of Pods remain available during disruptions.
Without a PDB, a node drain during maintenance can take down ALL your Pods on that node simultaneously. With a PDB saying "keep at least 2 Pods running," Kubernetes drains nodes one at a time, waiting for replacement Pods to start before draining the next.
# PDB: Always keep at least 2 replicas available apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: ink-api-pdb spec: minAvailable: 2 # OR: maxUnavailable: 1 selector: matchLabels: app: ink-api # Topology Spread: Distribute Pods across nodes/zones apiVersion: apps/v1 kind: Deployment spec: template: spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: ink-api - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: ink-api
Key Takeaways
Key Takeaways
Kubernetes is the "operating system of the cloud" — it schedules, scales, heals, and updates your containers across a cluster. The core concepts are: Pods (compute), Services (networking), Deployments (lifecycle), HPA (scaling), and PDBs (availability).
For system design interviews: explain the Pod → ReplicaSet → Deployment hierarchy, describe rolling updates with zero downtime, discuss HPA scaling policies, and demonstrate how PDBs + topology spread constraints achieve high availability.
In 2026, the Gateway API replaces Ingress for HTTP routing. Use HPA v2 with behavior controls for stable autoscaling. Always set resource requests/limits and health probes on every container.