Comprehensive Guide: Running GPU Workloads on Kubernetes
Understanding GPU Computing in Kubernetes
Architectural Overview
GPU (Graphics Processing Unit) computing leverages specialized processors originally designed for rendering graphics to perform parallel computations. Unlike CPUs which are optimized for sequential processing with few cores, GPUs contain thousands of smaller cores optimized for handling multiple tasks simultaneously.
Key Components in Kubernetes GPU Architecture:
- Hardware Layer
- Physical GPU cards (e.g., NVIDIA Tesla, V100, A100)
- PCIe interface connection
- GPU memory (VRAM)
- CUDA cores for parallel processing
- Driver Layer
- NVIDIA drivers: Interface between hardware and software
- CUDA toolkit: Programming interface for GPU computing
- Container runtime hooks: Enable GPU access from containers
- Container Runtime Layer
- NVIDIA Container Toolkit (nvidia-docker2)
- Container runtime (containerd/Docker)
- Device plugin interface
- Kubernetes Layer
- NVIDIA Device Plugin
- Kubernetes scheduler
- Resource allocation system
How GPU Scheduling Works in Kubernetes
Resource Advertisement
- The NVIDIA Device Plugin runs as a DaemonSet
- Discovers GPUs on each node
- Advertises GPU resources to Kubernetes API server
- Updates node capacity with
nvidia.com/gpu
resource
Resource Scheduling Process
Pod Request → Scheduler → Device Plugin → GPU Allocation
- Pod makes GPU request through resource limits
- Kubernetes scheduler finds eligible nodes
- Device Plugin handles GPU assignment
- Container runtime configures GPU access
GPU Resource Isolation
- Each container gets exclusive access to assigned GPUs
- GPU memory is not oversubscribed
- NVIDIA driver enforces hardware-level isolation
Deep Dive: NVIDIA Device Plugin Workflow
Initialization Phase
Device Plugin Start → GPU Discovery → Resource Registration
- Scans system for NVIDIA GPUs
- Creates socket for Kubernetes communication
- Registers as device plugin with kubelet
Operation Phase
List GPUs → Monitor Health → Handle Allocation
- Maintains list of available GPUs
- Monitors GPU health and status
- Handles allocation requests from kubelet
Resource Management
Pod Request → Allocation → Environment Setup → Container Start
- Maps GPU devices to containers
- Sets up NVIDIA runtime environment
- Configures container GPU access
Multi-Instance GPU (MIG) Technology
For NVIDIA A100 GPUs, MIG allows: 1. Partitioning - Single GPU split into up to 7 instances - Each instance has dedicated: * Compute resources * Memory * Memory bandwidth * Cache
- Isolation Levels
- Hardware-level isolation
- Memory protection
- Error containment
- QoS guarantee
GPU Memory Management
Allocation Modes
- Exclusive process mode
- Time-slicing mode (shared)
- MIG mode (for A100)
Memory Hierarchy
Global Memory → L2 Cache → L1 Cache → CUDA Cores
Resource Limits
- GPU memory is not oversubscribed
- Containers see full GPU memory
- Memory limits enforced by NVIDIA driver
[Previous content remains the same…]
Understanding the GPU Workflow in Kubernetes
1. Pod Scheduling Flow
Pod Creation → Resource Check → Node Selection → GPU Binding → Container Start
- Pod Creation
- User creates pod with GPU requirements
- Scheduler receives pod specification
- Resource Check
- Scheduler checks GPU availability
- Validates node constraints and taints
- Considers topology requirements
- Node Selection
- Scheduler selects optimal node
- Considers GPU availability and type
- Evaluates other scheduling constraints
- GPU Binding
- Device plugin assigns specific GPUs
- Sets up environment variables
- Configures container runtime
- Container Start
- Container runtime initializes with GPU access
- NVIDIA driver provides GPU interface
- Application gains GPU access
2. Data Flow in GPU Computing
Application → CUDA API → Driver → GPU → Memory → Results
- Application Layer
- Makes CUDA API calls
- Manages data transfers
- Orchestrates computations
- CUDA Layer
- Translates API calls to driver commands
- Manages memory transfers
- Handles kernel execution
- Driver Layer
- Controls GPU hardware
- Manages memory allocation
- Schedules operations
- Hardware Layer
- Executes CUDA kernels
- Performs memory operations
- Returns results
3. Resource Lifecycle
Allocation → Usage → Release → Cleanup
- Resource Allocation
- Kubernetes reserves GPU
- Device plugin configures access
- Container gets exclusive use
- Resource Usage
- Application uses GPU
- Monitoring tracks utilization
- Resource limits enforced
- Resource Release
- Pod termination triggers release
- GPU returned to pool
- Resources cleaned up
- Cleanup Process
- Memory cleared
- GPU state reset
- Resources marked available
[Previous content remains the same…]
Best Practices and Guidelines
Resource Optimization
- Batch Processing
- Group similar workloads
- Use job queues effectively
- Implement proper backoff strategies
- Memory Management
- Monitor GPU memory usage
- Implement proper cleanup
- Use appropriate batch sizes
- Compute Optimization
- Use optimal CUDA algorithms
- Balance CPU and GPU work
- Minimize data transfers
Enough talk, let’s dig in
Prerequisites
Before starting, ensure you have: - An active Kubernetes cluster (e.g., AKS cluster) - kubectl
command-line tool installed and configured - Access to create or modify node pools - Helm v3 installed (for NVIDIA device plugin installation)
GPU Node Pool Setup
1. Select appropriate GPU VM size
Choose a GPU-enabled VM size based on your workload requirements. Common options include: - Standard_NC6s_v3 (1 NVIDIA Tesla V100) - Standard_NC24rs_v3 (4 NVIDIA Tesla V100 with RDMA) - Standard_ND96asr_v4 (8 NVIDIA A100)
2. Create GPU node pool
az aks nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name gpunodepool \
--node-count 1 \
--node-vm-size Standard_NC6s_v3 \
--node-taints sku=gpu:NoSchedule \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 3
Key parameters explained: - --node-taints
: Ensures only GPU workloads are scheduled on these expensive nodes - --enable-cluster-autoscaler
: Automatically scales nodes based on demand - --min-count
and --max-count
: Define scaling boundaries
NVIDIA Driver Installation
Option 1: Default AKS Installation
By default, AKS automatically installs NVIDIA drivers on GPU-capable nodes. This is the recommended approach for most users.
Option 2: Manual Installation (if needed)
If you need to manually install or update drivers:
- Connect to the node:
# Get node name
kubectl get nodes
# Connect to node (requires SSH access)
ssh username@node-ip
- Install NVIDIA drivers:
# Update package list
sudo apt update
# Install ubuntu-drivers utility
sudo apt install -y ubuntu-drivers-common
# Install recommended NVIDIA drivers
sudo ubuntu-drivers install
# Reboot node
sudo reboot
NVIDIA Device Plugin Installation
The NVIDIA device plugin is required for Kubernetes to recognize and schedule GPU resources.
Using Helm (Recommended Method)
- Add NVIDIA Helm repository:
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
- Install the device plugin:
helm install nvdp nvdp/nvidia-device-plugin \
--version=0.15.0 \
--namespace nvidia-device-plugin \
--create-namespace
Verification Steps
- Verify node GPU capability:
kubectl get nodes -o wide
kubectl describe node <gpu-node-name>
Look for the following in the output:
Capacity:
nvidia.com/gpu: 1
Allocatable:
nvidia.com/gpu: 1
- Test GPU detection:
# Create a test pod
kubectl run nvidia-smi --rm -it \
--image=nvidia/cuda:11.8.0-base-ubuntu22.04 \
--limits=nvidia.com/gpu=1 \
--command -- nvidia-smi
Running GPU Workloads
Example GPU Workload YAML
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: gpu-container
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["nvidia-smi", "-l", "30"]
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: "sku"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
Key components explained: - resources.limits.nvidia.com/gpu
: Specifies GPU requirement - tolerations
: Matches node taints to allow scheduling - image
: Use CUDA-compatible container image
Deploy the workload:
kubectl apply -f gpu-workload.yaml
Monitoring and Troubleshooting
Monitor GPU Usage
Using Container Insights, you can monitor: - GPU duty cycle - GPU memory usage - Number of GPUs allocated/available
Key metrics: - containerGpuDutyCycle
- containerGpumemoryUsedBytes
- nodeGpuAllocatable
Common Troubleshooting Steps
- Check GPU driver status:
kubectl exec -it <pod-name> -- nvidia-smi
- Verify NVIDIA device plugin:
kubectl get pods -n nvidia-device-plugin
- Check pod events:
kubectl describe pod <pod-name>
Best Practices and Advanced Considerations
1. Resource Management Strategy
Optimal Resource Allocation
- Define explicit GPU resource requests and limits in pod specifications
- Implement resource quotas at namespace level to control GPU allocation
- Use pod priority classes for critical GPU workloads
apiVersion: v1
kind: Pod
metadata:
name: gpu-workload
spec:
priorityClassName: high-priority-gpu
containers:
- name: gpu-container
resources:
limits:
nvidia.com/gpu: 1
requests:
nvidia.com/gpu: 1
Node Management
- Implement node taints and tolerations for GPU nodes
# Node taint example
kubectl taint nodes gpu-node-1 gpu=true:NoSchedule
# Pod toleration example
tolerations:
- key: "gpu"
operator: "Equal"
value: "true"
effect: "NoSchedule"
- Use node labels for GPU-specific workload targeting
- Configure node affinity rules for specialized GPU workloads
2. Container Optimization
Image Management
- Base image selection:
- Use official NVIDIA CUDA images as base
- Choose appropriate CUDA version for your workload
- Consider slim variants for reduced image size
Build Optimization
- Implement multi-stage builds to minimize image size
- Include only necessary CUDA libraries
- Cache commonly used data in persistent volumes
# Example multi-stage build
FROM nvidia/cuda:11.8.0-base-ubuntu22.04 as builder
# Build steps...
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
COPY --from=builder /app/binary /app/
3. Performance Monitoring and Optimization
Metrics Collection
- Implement comprehensive monitoring:
- GPU utilization percentage
- Memory usage patterns
- Temperature and power consumption
- Error rates and throttling events
Alert Configuration
# Example PrometheusRule for GPU alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: gpu-alerts
spec:
groups:
- name: gpu.rules
rules:
- alert: HighGPUUsage
expr: nvidia_gpu_duty_cycle > 90
for: 10m
4. Cost Management
Resource Scheduling
- Implement spot instances for fault-tolerant workloads
- Use node auto-scaling based on GPU demand
- Configure pod disruption budgets for critical workloads
Workload Optimization
# Example CronJob for off-peak processing
apiVersion: batch/v1
kind: CronJob
metadata:
name: gpu-batch-job
spec:
schedule: "0 2 * * *" # Run at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: gpu-processor
resources:
limits:
nvidia.com/gpu: 1
5. Security Implementation
Access Control
- Implement RBAC for GPU resource access
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: gpu-user
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create"]
resourceNames: ["nvidia.com/gpu"]
Runtime Security
- Enable SecurityContext for GPU containers
- Implement network policies for GPU workloads
- Regular security scanning of GPU container images
6. Advanced Scenarios
Multi-Instance GPU (MIG) Configuration
# Example MIG profile configuration
apiVersion: v1
kind: Pod
metadata:
name: mig-workload
spec:
containers:
- name: gpu-container
resources:
limits:
nvidia.com/mig-1g.5gb: 1
RDMA for High-Performance Computing
- Configure RDMA-capable networks
- Optimize for low-latency communication
- Implement proper RDMA security controls
Development Environment Optimization
- Set up GPU sharing for development teams
- Implement resource quotas per development namespace
- Create development-specific GPU profiles
Finishing Points
- Regular Maintenance Checklist
- Weekly driver updates verification
- Monthly performance baseline checking
- Quarterly capacity planning review
- Documentation Requirements
- Maintain GPU allocation policies
- Document troubleshooting procedures
- Keep upgrade procedures updated
- Future Considerations
- Plan for GPU architecture upgrades
- Evaluate emerging GPU technologies
- Monitor Kubernetes GPU feature development
- Emergency Procedures
- GPU failure handling protocol
- Workload failover procedures
- Emergency contact information