Kueue Quick Start: Fair-Share GPU/TPU Queueing on Kubernetes

Kueue Quick Start: Fair-Share GPU/TPU Queueing on Kubernetes

This post helps you understand what Kueue is, why you need it, and how to use it on shared GPU/TPU clusters in about 10 minutes.

Why Kubernetes Alone Is Not Enough

Kubernetes can run batch workloads through Job or CronJob, but it does not natively manage admission timing, queueing behavior, and cluster-level resource governance for those workloads. Kubernetes is excellent at placing Pods on nodes, but not at deciding when a batch workload should be admitted.

For AI/ML training jobs, you often need expensive accelerators like GPUs or TPUs. In multi-team shared clusters, the native scheduling model quickly shows limitations:

  • No true queueing mechanism: when resources are insufficient, Jobs may fail, or Pods remain Pending instead of being admitted by policy.
  • No quota governance: one team can consume all GPUs/TPUs while others wait.
  • No workload priority at admission: low-priority experiments can block high-priority production training.
  • No fair sharing guarantees: teams cannot be guaranteed a reasonable share of cluster resources.

Kueue (pronounced like “queue”) is Kubernetes-native job queueing designed to solve exactly these problems. It can hold Jobs in queue and only admit them when enough resources are available within quota.

What Is Kueue?

Kueue is a Kubernetes controller that sits between Job submission and the Kubernetes scheduler. It does not replace the scheduler. Instead, it decides when Jobs are allowed to start.

If Kubernetes is the cluster’s operating system, Kueue is the air traffic control tower deciding which flight (workload) can take off and when.

Core capabilities provided by Kueue:

  • Quota Management: enforce per-team resource limits.
  • Priority-based Scheduling: high-priority workloads are admitted first.
  • Fair Sharing: distribute GPU/TPU resources fairly across teams.
  • Resource-aware Admission: admit Jobs only when required resources are available.

Kueue vs Job vs JobSet

A common question is: “If I already use JobSet, do I still need Kueue?”
Yes, because they solve different layers of the problem and work best together.

Layer Tool Responsibility
Pod scheduling Native Kubernetes Scheduler Places Pods onto specific nodes
Job orchestration JobSet Groups dependent Jobs into one coordinated unit (for example, 1 leader + 3 workers starting together)
Job admission Kueue Decides when a Job can run based on quota, priority, and fair-sharing policy
  • Job only: single batch task, no queue governance. Good for simple dedicated-cluster workloads.
  • JobSet only: multi-role distributed workload orchestration, but no cluster-wide quota governance.
  • Kueue only: queueing and quota control for independent Jobs.
  • Kueue + JobSet: full solution for large-scale TPU/GPU training with both orchestration and resource governance.

In short: JobSet controls how workloads run; Kueue controls when they can run.

Core Concepts

Before writing YAML, understand these four Kueue abstractions:

Concept Description Analogy
ResourceFlavor Defines resource types in the cluster (for example, TPU v6e 1x1 topology, A100 GPU, CPU-only nodes), mapped via node labels Menu: all dishes the kitchen can serve
ClusterQueue Cluster-scoped queue with per-flavor quotas (nominalQuota) to control total concurrent resource consumption Kitchen: controls total capacity
LocalQueue Namespace-scoped queue where users submit workloads; forwards requests to ClusterQueue for admission checks Waiter: takes your order and sends it to the kitchen
Workload Kueue’s internal object wrapping a Job/JobSet, auto-created when a Job has kueue.x-k8s.io/queue-name Order ticket: tracks request state

When to Use Kueue

Scenario Why Kueue Helps Typical Example
Multi-team shared cluster Teams compete for limited GPU/TPU resources; quota + fair sharing are required Team A and Team B share the same 8x A100 cluster
High-value accelerators TPU/GPU resources should not sit idle due to poor admission behavior TPU v6e is expensive; one idle hour wastes cost
Batch workloads Training, data pipelines, and CI/CD jobs can wait in queue Model training queues during peak hours and runs off-peak
Cloud cost control Hard quotas prevent overprovisioning and unexpected spend Limit a team to 16 TPU chips max
Multi-slice TPU training JobSet orchestrates distributed slices; Kueue controls admission timing 3 TPU slices for multislice JAX training

If more than one team submits batch jobs to the same cluster, Kueue is usually worth adopting. If you already use JobSet for multi-node training, adding Kueue completes the resource-governance layer.

How It Works: Admission Flow

  1. Submit a Job or JobSet with kueue.x-k8s.io/queue-name targeting a LocalQueue.
  2. Kueue creates a Workload object and checks available ClusterQueue quota.
  3. If quota is sufficient, the Job is admitted and Pods proceed to scheduling.
  4. If quota is insufficient, the Job waits in queue until resources are released.

Quick Start: Configure Kueue in 3 Steps

This example uses CPU and memory as quota resources so you can validate Kueue behavior on any standard Kubernetes cluster, without GPUs or TPUs.

Architecture Overview

Step 1: Install Kueue

# Install Kueue
helm install kueue oci://registry.k8s.io/kueue/charts/kueue \
    --namespace kueue-system \
    --create-namespace \
    --wait --timeout 300s

Step 2: Configure ResourceFlavor + ClusterQueue + LocalQueue

# ResourceFlavor: use the default flavor (no node restriction)
apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
  name: default-flavor
spec: {}  # Empty spec means any node is eligible
---
# ClusterQueue: set total CPU and memory quotas
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
  name: cluster-queue
spec:
  namespaceSelector: {}  # Allow all namespaces
  queueingStrategy: BestEffortFIFO
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: default-flavor
      resources:
      - name: "cpu"
        nominalQuota: 4      # Total 4 CPU cores available
      - name: "memory"
        nominalQuota: 8Gi    # Total 8Gi memory available
---
# LocalQueue: user-facing submission entrypoint
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
  namespace: default
  name: user-queue
spec:
  clusterQueue: cluster-queue

Step 3: Submit a Job with the Kueue Queue Label

Submit two Jobs, each requesting 3 CPU. Since ClusterQueue has only 4 CPU total, the first Job is admitted immediately, and the second waits until resources are released.

# Job 1: request 3 CPU + 4Gi memory
apiVersion: batch/v1
kind: Job
metadata:
  generateName: sample-job-
  labels:
    kueue.x-k8s.io/queue-name: user-queue  # Points to LocalQueue
spec:
  template:
    spec:
      containers:
      - name: worker
        image: busybox:1.36
        command: ["sh", "-c", "echo 'Hello from Kueue!'; sleep 30"]
        resources:
          requests:
            cpu: "3"
            memory: "4Gi"
      restartPolicy: Never
  backoffLimit: 0

Use these commands to observe Kueue admission behavior:

# Check ClusterQueue status (quota usage)
kubectl get clusterqueue cluster-queue -o wide

# Check Workload objects (admitted vs waiting jobs)
kubectl get workloads -n default

GKE AI Series

Eason Cao
Eason Cao Eason is an engineer working at FANNG and living in Europe. He was accredited as AWS Professional Solution Architect, AWS Professional DevOps Engineer and CNCF Certified Kubernetes Administrator. He started his Kubernetes journey in 2017 and enjoys solving real-world business problems.
comments powered by Disqus