Kubernetes

Kueue Quick Start: Fair-Share GPU/TPU Queueing on Kubernetes

This post helps you understand what Kueue is, why you need it, and how to use it on shared GPU/TPU clusters in about 10 minutes.

Why Kubernetes Alone Is Not Enough

Kubernetes can run batch workloads through Job or CronJob, but it does not natively manage admission timing, queueing behavior, and cluster-level resource governance for those workloads. Kubernetes is excellent at placing Pods on nodes, but not at deciding when a batch workload should be admitted.

For AI/ML training jobs, you often need expensive accelerators like GPUs or TPUs. In multi-team shared clusters, the native scheduling model quickly shows limitations:

No true queueing mechanism: when resources are insufficient, Jobs may fail, or Pods remain Pending instead of being admitted by policy.
No quota governance: one team can consume all GPUs/TPUs while others wait.
No workload priority at admission: low-priority experiments can block high-priority production training.
No fair sharing guarantees: teams cannot be guaranteed a reasonable share of cluster resources.

Kueue (pronounced like “queue”) is Kubernetes-native job queueing designed to solve exactly these problems. It can hold Jobs in queue and only admit them when enough resources are available within quota.

What Is Kueue?

Kueue is a Kubernetes controller that sits between Job submission and the Kubernetes scheduler. It does not replace the scheduler. Instead, it decides when Jobs are allowed to start.

If Kubernetes is the cluster’s operating system, Kueue is the air traffic control tower deciding which flight (workload) can take off and when.

Core capabilities provided by Kueue:

Quota Management: enforce per-team resource limits.
Priority-based Scheduling: high-priority workloads are admitted first.
Fair Sharing: distribute GPU/TPU resources fairly across teams.
Resource-aware Admission: admit Jobs only when required resources are available.

Kueue vs Job vs JobSet

A common question is: “If I already use JobSet, do I still need Kueue?”
Yes, because they solve different layers of the problem and work best together.

Layer	Tool	Responsibility
Pod scheduling	Native Kubernetes Scheduler	Places Pods onto specific nodes
Job orchestration	JobSet	Groups dependent Jobs into one coordinated unit (for example, 1 leader + 3 workers starting together)
Job admission	Kueue	Decides when a Job can run based on quota, priority, and fair-sharing policy

Job only: single batch task, no queue governance. Good for simple dedicated-cluster workloads.
JobSet only: multi-role distributed workload orchestration, but no cluster-wide quota governance.
Kueue only: queueing and quota control for independent Jobs.
Kueue + JobSet: full solution for large-scale TPU/GPU training with both orchestration and resource governance.

In short: JobSet controls how workloads run; Kueue controls when they can run.

Core Concepts

Before writing YAML, understand these four Kueue abstractions:

Concept	Description	Analogy
ResourceFlavor	Defines resource types in the cluster (for example, TPU v6e 1x1 topology, A100 GPU, CPU-only nodes), mapped via node labels	Menu: all dishes the kitchen can serve
ClusterQueue	Cluster-scoped queue with per-flavor quotas (`nominalQuota`) to control total concurrent resource consumption	Kitchen: controls total capacity
LocalQueue	Namespace-scoped queue where users submit workloads; forwards requests to ClusterQueue for admission checks	Waiter: takes your order and sends it to the kitchen
Workload	Kueue’s internal object wrapping a Job/JobSet, auto-created when a Job has `kueue.x-k8s.io/queue-name`	Order ticket: tracks request state

When to Use Kueue

Scenario	Why Kueue Helps	Typical Example
Multi-team shared cluster	Teams compete for limited GPU/TPU resources; quota + fair sharing are required	Team A and Team B share the same 8x A100 cluster
High-value accelerators	TPU/GPU resources should not sit idle due to poor admission behavior	TPU v6e is expensive; one idle hour wastes cost
Batch workloads	Training, data pipelines, and CI/CD jobs can wait in queue	Model training queues during peak hours and runs off-peak
Cloud cost control	Hard quotas prevent overprovisioning and unexpected spend	Limit a team to 16 TPU chips max
Multi-slice TPU training	JobSet orchestrates distributed slices; Kueue controls admission timing	3 TPU slices for multislice JAX training

If more than one team submits batch jobs to the same cluster, Kueue is usually worth adopting. If you already use JobSet for multi-node training, adding Kueue completes the resource-governance layer.

How It Works: Admission Flow

Submit a Job or JobSet with kueue.x-k8s.io/queue-name targeting a LocalQueue.
Kueue creates a Workload object and checks available ClusterQueue quota.
If quota is sufficient, the Job is admitted and Pods proceed to scheduling.
If quota is insufficient, the Job waits in queue until resources are released.

Quick Start: Configure Kueue in 3 Steps

This example uses CPU and memory as quota resources so you can validate Kueue behavior on any standard Kubernetes cluster, without GPUs or TPUs.

Architecture Overview

Step 1: Install Kueue

# Install Kueue
helm install kueue oci://registry.k8s.io/kueue/charts/kueue \
    --namespace kueue-system \
    --create-namespace \
    --wait --timeout 300s

Step 2: Configure ResourceFlavor + ClusterQueue + LocalQueue

# ResourceFlavor: use the default flavor (no node restriction)
apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
  name: default-flavor
spec: {}  # Empty spec means any node is eligible
---
# ClusterQueue: set total CPU and memory quotas
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
  name: cluster-queue
spec:
  namespaceSelector: {}  # Allow all namespaces
  queueingStrategy: BestEffortFIFO
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: default-flavor
      resources:
      - name: "cpu"
        nominalQuota: 4      # Total 4 CPU cores available
      - name: "memory"
        nominalQuota: 8Gi    # Total 8Gi memory available
---
# LocalQueue: user-facing submission entrypoint
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
  namespace: default
  name: user-queue
spec:
  clusterQueue: cluster-queue

Step 3: Submit a Job with the Kueue Queue Label

Submit two Jobs, each requesting 3 CPU. Since ClusterQueue has only 4 CPU total, the first Job is admitted immediately, and the second waits until resources are released.

# Job 1: request 3 CPU + 4Gi memory
apiVersion: batch/v1
kind: Job
metadata:
  generateName: sample-job-
  labels:
    kueue.x-k8s.io/queue-name: user-queue  # Points to LocalQueue
spec:
  template:
    spec:
      containers:
      - name: worker
        image: busybox:1.36
        command: ["sh", "-c", "echo 'Hello from Kueue!'; sleep 30"]
        resources:
          requests:
            cpu: "3"
            memory: "4Gi"
      restartPolicy: Never
  backoffLimit: 0

Use these commands to observe Kueue admission behavior:

# Check ClusterQueue status (quota usage)
kubectl get clusterqueue cluster-queue -o wide

# Check Workload objects (admitted vs waiting jobs)
kubectl get workloads -n default

GKE AI Series

07 Mar 2026

« JobSet: Make Kubernetes Truly Orchestrate Multi-Job Workloads

A Kueue Test Cleanup That Explains TAS Rank and Greedy Assignment »

Eason Cao Follow Eason is an engineer working at FANNG and living in Europe. He was accredited as AWS Professional Solution Architect, AWS Professional DevOps Engineer and CNCF Certified Kubernetes Administrator. He started his Kubernetes journey in 2017 and enjoys solving real-world business problems.