orchestrationinfrastructureperformance

Deployment Patterns for Heterogeneous Compute: Scheduling GPUs and RISC‑V Cores with NVLink

ddev tools

2026-02-04

10 min read

Practical orchestration patterns for RISC‑V CPUs and NVLink‑connected GPUs: topology-aware scheduling, device plugins, MIG, and cost optimization (2026).

Hook: The orchestration headache of mixed RISC‑V CPUs and NVLink‑connected GPUs

You know the problem: toolchains are fragmented, scheduling is ad hoc, and cloud bills spike when a job misplaces data or waits on slow PCIe transfers. In 2026, with SiFive integrating NVLink Fusion into RISC‑V platforms and cloud vendors offering more sovereign edge options, workloads that span RISC‑V CPU cores and Nvidia GPUs over NVLink are no longer hypothetical — they're production targets. This creates new orchestration, runtime, and scheduler challenges — and big opportunities to reduce cost and improve throughput if you get deployment patterns right.

Executive summary (most important first)

Key takeaways:

Design clusters around NVLink domains: treat an NVLink‑connected RISC‑V CPU + GPU fabric as a first‑class scheduling unit.
Use device plugins + Resource Topology APIs to expose NVLink topology and GPU partitions (MIG) to schedulers.
Prefer co‑scheduling (gang scheduling) for latency‑sensitive, tightly coupled jobs; use bin‑packing and prefetching for throughput jobs.
Leverage multi‑arch container images or native RISC‑V nodes; avoid QEMU unless for testing.
Optimize cost with node pools, autoscaling, and GPU sharing (MIG) while respecting NVLink locality to avoid cross‑fabric traffic.

Why NVLink + RISC‑V changes scheduling in 2026

Late 2025 and early 2026 saw a jump in hardware variety: SiFive announced integration of NVLink Fusion with RISC‑V IP, enabling tighter CPU↔GPU coherency and lower latency than traditional PCIe links. At the same time, orchestration projects (Kubernetes, cluster autoscalers and cloud providers) have shipped richer topology APIs and device plugin maturity that make it feasible to schedule on fabric locality rather than just per‑node resources.

That combination means you can exploit shared address spaces and peer‑to‑peer GPU transfers to dramatically cut host CPU overhead and data movement. But it also breaks naive scheduling models that treat CPU, memory and GPU as independent resources. To get predictable performance and lower costs you must schedule with NVLink topology in mind.

Core deployment patterns

1) NVLink domain nodes (preferred)

Group hardware that shares NVLink into a logical scheduling domain. A domain can be a single board (RISC‑V SoC + GPU), a chassis with multiple GPUs connected by NVLink/PCIe switches, or a rack with NVLink Fusion fabrics.

Expose a single composite resource representing the domain (e.g., nvlink-domain/1), plus fine‑grained GPU and CPU resources.
Schedule pods that need tight CPU↔GPU coupling to the domain resource; this prevents cross‑fabric traffic and keeps latency low.

2) Co‑scheduling (gang / affinity scheduling)

For workloads that partition work across RISC‑V threads and GPU kernels (e.g., inference pipelines that do pre/post processing on CPU and ML ops on GPU), use co‑scheduling:

Use Kubernetes PodAffinity or a scheduler extender to ensure the CPU and GPU containers land on the same NVLink domain.
For multi‑pod jobs, use a Job controller that submits all pods together and blocks scheduling until a suitable NVLink domain is available.

3) Gang scheduling with queueing tokens

Implement a lightweight admission controller that issues a token for the NVLink domain. Pods in a gang must hold the token to proceed. This reduces partial allocation and wasted GPU cycles.

4) Lazy data staging and prefetching

Use a small helper pod (or init container) pinned to the NVLink domain to prefetch model weights into GPU memory using unified memory or NVLink peer‑to‑peer, reducing startup latency. Combine prefetching with eviction policies to keep hot models resident.

How to expose NVLink topology to Kubernetes

You need the control plane to see not just “2 gpus” but the NVLink connectivity graph and NUMA zones. Combine these components:

Device Plugin: Build or extend a device plugin to enumerate GPUs, MIG instances, and NVLink links. Publish resources like nvidia.com/gpu, nvidia.com/mig-0g.1, and nvlink.zone/0.
Resource Topology API: Use the Resource Topology API (mature by 2025/26) or the Node Resource Topology CR to publish detailed NUMA and device locality to the scheduler — see work on edge orchestration & topology for related patterns.
Node Feature Discovery (NFD): Advertise CPU microarchitecture (riscv64 variants) and NVLink fusion capabilities as labels so you can select compatible nodes.

Device plugin example (high level)

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvlink-device-plugin
spec:
  template:
    spec:
      containers:
      - name: nvlink-plugin
        image: myorg/nvlink-device-plugin:2026
        args: ["--publish-topology", "--advertise-nvlink"]

This plugin should register extended resources and optionally expose a gRPC endpoint for scheduler extenders to query topology.

Scheduling strategies and examples

Topology‑aware scheduling

Enable or build a scheduler that uses topology hints. Kubernetes' default scheduler supports affinity, anti‑affinity and resource requests; extend it with a scheduler extender or a custom scheduler that takes Node Resource Topology and device plugin hints to compute placement scores.

Score higher nodes that satisfy:

Same NVLink domain
Enough GPU memory or MIG slices for the job
Available RISC‑V CPU cores with required ISA subsets (vector units, FPU)

Sample Pod manifest requesting NVLink locality

apiVersion: v1
kind: Pod
metadata:
  name: inference-pod
spec:
  containers:
  - name: preproc
    image: myorg/preproc:riscv64
    resources:
      requests:
        cpu: "2"
      limits:
        cpu: "4"
    nodeSelector:
      risc-v.arch: "rv64g"
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: nvlink.domain
              operator: In
              values:
              - domain-42
  - name: gpu-worker
    image: myorg/gpu-worker:latest
    resources:
      limits:
        nvidia.com/gpu: 1
    env:
    - name: NVLINK_DOMAIN
      valueFrom:
        fieldRef:
          fieldPath: metadata.labels['nvlink.domain']
  tolerations:
  - key: "nvlink.reserve"
    operator: "Exists"

This manifest pins a CPU container to the RISC‑V node and a GPU container to a node in the same NVLink domain via node labels and affinity.

Scheduler extenders and admission controls

For complex topologies create a scheduler extender that receives candidate nodes and computes a topology score. The extender can also act as an admission controller to implement gang scheduling tokens and NVLink resource booking.

Runtime considerations

Container runtimes and multi‑arch images

By 2026 RISC‑V native nodes are available from multiple silicon vendors. Use multi‑arch image manifests so the right binaries are pulled for riscv64 vs amd64 nodes:

docker buildx build --platform linux/riscv64,linux/amd64 -t myorg/gpu-worker:2026 --push .

Avoid runtime emulation (QEMU) in production — performance and deterministic behavior suffer. Use native RISC‑V builds or stage work on RISC‑V nodes. For examples and starter manifests see our micro-app templates.

GPU runtime (NVIDIA Container Toolkit)

Use the NVIDIA Container Toolkit (containerd or CRI) with NVLink and MIG support enabled. Confirm the toolkit version supports NVLink Fusion features and RISC‑V host kernels where applicable.

NUMA, CPUManager and TopologyManager

Enable Kubernetes CPUManager for static CPU allocation and TopologyManager with policy set to single-numa-node or restricted depending on how tight your NVLink/NUMA coupling is. This prevents CPU threads from being scheduled on a NUMA node that is remote from the GPU's NVLink attachment. Integrate these settings with monitoring and instrumentation to detect drift and tail-latency issues.

Performance tuning checklist

Pin critical threads with cpuset and set CPUManagerPolicy: static.
Use hugepages for workloads with heavy memory I/O.
Enable PCIe/NVLink peer access and verify with nvidia-smi topo -m.
Use MIG to partition GPUs for multiple tenants while preserving NVLink locality; ensure partitions map to NVLink endpoints.
Measure end‑to‑end latency; if CPU↔GPU transfer dominates, prefer unified memory or NVLink direct access patterns.

Cost optimization patterns

Specialized hardware costs: RISC‑V+NVLink racks are expensive. Use these tactics:

Node pools: Keep specialized NVLink nodes in a separate node pool; route only compatible workloads to them.
MIG sharing: Use MIG to increase utilization with multiple smaller workloads sharing a GPU.
Autoscaler with topology awareness: Extend the cluster autoscaler to request additional NVLink domains (new nodes) only when a full domain is needed — avoid scaling for partial allocations. Tie autoscaler hooks into your operational runbook such as the Operational Playbook.
Spot vs reserved: For batch jobs use cheaper reserved or spot racks where available, but avoid preemptible nodes for tight latency‑sensitive inference unless your framework supports graceful migration and checkpointing.
Prefetching and caching: Keep popular models resident on NVLink domains to reduce repeated transfer costs — patterns here are similar to edge caching and prefetch techniques discussed in edge testbed work.

Security, compliance and sovereignty

Specialized hardware often lives in sovereign or private clouds due to export controls, data locality and firmware trust needs. AWS and other cloud providers announced sovereign cloud offerings in 2025/26; when deploying RISC‑V+NVLink fabrics consider:

Supply chain attestations for RISC‑V cores and GPU firmware, and strong device onboarding processes.
Private clusters or sovereign cloud regions when data residency is required.
RBAC for NVLink domain reservation and GPU access logs for chargeback.

Operational playbook (step‑by‑step)

Inventory hardware: map NVLink links, MIG slices, RISC‑V core features, and NUMA nodes.
Deploy a device plugin that publishes GPUs, MIGs, and NVLink domain labels.
Install Node Feature Discovery to expose RISC‑V ISA variants and acceleration features.
Publish Node Resource Topology so the scheduler can reason about locality.
Create node pools and taint NVLink domains to isolate workloads until you implement scheduling policies.
Implement scheduler extender or custom scheduler with topology scoring and gang scheduling token logic.
Build multi‑arch images and validate on testbeds that mirror the NVLink topology.
Run perf benchmarks (microbenchmarks, end‑to‑end) and iterate on CPU pinning, hugepages and MIG configurations.
Enable autoscaler hooks that understand NVLink domains to avoid partial scale events.

Real‑world example: Inference pipeline

Scenario: a stream processing pipeline performs data normalization on RISC‑V cores, runs a transformer inference on an NVLink‑attached GPU, then aggregates results on CPU.

Pattern applied:

Use a single pod with two containers: CPU preproc (riscv64 image) and GPU worker. Both pinned to the same NVLink domain via nodeSelector/affinity.
Device plugin publishes MIG slices; the GPU worker requests a MIG slice to allow multiple inference pods per GPU.
TopologyManager ensures CPU threads are on the same NUMA node as the NVLink endpoint.
Autoscaler adds full NVLink domain nodes only when all MIG slices are consumed.

Outcome: 30–60% reduction in end‑to‑end latency due to reduced data movement and improved CPU/GPU locality; 20% cost saving by packing multiple inference streams using MIG.

Troubleshooting common failure modes

Pods land on nodes with missing NVLink: Check device plugin logs and NodeResourceTopology objects.
High tail latency: Verify CPU pinning, NUMA placement, and that NVLink peer access is enabled.
MIG slice contention: Use quota and admission controls to prevent oversubscription.
Cross‑domain transfers: Add affinity rules and nodeSelector to avoid accidental cross‑NVLink scheduling.

Future trends and predictions (2026+)

Expectations for the next 12–24 months:

Standardized NVLink topology APIs: Vendors and CNCF projects will converge on a topology API for NVLink‑style fabrics.
RISC‑V edge clusters: RISC‑V will be common in edge inference nodes paired with NVLink GPUs for power efficiency.
Cloud offerings: Sovereign and vertical cloud providers will offer racks with RISC‑V+NVLink configurations for regulated industries.
Better MIG + NVLink tooling: Tools will map MIG slices to NVLink endpoints automatically, improving packing efficiency.

In 2026, topology matters as much as raw resources. If your scheduler can’t see NVLink, it can’t schedule efficiently.

Actionable checklist

Map NVLink domains and publish them via Device Plugin + Resource Topology API.
Build multi‑arch images and deploy on native RISC‑V nodes — avoid emulation in production. See our multi‑arch examples.
Enable CPUManager and TopologyManager, and pin critical threads.
Implement gang scheduling or a scheduler extender for NVLink domain reservation.
Use MIG to improve utilization but preserve NVLink locality.
Extend autoscaler to scale NVLink domains, not individual GPUs.

Conclusion and call to action

Heterogeneous compute with RISC‑V CPU cores and NVLink‑connected Nvidia GPUs is becoming mainstream in 2026. To realize cost and performance gains you must design schedulers and runtimes that understand fabric topology, co‑schedule related CPU and GPU work, and adopt runtime features like MIG and CPU pinning. Without topology‑aware placement you will pay for wasted transfers and underutilized GPUs.

Try this now: Start by deploying a device plugin that publishes NVLink domains and run a small co‑scheduled pod (CPU preproc + GPU worker) pinned to the same domain. Measure latency, iterate on NUMA pinning and MIG configuration, then extend your autoscaler to manage NVLink domains.

Need a working reference? Visit our sample repo implementing a Kubernetes device plugin, scheduler extender, and example manifests for RISC‑V + NVLink deployments — or contact our team for an architecture review tailored to your hardware topology. If you need guidance on sovereign hosting and isolation patterns, review the AWS European Sovereign Cloud material and tie those controls into your deployment plan.

dev tools

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Up Next

Why 2026 Is the Year Observability Became the Control Plane for Dev Toolchains

observability•9 min read

The Evolution of Cloud DevTools in 2026: From Observability to Autonomous Ops

field-review•10 min read

Field Test: PocketFold Z6 as a Developer Demo Booth — Minimal, Portable, and Surprisingly Dev-Friendly (2026)

From Our Network

Trending stories across our publication group

Compensating Controls for End‑of‑Life Windows Systems in Clinical Environments

allscripts.cloud

patching•10 min read

Compensating Controls for End‑of‑Life Windows Systems in Clinical Environments

Supply Chain Resilience for AI Infrastructure: Strategies for Procuring Memory and Wafers

beneficial.cloud

Supply Chain•11 min read

Supply Chain Resilience for AI Infrastructure: Strategies for Procuring Memory and Wafers

Service Workers for Creators: Caching Creator-Submitted Data Safely

cached.space

creators•10 min read

Service Workers for Creators: Caching Creator-Submitted Data Safely

2026-02-04T09:46:22.205Z

Deployment Patterns for Heterogeneous Compute: Scheduling GPUs and RISC‑V Cores with NVLink

Hook: The orchestration headache of mixed RISC‑V CPUs and NVLink‑connected GPUs

Executive summary (most important first)

Why NVLink + RISC‑V changes scheduling in 2026

Core deployment patterns

1) NVLink domain nodes (preferred)

2) Co‑scheduling (gang / affinity scheduling)

3) Gang scheduling with queueing tokens

4) Lazy data staging and prefetching

How to expose NVLink topology to Kubernetes

Device plugin example (high level)

Scheduling strategies and examples

Topology‑aware scheduling

Sample Pod manifest requesting NVLink locality

Scheduler extenders and admission controls

Runtime considerations

Container runtimes and multi‑arch images

GPU runtime (NVIDIA Container Toolkit)

NUMA, CPUManager and TopologyManager

Performance tuning checklist

Cost optimization patterns

Security, compliance and sovereignty

Operational playbook (step‑by‑step)

Real‑world example: Inference pipeline

Troubleshooting common failure modes

Future trends and predictions (2026+)

Actionable checklist

Further reading and tools

Conclusion and call to action

Related Topics

dev tools

Up Next

Why 2026 Is the Year Observability Became the Control Plane for Dev Toolchains

The Evolution of Cloud DevTools in 2026: From Observability to Autonomous Ops

Field Test: PocketFold Z6 as a Developer Demo Booth — Minimal, Portable, and Surprisingly Dev-Friendly (2026)

From Our Network

Compensating Controls for End‑of‑Life Windows Systems in Clinical Environments

Supply Chain Resilience for AI Infrastructure: Strategies for Procuring Memory and Wafers

Service Workers for Creators: Caching Creator-Submitted Data Safely

Hook: The orchestration headache of mixed RISC‑V CPUs and NVLink‑connected GPUs

Executive summary (most important first)

Why NVLink + RISC‑V changes scheduling in 2026

Core deployment patterns

1) NVLink domain nodes (preferred)

2) Co‑scheduling (gang / affinity scheduling)

3) Gang scheduling with queueing tokens

4) Lazy data staging and prefetching

How to expose NVLink topology to Kubernetes

Device plugin example (high level)

Scheduling strategies and examples

Topology‑aware scheduling

Sample Pod manifest requesting NVLink locality

Scheduler extenders and admission controls

Runtime considerations

Container runtimes and multi‑arch images

GPU runtime (NVIDIA Container Toolkit)

NUMA, CPUManager and TopologyManager

Performance tuning checklist

Cost optimization patterns

Security, compliance and sovereignty

Operational playbook (step‑by‑step)

Real‑world example: Inference pipeline

Troubleshooting common failure modes

Future trends and predictions (2026+)

Actionable checklist

Further reading and tools

Conclusion and call to action

Related Reading

Related Topics

dev tools

Up Next

Why 2026 Is the Year Observability Became the Control Plane for Dev Toolchains

The Evolution of Cloud DevTools in 2026: From Observability to Autonomous Ops

Field Test: PocketFold Z6 as a Developer Demo Booth — Minimal, Portable, and Surprisingly Dev-Friendly (2026)

From Our Network

Compensating Controls for End‑of‑Life Windows Systems in Clinical Environments

Supply Chain Resilience for AI Infrastructure: Strategies for Procuring Memory and Wafers

Service Workers for Creators: Caching Creator-Submitted Data Safely