Edge AI Prototyping Kit: Repo and Templates for Pi 5 + AI HAT+2 (Model Serving, Push Updates, Telemetry)
templatesedgestarter-kit

Edge AI Prototyping Kit: Repo and Templates for Pi 5 + AI HAT+2 (Model Serving, Push Updates, Telemetry)

UUnknown
2026-02-28
9 min read
Advertisement

Starter repo and templates to run edge AI on Raspberry Pi 5 + AI HAT+2—model serving, signed OTA, and telemetry for fast, safe rollouts.

Launch Edge AI Prototypes Faster: Pi 5 + AI HAT+2 Starter Kit (OTA, Model Serving, Telemetry)

Hook: If your team is juggling fragile toolchains, slow device onboarding, and no repeatable way to push model updates to field hardware, this kit is for you. In 2026 the expectation is simple: prototypes should move to production-grade demos in days, not months. This article gives a vetted starter repo and deployment templates to prototype on Raspberry Pi 5 + AI HAT+2 with atomic OTA updates, production-like model serving, and telemetry you can act on.

The problem space (quick)

Dev and Ops teams building edge AI face common constraints:

  • Fragmented stacks: different model runtimes (TFLite, ONNX, PyTorch), OS versions, and drivers.
  • Unreliable OTA: ad-hoc scripts that corrupt devices or require manual intervention.
  • Limited observability: no consistent telemetry for inference latency, model drift, or hardware health.
  • Slow CI/CD: container builds and device deployment are not automated for edge fleets.

Why Raspberry Pi 5 + AI HAT+2 matters in 2026

By late 2025 and into 2026 the Raspberry Pi 5 combined with vendor NPUs like the AI HAT+2 have moved the needle: on-device generative and vision models at reasonable cost are now practical for many prototypes. This makes the Pi 5 a de-facto platform for proof-of-concept edge AI—if you have reliable tooling to manage model deployment, updates, and telemetry.

What this starter kit gives your team (high level)

  • Repo skeleton: Dockerized model server (FastAPI + ONNX Runtime), systemd bootstrap, and model packaging conventions.
  • OTA templates: S3 + signed artifact approach for atomic updates, plus a Mender-compatible path for teams using managed device management.
  • Telemetry: Prometheus-compatible /metrics, logs shipped to Loki, and a lightweight MQTT pipeline for cloud events.
  • CI/CD: GitHub Actions workflows to build images, push to GHCR, publish artifacts, and trigger OTA rollouts to device groups.
  • IaC: Terraform snippets to create S3 buckets, minimal IAM policy, and a webhook Lambda to signal devices.

Use this file-tree as the canonical starter. Keep code and infra separated for clarity.

pi5-edge-ai-kit/
  ├─ device/                   # Pi device bootstrap & runtime
  │   ├─ bootstrap.sh
  │   ├─ systemd/edge-ai.service
  │   └─ model-server/         # containerized server used on-device
  │       ├─ Dockerfile
  │       ├─ app.py
  │       └─ requirements.txt
  ├─ ota/                      # OTA artifact packaging & helper scripts
  │   ├─ package-artifact.sh
  │   └─ sign-artifact.sh
  ├─ ci/                       # GitHub Actions workflows
  │   ├─ build-and-publish.yml
  │   └─ ota-rollout.yml
  ├─ infra/                    # Terraform or CloudFormation examples
  │   ├─ main.tf
  │   └─ variables.tf
  └─ docs/
      └─ getting-started.md
  

Model serving: lightweight, fast, restartless

Keep the on-device server minimal so updates are atomic and rollback-friendly. The pattern below uses a container running a simple FastAPI endpoint that loads an ONNX file from /opt/models/current/ and serves inference. The key is to support model hot-swap without container rebuilds—swap the symlink and send SIGHUP to reload.

app.py (simplified)

from fastapi import FastAPI
  import onnxruntime as ort
  import os
  import signal

  app = FastAPI()
  model_path = "/opt/models/current/model.onnx"
  session = None

  def load_model():
      global session
      session = ort.InferenceSession(model_path)

  @app.on_event("startup")
  def startup():
      load_model()

  @app.post("/infer")
  def infer(payload: dict):
      # transform payload to numpy input, omitted for brevity
      inputs = ...
      out = session.run(None, inputs)
      return {"result": out}

  def handle_reload(signum, frame):
      load_model()

  signal.signal(signal.SIGHUP, handle_reload)
  

When an OTA swaps /opt/models/current to a new model, the OTA agent sends a SIGHUP to the container PID. The process reloads the ONNX session without rebuilding the container image.

OTA strategy — safe, atomic, auditable

Two practical approaches are included in the templates. Pick the one that fits your team:

Option A — Managed device updates (Mender / balena)

  • Best when you need rollout control, delta updates, and device authentication out-of-the-box.
  • Use Mender or balena templates in the repo (replace placeholders with your account).

Option B — S3 + signed artifact + systemd updater (DIY)

Lightweight and transparent. Workflow:

  1. CI builds model artifact and packages it: tar.gz of /opt/models/ plus a manifest.json.
  2. Sign artifact with a private key; upload artifact and .sig to S3 (or your blob store).
  3. Device cron or systemd-timer checks S3 for new manifests, downloads the artifact, verifies signature, unpacks to /opt/models/, then atomically updates /opt/models/current (symlink swap) and sends SIGHUP to apps.

Example package-artifact.sh

#!/bin/bash
  set -e
  VERSION=$1
  TAR="model-${VERSION}.tar.gz"
  tar -czf ${TAR} model.onnx manifest.json
  ./sign-artifact.sh ${TAR} > ${TAR}.sig
  aws s3 cp ${TAR} s3://my-edge-artifacts/${TAR}
  aws s3 cp ${TAR}.sig s3://my-edge-artifacts/${TAR}.sig
  echo "Uploaded ${TAR}"
  

This approach supports easy rollbacks: keep previous versions on disk and switch the symlink back if health checks fail.

Telemetry: metrics, logs, and usage events

Edge telemetry should be actionable and low-bandwidth. 2026 trends favor hybrid telemetry: push high-frequency, small metrics (latency, CPU temp, NPU utilization) to local Prometheus pushgateway or small local TSDB and periodically batch events (model usage, anomalies) to the cloud via MQTT or HTTP.

What to collect

  • Inference latency (p50/p95), requests/sec
  • Model version & checksum
  • Hardware metrics: CPU temp, memory, NPU utilization
  • Health events: model load failure, signature mismatch, OTA errors

/metrics endpoint (Prometheus)

# expose /metrics in Python
  from prometheus_client import start_http_server, Summary, Gauge

  inference_latency = Summary('inference_latency_seconds', 'Inference latency in seconds')
  model_version_g = Gauge('model_version', 'Model version as int')

  @app.post('/infer')
  @inference_latency.time()
  def infer(payload: dict):
      # ...
      model_version_g.set(current_model_version)
      return {"result": out}
  

Push metrics from devices to a local Prometheus instance at the edge (gateway) or use a pushgateway for intermittent connectivity. To reduce egress costs, batch events and compress payloads before sending to your cloud ingestion endpoint.

CI/CD: GitHub Actions workflows (practical)

Automate the full flow: build container, run unit tests, publish model artifacts, and trigger OTA. Below are the core steps we include in ci/build-and-publish.yml:

name: Build and Publish
on:
  push:
    branches: [ main ]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      - name: Build image
        uses: docker/build-push-action@v4
        with:
          context: ./device/model-server
          platforms: linux/arm64
          push: true
          tags: ghcr.io/${{ github.repository_owner }}/pi5-model-server:${{ github.sha }}
      - name: Package model artifact
        run: |
          cd model-artifacts && ./package-artifact.sh ${GITHUB_SHA}
      - name: Upload artifacts to S3
        uses: jakejarvis/s3-sync-action@v0.5.1
        with:
          args: --acl private
        env:
          AWS_S3_BUCKET: ${{ secrets.OTA_BUCKET }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_KEY }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET }}
  

Then a separate workflow (ota-rollout.yml) calls your device-management webhook or updates a DynamoDB manifest with the new version, triggering devices to check for updates.

Minimal Terraform for OTA artifacts (example)

Use Terraform to provision a secure artifact bucket and a minimal Lambda webhook to notify devices. Keep policies tight: only CI/publisher can write, devices can read specific paths.

resource "aws_s3_bucket" "artifacts" {
    bucket = var.ota_bucket_name
    acl    = "private"
  }

  resource "aws_iam_policy" "ci_publish" {
    name = "ci-publish-policy"
    policy = jsonencode({
      Version = "2012-10-17",
      Statement = [
        {
          Action = ["s3:PutObject"],
          Effect = "Allow",
          Resource = "${aws_s3_bucket.artifacts.arn}/*"
        }
      ]
    })
  }
  

Bootstrapping a Pi 5 device (practical checklist)

  1. Flash a 64-bit OS (Raspberry Pi OS 64-bit or Ubuntu 24.04 LTS for Pi) and enable SSH.
  2. Install vendor drivers for AI HAT+2 and the vendor SDK (follow vendor docs; our repo points to current SDK links updated in 2026).
  3. Clone the starter repo and run bootstrap.sh which registers the device with your fleet and installs the edge runtime.
  4. Run smoke tests: curl /health and /metrics, run a local inference, and verify NPU usage with vendor tooling.
  5. Enroll device in OTA group; do a staged rollout to a single device, then a canary group, then global.

Advanced strategies and 2026 predictions

Teams that move faster in 2026 adopt these advanced patterns:

  • Model pruning & quantization pipelines instrumented in CI so artifacts are reproducible and small for OTA.
  • Hybrid inference: run small models on-device and fallback to cloud for heavy generative tasks when connectivity is good.
  • Adaptive rollout driven by telemetry—if a canary shows a latency spike, pause and rollback automatically.
  • Policy-driven updates: require signatures and enforce hardware attestation before applying an update for supply-chain security.

Late-2025/early-2026 trends show more vendor support for standardized NPU runtimes and improved driver maturity. Expect more of the ecosystem—tool vendors, cloud providers, and device-management platforms—to provide Pi 5-specific templates through 2026, lowering the operational burden for teams that adopt best practices now.

Real-world example: From commit to field in under 30 minutes (walkthrough)

  1. Developer commits a quantized ONNX model to /model-artifacts and opens PR.
  2. CI runs unit tests, builds arm64 container, packages the artifact, and uploads to S3.
  3. CI triggers the OTA webhook, which writes the new manifest and notifies the device-group.
  4. Canary device polls S3, downloads, verifies signature, swaps /opt/models/current, sends SIGHUP to reload server, and reports success via telemetry.
  5. If telemetry shows anomalies, CI triggers an automated rollback to the previous artifact and notifies the team.

Actionable takeaways (start here)

  • Clone the starter repo skeleton and adapt the model-server to your runtime (ONNX recommended for portability).
  • Implement signed artifacts and atomic symlink swaps for safe OTA—avoid copying over live files.
  • Expose /metrics and batch higher-cost telemetry—measure p95 latency before rollout.
  • Automate CI to publish both container images and signed artifacts; make OTA a separate controlled workflow.
  • Start with a small canary pool of devices before broad rollouts; use telemetry-driven rollback rules.

Security & compliance notes

Edge devices are part of your supply chain. Key controls:

  • Sign all artifacts and validate signatures on-device.
  • Use least-privilege IAM for artifact publishing and device access.
  • Encrypt telemetry in transit (TLS + MQTT over TLS) and at rest if stored in cloud.
  • Log OTA and model-change events for audit trail.

Next steps & resources

We maintain a starter repo and a set of templates (CI workflows, Terraform snippets, and sample systemd updaters) designed for Pi 5 + AI HAT+2 prototypes. Use them to validate concepts before investing in a commercial device-management solution.

Pro tip: prioritize model reproducibility—version your model artifacts with checksum, quantization settings, and Docker image hash together in one manifest.

Call to action

Ready to move from PoC to production-ready prototypes? Clone the starter kit repo (example: github.com/dev-tools-cloud/pi5-edge-ai-kit), run the bootstrap on a Pi 5 + AI HAT+2, and follow the CI/CD templates to perform your first signed OTA rollout. If your team needs a hands-on workshop, we offer tailored onboarding to help lock down signing keys, create rollout policies, and integrate telemetry with your observability stack.

Get started: clone the repo, run ./device/bootstrap.sh on a test Pi, and trigger the CI pipeline to see a model go from commit to canary device in under 30 minutes.

Advertisement

Related Topics

#templates#edge#starter-kit
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-28T04:19:01.710Z