Smart Routing Micro App: Google Maps + Waze + LLM

Tutorial: combine Google Maps, Waze, and local data with an LLM to generate personalized, observable, cost-controlled routing recommendations.

Hook: Stop trusting a single map—build a smarter, personalized routing micro-app

Routing decisions in production environments are fragmented: different navigation APIs give different routes, local conditions and user preferences aren't captured, and LLMs can help—but only if you integrate them correctly. This guide shows you how to build a micro-app that aggregates Google Maps, Waze, and local telemetry, then uses an LLM to produce personalized routing recommendations with observability and cost controls suitable for production in 2026.

Why this matters in 2026

Micro-apps and AI-first tooling have matured since the vibe-coding surge of 2023–2025. Organizations now expect fast, auditable decisions from routing systems that reduce time-to-resolution and cloud spend. At the same time, big-platform moves (for example, the high-profile Gemini partnerships in 2025–early 2026) make LLMs a practical option for personalization and explanation generation. The result: you can now combine multiple navigation sources plus local context to generate routes that are not just optimal on distance or ETA, but are tuned to a driver's habits, vehicle, and local policies.

High-level architecture

Here's the simple, production-ready architecture we'll implement:

Edge client: Web/mobile micro-app that requests routes and shows explanations
Routing aggregator API (serverless or container): queries Google Maps, Waze or partner feed, and local data store
Normalization & scoring: converts heterogeneous route responses to a canonical schema
Personalization engine (LLM + embeddings): ranks routes and generates human-readable rationales
Cache & vector DB: Redis for route caching, Pinecone/Weaviate for user embeddings — consider caching tooling notes like CacheOps Pro.
Observability + cost control: Prometheus/Grafana metrics, budget rules, model selection

Design principles (quick wins)

Fail fast, fail safely: always provide a fallback route (e.g., last known-good cached route). These are core principles in resilient architecture.
Minimize API calls: candidate pruning + caching reduces third-party billable requests
Model-fit: use small LLMs for scoring and larger models for explanations only when needed
Auditability: timestamps, inputs, and decisions persisted for postmortem and compliance

Step 1 — Inventory APIs and confirm terms

Before you write a line of code, list the endpoints and pricing. For 2026, typical sources include:

Google Maps Routes API (Directions/Routes Matrix, advanced route attributes)
Waze: Waze Transport SDK or partner feeds (Waze for Cities/Connected Citizens and some enterprise routing endpoints). Waze's public routing surface is more limited than Google's; plan for partnerships or SDKs for production use.
Local feeds: city open-data feeds, private telemetry (fleet GPS), event APIs, and sensor streams

Important: check each provider’s Terms of Service (rate limits, display rules, and caching restrictions). Implement consent and privacy controls for driver data to match evolving 2025–2026 regulations.

Step 2 — Normalize route responses

Different APIs return different schemas. Normalize them to a canonical model so downstream logic is simple.

Canonical route schema

interface RouteCandidate {
  id: string;
  provider: 'google' | 'waze' | 'local';
  polyline: string;
  eta_seconds: number;
  distance_meters: number;
  congestion_score?: number; // 0-1, derived
  tolls?: number;
  road_classification?: string[];
  attributes: Record;
}

Normalization example (Node.js/TypeScript):

async function normalizeGoogleResponse(googleResp) {
  return googleResp.routes.map((r, idx) => ({
    id: `google-${r.summary || idx}`,
    provider: 'google',
    polyline: r.polyline, // decode/encode to canonical format
    eta_seconds: r.legs.reduce((s, l) => s + l.duration.value, 0),
    distance_meters: r.legs.reduce((s, l) => s + l.distance.value, 0),
    attributes: r
  }));
}

Step 3 — Candidate generation and pruning

Don't query every API for every request. Use a candidate-generation strategy:

Check Redis cache for origin/destination pairs with user profile: if present and fresh, return cached
Fetch from the cheapest API first (e.g., internal local feed) to get baseline candidates
If baseline doesn't meet constraints (time, distance, or user preference), query Google and Waze in parallel
Stop when you have N candidates (configurable default N=5)

Parallel call example (Node.js)

const [googlePromise, wazePromise] = [callGoogleRoutes(req), callWazeRoutes(req)];
const [googleRes, wazeRes] = await Promise.allSettled([googlePromise, wazePromise]);
// Accept successful responses and normalize

Step 4 — Personalization: embeddings + LLM

Use embeddings to represent driver preferences and local context, then let the LLM score and explain. Operationalizing embeddings and LLM calls safely and cost-effectively benefits from governance and CI patterns covered in From Micro-App to Production.

Data model for personalization

User profile embedding: vehicle_type, preferred roads, avoid_tolls, comfort_vs_speed
Route embedding: features from canonical route — congestion, distance, road types
Local context embedding: city events, construction indicators, weather

Workflow

Create embeddings for route candidates and user profile using a small embedding model (lower cost)
Retrieve top-k route candidates by cosine similarity between user profile and route embeddings
Pass top candidates to an LLM for final ranking and natural-language rationale

// pseudo-code: build prompt for LLM
const prompt = `User profile: ${JSON.stringify(userProfile)}

Routes:
${candidateRoutes.map((r, i) => `${i+1}. ETA ${r.eta_seconds/60}min, ${r.distance_meters/1000}km, provider=${r.provider}`).join('\n')}

Task: Rank the routes for this user and explain tradeoffs in one short paragraph per route.`

Model-selection tip (cost control): use a small LLM (or a cheaper embedding-only model with a scoring head) to produce numeric scores, and only call a larger, expensive model for the top-1 explanation if requested by the user. This reduces token usage and cost. For organizational cost and productivity tradeoffs see Developer Productivity and Cost Signals in 2026.

Step 5 — Prompt engineering & safety

Write prompts that are explicit about constraints and grade-based outputs so the LLM returns structured JSON you can trust. Example:

{
  "instruction": "Rank and score the candidate routes from 0-100 for this user. Return JSON array: [{route_id, score, explanation}].",
  "data": { userProfile, routes }
}

Always include an instruction to output machine-parseable JSON. That reduces downstream parsing errors and keeps audit logs clean.

Step 6 — Observability and SLOs

Track the right metrics so you can manage cost and reliability:

API metrics: api_calls_google, api_calls_waze, api_calls_local
LLM metrics: llm_calls_total, llm_tokens_in, llm_tokens_out, llm_cost_usd
Performance: request_latency_ms, backend_errors
Business: avg_route_selection_time, user_accept_rate

Prometheus metric examples (exposition):

# HELP routing_api_calls_total Total third-party routing API calls
# TYPE routing_api_calls_total counter
routing_api_calls_total{provider="google"} 1234
routing_api_calls_total{provider="waze"} 432

Alerts and dashboards

Alert if llm_cost_usd/day > budget_threshold
Alert if api error rate > 2% over 5m
Dashboard panels: per-provider latency and per-route selection breakdown

Step 7 — Cost controls (practical tactics)

Routing micro-apps can get expensive because of mapping API fees and LLM token costs. Use these tactics:

Cache results for common origin/destination pairs with TTLs based on freshness needs — leveraging cache tooling and reviews like CacheOps Pro helps you pick eviction and coalescing strategies.
Model tiering: small model for numeric scoring, large model only for textual explanation on demand
Call coalescing: coalesce near-simultaneous identical requests to one back-end call
Selective refresh: only call live APIs if cached route is stale or an event indicates change (traffic incident webhook)
Per-user quotas: apply rate-limiting and credits to prevent abuse

Example: when a user requests a route, first compute a cheap estimated score using heuristics (distance, speed limit) and embeddings. If the estimated score difference between top-2 candidates > threshold, skip expensive LLM calls and return the heuristics-based top choice.

Step 8 — Observability for bias, safety, and audit

Because the LLM influences routing, log the LLM inputs and outputs (hashed/anonymized where needed). Keep an immutable audit trail:

request_id, user_profile_id (hashed), candidate_ids, model_id, model_version, tokens_used
decision rationale (short text), decision_score

Store these logs in an append-only store (e.g., AWS S3 or a secure log DB) for compliance and postmortem analysis. Build dashboards that show model drift in scoring over time. For observability patterns and SLO design see Observability in 2026.

Step 9 — Implementation: minimal end-to-end example

The following is a condensed implementation sketch you can iterate from. It's intentionally compact; adapt to your infra.

1) Aggregator endpoint (Express/Node)

import express from 'express';
import { getCachedRoutes, cacheRoutes } from './cache';
import { fetchGoogleRoutes, fetchWazeRoutes, fetchLocalRoutes } from './providers';
import { normalize } from './normalize';
import { scoreWithEmbeddingsAndLLM } from './personalize';

const app = express();

app.post('/route', async (req, res) => {
  const { origin, destination, userProfile } = req.body;
  const cacheKey = `route:${origin}:${destination}:${userProfile.id}`;

  const cached = await getCachedRoutes(cacheKey);
  if (cached) return res.json(cached);

  // Candidate generation (prune cheap first)
  const local = await fetchLocalRoutes(origin, destination).catch(() => []);
  const [google, waze] = await Promise.allSettled([
    fetchGoogleRoutes(origin, destination),
    fetchWazeRoutes(origin, destination)
  ]);

  const candidates = normalize([...local, ...(google.status==='fulfilled'?google.value:[]), ...(waze.status==='fulfilled'?waze.value:[])]);

  // Personalize and score
  const ranked = await scoreWithEmbeddingsAndLLM(candidates, userProfile);

  await cacheRoutes(cacheKey, ranked, { ttl: 30 });
  res.json(ranked);
});

2) Scoring function (pseudo)

async function scoreWithEmbeddingsAndLLM(routes, userProfile) {
  // 1) create embeddings for user and routes (cheap model)
  const userEmb = await embeddingClient.embed(userProfile);
  const routeEmbeds = await Promise.all(routes.map(r => embeddingClient.embed(r.attributes)));

  // 2) similarity scoring
  const scores = routes.map((r, i) => ({
    route: r,
    sim: cosine(userEmb, routeEmbeds[i])
  })).sort((a,b) => b.sim - a.sim).slice(0, 4);

  // 3) call small LLM for numeric adjustments
  const smallLLMResponse = await smallLLM.call({ prompt: buildNumericPrompt(scores) });
  const adjusted = applyNumericAdjustments(scores, smallLLMResponse);

  // 4) optionally call larger LLM for explanation for top-1
  if (shouldExplain(adjusted[0])) {
    const explanation = await largeLLM.call({ prompt: buildExplainPrompt(adjusted[0]) });
    adjusted[0].explanation = explanation.text;
  }

  return adjusted;
}

Step 10 — Deployment and CI/CD

Choose serverless (Cloud Run, AWS Lambda) for cost-efficient burst scaling or containers for predictable performance. Example CI pipeline: Implement CI/CD and governance best practices from From Micro-App to Production to keep model and infra rollout traceable.

Lint & unit tests
Run emulator integration tests (stub Google/Waze APIs with recorded fixtures)
Canary deploy to 10% of traffic and monitor SLOs
Promote to prod when errors < thresholds for 1 hour

Operational checklist before going live

Confirm API quotas and set escalation: auto-block or degrade gracefully when costs spike
Enable key rotation for API keys and store secrets in Vault
Implement rate-limiting and per-user budgets
Document data retention and logging policy (GDPR/CCPA alignment)

Troubleshooting & common pitfalls

Inconsistent polylines: normalize to a canonical encoding (e.g., polyline6) and snap to road geometry before comparisons
LLM hallucinations: require structured JSON outputs and validate fields server-side
Billing surprises: set daily cost caps on LLM provider and alert on threshold breaches
Waze availability: you may need enterprise partnerships for production-grade routing—plan fallback strategies and resilient backends; see Micro-Events, Pop‑Ups and Resilient Backends.

Example UX: what the micro-app returns

{
  request_id: "abc-123",
  routes: [
    {
      route_id: "google-1",
      score: 92,
      eta_minutes: 22,
      distance_km: 15.2,
      provider: "google",
      explanation: "Fastest route with low-congestion highways; suits driver pref to avoid tolls."
    },
    { route_id: "waze-2", score: 78, eta_minutes: 25, explanation: "Shorter but passes through downtown congestion." }
  ]
}

Advanced strategies and 2026 trends

As of early 2026, teams adopting hybrid LLM strategies and multi-source telemetry outperform single-source routing systems in user acceptance and incident avoidance. Emerging patterns include:

Federated personalization: keep sensitive profile data on-device and only send hashed embeddings to backend
Edge LLMs: run compact models on the device for instant scoring, falling back to cloud LLM for complex explanations. For indexing and edge delivery patterns see Indexing Manuals for the Edge Era.
Policy-aware routing: incorporate city-specific low-emission zones and dynamic pricing into the scoring function

Actionable takeaways

Start with a canonical route schema and a caching layer to avoid unnecessary API calls.
Use embeddings + a small LLM for scoring; reserve big models for human-facing explanations.
Instrument LLM usage and third-party API calls with cost metrics and alerts before you scale. Observability guidance is available at Observability in 2026.
Design for graceful degradation when a provider (e.g., Waze) isn't available—cache last-known-good routes and show an explicit fallback message.

Call to action

Ready to prototype? Clone the sample repo (see quickstart branch) and deploy the aggregator to a free Cloud Run or Lambda tier. Instrument Prometheus metrics and add a daily budget alert for LLM spend. If you want, I can generate a starter repo with provider stubs, caching, observability dashboards, and a sample LLM prompt template tailored to your fleet and policy needs—tell me your preferred cloud and I’ll sketch the CI pipeline. For CI/CD and governance best-practices, consult From Micro-App to Production.

Build a Smart Routing Micro App: Aggregate Google Maps, Waze, and Local Data with an LLM

Hook: Stop trusting a single map—build a smarter, personalized routing micro-app

Why this matters in 2026

High-level architecture

Design principles (quick wins)

Step 1 — Inventory APIs and confirm terms

Step 2 — Normalize route responses

Canonical route schema

Step 3 — Candidate generation and pruning

Parallel call example (Node.js)

Step 4 — Personalization: embeddings + LLM

Data model for personalization

Workflow

Step 5 — Prompt engineering & safety

Step 6 — Observability and SLOs

Alerts and dashboards

Step 7 — Cost controls (practical tactics)

Step 8 — Observability for bias, safety, and audit

Step 9 — Implementation: minimal end-to-end example

1) Aggregator endpoint (Express/Node)

2) Scoring function (pseudo)

Step 10 — Deployment and CI/CD

Operational checklist before going live

Troubleshooting & common pitfalls

Example UX: what the micro-app returns

Advanced strategies and 2026 trends

Actionable takeaways

Further reading and references

Call to action

Related Topics

dev tools

Up Next

Best Online Text Diff Tools for Developers and Technical Writers

Vite vs Next.js vs Astro for New Web Projects: A Practical Decision Guide

pnpm vs npm vs Yarn: Package Manager Comparison for Modern JavaScript Projects

Hook: Stop trusting a single map—build a smarter, personalized routing micro-app

Why this matters in 2026

High-level architecture

Design principles (quick wins)

Step 1 — Inventory APIs and confirm terms

Step 2 — Normalize route responses

Canonical route schema

Step 3 — Candidate generation and pruning

Parallel call example (Node.js)

Step 4 — Personalization: embeddings + LLM

Data model for personalization

Workflow

Step 5 — Prompt engineering & safety

Step 6 — Observability and SLOs

Alerts and dashboards

Step 7 — Cost controls (practical tactics)

Step 8 — Observability for bias, safety, and audit

Step 9 — Implementation: minimal end-to-end example

1) Aggregator endpoint (Express/Node)

2) Scoring function (pseudo)

Step 10 — Deployment and CI/CD

Operational checklist before going live

Troubleshooting & common pitfalls

Example UX: what the micro-app returns

Advanced strategies and 2026 trends

Actionable takeaways

Further reading and references

Call to action

Related Reading

Related Topics

dev tools

Up Next

Best Online Text Diff Tools for Developers and Technical Writers

Vite vs Next.js vs Astro for New Web Projects: A Practical Decision Guide

pnpm vs npm vs Yarn: Package Manager Comparison for Modern JavaScript Projects