Step-by-Step: How to Run a Company-Wide SaaS Usage Audit (with Scripts)
how-totoolingautomation

Step-by-Step: How to Run a Company-Wide SaaS Usage Audit (with Scripts)

UUnknown
2026-03-04
10 min read
Advertisement

A hands-on guide with runnable scripts and SQL to audit SaaS usage, measure active users, cost, and overlap for tool rationalization.

Hook — Why your SaaS bills still surprise you (and how to stop it)

You run a modern engineering org in 2026: dozens of cloud-first tools, scattered admin consoles, and a procurement process that still relies on expense reports. The result is predictable — recurring bills you cant fully justify, overlapping functionality across teams, and license seats that sit unused. This guide gives a practical, step-by-step playbook with runnable scripts, SQL queries, and API calls so you can quantify active users, sessions, spend, and overlap across SaaS vendors and build an evidence-based tool-rationalization plan.

Executive summary

Goal: Produce a repeatable, automated SaaS usage audit that answers: who uses which tool, how often, how much we pay, and where duplication creates waste. By the end of this article you will have:

  • Data collection scripts for common sources (SSO, Google Workspace, Slack, vendor billing)
  • Normalization and schema to load into a central Postgres or BigQuery warehouse
  • SQL queries that compute MAU, seat utilization, cost per active user, and overlap (Jaccard index)
  • An automation pattern to run this monthly and produce stakeholder reports

Late 2025 and early 2026 saw three industry shifts that change the rules:

  • SSO telemetry consolidation became mainstream: Okta, Azure AD, and Google Workspace logs are now routinely pushed to SIEMs and warehouses, making user activity easier to centralize.
  • Procurement + FinOps alignment accelerated: more teams apply FinOps principles to SaaS line items, demanding cost-per-active-user metrics.
  • AI tool proliferation exploded: dozens of AI/assistants per team means faster sprawl unless you audit frequently.

Audit cadence in 2026: run a lightweight audit monthly and a deep audit quarterly.

Prerequisites and data sources

Collect these sources to get a complete picture:

  • Identity provider logs (Okta, Azure AD / Microsoft Graph, Google Workspace Reports)
  • SaaS admin APIs (Slack, Atlassian, GitHub, Zoom, Salesforce)
  • Billing and procurement (Stripe/Paddle invoices for SaaS vendors, ERP exports, corporate card feeds)
  • Expense and purchase orders via CSV or API from Spend platforms
  • SSO last-login and session events for activity-based thresholds

Architecture overview — simple ETL to a warehouse

We recommend this flow: API extraction -> normalization layer -> warehouse (Postgres / BigQuery) -> reporting (Metabase / Looker / Superset). Automate with GitHub Actions or Airflow.

Key normalization rules

  • Canonicalize user emails to lowercase and trim aliases (e.g., user+tag@example.com -> user@example.com)
  • Use primary email from SSO as the canonical identity
  • Standardize timestamps to UTC and ISO 8601
  • Record vendor, app_name, metric_type (user, session, license, invoice), metric_value, and date

Step 1 — Extract identity and activity data

Below are runnable snippets to extract users and activity from common sources. Replace TOKEN, ORG, and other placeholders before running.

Okta — list users with lastLogin

#!/usr/bin/env bash
OKTA_DOMAIN='org-example.okta.com'
API_TOKEN='REPLACE_WITH_OKTA_TOKEN'

curl -s -H 'Accept: application/json' \
     -H 'Authorization: SSWS '$API_TOKEN\
     'https://'$OKTA_DOMAIN'/api/v1/users?limit=200' | jq '.'

Okta's user object includes profile.login and lastLogin via the System Log API if you need session events.

Microsoft Graph — get users and signIns

#!/usr/bin/env bash
TENANT_ID='REPLACE'
CLIENT_ID='REPLACE'
CLIENT_SECRET='REPLACE'

# get OAuth token
TOKEN=$(curl -s -X POST 'https://login.microsoftonline.com/'$TENANT_ID'/oauth2/v2.0/token' \
  -d 'grant_type=client_credentials&scope=https://graph.microsoft.com/.default&client_id='$CLIENT_ID'&client_secret='$CLIENT_SECRET' | jq -r '.access_token')

# list users
curl -s -H 'Authorization: Bearer '$TOKEN \
  'https://graph.microsoft.com/v1.0/users' | jq '.'

# sign-in logs (requires audit logs permissions)
curl -s -H 'Authorization: Bearer '$TOKEN \
  'https://graph.microsoft.com/v1.0/auditLogs/signIns' | jq '.'

Google Workspace Reports API — usage reports

# Use a service account and gcloud auth; example using python google-api-client
from googleapiclient.discovery import build
from google.oauth2 import service_account

SCOPES=['https://www.googleapis.com/auth/admin.reports.usage.readonly']
CRED=service_account.Credentials.from_service_account_file('sa.json', scopes=SCOPES)
SERVICE=build('admin', 'reports_v1', credentials=CRED)
res=SERVICE.userUsageReport().get(userKey='all', date='2026-01-01').execute()
print(res)

Slack — users and active sessions

#!/usr/bin/env bash
SLACK_TOKEN='xoxp-REPLACE'

# list users
curl -s -H 'Authorization: Bearer '$SLACK_TOKEN \
  'https://slack.com/api/users.list' | jq '.'

# workspace analytics require Enterprise API or admin scopes; use team analytics endpoints if available

Many vendors expose last_activity or last_login fields. When unavailable, use proxy signals like recent messages, commits, or meeting presence as activity proxies.

Step 2 — Extract billing and spend

Billing often lives outside vendor APIs; collect:

  • Vendor invoices CSV from procurement
  • Stripe invoices if you pay vendors via card and vendors offer Stripe access
  • Corporate card CSVs (Visa, Amex) with vendor name parsing

Example: fetch Stripe invoices (if you manage a vendor account)

#!/usr/bin/env bash
STRIPE_KEY='sk_live_REPLACE'

curl -s -u $STRIPE_KEY: 'https://api.stripe.com/v1/invoices?limit=100' | jq '.'

For corporate card feeds, export CSV, normalize vendor name to your canonical vendor list, and join on nearest match (fuzzy matching with trigram is recommended).

Step 3 — Normalization and schema (Postgres)

Load all normalized rows into a central schema. Example Postgres schema:

-- schema.sql
CREATE TABLE saas_vendor (
  vendor_id serial PRIMARY KEY,
  vendor_key text UNIQUE, -- e.g., 'slack', 'github'
  vendor_name text
);

CREATE TABLE saas_metric (
  id serial PRIMARY KEY,
  vendor_key text NOT NULL,
  metric_type text NOT NULL, -- 'user','session','license','invoice'
  metric_date date NOT NULL,
  metric_value numeric NOT NULL,
  email text, -- nullable for invoice rows
  raw_payload jsonb,
  created_at timestamptz default now()
);

Insert records directly from your extractor scripts. Using single canonical email enables cross-vendor joins.

Step 4 — Core SQL queries to answer the business questions

Monthly Active Users (MAU) per vendor

-- MAU per vendor in the last 30 days
SELECT vendor_key,
       COUNT(DISTINCT email) AS mau_30d
FROM saas_metric
WHERE metric_type = 'session'
  AND metric_date >= current_date - INTERVAL '30 days'
GROUP BY vendor_key
ORDER BY mau_30d DESC;

Seat utilization — active seats vs purchased seats

This assumes invoices contain purchased_seats in raw_payload or a separate license_count metric.

-- seat utilization last invoice month
WITH latest_invoices AS (
  SELECT vendor_key,
         (raw_payload->>'purchased_seats')::int AS purchased_seats,
         date_trunc('month', metric_date) AS month
  FROM saas_metric
  WHERE metric_type = 'invoice'
),
active AS (
  SELECT vendor_key,
         date_trunc('month', metric_date) AS month,
         COUNT(DISTINCT email) AS active_users
  FROM saas_metric
  WHERE metric_type = 'session'
  GROUP BY vendor_key, date_trunc('month', metric_date)
)
SELECT i.vendor_key, i.month, i.purchased_seats, a.active_users,
       round((a.active_users::numeric / GREATEST(i.purchased_seats,1)) * 100,2) AS utilization_pct
FROM latest_invoices i
LEFT JOIN active a ON a.vendor_key = i.vendor_key AND a.month = i.month
ORDER BY utilization_pct ASC;

Cost per active user

-- cost per active user last 30 days
WITH spend AS (
  SELECT vendor_key, SUM((raw_payload->>'amount')::numeric) AS total_spend, date_trunc('month', metric_date) AS month
  FROM saas_metric
  WHERE metric_type = 'invoice'
  GROUP BY vendor_key, date_trunc('month', metric_date)
),
active AS (
  SELECT vendor_key, COUNT(DISTINCT email) AS mau_30d
  FROM saas_metric
  WHERE metric_type = 'session'
    AND metric_date >= current_date - INTERVAL '30 days'
  GROUP BY vendor_key
)
SELECT s.vendor_key, s.total_spend, a.mau_30d,
       round((s.total_spend / GREATEST(a.mau_30d,1))::numeric,2) AS cost_per_active_user
FROM spend s
LEFT JOIN active a ON a.vendor_key = s.vendor_key
ORDER BY cost_per_active_user DESC;

Overlap matrix (Jaccard index) — which tools share users

This computes pairwise Jaccard similarity: |A ∩ B| / |A ∪ B| for active users in last 30 days.

-- build sets
WITH users AS (
  SELECT vendor_key, email
  FROM saas_metric
  WHERE metric_type = 'session'
    AND metric_date >= current_date - INTERVAL '30 days'
  GROUP BY vendor_key, email
),
pairs AS (
  SELECT a.vendor_key AS vendor_a, b.vendor_key AS vendor_b,
         COUNT(*) FILTER (WHERE a.email = b.email) AS intersection
  FROM users a
  JOIN users b ON a.email = b.email
  WHERE a.vendor_key < b.vendor_key
  GROUP BY a.vendor_key, b.vendor_key
),
counts AS (
  SELECT vendor_key, COUNT(DISTINCT email) AS cnt
  FROM users
  GROUP BY vendor_key
)
SELECT p.vendor_a, p.vendor_b, p.intersection,
       (p.intersection::numeric / (c1.cnt + c2.cnt - p.intersection)) AS jaccard
FROM pairs p
JOIN counts c1 ON c1.vendor_key = p.vendor_a
JOIN counts c2 ON c2.vendor_key = p.vendor_b
ORDER BY jaccard DESC NULLS LAST;

Find underused paid tools

-- vendor with purchased seats and low utilization
SELECT v.vendor_key, i.purchased_seats, a.active_users,
       round((a.active_users::numeric/i.purchased_seats)*100,2) AS utilization_pct
FROM (
  SELECT vendor_key, (raw_payload->>'purchased_seats')::int AS purchased_seats, date_trunc('month', metric_date) AS month
  FROM saas_metric
  WHERE metric_type='invoice'
) i
LEFT JOIN (
  SELECT vendor_key, COUNT(DISTINCT email) AS active_users, date_trunc('month', metric_date) AS month
  FROM saas_metric
  WHERE metric_type='session'
  GROUP BY vendor_key, date_trunc('month', metric_date)
) a ON a.vendor_key = i.vendor_key AND a.month = i.month
WHERE i.purchased_seats > 0 AND (a.active_users::numeric/i.purchased_seats) < 0.25
ORDER BY utilization_pct ASC;

Step 5 — Automation and orchestration

Use GitHub Actions for a simple monthly pipeline. Example workflow (skeleton):

name: monthly-saas-audit
on:
  schedule:
    - cron: '0 6 1 * *' # first of month UTC
jobs:
  extract-load:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run extractors
        env:
          OKTA_TOKEN: ${{ secrets.OKTA_TOKEN }}
          SLACK_TOKEN: ${{ secrets.SLACK_TOKEN }}
        run: |
          python extract_okta.py --out db
          python extract_slack.py --out db
      - name: Run SQL transforms
        run: psql $DB_CONN -f schema.sql

For larger orgs, use Airflow or Prefect to enrich, validate, and alert when spend spikes or utilization drops below thresholds.

Step 6 — Reporting and stakeholder-ready deliverables

Create a two-page stakeholder PDF and an interactive dashboard. Minimum metrics to show:

  • Top 20 vendors by spend
  • MAU and seat utilization for paid tools
  • Overlap heatmap (Jaccard) to highlight consolidation targets
  • One-line recommendations per vendor (e.g., renegotiate seats, consolidate, or divest)

Rule of thumb: If utilization < 30% and overlap with other tools > 40%, flag for review.

Step 7 — Decision framework for rationalization

Use a simple RICE-like approach adapted for SaaS: Reach (MAU), Impact (critical workflows), Confidence (data quality), Effort (migration cost). Prioritize tools with low reach, low impact, low confidence, and high cost.

Advanced techniques

1) Jaccard + Clustering for consolidation candidates

Compute pairwise Jaccard, then run hierarchical clustering to find groups of highly overlapping tools. These clusters become consolidation candidates for deeper interviews.

2) Session-weighted usage

Not all sessions are equal: weight sessions by duration or feature use (e.g., editor usage in a dev IDE). Add a metric_weight column to saas_metric and compute weighted MAU.

3) Attribute-based deduplication

When users have multiple emails, use SSO primary email, then fallback to common name + domain matching. Flag ambiguous cases for manual review.

Troubleshooting and data quality tips

  • Missing last-login? Use alternative signals: Slack message timestamps, GitHub commits, Calendar events.
  • Vendor names mismatch in card feeds? Maintain a mapping table and use trigram fuzzy matches for ingestion.
  • API rate limits? Batch and paginate; use incremental cursors where supported.
  • PII concerns? Hash emails for cross-vendor joins where policy dictates; keep plaintext in a restricted table with strict ACLs.

Practical audit playbook (one-month timeline)

  1. Week 1: Gather permissions, identify data owners, and map data sources.
  2. Week 2: Run extractors for identity, sessions, and billing. Load into warehouse.
  3. Week 3: Run SQL queries, build dashboard, and identify top 10 candidates for review.
  4. Week 4: Conduct stakeholder interviews for each candidate, finalize recommendations, and schedule changes (contract renegotiation, seat reduction).

2026 predictions — the next 18 months for SaaS governance

  • SSO-first observability: SSO logs will be the canonical source of truth for user activity across SaaS.
  • SaaSOps platforms mature: Expect more out-of-the-box connectors and recommendations powered by AI for consolidation.
  • Procurement will adopt quota-based approvals: automated guardrails for new SaaS spend tied to headcount and utilization.

Actionable takeaways

  • Start small: script SSO exports and join with billing; youll get most ROI quickly.
  • Measure utilization, not just headcount; cost-per-active-user is the key FinOps metric for SaaS.
  • Automate monthly. Monthly audits catch sprawl from new AI tools faster than quarterly reviews.
  • Use overlap (Jaccard) to prioritize consolidation efforts — high overlap + low utilization = low-hanging fruit.

Final checklist before your first stakeholder report

  • Canonical user identity available and normalized
  • At least one month of session events ingested
  • Invoice or procurement data mapped to vendor_key
  • Dashboard with MAU, utilization, cost-per-active-user, and overlap heatmap

Call to action

Run the scripts above against a small pilot set of vendors (SSO + top 5 paid vendors) this month. Export the key SQL queries into your BI tool and present a one-page recommendation for the top 3 consolidation candidates. If youd like, download our ready-to-deploy extractor repo and Postgres schema to jumpstart your audit — reach out to your internal platform team or the dev-tools.cloud community for templates and peer reviews.

Advertisement

Related Topics

#how-to#tooling#automation
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-03-04T01:05:08.782Z