Company-Wide SaaS Usage Audit: Scripts & SQL

A hands-on guide with runnable scripts and SQL to audit SaaS usage, measure active users, cost, and overlap for tool rationalization.

Hook — Why your SaaS bills still surprise you (and how to stop it)

You run a modern engineering org in 2026: dozens of cloud-first tools, scattered admin consoles, and a procurement process that still relies on expense reports. The result is predictable — recurring bills you cant fully justify, overlapping functionality across teams, and license seats that sit unused. This guide gives a practical, step-by-step playbook with runnable scripts, SQL queries, and API calls so you can quantify active users, sessions, spend, and overlap across SaaS vendors and build an evidence-based tool-rationalization plan.

Executive summary

Goal: Produce a repeatable, automated SaaS usage audit that answers: who uses which tool, how often, how much we pay, and where duplication creates waste. By the end of this article you will have:

Data collection scripts for common sources (SSO, Google Workspace, Slack, vendor billing)
Normalization and schema to load into a central Postgres or BigQuery warehouse
SQL queries that compute MAU, seat utilization, cost per active user, and overlap (Jaccard index)
An automation pattern to run this monthly and produce stakeholder reports

Why now? 2026 trends that make audits urgent

Late 2025 and early 2026 saw three industry shifts that change the rules:

SSO telemetry consolidation became mainstream: Okta, Azure AD, and Google Workspace logs are now routinely pushed to SIEMs and warehouses, making user activity easier to centralize.
Procurement + FinOps alignment accelerated: more teams apply FinOps principles to SaaS line items, demanding cost-per-active-user metrics.
AI tool proliferation exploded: dozens of AI/assistants per team means faster sprawl unless you audit frequently.

Audit cadence in 2026: run a lightweight audit monthly and a deep audit quarterly.

Prerequisites and data sources

Collect these sources to get a complete picture:

Identity provider logs (Okta, Azure AD / Microsoft Graph, Google Workspace Reports)
SaaS admin APIs (Slack, Atlassian, GitHub, Zoom, Salesforce)
Billing and procurement (Stripe/Paddle invoices for SaaS vendors, ERP exports, corporate card feeds)
Expense and purchase orders via CSV or API from Spend platforms
SSO last-login and session events for activity-based thresholds

Architecture overview — simple ETL to a warehouse

We recommend this flow: API extraction -> normalization layer -> warehouse (Postgres / BigQuery) -> reporting (Metabase / Looker / Superset). Automate with GitHub Actions or Airflow.

Key normalization rules

Canonicalize user emails to lowercase and trim aliases (e.g., user+tag@example.com -> user@example.com)
Use primary email from SSO as the canonical identity
Standardize timestamps to UTC and ISO 8601
Record vendor, app_name, metric_type (user, session, license, invoice), metric_value, and date

Step 1 — Extract identity and activity data

Below are runnable snippets to extract users and activity from common sources. Replace TOKEN, ORG, and other placeholders before running.

Okta — list users with lastLogin

#!/usr/bin/env bash
OKTA_DOMAIN='org-example.okta.com'
API_TOKEN='REPLACE_WITH_OKTA_TOKEN'

curl -s -H 'Accept: application/json' \
     -H 'Authorization: SSWS '$API_TOKEN\
     'https://'$OKTA_DOMAIN'/api/v1/users?limit=200' | jq '.'

Okta's user object includes profile.login and lastLogin via the System Log API if you need session events.

Microsoft Graph — get users and signIns

#!/usr/bin/env bash
TENANT_ID='REPLACE'
CLIENT_ID='REPLACE'
CLIENT_SECRET='REPLACE'

# get OAuth token
TOKEN=$(curl -s -X POST 'https://login.microsoftonline.com/'$TENANT_ID'/oauth2/v2.0/token' \
  -d 'grant_type=client_credentials&scope=https://graph.microsoft.com/.default&client_id='$CLIENT_ID'&client_secret='$CLIENT_SECRET' | jq -r '.access_token')

# list users
curl -s -H 'Authorization: Bearer '$TOKEN \
  'https://graph.microsoft.com/v1.0/users' | jq '.'

# sign-in logs (requires audit logs permissions)
curl -s -H 'Authorization: Bearer '$TOKEN \
  'https://graph.microsoft.com/v1.0/auditLogs/signIns' | jq '.'

Google Workspace Reports API — usage reports

# Use a service account and gcloud auth; example using python google-api-client
from googleapiclient.discovery import build
from google.oauth2 import service_account

SCOPES=['https://www.googleapis.com/auth/admin.reports.usage.readonly']
CRED=service_account.Credentials.from_service_account_file('sa.json', scopes=SCOPES)
SERVICE=build('admin', 'reports_v1', credentials=CRED)
res=SERVICE.userUsageReport().get(userKey='all', date='2026-01-01').execute()
print(res)

Slack — users and active sessions

#!/usr/bin/env bash
SLACK_TOKEN='xoxp-REPLACE'

# list users
curl -s -H 'Authorization: Bearer '$SLACK_TOKEN \
  'https://slack.com/api/users.list' | jq '.'

# workspace analytics require Enterprise API or admin scopes; use team analytics endpoints if available

Many vendors expose last_activity or last_login fields. When unavailable, use proxy signals like recent messages, commits, or meeting presence as activity proxies.

Step 2 — Extract billing and spend

Billing often lives outside vendor APIs; collect:

Vendor invoices CSV from procurement
Stripe invoices if you pay vendors via card and vendors offer Stripe access
Corporate card CSVs (Visa, Amex) with vendor name parsing

Example: fetch Stripe invoices (if you manage a vendor account)

#!/usr/bin/env bash
STRIPE_KEY='sk_live_REPLACE'

curl -s -u $STRIPE_KEY: 'https://api.stripe.com/v1/invoices?limit=100' | jq '.'

For corporate card feeds, export CSV, normalize vendor name to your canonical vendor list, and join on nearest match (fuzzy matching with trigram is recommended).

Step 3 — Normalization and schema (Postgres)

Load all normalized rows into a central schema. Example Postgres schema:

-- schema.sql
CREATE TABLE saas_vendor (
  vendor_id serial PRIMARY KEY,
  vendor_key text UNIQUE, -- e.g., 'slack', 'github'
  vendor_name text
);

CREATE TABLE saas_metric (
  id serial PRIMARY KEY,
  vendor_key text NOT NULL,
  metric_type text NOT NULL, -- 'user','session','license','invoice'
  metric_date date NOT NULL,
  metric_value numeric NOT NULL,
  email text, -- nullable for invoice rows
  raw_payload jsonb,
  created_at timestamptz default now()
);

Insert records directly from your extractor scripts. Using single canonical email enables cross-vendor joins.

Step 4 — Core SQL queries to answer the business questions

Monthly Active Users (MAU) per vendor

-- MAU per vendor in the last 30 days
SELECT vendor_key,
       COUNT(DISTINCT email) AS mau_30d
FROM saas_metric
WHERE metric_type = 'session'
  AND metric_date >= current_date - INTERVAL '30 days'
GROUP BY vendor_key
ORDER BY mau_30d DESC;

Seat utilization — active seats vs purchased seats

This assumes invoices contain purchased_seats in raw_payload or a separate license_count metric.

-- seat utilization last invoice month
WITH latest_invoices AS (
  SELECT vendor_key,
         (raw_payload->>'purchased_seats')::int AS purchased_seats,
         date_trunc('month', metric_date) AS month
  FROM saas_metric
  WHERE metric_type = 'invoice'
),
active AS (
  SELECT vendor_key,
         date_trunc('month', metric_date) AS month,
         COUNT(DISTINCT email) AS active_users
  FROM saas_metric
  WHERE metric_type = 'session'
  GROUP BY vendor_key, date_trunc('month', metric_date)
)
SELECT i.vendor_key, i.month, i.purchased_seats, a.active_users,
       round((a.active_users::numeric / GREATEST(i.purchased_seats,1)) * 100,2) AS utilization_pct
FROM latest_invoices i
LEFT JOIN active a ON a.vendor_key = i.vendor_key AND a.month = i.month
ORDER BY utilization_pct ASC;

Cost per active user

-- cost per active user last 30 days
WITH spend AS (
  SELECT vendor_key, SUM((raw_payload->>'amount')::numeric) AS total_spend, date_trunc('month', metric_date) AS month
  FROM saas_metric
  WHERE metric_type = 'invoice'
  GROUP BY vendor_key, date_trunc('month', metric_date)
),
active AS (
  SELECT vendor_key, COUNT(DISTINCT email) AS mau_30d
  FROM saas_metric
  WHERE metric_type = 'session'
    AND metric_date >= current_date - INTERVAL '30 days'
  GROUP BY vendor_key
)
SELECT s.vendor_key, s.total_spend, a.mau_30d,
       round((s.total_spend / GREATEST(a.mau_30d,1))::numeric,2) AS cost_per_active_user
FROM spend s
LEFT JOIN active a ON a.vendor_key = s.vendor_key
ORDER BY cost_per_active_user DESC;

This computes pairwise Jaccard similarity: |A ∩ B| / |A ∪ B| for active users in last 30 days.

-- build sets
WITH users AS (
  SELECT vendor_key, email
  FROM saas_metric
  WHERE metric_type = 'session'
    AND metric_date >= current_date - INTERVAL '30 days'
  GROUP BY vendor_key, email
),
pairs AS (
  SELECT a.vendor_key AS vendor_a, b.vendor_key AS vendor_b,
         COUNT(*) FILTER (WHERE a.email = b.email) AS intersection
  FROM users a
  JOIN users b ON a.email = b.email
  WHERE a.vendor_key < b.vendor_key
  GROUP BY a.vendor_key, b.vendor_key
),
counts AS (
  SELECT vendor_key, COUNT(DISTINCT email) AS cnt
  FROM users
  GROUP BY vendor_key
)
SELECT p.vendor_a, p.vendor_b, p.intersection,
       (p.intersection::numeric / (c1.cnt + c2.cnt - p.intersection)) AS jaccard
FROM pairs p
JOIN counts c1 ON c1.vendor_key = p.vendor_a
JOIN counts c2 ON c2.vendor_key = p.vendor_b
ORDER BY jaccard DESC NULLS LAST;

Find underused paid tools

-- vendor with purchased seats and low utilization
SELECT v.vendor_key, i.purchased_seats, a.active_users,
       round((a.active_users::numeric/i.purchased_seats)*100,2) AS utilization_pct
FROM (
  SELECT vendor_key, (raw_payload->>'purchased_seats')::int AS purchased_seats, date_trunc('month', metric_date) AS month
  FROM saas_metric
  WHERE metric_type='invoice'
) i
LEFT JOIN (
  SELECT vendor_key, COUNT(DISTINCT email) AS active_users, date_trunc('month', metric_date) AS month
  FROM saas_metric
  WHERE metric_type='session'
  GROUP BY vendor_key, date_trunc('month', metric_date)
) a ON a.vendor_key = i.vendor_key AND a.month = i.month
WHERE i.purchased_seats > 0 AND (a.active_users::numeric/i.purchased_seats) < 0.25
ORDER BY utilization_pct ASC;

Step 5 — Automation and orchestration

Use GitHub Actions for a simple monthly pipeline. Example workflow (skeleton):

name: monthly-saas-audit
on:
  schedule:
    - cron: '0 6 1 * *' # first of month UTC
jobs:
  extract-load:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run extractors
        env:
          OKTA_TOKEN: ${{ secrets.OKTA_TOKEN }}
          SLACK_TOKEN: ${{ secrets.SLACK_TOKEN }}
        run: |
          python extract_okta.py --out db
          python extract_slack.py --out db
      - name: Run SQL transforms
        run: psql $DB_CONN -f schema.sql

For larger orgs, use Airflow or Prefect to enrich, validate, and alert when spend spikes or utilization drops below thresholds.

Step 6 — Reporting and stakeholder-ready deliverables

Create a two-page stakeholder PDF and an interactive dashboard. Minimum metrics to show:

Top 20 vendors by spend
MAU and seat utilization for paid tools
Overlap heatmap (Jaccard) to highlight consolidation targets
One-line recommendations per vendor (e.g., renegotiate seats, consolidate, or divest)

Rule of thumb: If utilization < 30% and overlap with other tools > 40%, flag for review.

Step 7 — Decision framework for rationalization

Use a simple RICE-like approach adapted for SaaS: Reach (MAU), Impact (critical workflows), Confidence (data quality), Effort (migration cost). Prioritize tools with low reach, low impact, low confidence, and high cost.

Advanced techniques

1) Jaccard + Clustering for consolidation candidates

Compute pairwise Jaccard, then run hierarchical clustering to find groups of highly overlapping tools. These clusters become consolidation candidates for deeper interviews.

2) Session-weighted usage

Not all sessions are equal: weight sessions by duration or feature use (e.g., editor usage in a dev IDE). Add a metric_weight column to saas_metric and compute weighted MAU.

3) Attribute-based deduplication

When users have multiple emails, use SSO primary email, then fallback to common name + domain matching. Flag ambiguous cases for manual review.

Troubleshooting and data quality tips

Missing last-login? Use alternative signals: Slack message timestamps, GitHub commits, Calendar events.
Vendor names mismatch in card feeds? Maintain a mapping table and use trigram fuzzy matches for ingestion.
API rate limits? Batch and paginate; use incremental cursors where supported.
PII concerns? Hash emails for cross-vendor joins where policy dictates; keep plaintext in a restricted table with strict ACLs.

Practical audit playbook (one-month timeline)

Week 1: Gather permissions, identify data owners, and map data sources.
Week 2: Run extractors for identity, sessions, and billing. Load into warehouse.
Week 3: Run SQL queries, build dashboard, and identify top 10 candidates for review.
Week 4: Conduct stakeholder interviews for each candidate, finalize recommendations, and schedule changes (contract renegotiation, seat reduction).

2026 predictions — the next 18 months for SaaS governance

SSO-first observability: SSO logs will be the canonical source of truth for user activity across SaaS.
SaaSOps platforms mature: Expect more out-of-the-box connectors and recommendations powered by AI for consolidation.
Procurement will adopt quota-based approvals: automated guardrails for new SaaS spend tied to headcount and utilization.

Actionable takeaways

Start small: script SSO exports and join with billing; youll get most ROI quickly.
Measure utilization, not just headcount; cost-per-active-user is the key FinOps metric for SaaS.
Automate monthly. Monthly audits catch sprawl from new AI tools faster than quarterly reviews.
Use overlap (Jaccard) to prioritize consolidation efforts — high overlap + low utilization = low-hanging fruit.

Final checklist before your first stakeholder report

Canonical user identity available and normalized
At least one month of session events ingested
Invoice or procurement data mapped to vendor_key
Dashboard with MAU, utilization, cost-per-active-user, and overlap heatmap

Call to action

Run the scripts above against a small pilot set of vendors (SSO + top 5 paid vendors) this month. Export the key SQL queries into your BI tool and present a one-page recommendation for the top 3 consolidation candidates. If youd like, download our ready-to-deploy extractor repo and Postgres schema to jumpstart your audit — reach out to your internal platform team or the dev-tools.cloud community for templates and peer reviews.

Step-by-Step: How to Run a Company-Wide SaaS Usage Audit (with Scripts)

Hook — Why your SaaS bills still surprise you (and how to stop it)

Executive summary

Why now? 2026 trends that make audits urgent

Prerequisites and data sources

Architecture overview — simple ETL to a warehouse

Key normalization rules

Step 1 — Extract identity and activity data

Okta — list users with lastLogin

Microsoft Graph — get users and signIns

Google Workspace Reports API — usage reports

Slack — users and active sessions

Step 2 — Extract billing and spend

Example: fetch Stripe invoices (if you manage a vendor account)

Step 3 — Normalization and schema (Postgres)

Step 4 — Core SQL queries to answer the business questions

Monthly Active Users (MAU) per vendor

Seat utilization — active seats vs purchased seats

Cost per active user

Find underused paid tools

Step 5 — Automation and orchestration

Step 6 — Reporting and stakeholder-ready deliverables

Step 7 — Decision framework for rationalization

Advanced techniques

1) Jaccard + Clustering for consolidation candidates

2) Session-weighted usage

3) Attribute-based deduplication

Troubleshooting and data quality tips

Practical audit playbook (one-month timeline)

2026 predictions — the next 18 months for SaaS governance

Actionable takeaways

Final checklist before your first stakeholder report

Call to action

Related Topics

dev tools

Up Next

Best Online Text Diff Tools for Developers and Technical Writers

Vite vs Next.js vs Astro for New Web Projects: A Practical Decision Guide

pnpm vs npm vs Yarn: Package Manager Comparison for Modern JavaScript Projects

Hook — Why your SaaS bills still surprise you (and how to stop it)

Executive summary

Why now? 2026 trends that make audits urgent

Prerequisites and data sources

Architecture overview — simple ETL to a warehouse

Key normalization rules

Step 1 — Extract identity and activity data

Okta — list users with lastLogin

Microsoft Graph — get users and signIns

Google Workspace Reports API — usage reports

Slack — users and active sessions

Step 2 — Extract billing and spend

Example: fetch Stripe invoices (if you manage a vendor account)

Step 3 — Normalization and schema (Postgres)

Step 4 — Core SQL queries to answer the business questions

Monthly Active Users (MAU) per vendor

Seat utilization — active seats vs purchased seats

Cost per active user

Overlap matrix (Jaccard index) — which tools share users

Find underused paid tools

Step 5 — Automation and orchestration

Step 6 — Reporting and stakeholder-ready deliverables

Step 7 — Decision framework for rationalization

Advanced techniques

1) Jaccard + Clustering for consolidation candidates

2) Session-weighted usage

3) Attribute-based deduplication

Troubleshooting and data quality tips

Practical audit playbook (one-month timeline)

2026 predictions — the next 18 months for SaaS governance

Actionable takeaways

Final checklist before your first stakeholder report

Call to action

Related Reading

Related Topics

dev tools

Up Next

Best Online Text Diff Tools for Developers and Technical Writers

Vite vs Next.js vs Astro for New Web Projects: A Practical Decision Guide

pnpm vs npm vs Yarn: Package Manager Comparison for Modern JavaScript Projects