Step-by-Step: How to Run a Company-Wide SaaS Usage Audit (with Scripts)
A hands-on guide with runnable scripts and SQL to audit SaaS usage, measure active users, cost, and overlap for tool rationalization.
Hook — Why your SaaS bills still surprise you (and how to stop it)
You run a modern engineering org in 2026: dozens of cloud-first tools, scattered admin consoles, and a procurement process that still relies on expense reports. The result is predictable — recurring bills you cant fully justify, overlapping functionality across teams, and license seats that sit unused. This guide gives a practical, step-by-step playbook with runnable scripts, SQL queries, and API calls so you can quantify active users, sessions, spend, and overlap across SaaS vendors and build an evidence-based tool-rationalization plan.
Executive summary
Goal: Produce a repeatable, automated SaaS usage audit that answers: who uses which tool, how often, how much we pay, and where duplication creates waste. By the end of this article you will have:
- Data collection scripts for common sources (SSO, Google Workspace, Slack, vendor billing)
- Normalization and schema to load into a central Postgres or BigQuery warehouse
- SQL queries that compute MAU, seat utilization, cost per active user, and overlap (Jaccard index)
- An automation pattern to run this monthly and produce stakeholder reports
Why now? 2026 trends that make audits urgent
Late 2025 and early 2026 saw three industry shifts that change the rules:
- SSO telemetry consolidation became mainstream: Okta, Azure AD, and Google Workspace logs are now routinely pushed to SIEMs and warehouses, making user activity easier to centralize.
- Procurement + FinOps alignment accelerated: more teams apply FinOps principles to SaaS line items, demanding cost-per-active-user metrics.
- AI tool proliferation exploded: dozens of AI/assistants per team means faster sprawl unless you audit frequently.
Audit cadence in 2026: run a lightweight audit monthly and a deep audit quarterly.
Prerequisites and data sources
Collect these sources to get a complete picture:
- Identity provider logs (Okta, Azure AD / Microsoft Graph, Google Workspace Reports)
- SaaS admin APIs (Slack, Atlassian, GitHub, Zoom, Salesforce)
- Billing and procurement (Stripe/Paddle invoices for SaaS vendors, ERP exports, corporate card feeds)
- Expense and purchase orders via CSV or API from Spend platforms
- SSO last-login and session events for activity-based thresholds
Architecture overview — simple ETL to a warehouse
We recommend this flow: API extraction -> normalization layer -> warehouse (Postgres / BigQuery) -> reporting (Metabase / Looker / Superset). Automate with GitHub Actions or Airflow.
Key normalization rules
- Canonicalize user emails to lowercase and trim aliases (e.g., user+tag@example.com -> user@example.com)
- Use primary email from SSO as the canonical identity
- Standardize timestamps to UTC and ISO 8601
- Record vendor, app_name, metric_type (user, session, license, invoice), metric_value, and date
Step 1 — Extract identity and activity data
Below are runnable snippets to extract users and activity from common sources. Replace TOKEN, ORG, and other placeholders before running.
Okta — list users with lastLogin
#!/usr/bin/env bash
OKTA_DOMAIN='org-example.okta.com'
API_TOKEN='REPLACE_WITH_OKTA_TOKEN'
curl -s -H 'Accept: application/json' \
-H 'Authorization: SSWS '$API_TOKEN\
'https://'$OKTA_DOMAIN'/api/v1/users?limit=200' | jq '.'
Okta's user object includes profile.login and lastLogin via the System Log API if you need session events.
Microsoft Graph — get users and signIns
#!/usr/bin/env bash
TENANT_ID='REPLACE'
CLIENT_ID='REPLACE'
CLIENT_SECRET='REPLACE'
# get OAuth token
TOKEN=$(curl -s -X POST 'https://login.microsoftonline.com/'$TENANT_ID'/oauth2/v2.0/token' \
-d 'grant_type=client_credentials&scope=https://graph.microsoft.com/.default&client_id='$CLIENT_ID'&client_secret='$CLIENT_SECRET' | jq -r '.access_token')
# list users
curl -s -H 'Authorization: Bearer '$TOKEN \
'https://graph.microsoft.com/v1.0/users' | jq '.'
# sign-in logs (requires audit logs permissions)
curl -s -H 'Authorization: Bearer '$TOKEN \
'https://graph.microsoft.com/v1.0/auditLogs/signIns' | jq '.'
Google Workspace Reports API — usage reports
# Use a service account and gcloud auth; example using python google-api-client
from googleapiclient.discovery import build
from google.oauth2 import service_account
SCOPES=['https://www.googleapis.com/auth/admin.reports.usage.readonly']
CRED=service_account.Credentials.from_service_account_file('sa.json', scopes=SCOPES)
SERVICE=build('admin', 'reports_v1', credentials=CRED)
res=SERVICE.userUsageReport().get(userKey='all', date='2026-01-01').execute()
print(res)
Slack — users and active sessions
#!/usr/bin/env bash
SLACK_TOKEN='xoxp-REPLACE'
# list users
curl -s -H 'Authorization: Bearer '$SLACK_TOKEN \
'https://slack.com/api/users.list' | jq '.'
# workspace analytics require Enterprise API or admin scopes; use team analytics endpoints if available
Many vendors expose last_activity or last_login fields. When unavailable, use proxy signals like recent messages, commits, or meeting presence as activity proxies.
Step 2 — Extract billing and spend
Billing often lives outside vendor APIs; collect:
- Vendor invoices CSV from procurement
- Stripe invoices if you pay vendors via card and vendors offer Stripe access
- Corporate card CSVs (Visa, Amex) with vendor name parsing
Example: fetch Stripe invoices (if you manage a vendor account)
#!/usr/bin/env bash
STRIPE_KEY='sk_live_REPLACE'
curl -s -u $STRIPE_KEY: 'https://api.stripe.com/v1/invoices?limit=100' | jq '.'
For corporate card feeds, export CSV, normalize vendor name to your canonical vendor list, and join on nearest match (fuzzy matching with trigram is recommended).
Step 3 — Normalization and schema (Postgres)
Load all normalized rows into a central schema. Example Postgres schema:
-- schema.sql
CREATE TABLE saas_vendor (
vendor_id serial PRIMARY KEY,
vendor_key text UNIQUE, -- e.g., 'slack', 'github'
vendor_name text
);
CREATE TABLE saas_metric (
id serial PRIMARY KEY,
vendor_key text NOT NULL,
metric_type text NOT NULL, -- 'user','session','license','invoice'
metric_date date NOT NULL,
metric_value numeric NOT NULL,
email text, -- nullable for invoice rows
raw_payload jsonb,
created_at timestamptz default now()
);
Insert records directly from your extractor scripts. Using single canonical email enables cross-vendor joins.
Step 4 — Core SQL queries to answer the business questions
Monthly Active Users (MAU) per vendor
-- MAU per vendor in the last 30 days
SELECT vendor_key,
COUNT(DISTINCT email) AS mau_30d
FROM saas_metric
WHERE metric_type = 'session'
AND metric_date >= current_date - INTERVAL '30 days'
GROUP BY vendor_key
ORDER BY mau_30d DESC;
Seat utilization — active seats vs purchased seats
This assumes invoices contain purchased_seats in raw_payload or a separate license_count metric.
-- seat utilization last invoice month
WITH latest_invoices AS (
SELECT vendor_key,
(raw_payload->>'purchased_seats')::int AS purchased_seats,
date_trunc('month', metric_date) AS month
FROM saas_metric
WHERE metric_type = 'invoice'
),
active AS (
SELECT vendor_key,
date_trunc('month', metric_date) AS month,
COUNT(DISTINCT email) AS active_users
FROM saas_metric
WHERE metric_type = 'session'
GROUP BY vendor_key, date_trunc('month', metric_date)
)
SELECT i.vendor_key, i.month, i.purchased_seats, a.active_users,
round((a.active_users::numeric / GREATEST(i.purchased_seats,1)) * 100,2) AS utilization_pct
FROM latest_invoices i
LEFT JOIN active a ON a.vendor_key = i.vendor_key AND a.month = i.month
ORDER BY utilization_pct ASC;
Cost per active user
-- cost per active user last 30 days
WITH spend AS (
SELECT vendor_key, SUM((raw_payload->>'amount')::numeric) AS total_spend, date_trunc('month', metric_date) AS month
FROM saas_metric
WHERE metric_type = 'invoice'
GROUP BY vendor_key, date_trunc('month', metric_date)
),
active AS (
SELECT vendor_key, COUNT(DISTINCT email) AS mau_30d
FROM saas_metric
WHERE metric_type = 'session'
AND metric_date >= current_date - INTERVAL '30 days'
GROUP BY vendor_key
)
SELECT s.vendor_key, s.total_spend, a.mau_30d,
round((s.total_spend / GREATEST(a.mau_30d,1))::numeric,2) AS cost_per_active_user
FROM spend s
LEFT JOIN active a ON a.vendor_key = s.vendor_key
ORDER BY cost_per_active_user DESC;
Overlap matrix (Jaccard index) — which tools share users
This computes pairwise Jaccard similarity: |A ∩ B| / |A ∪ B| for active users in last 30 days.
-- build sets
WITH users AS (
SELECT vendor_key, email
FROM saas_metric
WHERE metric_type = 'session'
AND metric_date >= current_date - INTERVAL '30 days'
GROUP BY vendor_key, email
),
pairs AS (
SELECT a.vendor_key AS vendor_a, b.vendor_key AS vendor_b,
COUNT(*) FILTER (WHERE a.email = b.email) AS intersection
FROM users a
JOIN users b ON a.email = b.email
WHERE a.vendor_key < b.vendor_key
GROUP BY a.vendor_key, b.vendor_key
),
counts AS (
SELECT vendor_key, COUNT(DISTINCT email) AS cnt
FROM users
GROUP BY vendor_key
)
SELECT p.vendor_a, p.vendor_b, p.intersection,
(p.intersection::numeric / (c1.cnt + c2.cnt - p.intersection)) AS jaccard
FROM pairs p
JOIN counts c1 ON c1.vendor_key = p.vendor_a
JOIN counts c2 ON c2.vendor_key = p.vendor_b
ORDER BY jaccard DESC NULLS LAST;
Find underused paid tools
-- vendor with purchased seats and low utilization
SELECT v.vendor_key, i.purchased_seats, a.active_users,
round((a.active_users::numeric/i.purchased_seats)*100,2) AS utilization_pct
FROM (
SELECT vendor_key, (raw_payload->>'purchased_seats')::int AS purchased_seats, date_trunc('month', metric_date) AS month
FROM saas_metric
WHERE metric_type='invoice'
) i
LEFT JOIN (
SELECT vendor_key, COUNT(DISTINCT email) AS active_users, date_trunc('month', metric_date) AS month
FROM saas_metric
WHERE metric_type='session'
GROUP BY vendor_key, date_trunc('month', metric_date)
) a ON a.vendor_key = i.vendor_key AND a.month = i.month
WHERE i.purchased_seats > 0 AND (a.active_users::numeric/i.purchased_seats) < 0.25
ORDER BY utilization_pct ASC;
Step 5 — Automation and orchestration
Use GitHub Actions for a simple monthly pipeline. Example workflow (skeleton):
name: monthly-saas-audit
on:
schedule:
- cron: '0 6 1 * *' # first of month UTC
jobs:
extract-load:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run extractors
env:
OKTA_TOKEN: ${{ secrets.OKTA_TOKEN }}
SLACK_TOKEN: ${{ secrets.SLACK_TOKEN }}
run: |
python extract_okta.py --out db
python extract_slack.py --out db
- name: Run SQL transforms
run: psql $DB_CONN -f schema.sql
For larger orgs, use Airflow or Prefect to enrich, validate, and alert when spend spikes or utilization drops below thresholds.
Step 6 — Reporting and stakeholder-ready deliverables
Create a two-page stakeholder PDF and an interactive dashboard. Minimum metrics to show:
- Top 20 vendors by spend
- MAU and seat utilization for paid tools
- Overlap heatmap (Jaccard) to highlight consolidation targets
- One-line recommendations per vendor (e.g., renegotiate seats, consolidate, or divest)
Rule of thumb: If utilization < 30% and overlap with other tools > 40%, flag for review.
Step 7 — Decision framework for rationalization
Use a simple RICE-like approach adapted for SaaS: Reach (MAU), Impact (critical workflows), Confidence (data quality), Effort (migration cost). Prioritize tools with low reach, low impact, low confidence, and high cost.
Advanced techniques
1) Jaccard + Clustering for consolidation candidates
Compute pairwise Jaccard, then run hierarchical clustering to find groups of highly overlapping tools. These clusters become consolidation candidates for deeper interviews.
2) Session-weighted usage
Not all sessions are equal: weight sessions by duration or feature use (e.g., editor usage in a dev IDE). Add a metric_weight column to saas_metric and compute weighted MAU.
3) Attribute-based deduplication
When users have multiple emails, use SSO primary email, then fallback to common name + domain matching. Flag ambiguous cases for manual review.
Troubleshooting and data quality tips
- Missing last-login? Use alternative signals: Slack message timestamps, GitHub commits, Calendar events.
- Vendor names mismatch in card feeds? Maintain a mapping table and use trigram fuzzy matches for ingestion.
- API rate limits? Batch and paginate; use incremental cursors where supported.
- PII concerns? Hash emails for cross-vendor joins where policy dictates; keep plaintext in a restricted table with strict ACLs.
Practical audit playbook (one-month timeline)
- Week 1: Gather permissions, identify data owners, and map data sources.
- Week 2: Run extractors for identity, sessions, and billing. Load into warehouse.
- Week 3: Run SQL queries, build dashboard, and identify top 10 candidates for review.
- Week 4: Conduct stakeholder interviews for each candidate, finalize recommendations, and schedule changes (contract renegotiation, seat reduction).
2026 predictions — the next 18 months for SaaS governance
- SSO-first observability: SSO logs will be the canonical source of truth for user activity across SaaS.
- SaaSOps platforms mature: Expect more out-of-the-box connectors and recommendations powered by AI for consolidation.
- Procurement will adopt quota-based approvals: automated guardrails for new SaaS spend tied to headcount and utilization.
Actionable takeaways
- Start small: script SSO exports and join with billing; youll get most ROI quickly.
- Measure utilization, not just headcount; cost-per-active-user is the key FinOps metric for SaaS.
- Automate monthly. Monthly audits catch sprawl from new AI tools faster than quarterly reviews.
- Use overlap (Jaccard) to prioritize consolidation efforts — high overlap + low utilization = low-hanging fruit.
Final checklist before your first stakeholder report
- Canonical user identity available and normalized
- At least one month of session events ingested
- Invoice or procurement data mapped to vendor_key
- Dashboard with MAU, utilization, cost-per-active-user, and overlap heatmap
Call to action
Run the scripts above against a small pilot set of vendors (SSO + top 5 paid vendors) this month. Export the key SQL queries into your BI tool and present a one-page recommendation for the top 3 consolidation candidates. If youd like, download our ready-to-deploy extractor repo and Postgres schema to jumpstart your audit — reach out to your internal platform team or the dev-tools.cloud community for templates and peer reviews.
Related Reading
- Cheap vs. Premium E-Bikes: A Side-by-Side Comparison for Buyers
- Build a Home Gym for Under £300: Best Bargains on Dumbbells, Mats and Storage
- Converting Commercial Offices to Residences: Parking Challenges and Solutions from Toronto’s Brokerage Shake-Up
- Audit Your Remote Team’s Tool Stack: A Practical Framework to Avoid Tool Bloat
- Venice Photo Map: Celebrity Arrival Points and Low‑Crowd Alternatives
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
The Importance of Shadow IT: How to Manage Unapproved Tools in Your Development Stack
Creating Effective Guardrails: Security Patterns for Dev Tools
Harnessing Minimalist Development Tools for Enhanced Productivity
The Cost of Tool Bloat: How to Tackle Martech Debt in Your Dev Stack
Is Your Tech Subscription Costing You? Strategies for Cost Optimization
From Our Network
Trending stories across our publication group