← Back to Case Studies Network Engineering

Multi-Site Config Drift Detection & Remediation Tracking

Peer-consensus configuration auditing across a large multi-site switch fleet — finds drift by comparing each device against its role-peers (no golden config required), then tracks every finding to closure in a live dashboard with per-round diffs and email alerts.

Python SecureCRT Google Apps Script Google Sheets pandas

Problem

A large multi-site enterprise network undergoing standardization had hundreds of access and distribution switches that had drifted over years of independent, site-by-site changes. There was no single “golden” config to compare against — the correct configuration legitimately varied by device role and site, so a naive template diff produced thousands of false positives and got ignored.

The team needed two things a spreadsheet couldn’t give them: a way to separate genuine outliers from legitimate per-role variation, and a way to track remediation across many engineers and many weeks without the status living in one person’s head.

Solution

Built a peer-consensus comparison engine. Instead of diffing against a baseline that didn’t exist, each device is compared against the consensus of its role-peers. For any given setting — VLANs, SVIs, ACLs, routing knobs, service config — if fourteen of fifteen peers agree and one differs, that one is the drift. This finds real outliers without needing an authoritative golden config.

The output drives a live Google Sheets dashboard with four views: a Summary, a Findings list, a Drift Matrix (device × setting), and a Remediation tab that tracks each finding from open to resolved across successive crawl snapshots. Every new crawl diffs against the prior one and emails exactly what changed since last time.

Architecture

SecureCRT collection script  (runs from the management jump host)
    │  raw running-configs

Python parse + compare engine
    ├── normalize per device/role
    ├── peer-consensus scoring  ──→ outliers, not template noise
    └── decommission map        ──→ removals score as progress
    │  CSVs (summary / findings / matrix / remediation)

Google Drive  →  Apps Script  →  Sheet (4 tabs) + email-on-change

Each crawl is dated and staged, so the dashboard shows velocity — findings resolved per round — not just a point-in-time count.

Key Decisions

Peer-consensus over golden-config. No authoritative baseline existed and building one by hand would have been its own multi-week project. Consensus surfaces the outliers directly from the fleet itself.

Retirement-aware scoring. Some configuration was being removed as part of the standardization. Peer-consensus alone scores a removal backwards — the device that already removed it looks like the outlier. A decommission map flags those elements so presence counts as remaining work and removal counts as progress.

Snapshot-over-snapshot diffing. Every crawl is staged and dated. The dashboard reports how many findings closed each round and emails the delta, so stakeholders see momentum without asking for a status update.

Dashboard in Google Sheets. The consumers were engineers and managers who already live in Sheets. A shareable link, built-in email alerts, and zero new tooling to learn beat a bespoke web app nobody would open.

Results

  • Hundreds of real configuration findings triaged out of thousands of raw differences
  • Genuine outliers cleanly separated from legitimate per-role variation
  • Remediation tracked to closure across multiple engineers and many crawl rounds
  • Per-round diffs and email summaries so progress is visible without status meetings
  • Reusable pattern — the same SecureCRT → Drive → Apps Script flow now drives several other dashboards

How This Scales

  • Source-of-truth blend — pull intended config from NetBox and combine intent with peer-consensus for higher-confidence findings.
  • Auto-generated remediation — emit the corrective CLI per finding, ready to stage into a config-push workflow.
  • Multi-vendor parsing — extend the parser beyond a single OS family using the same scoring engine.
  • Continuous mode — nightly crawl plus a drift journal that alerts the moment a remediated device regresses.

Tech Stack

  • Collection: SecureCRT scripting (run from the management jump host)
  • Engine: Python, pandas (parse, normalize, peer-consensus scoring)
  • Dashboard: Google Apps Script + Google Sheets (Summary / Findings / Drift Matrix / Remediation)
  • Alerting: scheduled diffs with email-on-change

Need something similar?

I've built this before. Let's talk about adapting it for your needs.