The Air-gap Network That Saved Us
How a 2014 Compliance Requirement Became Our 2018 Ransomware Recovery Plan
In cybersecurity, you don’t always build systems for the problems you expect.
Sometimes, the thing that saves you was never meant to exist for that reason at all.
That’s exactly what happened to us.
2014 — A Contract, a Quake, and a Wall
Back in 2014, our largest client at the time—HHC’s Home Health Services division in New York—asked us to strengthen our disaster-recovery posture.
Their leadership was convinced California was overdue for a major earthquake.
To protect their data, they wanted our production environment moved out of state.
So, we migrated everything from our Anaheim office into the IO Datacenter (now Iron Mountain) in Phoenix, AZ.
Just as the dust settled, another requirement surfaced.
HHC was transitioning parts of its business to Epic, a competitor of ours.
Epic’s legal team insisted that a “Chinese wall” be built—an isolated environment separating our infrastructure from theirs.
Most people would have seen that as an administrative burden.
I saw it as an opportunity to design a true air-gap network.
Designing the “Epic” Network
I built the new network from scratch:
- Cisco ASA 5516-X firewall
- Cisco 2960 access switches
- Dell PowerEdge R640 server with a RAID-6 array (pre-vSAN era)
- A brand-new Active Directory forest: acme.local
We’d originally planned to establish a site-to-site VPN between us and HHC.
But something in my gut said no.
“If one of our users ever gets infected,” I thought,
“I don’t want to be responsible for compromising a hospital network.”
So, I left it air-gapped—physically segmented, no VPN, no shared authentication.
The project was completed, tested, and ready to go.
And then… the deal fell apart.
For the next few years, that pristine, isolated network sat idle in the rack—powered on but unused, a technological ghost town.
2018 — When the Lights Went Out
Fast-forward four years.
It’s May 2018, and we get hit by ransomware.
Not a scare, not a test—an actual, spreading, file-encrypting outbreak.
Within hours, the productiondomain.com domain was compromised.
Authentication broke, servers were locking up, and users couldn’t work.
Tawny—our CIO—and I met in the war room to figure out a path forward.
Backups were intact, but restoration would take time.
We needed users operational now.
Then I remembered the dormant environment.
The Epic network.
Completely separate. Completely clean.
Activating the Lifeboat
I went to the rack, pulled the patch cables from the productiondomain.com switch stack, and moved them over to the acme.local patch panel.
Within minutes, users were authenticating into a domain that hadn’t seen a login in years—and it worked flawlessly.
It wasn’t elegant.
It wasn’t automated.
But it was segmented, and that made all the difference.
While users continued working through the isolated environment, I spent the next two days performing bare-metal restores on production servers.
Using Data Instant Replay snapshots (Dell Compellent’s early snapshot tech), I rebuilt each volume from the last known good image.
Our web applications were already externally accessible, so end-users didn’t even need internal network access.
The “Epic” segment effectively became a temporary production bubble, allowing the business to operate while we rebuilt everything else.
Why It Worked
That improvised recovery plan succeeded for one reason: segmentation.
When the ransomware hit, every system trusted by the main domain was vulnerable.
But the Epic network lived outside that trust boundary—different domain, different hardware, no shared credentials, no shared replication.
It was, quite literally, a Chinese wall—one that turned out to be both legal and lifesaving.
Technical Breakdown
| Layer | productiondomain.com | acme.local |
|---|---|---|
| Domain | Primary production AD | Isolated AD forest |
| Network | Shared corporate LAN | Physically segmented VLAN/switch fabric |
| Firewall | Shared perimeter | Dedicated Cisco ASA 5516-X |
| Connectivity | Internet + internal VPNs | Stand-alone, no site-to-site tunnels |
| Storage | Compellent SAN w/ Data Instant Replay | Local RAID-6 on R640 |
| Dependency | Centralized | Autonomous |
That separation meant the ransomware couldn’t traverse authentication trusts or shared SMB paths.
The infected network stayed quarantined while the alternate domain served as a clean recovery enclave.
Business Continuity, Not Just Recovery
Most disaster-recovery plans focus on restoring systems.
This event taught me that continuity—keeping users productive during the outage—is just as critical.
Because we had built a completely autonomous environment years earlier, we had an instant fallback that required no cloud orchestration, no vendor tickets, no third-party dependencies.
It was the purest form of Zero Dependence: self-contained, offline-capable, and secure by design.
Decommissioning the Legend
That “temporary” network kept running quietly for years.
It became our shadow infrastructure—always on, rarely touched, always ready.
Seventeen months ago, we migrated most workloads into Azure and began retiring legacy colocation gear.
Just 75 days ago, we finally powered down the Irvine colocation—the last physical reminder of the Epic network that once saved our company.
I pulled the patch cords myself, thinking: This rack survived a ransomware event because we over-engineered for compliance.
It was bittersweet.
The system that was built for a contract clause ended up protecting our entire business.
Lessons Learned
- Segmentation Isn’t Optional
The cleanest firewall in the world won’t save you if your trust model is flat.
True isolation—physical or logical—is your final defense line. - Design for “What If,” Not “What Now.”
The best contingencies often come from saying no to convenience.
Refusing that site-to-site VPN in 2014 prevented a hospital breach in 2018. - Compliance Can Create Resilience.
What started as a legal formality built a structure strong enough to carry us through an existential event. - Snapshots Are Not Backups—But They’re Lifesavers.
Data Instant Replay gave us point-in-time recovery without needing to rebuild entire servers from tape or cloud storage. - Document Everything.
When chaos hits, clarity is currency.
Because I had diagrams and notes from 2014, recovery in 2018 wasn’t guesswork—it was replaying a known sequence.
From “What Could Go Wrong” to “What Went Right”
Looking back, it’s clear: the network we built for someone else became the network that saved us.
If I had built that system with convenience in mind—shared DNS, single sign-on, one firewall—it would have fallen like the rest.
Instead, caution, redundancy, and a healthy dose of paranoia created a digital lifeboat.
That’s the paradox of cybersecurity leadership:
The systems people complain about as “overkill” in good times are the same ones that make recovery possible when everything goes dark.
The Broader Lesson
In architecture, redundancy is often seen as waste.
In cybersecurity, it’s insurance.
But in leadership, redundancy is foresight—the ability to imagine future failure and build compassionately against it.
When I hear teams debate the cost of segmentation or the time required to implement Zero Trust, I think back to that summer.
A few extra cables, a second domain, and a firewall bought us something priceless: time.
Time to restore.
Time to think.
Time to lead.
Closing Thoughts
That experience reshaped how I define success in cybersecurity.
It’s not about preventing every incident; it’s about designing a world where incidents don’t end you.
So, the next time someone tells you a “Chinese wall” or air-gap is overkill, remember this story.
Because sometimes the wall you build to keep others out is the one that keeps you standing.
#CyberResilience #ZeroDependence #Leadership #IncidentResponse