The Latency That Wouldn’t Die

It was February 2022.

We had just installed and configured our new Dell PowerStore 500T SAN — a big step forward in our modernization journey.

I started migrating workloads off our outgoing Dell Compellent array. What used to take minutes suddenly took hours. Migrations crawled. Systems lagged. Something wasn’t right.

At first, I assumed it was storage or zoning. Then, our DBA mentioned something that changed everything:

“We’re seeing noise in the database. Failover events between AG nodes are spiking.”

At least now I had a direction.

I reached out to [Redacted] at Sidepath, who recommended opening a Dell ticket for the PowerStore. We did — but it went nowhere. Next, our CEO asked me to talk to an IT consulting firm. Their solution? “You need better monitoring.”

We already had monitoring. I didn’t want more dashboards.

I wanted the latency to go away.

So I reached out to Brent Ozar — one of the best SQL Server minds in the business.

Within 24 hours, Brent sent back a detailed report that flipped the investigation on its head.

The problem wasn’t the SAN at all — it was SQL Server:

TempDB was undersized and constantly shrinking and growing.

Instant File Initialization wasn’t enabled.

Memory was way too low.

SQL was four years behind on patches.

Brent gave me six actionable items.

I implemented every one of them — increasing memory to 64GB, growing TempDB to 750GB, turning on IFI, patching SQL, and tuning the memory configuration.

The database was happy again.

But the latency persisted elsewhere.

So I dug deeper. This time, into the fabric layer.

I reached out to Brocade, our Fibre Channel switch vendor.

After months of back-and-forth with Tier 3, I pulled in [Redacted] from Sidepath — a networking genius who’s saved me more times than I can count.

[Redacted] disabled Port 6 on our SANB1 switch as a test.

We run a dual-fabric SAN, which he originally helped architect.

The moment he disabled that port — the system was instantly happy again.

Brocade confirmed it: one HBA port had been poisoning the entire fabric.

We replaced the FC switches and the HBA, and everything returned to normal.

No new storage arrays. No costly consulting engagements.

Just persistence, collaboration, and a refusal to settle for “it’s fine.”

Sometimes, solving the problem means following the trail all the way down the stack — from SQL to storage to SAN — until you find the real culprit.

And sometimes, it takes the right people — like Brent Ozar, [Redacted] , and [Redacted] — to help you see what’s been hiding in plain sight.

Ever had a problem that turned out to be way deeper than you thought?

What started as SQL troubleshooting turned into a six-month journey through every layer of the stack — but we came out stronger (and a lot wiser).

Leave a comment