Resolving SQL Server Performance Issues: A Case Study

It was February 2022.
We had just finished installing and configuring our Dell PowerStore 500T SAN — the next step in modernizing our infrastructure.

I started migrating workloads from the old Dell Compellent SAN to the new PowerStore. What used to take minutes suddenly took hours. The system crawled. Something wasn’t right.

At first, I assumed it was storage or network related — maybe throughput, maybe zoning. But then our DBA mentioned something interesting:

“We’re seeing noise in the database. Failover events between the AG nodes are spiking.”

That was the clue.
The issue wasn’t in the network — it was deeper.

I reached out to Brent Ozar, a SQL Server consultant I’d followed for years. Within a day, Brent sent back a detailed analysis that completely reframed the problem:

  • TempDB was undersized and constantly shrinking/growing.
  • Instant File Initialization wasn’t enabled.
  • Memory allocation was far below best practice.
  • SQL Server was four years behind in patching.

I spent the next few days implementing his six recommendations:
✅ Enabled Instant File Initialization
✅ Increased memory to 64 GB on both nodes
✅ Grew TempDB from 250 GB to 750 GB
✅ Patched SQL Server 2016 to SP3
✅ Reconfigured max memory settings
✅ Rebalanced the AG nodes

And just like that — performance returned to normal. No additional consulting costs. No new hardware. Just analysis, execution, and a little persistence.

What started as a latency issue turned into a lesson in causality over correlation.
Sometimes the problem you’re chasing isn’t the one you need to fix — and the right data (and a few good people) can make all the difference.

Skills: Root Cause Analysis • IT Infrastructure Analytics • SQL Optimization • Resiliency Engineering • Causality over Correlation

Leave a comment