Last year, we planned to move our data warehouse from a 2 socket server to a 4 socket server that was left in the data centre from an abandoned project to virtualize our desktops.

  • The 2 core server: Dell R610, with 2 Intel E5640 @2.66Ghz and 64GB of RAM
  • The 4 core server: Dell R910, with 4 Intel X7550 @2.00Ghz and 128GB of RAM
Both were 64bit OS running SQL Server Enterprise. We decided to test the server before moving directly, so our first test was to load a large import (40 GB) using SSIS from a temporary table in one database to the data warehouse and process the relative cube.

To cut a long story short; the performance was far worse, despite more RAM, more cores, and attached to a DAS via SAS.

The first thing we understood was that increasing the RAM would not make a lot of difference. The data warehouse is over 6TB, and if we have multiple importing jobs running, plus all the temp tables we use when switching in partitions, and all the selects being run when processing cubes at the end of the ETL phase, that 64GB extra RAM isn’t really going to help a great deal.

We were concerned that the DAS was not performing as we’d hoped, as it was one of the big changes we’d made. However through firing up the perflogs and checking the appropriate counters we could find no issues.

One of the things that we noticed was that the new 4 socket server had a slower clock count than the 2 socket machine. Could this be the source to our perceived performance drop? After a bit of investigation, and testified by Glenn Berry at SQL Performance, a larger server is not necessarily a faster server:

One common misconception is that bigger Intel-based servers (in terms of socket counts) are faster servers. This is simply not true, for a number of reasons. The sales volume and market share of two- socket servers is much higher than it is for four-socket and larger servers. There is also less engineering and validation work required for two-socket capable Intel processors compared to four-socket capable Intel processors. Because of these factors, Intel releases new processor architectures more frequently and earlier for lower socket count servers. Currently, Intel’s single-socket E3 family is using the 22nm Ivy Bridge and the two-socket E5 family is using the 32nm Sandy Bridge-EP, while Intel E7 family is using the older 32nm Westmere-EX microarchitecture.
So the more sockets a server had, the older the technology in the CPU, and the slower the actual cores are, resulting in slower core performance. This explanation certainly matched our experience. If use the Intel site to compare the processors, we can see that the X7550 is nehalem, the Tock release cycle, whilst the E5640 is the Tick of the Nehalem processor. For a detailed explanation of the Tick Tock release cycle, follow this link:


For a full compare, follow this link:,47923

So we wrote a PowerShell script that loops over calculating sine and returns the duration. Run the script below and you will get an accurate reflection of how long it takes a single core calculation to run.

powershell.exe -sta
$sw = [Diagnostics.Stopwatch]::StartNew()
do{[math]::sin(1); $i++}
while ($i -le 10000)

Based on the results we got back from the machines we noticed that the R910 was slower than the R610, by about 30%. Coincidentally the actual performance of the R910 with SQL was about 30% worse than the R610, so the numbers matched up. It was clear that the size of our workload benefited from faster cores with a decent capacity than slower cores with a larger capacity. So we attached the DAS to the old server, and the other server is used as a Hyper V host, which Ops were very happy about as they thought they were going to get the older machine, with less cores and RAM. We felt that maybe the extra L3 cache would have helped for a data warehouse workload, and maybe for heavier loads this would hlep, but again for our workload we’d need a processor with a faster clock speed than larger cache at this juncture. If we were to upgrade now, we’d stick with a two socket server, and if I had the luxury to test, I’d start with the Xeon E5-2690, which has eight-cores plus hyper-threading clocked at 2.90GHz (Turbo Boost at 3.8GHz), and the L3 cache set at 20MB: same amount of cores for licencing, faster clock and larger L3 cache.


It was reassuring to find that article by Glenn Berry. After finding that the new box was slower we spent a lot of time going through the SQL configuration making sure that both were optimized (network packet size, max RAM settings, Hyper Threading enabled etc) and we were confident that both boxes were configured correctly.

I’m sure that for some people, 4 sockets is the way to go as a 2 socket machine couldn’t cope with the capacity of their workload. And for us, we assumed our workload would benefit from more cores. But it was clear from our findings that the current 2 Socket box with faster cores was better suited to our workload than a 4 socket behemoth. It also saved us a lot of money in terms of licensing as SQL Server Enterprise is licensing model is core based.