Sandy Bridge-E looks awesome

Looking for new hardware to run WRF? Intel or AMD? Check this forum.
Post Reply
Posts: 17
Joined: Sun Jan 02, 2011 12:24 pm

Sandy Bridge-E looks awesome

Post by Boogie » Thu Dec 08, 2011 5:25 pm

I just thought to share the benchmark results of the new Intel Sandy Bridge-E processor.
It's the new flagship Intel Core i7-3960X with six 3.3 GHz cores.

The Spec 2006 benchmark system has WRF 2.0.2 as one of its test components and it gives info on the floating point throughput for various cpu's, including the values from the WRF test. So here are the numbers:

i7-3960X: 233 pts
i7-2600K: 112 pts
AMD 1090T: 83 pts

My current year-old 1090T pales in comparison. And I thought the 2600K was already very good! Even though the new 3960X costs about 1000$, there is the little brother 3930K which cuts the price tag almost in half and only cuts the speed by 100 MHz and L3 cache by 3 MB so I would expect almost identical benchmark values for half the cost.

If anyone gets either of these new processors, please post your WRF EMS benchmarks. I can't wait for them ;)

Posts: 199
Joined: Sun Jun 24, 2012 8:42 pm
Location: Los Angeles, CA, USA

Re: Sandy Bridge-E looks awesome

Post by pattim » Thu Jul 05, 2012 4:42 pm

I don't know quite what to make of all these comparisons. I did run EMS on an Intel chip (i7) and found the extra threads (beyond the number of physical cores) didn't do anything, so that sort of makes all Intel chips half the cores they would normally tout in sales info, since they seem to rely on virtual cores or hyperthreading to get a high core count. OTOH, AMD's hardware cores seem less numerically efficient than Intel's. So 6 AMD hardware cores are going to be slower than 6 Intel hardware cores. Then again, there is a limit to the number of Intel cores you can get in a single box. You can get more than 48 AMD cores in a single box for under $10,000. I'm pretty sure the Intel core count doesn't go that high. Then there the questions of scalability and memory speed, which can dominate everything. The biggest AMD core-count processors don't have as many HT links between cores - I'm not sure how that affects domain scalability, or if the compilers and operating systems are aware enough of the NUMA to automatically arrange the computational domain to maximize scalability.

I'm getting ready to run EMS on an AMD Magny-Cours box to see how "automatic" this is. I'm not enough of a cWizard to attempt to optimize EMS for the AMD NUMA, but it seems there's a lot of potential there. I think it's a multi-level problem that also involves MPICH*. I know it's possible to assign specific jobs (i.e., computational sub-domains) to specific threads on specific core/memory locations with OpenMPI, but I don't know if that's true for MPICH2, which EMS uses - or if WRF attempts to use such capabilities if they did exist. I think one would want to decompose the computational domain so as to place adjacent domains also adjacent in CPU/memory space and that takes knowledge of the AMD HT interconnect topology and WRF itself. For instance, does MPI use PIO or DMA or ...? But you can tell I'm getting a little out of my league here...

Post Reply