Memory bandwidth as bottleneck for WRF performance

Looking for new hardware to run WRF? Intel or AMD? Check this forum.
Post Reply
meteoadriatic
Posts: 1603
Joined: Wed Aug 19, 2009 10:05 am

Memory bandwidth as bottleneck for WRF performance

Post by meteoadriatic » Mon Jun 18, 2012 11:37 am

I have found some interesting stuff about how memory bandwidth limits WRF performance even if CPU could deliver more...

Quote about WRF performance from
http://en.community.dell.com/techcenter ... /2356.aspx
... in the Westmere case, when we go beyond 8 cores per node, we see diminishing returns of 12% performance improvement when increasing cores by 25% from 8 to 10. The drop off is more significant between 10 and 12 cores as the code only gets a 2.7% increase in performance with 20% more cores!
You can find some WRF benchmarks there, it is worth reading!

I got there after doing some research, following this quote from
http://software.intel.com/en-us/article ... hitecture/
WRF is very sensitive to memory bandwidth.
So I decided to do my own tests to see how memory speed affects WRF performance in my environment. I used my standard testing platform, Core2Quad Q9450 with 4GB DDR2 RAM running at stock memory clock of 800MHz. For this research I put system FSB from 350 back to stock 333MHz.

Using memtest utility I found that my memory has bandwidth of 4505MB/s when set to 800MHz. Then I run my NMM domain test (not WRFEMS benchmark case!) and got result of 30 minutes and 40 seconds.

After that I downclocked my memory modules to 667MHz (they are rated 800MHz so I probably can't go one step up.. that's why I downclocked them one step down, to 667MHz). Memtest utility now reads only 3958MB/s memory bandwidth. CPU still runs on stock clock (2.66GHz) and FSB is unchanged (333MHz). Let's see if this memory slowdown will create bottleneck for WRF on my system! I run same domain as before and now got result of 34 minutes and 37 seconds! Wooohooo, we really have bottleneck!

Let's see some numbers...
800MHz : 667MHz = 1.20 (20% slower clock)
4505MB/s : 3958MB/s = 1.14 (14% less bandwidth)
34m37s : 30m40s = 1.13 (13% more time)

What we can see from this example is that WRF slowdown is almost equal in percents as memory bandwidth lose, what tells me that memory bandwidth pretty much determines maximum performance that CPU gives me.

Does this mean that when we purchase new hardware for WRF, need to look more to memory bandwidth than CPU power?? It looks like that, but that is only valid if CPU power exceeds memory bandwith! If bandwidth is large enough, then it won't create bottleneck, so adding CPU power will boost WRF performance significantly. If instead, you have powerful CPU and not so great memory bandwidth, then adding even more CPU power won't help much, but instead, investing in faster clocked RAM will do the trick. These days, if you have powerful CPU, you can probably go with 2000MHz or more DDR3 chips instead of standard 1600 or 1333MHz and gain significantly more peformance from your WRF. Check first out if your motherboard supports those memory overspeeds.

pattim
Posts: 199
Joined: Sun Jun 24, 2012 8:42 pm
Location: Los Angeles, CA, USA

Re: Memory bandwidth as bottleneck for WRF performance

Post by pattim » Wed Jul 04, 2012 1:29 am

I have seen this also on a FDTD code I was running. They claimed it ran *so much* better on Xeons than Opterons, but upon looking closer, the Xeon system they were using as an example was one of the first that used DDR3, and back then AMD Opterons still used DDR2. So I thing memory speed is a crucial unsung hero in HPC.

Post Reply