AMD Bulldozer Disappoints

Looking for new hardware to run WRF? Intel or AMD? Check this forum.
pattim
Posts: 199
Joined: Sun Jun 24, 2012 8:42 pm
Location: Los Angeles, CA, USA

Re: AMD Bulldozer Disappoints

Post by pattim » Mon Jun 25, 2012 8:05 pm

Zoyx wrote:I exchanged emails with Bob about the Open64 compilers back when the Bulldozers first came out, here are the tips he passed along...
I took a look at the site and it appears you could use the AMD compilers; however, I have a few comments:

1. The EMS 64-bit binaries are fully optimized for AMD and INTEL processors. That's not to say that
you can not squeeze out some additional performance using AMD's compiler but it will require
quite a bit of time on your part.

2. The AMD compiler system is relatively new so expect to have some setbacks along the way.

3. Binaries built from the off-the-shelf WRF release will not work correctly with the EMS. You will
need to replace a number of routines in WPS and WRF with those available in the
wrfems/util/wrfems directory. After that you will have the same version that is provided with the
EMS.


4. You will also have to build netCDF and MPICH2 with the same AMD compilers. You may also need
to build libz, libjasper and libpng as well although I think the RPM versions will work. I compile
everything from scratch with the same PGI compilers to be safe.
I am not sure I really believe the compilers make a huge difference, but we know that HT3 and memory clock speeds can. I think the biggest difference is in properly handling the NUMA, at least for Opterons. I don't think there are any compiler settings to do that. I think that's handled by linux system calls and clever mapping of the domains to specific processors, knowing the Opteron HT3 topology - and that throws it back to the level of the FORTRAN code (specifically assigning subdomain locations - I think that's in the MPI code which is why OpenMPI might be better than MPICH, which I think predates modern CPUs). It would be different for 2P and 4P machines. I guess INTEL saves all this because they have a better inter-cpu snoop algorithm (or so I've read).

Come to think of it - maybe this thread is better asked on an MPI** forum? I think all EMS does is issue "mpirun" commands (and maybe configure them first). I know OpenMPI has switches to bind memory to CPUs.

Post Reply