I have strange behaviour on one computer. It is dual Xeon E5-2620 v3. Generally nice and fast but if I use adaptive time step (as I always want to) I have very strange inconsistency with performance.
Actual time step is just fine but what is not is time per timestep!
For example this is one rsl.out.0000 output at run start:
Code: Select all
Tile Strategy is not specified. Assuming 1D-Y WRF TILE 1 IS 1 IE 66 JS 1 JE 3 WRF TILE 2 IS 1 IE 66 JS 4 JE 6 WRF TILE 3 IS 1 IE 66 JS 7 JE 8 WRF TILE 4 IS 1 IE 66 JS 9 JE 11 WRF NUMBER OF TILES = 4 Timing for main (dt= 30.00): time 2017-08-11_00:00:30 on domain 1: 1.60754 elapsed seconds Timing for main (dt= 31.50): time 2017-08-11_00:01:01 on domain 1: 0.10679 elapsed seconds Timing for main (dt= 33.08): time 2017-08-11_00:01:34 on domain 1: 0.10773 elapsed seconds Timing for main (dt= 34.73): time 2017-08-11_00:02:09 on domain 1: 0.11009 elapsed seconds Timing for main (dt= 36.47): time 2017-08-11_00:02:45 on domain 1: 0.10913 elapsed seconds Timing for main (dt= 38.29): time 2017-08-11_00:03:24 on domain 1: 0.10669 elapsed seconds Timing for main (dt= 40.20): time 2017-08-11_00:04:04 on domain 1: 0.10780 elapsed seconds Timing for main (dt= 42.21): time 2017-08-11_00:04:46 on domain 1: 0.11021 elapsed seconds Timing for main (dt= 44.32): time 2017-08-11_00:05:30 on domain 1: 0.10720 elapsed seconds Timing for main (dt= 46.54): time 2017-08-11_00:06:17 on domain 1: 0.10789 elapsed seconds Timing for main (dt= 48.87): time 2017-08-11_00:07:06 on domain 1: 0.10854 elapsed seconds Timing for main (dt= 51.31): time 2017-08-11_00:07:57 on domain 1: 0.10734 elapsed seconds Timing for main (dt= 53.88): time 2017-08-11_00:08:51 on domain 1: 0.10791 elapsed seconds .....
Of course there is no obvious things like something else running on server, nothing runs... system load is exact = 22.0 (I use 22 of 24 available cores), nothing obvious happens but run slows down significantly.
I was banging my head against the wall several days. Until I tried fixed time step. Guess what? It runs at exact speed as expected, comparable performance from run to run is within seconds from each other of total running time, elapsed seconds per timestep are exactly the same up to 2nd or 3rd decimal place, and so on... no problems whatsoever.
And to mention - I tried 3.9, 3.8.1, same result. Get rid of everything but essentials from namelist, same result. Any other computer runs just fine with adaptive timestep, this one does not and as a result, adaptive timestep runs are usually slower than fixed. And I can't count that run will reliably finish in expected time frame at all
Anybody encountered anything similar?