The Asynchronous Partitioned Global Address Space (APGAS) programming model enables programmers to express the parallelism and locality necessary for high performance scientific applications on extreme-scale systems. We used the well-known LULESH hydrodynamics proxy application to explore the performance and programmability of the APGAS model as expressed in the X10 programming language. By extending previous work on e cient exchange of ghost regions in stencil codes, and by taking advantage of enhanced runtime support for collective communications, the X10 implementation of LULESH exhibits superior performance to a reference MPI code when scaling to many nodes of a large HPC system. Runtime support for local parallel iteration allows the e cient use of all cores within a node. The X10 implementation was around 10% faster than the reference version using C++/OpenMP/MPI, across a range of 125 to 4,096 places (750 to 24,576 cores). Our improvements to the X10 runtime have broad applicability to other scientific application codes.
By: Josh Milthorpe, David Grove, Benjamin Herta, Olivier Tardieu
Published in: RC25555 in 2015
LIMITED DISTRIBUTION NOTICE:
This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.
Questions about this service can be mailed to firstname.lastname@example.org .