| Exploiting Heterogeneous Parallelism |
Our compiler, referred to as a 'single source' compiler, is a parallelizing, simdizing compiler, which generates from a single C or Fortran input source file, multiple binaries targetting the PPE and SPE processing elements. Our goal in this compiler is to generate highly optimized code for the multiple levels of parallelism while providing an abstraction of the underlying architectural intricacies, thus allowing the user to develop applications for a parallel architecture with a single shared memory image.
At the core of our compilation strategy is our technique for abstracting the small local memories of the SPE.Each SPE has a 256k local memory which is used for both data and instructions. An SPE can directly access only its local store, requiring a DMA transfer whenever it reads or writes locations in the shared system memory. This imposes significant burden on the programmer, especially for large programs accessing significant amounts of data. Our compiler-controlled software-cache, memory hierarchy optimizations and code partitioning techniques assume all data resides in shared system memory, and enables automatic transfer of code and data while preserving coherence across all the local SPE memories and system memory. This infrastructure provides the underpinning for enabling parallelism across the Cell processing elements. Our current compiler enables this via OpenMP Pragmas, but our techniques will easily support the existing auto-parallelization techniques in the compiler framework. Other parallelization paradigms such as UPC could be developed on this framework.
