Optimized Code Generation

Optimized SPE Code Generation  

Target: SPE (Cell)

In addition to the traditional compiler optimizations, we provide the following optimizations.

Scalar on SIMD Units. Most SPE instructions, including all memory instructions, are SIMD instructions operating on 128 bits of data at a time, . As a result, all scalar code in a program must be adapted in order to run correctly on the SPE's SIMD units. Most notable is that all scalar stores must be modified into a read-modify-write, as stores necessarily store 16 byte of data at once. Note that by performing aggressive register allocation and by allocating temporary scalars in distinct 16-byte memory locations, we can avoid most such overhead, although there is a storage use penalty for doing this.

Branch Optimization. The SPE's hardware has no dynamic branch prediction but has a special branch hint instruction, which indicates likely taken branches. The compiler inserts such hints when suitable.

Instruction Scheduling, Bundling, and Instruction Fetch Handling. The SPE's hardware supports dual issuing of independent instructions but has some code layout restrictions. Namely, to dual-issue, the even pipe instruction must be at an even-word PC address and the odd pipe instruction must be at an odd-word PC address. Code layout can be modified by adding nops when required. Another optimization by the compiler is explicit instruction fetch from the local memory, which improves memory intensive code as both the load/store and the instruction fetch share the same local memory port, with load/store instructions having priority over instruction fetch.