|Poor data locality is one of the main reasons for low performance in many modern programs. This problem concerns both the hardware architectures on one hand, and compiler and algorithms designers on the other hand. Application programs often exhibit poor data locality due to the fact that the programmers are unaware of the cache existence. Even if the programmers were aware of that, they do not have the appropriate tools to deal with caching effects. It is very hard for a programmer to analyze what memory layout or access pattern gives better data locality. The poor locality in applications programs sometimes causes high cache miss ratio and thus remarkable low performance. To overcome this problem, hardware architectures were designed to include sophisticated cache systems. Advanced cache replacement algorithms, trace cache systems and prefetching mechanisms are examples of this. This shows benefit in case of scientific code but is less effective on general-purpose programs.
In the past, compiler designers focused their work on data cache optimization in scientific code that includes matrices as a main data structure. Many optimizations were introduced in this field, such as blocking and software prefetching (which can use hardware support). Only in the last decade researches began to give some interest to general-purpose programs that use heap allocated objects and Recursive Data Structures such as linked lists, trees, graphs, etc. Optimizations for scientific code and matrix accesses improve the data locality by changing the access patters to the matrices (temporal locality). On the other hand, changing access patters of an RDS (Recursive Data Structure) is a very hard task, thus we propose to change the memory layout of these data structures to improve their data locality (spatial locality). Changing the layout of a data structure is done by changing its definition in the high level language. The problem is how to figure out the best layout of a data structure - the layout that gives the best data locality. Our proposed solution is to collect information about the accesses to the data structure and analyze them. To accomplish this we use a data profiling method which is fast, has a small memory footprint, and is simple to implement.
This work provides a method for generating recommendations for changing the data structure definition in order to have better data locality. It is divided into two parts: the first part provides the data access profiling information, and the second set of techniques analyzes the results of the profiling and generates recommendations for changing the structure definition. Performance results for expected improvements in data cache miss ratios and execution times will be presented.