IBM Journal of Research and Development
IBM Skip to main content
  Home     Products & services     Support & downloads     My account  

  Select a country  
Journals Home  
  Systems Journal  
Journal of Research
and Development
    Current Issue  
    Recent Issues  
    Papers in Progress  
    Recent publications  
    Author's Guide  
  Contact Us  
  Related links:  
     IBM Research  

IBM Journal of Research and Development  
Volume 41, Number 6, Page 711 (1997)
Nontopical issue
  Full article: arrowHTML arrowPDF arrowASCII   arrowCopyright info


Improving the memory-system performance of sparse-matrix vector multiplication

by S. Toledo
Sparse-matrix vector multiplication is an important kernel that often runs inefficiently on superscalar RISC processors. This paper describes techniques that increase instruction-level parallelism and improve performance. The techniques include reordering to reduce cache misses (originally due to Das et al.), blocking to reduce load instructions, and prefetching to prevent multiple load-store units from stalling simultaneously. The techniques improve performance from about 40 MFLOPS (on a well-ordered matrix) to more than 100 MFLOPS on a 266-MFLOPS machine. The techniques are applicable to other superscalar RISC processors as well, and have improved performance on a Sun UltraSPARC™ I workstation, for example.
Related Subjects: Algorithms; Computation; Mathematical functions and techniques; Memory (computer) design and technology; Reduced-instruction-set computers (RISC)