Cache Optimization for Structured and Unstructured Grid Multigrid

Craig C. Douglas
and Jonathan Hu
University of Kentucky
325 McV ey Hall - CCS
Lexington, KY 40506-0045, USA

Markus Kowarschik
and Ulrich Rüde
Lehrstuhl fur Systemsimulation (IMMD 10)
Institut fur Informatik
Universität Erlangen-Nurnberg
Martensstrasse 3, D-91058 Erlangen, Germany

Christian Weiss
Lehrstuhl fur Rechnertechnik und Rechnerorganisation (LRR-TUM)
Institut fur Informatik
Technische Universität München
D-80290 München, Germany


Many current computer designs employ caches and a hierarchical memory architecture. The speed of a code depends on how well the cache structure is exploited. The number of cache misses provides a better measure for comparing algorithms than the number of multiplies. In this paper, suitable blocking strategies for both structured and unstructured grids will be introduced. They improve the cache usage without changing the underlying algorithm. In particular, bitwise compatibility is guaranteed between the standard and the high performance implementations ofthe algorithms. This is illustrated by comparisons for various multigrid algorithms on a selection of different computers for problems in two and three dimensions. The code re-structuring can yield performance improvements of factors of 2-5. This allows the modified codes to achieve a much higher percentage of the peak performance of the CPU than is usually observed with standard implementations.

Key words. Computer architectures,iterative algorithms, multigrid, high performance computing,cache.

AMS subject classifications. 65M55, 65N55, 65F10, 68-04, 65Y99