A Highly Scalable Unstructured Agglomeration Multigrid Algorithm for Viscous Turbulent Flows

Dimitri J. Mavriplis

ICASE MS 403 NASA Langley Research Center Hampton, VA 23681


Abstract

The development and testing of a parallel unstructured agglomeration multigrid algorithm for steady-state aerodynamic flows is discussed. The agglomeration multigrid strategy uses a graph algorithm to construct the coarse multigrid levels from the given fine grid, similar to an algebraic multigrid approach, but operates directly on the non-linear system using the FAS approach. An isotropic agglomeration version which operates on an un-weighted graph as well as a directional agglomeration approach (operating on a weighted graph) have both been devised and tested. Two preconditioning techniques are employed to relieve stiffness due to high-aspect ratio grid cells in the boundary layer regions, and the stiffness associated with regions of nearly incompressible flow.

The preconditioned multigrid solver is parallelized using the domain-decomposition approach with the MPI message passing interface. The scalability and convergence rate of the resulting multigrid algorithm using the isotropic and directional agglomeration strategies is examined on an SGI ORIGIN 2000 and a Cray T3E. The scalability of the single (non-multigrid) algorithm is contrasted with that of the multigrid algorithm using a V and a W-cycle. For medium size problems involving several million grid points, near perfect scalability is obtained for the single grid algorithm, while only a slight drop-off in parallel efficiency is observed for the multigrid V and W-cycles, using up to 128 processors on the SGI Origin 2000, and up to 512 processors on the Cray T3E. For a large problem using 25 million grid points, good scalability is observed for the multigrid algorithm using up to 1450 processors on a Cray T3E, even when the coarsest grid level contains fewer points than the total number of processors.

At present, various preprocessing operations, which require little overall cpu time, such as the coarse level construction and the partitioning of these levels are performed sequentially. Future work will concentrate on parallelizing all of these aspects fo the solver on distributed memory parallel machines.