Considerations for Parallel CFD Enhancements on SGI ccNUMA and Cluster Architectures

Mark Kremenetsky, PhD, Principal Scientist, CFD Applications
SGI Mountain View, CA, 650.933.2304
Tom Tysinger, PhD, Principal Engineer
Fluent Inc, Lebanon, NH, 603.643.2600
Stan Posey, HPC Applications Market Development
SGI Mountain View, CA, 650.933.1689


The maturity of Computational Fluid Dynamics (CFD) methods and the increasing computational power of contemporary computers has enabled industry to incorporate CFD technology in several stages of design processes. As the application of the CFD technology grows from component level analysis to system level, the complexity and the size of models increase continuously. Successful simulation requires synergy between CAD, grid generation and solvers.

The requirement for shorter design cycles has put severe limitations on the turnaround time of the numerical simulations. The time required for (1) mesh generation for computational domains of complex geometry and (2) obtaining numerical solutions for flows with complex physics has traditionally been the pacing item for CFD applications. Unstructured grid generation techniques and parallel algorithms have been instrumental in making such calculations affordable. Availability of these algorithms in commercial packages has grown in the last few years and parallel performance has become a very important factor in the selection of such methods for production work.

Although extensive research has been devoted in determining the optimum parallel paradigm, in practice the best parallel performance can be obtained only when algorithm and paradigms take into consideration the architectural design of the target computer system they are intended for. This paper addresses the issues related to efficient performance of the commercial CFD software FLUENT on a cache coherent Non Uniform Memory (ccNUMA) Architecture. Also presented are results from implementation of FLUENT on cluster systems of workstation for both the Linux and SGI IRIX operating systems. Issues related to performance of the message passing system and memory-processor affinity are investigated for efficient scalability of FLUENT when applied to a variety of industrial problems.