Distributed-Memory Parallel COMSOL
Basic Cluster Concepts
The following terms occur frequently when describing the hardware for cluster computing and shared memory parallel computing:
Compute node: The compute nodes are where the distributed computing occurs. The COMSOL Multiphysics server resides in a compute node and communicates with other compute nodes using MPI (message-passing interface).
Host: The host is a hardware physical machine with a network adapter and unique network address. The host is part of the cluster. It is sometimes referred to as a physical node.
Core: The core is a processor core used in shared-memory parallelism by a computational node with multiple processors.
The number of used hosts and the number of computational nodes are usually the same. For some special problem types, like very small problems with many parameters, it might be beneficial to use more than one computational node on one host.
The Linux® and Windows® versions of COMSOL Multiphysics support a distributed memory mode. The distributed mode starts a number of computational nodes set by the user. Each computational node is a separate process running a COMSOL instance. A computational node is not the same as a physical node (computer), but they can coincide. When running in distributed mode, COMSOL Multiphysics uses MPI for communicating between the processes in the distributed environment.
Figure 21-1: Schematic of a cluster with 3 physical nodes (computers) with 4 processors each.
The distributed-memory mode can be combined with the ability of COMSOL Multiphysics to benefit from the shared-memory model. All modes that COMSOL Multiphysics can run in are able to use distributed-memory mode.
For the schematic in Figure 21-1, you can choose any number of computational nodes between 1 and 12. Each node, in turn, can use between 4 and 1 processors for shared memory. By default, COMSOL Multiphysics uses as many cores as are available on each physical node for shared-memory parallelism on Windows. This is suboptimal if the number of computational nodes is not the same as the number of physical nodes. It is recommended that you explicitly set the number of cores. For the schematic example, if you run 6 computational nodes, the optimal value for number of cores is 2. The number of cores used is 6·2 = 12.
For the same example, assuming you are the sole user of the system for the duration of the computation and that your problem requires a lot of memory, use 3 computational nodes with 4 shared memory cores each. If, on the other hand, your problem is small, use 12 computational nodes with 1 shared memory core each. This way you make the best use of shared-memory and distributed-memory parallelism for each problem.
You do not need a cluster to benefit from the ability to utilize the distributed-memory model. On a multiprocessor computer, you can use multiple computational nodes. This can be useful for small-sized parameter sweeps, for example. Make sure that the number of computational nodes times the number of cores does not exceed the number of available cores; otherwise performance deteriorates significantly.