Running COMSOL in Parallel on Clusters
You control the options for running COMSOL Multiphysics on a cluster from the Study node in the Model Builder. To enable the cluster computing feature, click the Show More Options button () and select Batch and Cluster in the Show More Options dialog box. Then in the Model Builder, right-click a Study node and select Cluster Computing (). You must have a floating network license (FNL) to run COMSOL Multiphysics in distributed-memory parallel mode.
Cluster Computing and Cluster License Handling. See also Cluster Installation in the COMSOL Multiphysics Installation Guide for information about cluster installations.
The Micromixer — Cluster Version and Joule Heating of a Microactuator — Distributed Parameter Version models show how to set up a model for running COMSOL Multiphysics in parallel on a cluster: in one case for decreased per node memory usage of a large fluid flow model using distributed memory computing, and in the second case for speeding up solution of a distributed parameter sweep.
The following sections describe how to run cluster jobs on Windows and Linux.
Running a Cluster Job on Windows
This section outlines the main steps for running a cluster job on Windows®. Before you start, check that the installation of COMSOL Multiphysics follows these guidelines:
If you work on a desktop PC, which is recommended, install the COMSOL software on that local PC. Also install Microsoft® HPC Pack on the desktop PC before you start. Microsoft® HPC Pack makes it possible to access the cluster from your workstations. The latest version can be downloaded from Microsoft. An alternative is to log in to the cluster via Remote Desktop, for example.
Running Cluster Jobs in the COMSOL Desktop
To run a cluster job using COMSOL Desktop, follow these steps:
1
2
In a complete model, right-click the Study node and select Cluster Computing ().
3
In the Settings window for Cluster Computing, select Microsoft® HPC Pack from the Scheduler type list (this is typically a preference setting). This provides access to all parameters that you need for communication with the cluster.
4
5
You can define more details in the Settings window for the Cluster Computing node ().under Job Configurations (). When you submit a job, COMSOL Multiphysics adds a Cluster Computing node. If you want to change or inspect its settings before submitting the first job, right-click the Study node () and select Show Default Solver ().
6
After submitting the job to the cluster, you can monitor the progress in the Progress window and the Log window. The Progress window shows the progress of the batch data and external processes, and the Log window contains a log with information about the solver operations for each parameter in a parametric sweep, for example. You can also get details about a cluster job in the Windows Job Manager, which is available in the HPC Pack.
Running Cluster Jobs from the Command Line
You can do the same cluster simulation from the command line using, for example, a scheduler script. The Cluster Computing node is not needed in this case.
This command launches a COMSOL MPI job on a cluster without involving the scheduler:
mpiexec -n 2 comsolclusterbatch.exe -inputfile comsoltest.mph -outputfile output.mph -batchlog b.log
You can use the command job submit to launch COMSOL Multiphysics to the Windows scheduler.
For additional information about running COMSOL Multiphysics on clusters from the command line, see the section COMSOL Cluster Commands for Windows.
Running a Cluster Job on Linux
Before you begin, make sure that the license manager is up and running and reachable from all compute nodes and the head node.
Running Cluster Jobs in the COMSOL Desktop
To run a cluster job using COMSOL Desktop, use the steps below. Skip Steps 1 and 3 if you are running COMSOL Multiphysics on the machine from where you want to start the cluster job.
1
Start the COMSOL Multiphysics server on the Linux® system with the command comsol mphserver. Notice the port number that is displayed (for example, COMSOL 6.2 started listening on port 2036).
2
3
From the File menu, choose COMSOL Multiphysics Server>Connect to Server (). In the Connect to Server dialog box, use the login credentials that you entered at the startup of the COMSOL Multiphysics server.
4
In a complete model, right-click the Study node and select Cluster Computing ().
5
In the Settings window for Cluster Computing, select General from the Scheduler type list for Linux clusters. This provides access to all parameters that you need for communication with the cluster.
6
7
You can define more details in the Settings window for the Cluster Computing node () under Job Configurations (). When you submit a job, COMSOL Multiphysics adds a Cluster Computing node. If you want to change or inspect its settings before submitting the first job, right-click the Study node () and select Show Default Solver ().
8
After submitting the job to the cluster, you can monitor the progress in the Progress window and the Log window. The Progress window shows the progress of the batch data and external processes, and the Log window contains a log with information about the solver operations for each parameter in a parametric sweep, for example.
Running Cluster Jobs from the Command Line
You can do the same cluster simulation from the command line using, for example, a scheduler script. The Cluster Computing node is not needed in this case.
This command launches a COMSOL MPI job on a cluster without involving the scheduler:
comsol -nn 2 batch -inputfile comsoltest.mph -outputfile output.mph -batchlog b.log
For additional information about running COMSOL Multiphysics on clusters from the command line, see the section COMSOL Cluster Commands for Linux.
Running Cluster Sweeps
A Cluster Sweep node that you add to a study can be viewed as combinations of a Parametric Sweep node with a Cluster Computing subnode under Job Configurations. Each parameter tuple results in a start of a Cluster Computing job. Therefore, most settings can be derived from these two. The main additions are the following options in the Batch Settings section:
Under Before sweep: the Clear meshes and Clear solutions check boxes that specify what to do before saving the model.
Under During sweep: The Synchronize solutions and Synchronize accumulated probe table check boxes that specify what data to insert back into the original model.
Under After sweep: The Output model to file check box that controls if the batch jobs save a model file or not.
Cluster License Handling
To run COMSOL Multiphysics simulations in distributed-memory parallel mode (on a cluster), you must have a floating network license (FNL). Look for the keyword CLUSTERNODE in your license file. When running a cluster job, COMSOL Multiphysics uses the following license components and license check-out procedures:
Stopping and outputting the solution running a Cluster Job or Batch Job
If you have a model running on a cluster in batch mode, for example, you can monitor the solver log. If you notice that the solver starts diverging, you may want to stop the solution process and output the available solutions. To do so, use one of the following commands, for example:
echo "Cancel" > outputfile.mph.status
to cancel the solution, or
echo "Stop 2" > outputfile.mph.status
to stop the solution on progress level 2, or
echo "Stop"
to stop immediately.
In those commands, replace outputfile with inputfile if a separate output file is not specified.
Specifying the Solution Storage Format
The COMSOL Multiphysics software supports three different solution storage formats on clusters. With the first format the entire solution is stored on all nodes. With the second format only a single node stores the entire solution. The third format is intended for storing the solution using a distributed storage on clusters. The savings in disk space when using a single node can be significant for time-dependent problems, eigenvalue problems, and for solutions using a parametric solver. To select the first format, go to the Solution node under Solver Configurations and choose Solution>Store on All Nodes. To select the second format, choose Solution>Store on a Single Node. Choose the third format, Solution>Store Solution Using Distributed Storage, to store the solution using a distributed storage on clusters, which can improve performance using parallel I/O.
Running FEAST in a Parallel MPI Mode
The main computational cost for the FEAST eigenvalue solver consists of assembling the Jacobian matrix and solving independent linear systems for each quadrature points along the complex contour. Parallelism is therefore central to speeding up the FEAST solver, especially for large problems with many integration points. The parallel version of FEAST distributes the quadrature points to different computational nodes, forms the Jacobian matrix, and solves the corresponding linear system independently and locally. The performance of the parallel version is dominated by the node who takes care of the maximum number of quadrature points. In order to get optimal use of the parallel code, define the number of nodes according to the number of quadrature points, such that the quadrature points can be equally distributed on the nodes. For example, if running a model in parallel with 3 nodes, then the performance of using 15 integration points for the eigenvalue solver would be a bit better than the performance of using 16 integration points. Furthermore, the COMSOL Multiphysics software supports OpenMP parallelism when assembling the matrix and solving the linear system. That is, with hybrid configurations (MPI plus OpenMP), the best performance on parallel platforms could be achieved. For small problems that fit well in the memory of a workstation, the hybrid configuration is always an interesting alternative. On a 16-core workstation, for example, it might be advantageous for 16 integration points to run COMSOL Multiphysics in hybrid mode with 4 nodes and 4 cores on each node; that is, specify -nn 4 -np 4.
To enable the MPI parallelism for the FEAST eigenvalue solver, you must run COMSOL Multiphysics in cluster mode (requires a floating network license). In the COMSOL Desktop, under Model Builder, click Show More Options and select the Study>Batch and Cluster check box. The option to select the Distribute linear system solution check box then appears for the FEAST eigenvalue solver in both the Eigenvalue and Eigenfrequency study nodes’ and the Eigenvalue Solver node’s Settings windows. Select it to run FEAST in parallel.