The Message Passing Library (MPL)#
Using the C++ MPL (Message Passing Library) in conjunction with MPI (Message Passing Interface) provides several significant benefits for developers working on parallel computing and distributed systems. Here are a few key advantages:
Type Safety and Modern C++ Features: One of the primary benefits of using MPL with MPI is the enhanced type safety it offers. MPL is designed with modern C++ standards in mind, utilizing features such as templates, type inference, and smart pointers. This helps ensure that message-passing operations are type-safe, reducing the likelihood of runtime errors due to type mismatches. The ability to leverage modern C++ features such as lambdas, auto keyword, and constexpr also allows developers to write more concise and maintainable code, making use of idiomatic C++ patterns that are both robust and easier to understand.
Improved Abstraction and Usability: MPL provides a higher-level abstraction over the traditional MPI C/C++ bindings. By encapsulating common MPI operations into C++ classes and functions, MPL simplifies the syntax and reduces boilerplate code, leading to clearer and more readable codebases. Developers can focus more on the problem they are solving rather than the intricacies of MPI. This abstraction layer not only makes it easier to perform common tasks like point-to-point communication, broadcasting, and collective operations but also allows for the integration of complex C++ data types seamlessly.
Enhanced Productivity and Reduced Development Time: By simplifying MPI’s verbose and often error-prone API, MPL significantly reduces the learning curve for new developers and speeds up the development process for seasoned programmers. MPL’s design encourages best practices by providing a more intuitive API, which reduces the likelihood of common MPI-related bugs (such as incorrect buffer sizes, improper type handling, and synchronization issues). The result is a shorter development cycle, quicker debugging, and faster time-to-market for scientific and engineering applications that rely on parallel computing.
Scalability and Performance Optimization: MPL is optimized to work efficiently with MPI’s underlying communication mechanisms. It takes advantage of MPI’s high-performance, low-latency communication capabilities while providing additional optimizations that are facilitated by modern C++ compilers. This means that applications developed using MPL can scale efficiently across large numbers of processors, making it suitable for high-performance computing (HPC) environments. Additionally, MPL’s API is designed to minimize the overhead introduced by its abstractions, ensuring that applications can achieve near-native MPI performance.
Portability and Compatibility: MPL is designed to be fully compatible with the MPI standard, ensuring that existing MPI-based applications can be easily ported to use MPL without significant changes to the underlying code. This backward compatibility makes MPL an attractive option for developers looking to modernize their codebases while maintaining the ability to run on various platforms that support MPI. Moreover, MPL’s abstraction layer can help mitigate platform-specific quirks and differences in MPI implementations, further enhancing portability.
In summary, using the C++ MPL library with MPI provides a combination of modern C++ features, improved type safety, usability, enhanced productivity, and performance optimization. These benefits make it a powerful tool for developers working on parallel computing applications, helping to write more efficient, maintainable, and scalable code.
Examples#
Hello, World!#
To create a “Hello, World” style program in C++ that utilizes the Message Passing Interface (MPI) for parallelism with the Message Passing Library (MPL), you’ll first need to ensure that you have MPL set up in your environment. MPL is a modern C++ wrapper for MPI, which simplifies the process of using MPI by providing a more C++-like interface.
Here’s a simple example of a “Hello, World” program using MPL and MPI:
Prerequisites#
MPI Library: You need an MPI implementation such as OpenMPI or MPICH installed on your system.
MPL: You need to have MPL installed. MPL is header-only, so you can include it in your project without a complicated installation process. You can find it on MPL GitHub.
Tip
On M3, module load gcc/11.2.0 openmpi
.
Hello World using MPL with MPI#
#include <iostream>
#include <mpl/mpl.hpp>
int main() {
// Initialize MPL communicator
const mpl::communicator &comm_world = mpl::environment::comm_world();
// Get the rank (ID) of the process
int rank = comm_world.rank();
// Get the total number of processes
int size = comm_world.size();
// Print Hello World from each process
std::cout << "Hello, World from process " << rank << " out of " << size << " processes!" << std::endl;
return 0;
}
Explanation#
Including MPL:
#include <mpl/mpl.hpp>
includes the MPL header file, which provides the necessary classes and functions to interact with MPI.Initialize Communicator:
const mpl::communicator &comm_world = mpl::environment::comm_world();
initializes the default communicator (similar toMPI_COMM_WORLD
in standard MPI).Rank and Size: -
comm_world.rank()
: This function returns the rank (or ID) of the current process. -comm_world.size()
: This function returns the total number of processes participating in the communicator.Printing: Each process prints its rank and the total number of processes, allowing us to see parallel execution.
Compilation and Execution#
To compile and run this program, you typically use an MPI compiler wrapper like
mpic++
. Additionally you can use CMake to automatically download the
dependencies.
Compile the Program:
mpic++ -o hello_mpl hello_mpl.cpp
mkdir build cd build cmake .. cmake --build .
cmake_minimum_required(VERSION 3.14) project(HelloMPLMPI LANGUAGES CXX) # Specify C++ standard set(CMAKE_CXX_STANDARD 17) set(CMAKE_CXX_STANDARD_REQUIRED True) # Find MPI find_package(MPI REQUIRED) # Use FetchContent to download MPL if not available include(FetchContent) FetchContent_Declare( MPL GIT_REPOSITORY https://github.com/rabauke/mpl.git GIT_TAG master ) FetchContent_MakeAvailable(MPL) # Add the executable add_executable(hello_mpl hello_mpl.cpp) # Include MPL headers target_include_directories(hello_mpl PRIVATE ${mpl_SOURCE_DIR}) # Link against MPI target_link_libraries(hello_mpl PRIVATE MPI::MPI_CXX)
Run the Program: Specify the number of processes using
mpirun
ormpiexec
.mpirun -np 4 ./hello_mpl
This command will start 4 processes, and you should see output from each process indicating its rank.
Example Output#
If you run the program with 4 processes, you might see something like:
Hello, World from process 0 out of 4 processes!
Hello, World from process 1 out of 4 processes!
Hello, World from process 2 out of 4 processes!
Hello, World from process 3 out of 4 processes!
Notes#
Ensure that the MPL headers are correctly included in your project path or specified using the include flag during compilation.
The order of output may vary as processes execute independently.
This simple program demonstrates how to initialize an MPI environment using MPL, obtain information about each process, and print output in a parallelized manner.
Matrix-Matrix Multiplication#
To create a C++ program using MPL (Message Passing Library) for MPI-based parallel matrix-matrix multiplication, we need to distribute the work across multiple processes. Each process will handle a portion of the matrix multiplication, and we’ll use MPI to manage communication between the processes.
For simplicity, consider square matrices of size N x N
. Each process will
compute a part of the result matrix. We’ll use row-wise decomposition, where
each process is responsible for a specific set of rows in the result matrix.
Steps to Implement the Program:#
Distribute the Matrices: We’ll distribute parts of matrices
A
andB
among different MPI processes.Perform Local Computation: Each process computes its part of the result matrix
C
.Gather Results: Use MPI to gather all parts of the result matrix
C
from different processes.
C++ Code Using MPL for MPI-Based Parallel Matrix-Matrix Multiplication#
Here’s a C++ program that performs parallel matrix-matrix multiplication using MPL:
#include <iostream>
#include <vector>
#include <mpl/mpl.hpp>
void print_matrix(const std::vector<std::vector<int>> &matrix, const std::string &name) {
std::cout << name << ":\n";
for (const auto &row : matrix) {
for (const auto &elem : row) {
std::cout << elem << " ";
}
std::cout << "\n";
}
}
int main() {
const mpl::communicator &comm_world = mpl::environment::comm_world();
int rank = comm_world.rank();
int size = comm_world.size();
const int N = 4; // Size of the matrix (N x N)
std::vector<std::vector<int>> A(N, std::vector<int>(N));
std::vector<std::vector<int>> B(N, std::vector<int>(N));
std::vector<std::vector<int>> C(N, std::vector<int>(N, 0)); // Result matrix
if (rank == 0) {
// Initialize matrices A and B
int value = 1;
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
A[i][j] = value;
B[i][j] = value;
value++;
}
}
print_matrix(A, "Matrix A");
print_matrix(B, "Matrix B");
}
// Broadcast matrix B to all processes
comm_world.bcast(0, B);
// Determine the number of rows each process will handle
int rows_per_process = N / size;
int remaining_rows = N % size;
int start_row = rank * rows_per_process;
int end_row = (rank + 1) * rows_per_process;
if (rank == size - 1) {
end_row += remaining_rows; // Last process may handle extra rows
}
// Each process computes its part of matrix C
for (int i = start_row; i < end_row; ++i) {
for (int j = 0; j < N; ++j) {
for (int k = 0; k < N; ++k) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
// Gather results from all processes to process 0
if (rank == 0) {
std::vector<std::vector<int>> full_C(N, std::vector<int>(N));
comm_world.gather(0, C.data(), full_C.data(), 0);
print_matrix(full_C, "Result Matrix C");
} else {
comm_world.gather(0, C.data(), 0);
}
return 0;
}
Key Parts of the Code#
Matrix Initialization: Matrix
A
andB
are initialized in process 0. The program uses integer values for simplicity.Broadcast Matrix
B
: The matrixB
is broadcasted from process 0 to all other processes usingcomm_world.bcast(B.data(), B.size(), 0);
. All processes receive matrixB
.Local Computation: Each process computes its assigned rows of matrix
C
using the local data.Gather Results: The results are gathered to process 0 using
comm_world.gather()
. The final result matrixC
is printed by process 0.
Compilation and Execution#
Compile:
mpic++ -o matrix_multiplication matrix_multiplication.cpp
Run (with 4 processes, for example):
mpirun -np 4 ./matrix_multiplication
Notes#
Ensure that MPL is correctly installed and included in the include path.
Adjust the matrix size
N
and number of processes to fit your system’s capability.This code assumes the number of rows (
N
) is divisible by the number of processes. IfN
is not divisible bysize
, the last process handles the remaining rows.