API Reference

mpi4torch.JoinDummies(loopthrough: torch.Tensor, dummies: List[torch.Tensor]) → torch.Tensor

This function joins multiple dummy dependencies with the DAG.

From the perspective of the forward pass, this function is mostly a no-op, since it simply loops through its first argument, and discards the dummies argument.

However, for the backward pass, the AD engine still considers the dummies as actual dependencies. The main use of this function is thus to manually encode dependencies that the AD engine does not see on its own. See also the introductory text in the Implications for mpi4torch section on how to use this function.

Parameters:

loopthrough – Variable to pass through.
dummies – List of tensors that are added as dummy dependencies to the DAG.

Returns:

Tensor that is a shallow copy of loopthrough, but whose grad_fn is JoinDummiesBackward.

Return type:

torch.tensor

mpi4torch.JoinDummiesHandle(handle: mpi4torch.WaitHandle, dummies: List[torch.Tensor]) → mpi4torch.WaitHandle

This function has the same purpose as JoinDummies(), but accepts mpi4torch.WaitHandle as its first argument.

Parameters:

handle – mpi4torch.WaitHandle to pass through.
dummies – List of tensors that are added as dummy dependencies to the DAG.

Returns:

A wait handle with the additional dummy dependenices added.

Return type:

mpi4torch.WaitHandle

mpi4torch.MPI_MAX

mpi4torch.MPI_MIN

mpi4torch.MPI_SUM

mpi4torch.MPI_PROD

mpi4torch.MPI_LAND

mpi4torch.MPI_BAND

mpi4torch.MPI_LOR

mpi4torch.MPI_BOR

mpi4torch.MPI_LXOR

mpi4torch.MPI_BXOR

mpi4torch.MPI_MINLOC

mpi4torch.MPI_MAXLOC

mpi4torch.COMM_WORLD = <mpi4torch.MPI_Communicator object>: World communicator MPI_COMM_WORLD.

class mpi4torch.MPI_Communicator(comm: <torch.ScriptClass object at 0x7f59c010af30>)

MPI communicator wrapper class

The only supported ways to construct an MPI_Communicator are currently either through mpi4torch.COMM_WORLD or mpi4torch.comm_from_mpi4py().

Note

All methods with an underscore suffix are in-place operations.

Allgather(tensor: Tensor, gatheraxis: int) → Tensor

Allreduce(tensor: Tensor, op: int) → Tensor

Combines values from all processes and distributes the result back to all processes.

The combination operation is performed element-wise on the tensor.

This is the wrapper function of MPI_Allreduce.

Parameters:

tensor – torch.Tensor that shall be combined. It needs to have the same shape on all processes.
op – Operation to combine the results. Only supported operations are mpi4torch.MPI_MAX, mpi4torch.MPI_MIN, mpi4torch.MPI_SUM, mpi4torch.MPI_PROD, mpi4torch.MPI_LAND, mpi4torch.MPI_BAND, mpi4torch.MPI_LOR, mpi4torch.MPI_BOR, mpi4torch.MPI_LXOR, mpi4torch.MPI_BXOR, mpi4torch.MPI_MINLOC, mpi4torch.MPI_MAXLOC

Returns:

Combined tensor of the same shape as the input tensor.

Return type:

torch.Tensor

Note

Only mpi4torch.MPI_SUM is supported in the backwards pass at the moment.

Alltoall(tensor: Tensor, gatheraxis: int, scatteraxis: int, numelem: int) → Tensor

Bcast_(tensor: Tensor, root: int) → Tensor

Broadcasts a tensor from the root process to all other processes.

This is an in-place operation.

This is the wrapper function of MPI_Bcast.

Parameters:

tensor – torch.Tensor that shall be broadcasted. The tensor needs to have the same shape on all processes, since it is an in-place operation.
root – The root process, whose tensor shall be broadcasted to the others.

Returns:

For rank == root this is the same as the input tensor. For all other processes this is the input tensor filled with the content from the root process.

Return type:

torch.Tensor

Gather(tensor: Tensor, gatheraxis: int, root: int) → Tensor

Irecv(tensor: Tensor, source: int, tag: int) → WaitHandle

Isend(tensor: Tensor, dest: int, tag: int) → WaitHandle

Recv(tensor: Tensor, source: int, tag: int) → Tensor

Reduce_(tensor: Tensor, op: int, root: int) → Tensor

Reduces multiple tensors of the same shape, scattered over all processes, to a single tensor of the same shape stored on the root process.

The combination operation is performed element-wise on the tensor.

This is an in-place operation.

This is the wrapper function of MPI_Reduce.

Parameters:

tensor – torch.Tensor that shall be reduced. The tensor needs to have the same shape on all processes, since it is an element-wise operation.
op – Operation to combine the results. Only supported operations are mpi4torch.MPI_MAX, mpi4torch.MPI_MIN, mpi4torch.MPI_SUM, mpi4torch.MPI_PROD, mpi4torch.MPI_LAND, mpi4torch.MPI_BAND, mpi4torch.MPI_LOR, mpi4torch.MPI_BOR, mpi4torch.MPI_LXOR, mpi4torch.MPI_BXOR, mpi4torch.MPI_MINLOC, mpi4torch.MPI_MAXLOC
root – The root process, where the resulting tensor shall be gathered.

Returns:

For rank == root the result stores the reduced tensor. For all other processes the content of the resulting tensor is undefined, with the exception that the result shall still suffice as input for the second argument of mpi4torch.JoinDummies().

Return type:

torch.Tensor

Note

Only mpi4torch.MPI_SUM is supported in the backwards pass at the moment.

Scatter(tensor: Tensor, scatteraxis: int, numelem: int, root: int) → Tensor

Send(tensor: Tensor, dest: int, tag: int) → Tensor

Wait(waithandle: WaitHandle) → Tensor

property rank: int

The rank or identification number of the local process with respect to this communicator.

The processes participating in a communicator are consecutively given ranks in the interval [0, mpi4torch.MPI_Communicator.size - 1].

property size: int: The size of the MPI communicator, i.e. the number of processes involved.

class mpi4torch.WaitHandle(raw_handle: List[Tensor])

Class representing a wait handle, as they are returned from one of the non-blocking MPI calls.

property dummy: A dummy variable that allows for the usage of the WaitHandle as one of the second arguments of mpi4torch.JoinDummies() and mpi4torch.JoinDummiesHandle().

mpi4torch.comm_from_mpi4py(comm) → MPI_Communicator: Converts an mpi4py communicator to an mpi4torch.MPI_Communicator.

mpi4torch.deactivate_cuda_aware_mpi_support() → None

Deactivates the CUDA-aware MPI support.

Calling this function forces mpi4torch to first move any tensor into main memory before calling a MPI function on it, and then to move the result back into device memory after the MPI call has finished.

Note

This function is useful in situations in which MPI advertises CUDA-awareness but the functionality is not really supported by the backend.