API Reference
- mpi4torch.JoinDummies(loopthrough: torch.Tensor, dummies: List[torch.Tensor]) torch.Tensor
This function joins multiple dummy dependencies with the DAG.
From the perspective of the forward pass, this function is mostly a no-op, since it simply loops through its first argument, and discards the
dummies
argument.However, for the backward pass, the AD engine still considers the
dummies
as actual dependencies. The main use of this function is thus to manually encode dependencies that the AD engine does not see on its own. See also the introductory text in the Implications for mpi4torch section on how to use this function.- Parameters:
loopthrough – Variable to pass through.
dummies – List of tensors that are added as dummy dependencies to the DAG.
- Returns:
Tensor that is a shallow copy of
loopthrough
, but whosegrad_fn
isJoinDummiesBackward
.- Return type:
torch.tensor
- mpi4torch.JoinDummiesHandle(handle: mpi4torch.WaitHandle, dummies: List[torch.Tensor]) mpi4torch.WaitHandle
This function has the same purpose as
JoinDummies()
, but acceptsmpi4torch.WaitHandle
as its first argument.- Parameters:
handle –
mpi4torch.WaitHandle
to pass through.dummies – List of tensors that are added as dummy dependencies to the DAG.
- Returns:
A wait handle with the additional dummy dependenices added.
- Return type:
- mpi4torch.MPI_MAX
- mpi4torch.MPI_MIN
- mpi4torch.MPI_SUM
- mpi4torch.MPI_PROD
- mpi4torch.MPI_LAND
- mpi4torch.MPI_BAND
- mpi4torch.MPI_LOR
- mpi4torch.MPI_BOR
- mpi4torch.MPI_LXOR
- mpi4torch.MPI_BXOR
- mpi4torch.MPI_MINLOC
- mpi4torch.MPI_MAXLOC
- mpi4torch.COMM_WORLD = <mpi4torch.MPI_Communicator object>
World communicator
MPI_COMM_WORLD
.
- class mpi4torch.MPI_Communicator(comm: <torch.ScriptClass object at 0x7f59c010af30>)
MPI communicator wrapper class
The only supported ways to construct an
MPI_Communicator
are currently either throughmpi4torch.COMM_WORLD
ormpi4torch.comm_from_mpi4py()
.Note
All methods with an underscore suffix are in-place operations.
- Allreduce(tensor: Tensor, op: int) Tensor
Combines values from all processes and distributes the result back to all processes.
The combination operation is performed element-wise on the tensor.
This is the wrapper function of MPI_Allreduce.
- Parameters:
tensor –
torch.Tensor
that shall be combined. It needs to have the same shape on all processes.op – Operation to combine the results. Only supported operations are
mpi4torch.MPI_MAX
,mpi4torch.MPI_MIN
,mpi4torch.MPI_SUM
,mpi4torch.MPI_PROD
,mpi4torch.MPI_LAND
,mpi4torch.MPI_BAND
,mpi4torch.MPI_LOR
,mpi4torch.MPI_BOR
,mpi4torch.MPI_LXOR
,mpi4torch.MPI_BXOR
,mpi4torch.MPI_MINLOC
,mpi4torch.MPI_MAXLOC
- Returns:
Combined tensor of the same shape as the input tensor.
- Return type:
Note
Only
mpi4torch.MPI_SUM
is supported in the backwards pass at the moment.
- Bcast_(tensor: Tensor, root: int) Tensor
Broadcasts a tensor from the root process to all other processes.
This is an in-place operation.
This is the wrapper function of MPI_Bcast.
- Parameters:
tensor –
torch.Tensor
that shall be broadcasted. The tensor needs to have the same shape on all processes, since it is an in-place operation.root – The root process, whose tensor shall be broadcasted to the others.
- Returns:
For rank == root this is the same as the input tensor. For all other processes this is the input tensor filled with the content from the root process.
- Return type:
- Irecv(tensor: Tensor, source: int, tag: int) WaitHandle
- Isend(tensor: Tensor, dest: int, tag: int) WaitHandle
- Reduce_(tensor: Tensor, op: int, root: int) Tensor
Reduces multiple tensors of the same shape, scattered over all processes, to a single tensor of the same shape stored on the root process.
The combination operation is performed element-wise on the tensor.
This is an in-place operation.
This is the wrapper function of MPI_Reduce.
- Parameters:
tensor –
torch.Tensor
that shall be reduced. The tensor needs to have the same shape on all processes, since it is an element-wise operation.op – Operation to combine the results. Only supported operations are
mpi4torch.MPI_MAX
,mpi4torch.MPI_MIN
,mpi4torch.MPI_SUM
,mpi4torch.MPI_PROD
,mpi4torch.MPI_LAND
,mpi4torch.MPI_BAND
,mpi4torch.MPI_LOR
,mpi4torch.MPI_BOR
,mpi4torch.MPI_LXOR
,mpi4torch.MPI_BXOR
,mpi4torch.MPI_MINLOC
,mpi4torch.MPI_MAXLOC
root – The root process, where the resulting tensor shall be gathered.
- Returns:
For rank == root the result stores the reduced tensor. For all other processes the content of the resulting tensor is undefined, with the exception that the result shall still suffice as input for the second argument of
mpi4torch.JoinDummies()
.- Return type:
Note
Only
mpi4torch.MPI_SUM
is supported in the backwards pass at the moment.
- Wait(waithandle: WaitHandle) Tensor
- property rank: int
The rank or identification number of the local process with respect to this communicator.
The processes participating in a communicator are consecutively given ranks in the interval [0,
mpi4torch.MPI_Communicator.size
- 1].
- class mpi4torch.WaitHandle(raw_handle: List[Tensor])
Class representing a wait handle, as they are returned from one of the non-blocking MPI calls.
- property dummy
A dummy variable that allows for the usage of the
WaitHandle
as one of the second arguments ofmpi4torch.JoinDummies()
andmpi4torch.JoinDummiesHandle()
.
- mpi4torch.comm_from_mpi4py(comm) MPI_Communicator
Converts an
mpi4py
communicator to anmpi4torch.MPI_Communicator
.
- mpi4torch.deactivate_cuda_aware_mpi_support() None
Deactivates the CUDA-aware MPI support.
Calling this function forces mpi4torch to first move any tensor into main memory before calling a MPI function on it, and then to move the result back into device memory after the MPI call has finished.
Note
This function is useful in situations in which MPI advertises CUDA-awareness but the functionality is not really supported by the backend.