Base#
A simple fullyconnected layer. 

A simple graph convolutional operator where the message function is a simple linear projection and aggregation a simple average. 

Learns a standard temporal convolutional filter. 

 class Dense(input_size, output_size, activation='linear', dropout=0.0, bias=True)[source]#
A simple fullyconnected layer.
 Parameters:
 forward(x)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class GraphConv(input_size: int, output_size: int, bias: bool = True, asymmetric_norm: bool = True, root_weight: bool = True, activation='linear', cached: bool = False, **kwargs)[source]#
A simple graph convolutional operator where the message function is a simple linear projection and aggregation a simple average. In other terms:
\[\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{1} \mathbf{A} \mathbf{X} \boldsymbol{\Theta}\] Parameters:
input_size (int) – Size of the input features.
output_size (int) – Size of each output features.
add_self_loops (bool, optional) – If set to
True
, will add selfloops to the input graph. (default:False
)bias (bool, optional) – If set to
False
, the layer will not learn an additive bias. (default:True
)**kwargs (optional) – Additional arguments of
torch_geometric.nn.conv.MessagePassing
.
 message(x_j: Tensor, edge_weight) Tensor [source]#
Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in
edge_index
. This function can take any argument as input which was initially passed topropagate()
. Furthermore, tensors passed topropagate()
can be mapped to the respective nodes \(i\) and \(j\) by appending_i
or_j
to the variable name, .e.g.x_i
andx_j
.
 class TemporalConv2d(input_channels, output_channels, kernel_size, dilation=1, stride=1, bias=True, padding=0, causal_pad=True, weight_norm=False, channel_last=False)[source]#
Learns a standard temporal convolutional filter.
 Parameters:
input_channels (int) – Input size.
output_channels (int) – Output size.
kernel_size (int) – Size of the convolution kernel.
dilation (int, optional) – Spacing between kernel elements.
stride (int, optional) – Stride of the convolution.
bias (bool, optional) – Whether to add a learnable bias to the output of the convolution.
padding (int or tuple, optional) – Padding of the input. Used only of causal_pad is False.
causal_pad (bool, optional) – Whether to pad the input as to preserve causality.
weight_norm (bool, optional) – Wheter to apply weight normalization to the parameters of the filter.
 class GatedTemporalConv2d(input_channels, output_channels, kernel_size, dilation=1, stride=1, bias=True, padding=0, causal_pad=True, weight_norm=False, channel_last=False)[source]#
 class AttentionEncoder(embed_dim, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, add_positional_encoding: bool = False, bias: bool = True, activation: Optional[str] = None)[source]#
 forward(query: Tensor, key: Optional[Tensor] = None, value: Optional[Tensor] = None)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class MultiHeadAttention(embed_dim, heads, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, axis='steps', dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, device=None, dtype=None, causal=False)[source]#
 forward(query: Tensor, key: Optional[Tensor] = None, value: Optional[Tensor] = None, key_padding_mask: Optional[Tensor] = None, need_weights: bool = True, attn_mask: Optional[Tensor] = None)[source]#
 Parameters:
query – map a query and a set of keyvalue pairs to an output. See “Attention Is All You Need” for more details.
key – map a query and a set of keyvalue pairs to an output. See “Attention Is All You Need” for more details.
value – map a query and a set of keyvalue pairs to an output. See “Attention Is All You Need” for more details.
key_padding_mask – if provided, specified padding elements in the key will be ignored by the attention. When given a binary mask and a value is True, the corresponding value on the attention layer will be ignored. When given a byte mask and a value is nonzero, the corresponding value on the attention layer will be ignored
need_weights – output attn_output_weights.
attn_mask – 2D or 3D mask that prevents attention to certain positions. A 2D mask will be broadcasted for all the batches while a 3D mask allows to specify a different mask for the entries of each batch.
 Shapes for inputs:
query: \((L, N, E)\) where L is the target sequence length, N is the batch size, E is the embedding dimension. \((N, L, E)\) if
batch_first
isTrue
.key: \((S, N, E)\), where S is the source sequence length, N is the batch size, E is the embedding dimension. \((N, S, E)\) if
batch_first
isTrue
.value: \((S, N, E)\) where S is the source sequence length, N is the batch size, E is the embedding dimension. \((N, S, E)\) if
batch_first
isTrue
.key_padding_mask: \((N, S)\) where N is the batch size, S is the source sequence length. If a ByteTensor is provided, the nonzero positions will be ignored while the position with the zero positions will be unchanged. If a BoolTensor is provided, the positions with the value of
True
will be ignored while the position with the value ofFalse
will be unchanged.attn_mask: if a 2D mask: \((L, S)\) where L is the target sequence length, S is the source sequence length.
If a 3D mask: \((N\cdot\text{num\_heads}, L, S)\) where N is the batch size, L is the target sequence length, S is the source sequence length.
attn_mask
ensure that position i is allowed to attend the unmasked positions. If a ByteTensor is provided, the nonzero positions are not allowed to attend while the zero positions will be unchanged. If a BoolTensor is provided, positions withTrue
is not allowed to attend whileFalse
values will be unchanged. If a FloatTensor is provided, it will be added to the attention weight.
 Shapes for outputs:
attn_output: \((L, N, E)\) where L is the target sequence length, N is the batch size, E is the embedding dimension. \((N, L, E)\) if
batch_first
isTrue
.attn_output_weights: \((N, L, S)\) where N is the batch size, L is the target sequence length, S is the source sequence length.
 class CausalLinearAttention(embed_dim, heads, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, out_channels: Optional[int] = None, concat: bool = True, dim: int = 1)[source]#
 forward(query: Tensor, key: Optional[Tensor] = None, value: Optional[Tensor] = None)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.