Base#

Dense

A simple fully-connected layer.

GraphConv

A simple graph convolutional operator where the message function is a simple linear projection and aggregation a simple average.

TemporalConv2d

Learns a standard temporal convolutional filter.

GatedTemporalConv2d

AttentionEncoder

MultiHeadAttention

CausalLinearAttention

class Dense(input_size, output_size, activation='linear', dropout=0.0, bias=True)[source]#

A simple fully-connected layer.

Parameters:
  • input_size (int) – Size of the input.

  • output_size (int) – Size of the output.

  • activation (str, optional) – Activation function.

  • dropout (float, optional) – Dropout rate.

  • bias (bool, optional) – Whether to use a bias.

forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class GraphConv(input_size: int, output_size: int, bias: bool = True, asymmetric_norm: bool = True, root_weight: bool = True, activation='linear', cached: bool = False, **kwargs)[source]#

A simple graph convolutional operator where the message function is a simple linear projection and aggregation a simple average. In other terms:

\[\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1} \mathbf{A} \mathbf{X} \boldsymbol{\Theta}\]
Parameters:
  • input_size (int) – Size of the input features.

  • output_size (int) – Size of each output features.

  • add_self_loops (bool, optional) – If set to True, will add self-loops to the input graph. (default: False)

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]#

Resets all learnable parameters of the module.

message(x_j: Tensor, edge_weight) Tensor[source]#

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

class TemporalConv2d(input_channels, output_channels, kernel_size, dilation=1, stride=1, bias=True, padding=0, causal_pad=True, weight_norm=False, channel_last=False)[source]#

Learns a standard temporal convolutional filter.

Parameters:
  • input_channels (int) – Input size.

  • output_channels (int) – Output size.

  • kernel_size (int) – Size of the convolution kernel.

  • dilation (int, optional) – Spacing between kernel elements.

  • stride (int, optional) – Stride of the convolution.

  • bias (bool, optional) – Whether to add a learnable bias to the output of the convolution.

  • padding (int or tuple, optional) – Padding of the input. Used only of causal_pad is False.

  • causal_pad (bool, optional) – Whether to pad the input as to preserve causality.

  • weight_norm (bool, optional) – Wheter to apply weight normalization to the parameters of the filter.

class GatedTemporalConv2d(input_channels, output_channels, kernel_size, dilation=1, stride=1, bias=True, padding=0, causal_pad=True, weight_norm=False, channel_last=False)[source]#
class AttentionEncoder(embed_dim, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, add_positional_encoding: bool = False, bias: bool = True, activation: Optional[str] = None)[source]#
forward(query: Tensor, key: Optional[Tensor] = None, value: Optional[Tensor] = None)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class MultiHeadAttention(embed_dim, heads, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, axis='steps', dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, device=None, dtype=None, causal=False)[source]#
forward(query: Tensor, key: Optional[Tensor] = None, value: Optional[Tensor] = None, key_padding_mask: Optional[Tensor] = None, need_weights: bool = True, attn_mask: Optional[Tensor] = None)[source]#
Parameters:
  • query – map a query and a set of key-value pairs to an output. See “Attention Is All You Need” for more details.

  • key – map a query and a set of key-value pairs to an output. See “Attention Is All You Need” for more details.

  • value – map a query and a set of key-value pairs to an output. See “Attention Is All You Need” for more details.

  • key_padding_mask – if provided, specified padding elements in the key will be ignored by the attention. When given a binary mask and a value is True, the corresponding value on the attention layer will be ignored. When given a byte mask and a value is non-zero, the corresponding value on the attention layer will be ignored

  • need_weights – output attn_output_weights.

  • attn_mask – 2D or 3D mask that prevents attention to certain positions. A 2D mask will be broadcasted for all the batches while a 3D mask allows to specify a different mask for the entries of each batch.

Shapes for inputs:
  • query: \((L, N, E)\) where L is the target sequence length, N is the batch size, E is the embedding dimension. \((N, L, E)\) if batch_first is True.

  • key: \((S, N, E)\), where S is the source sequence length, N is the batch size, E is the embedding dimension. \((N, S, E)\) if batch_first is True.

  • value: \((S, N, E)\) where S is the source sequence length, N is the batch size, E is the embedding dimension. \((N, S, E)\) if batch_first is True.

  • key_padding_mask: \((N, S)\) where N is the batch size, S is the source sequence length. If a ByteTensor is provided, the non-zero positions will be ignored while the position with the zero positions will be unchanged. If a BoolTensor is provided, the positions with the value of True will be ignored while the position with the value of False will be unchanged.

  • attn_mask: if a 2D mask: \((L, S)\) where L is the target sequence length, S is the source sequence length.

    If a 3D mask: \((N\cdot\text{num\_heads}, L, S)\) where N is the batch size, L is the target sequence length, S is the source sequence length. attn_mask ensure that position i is allowed to attend the unmasked positions. If a ByteTensor is provided, the non-zero positions are not allowed to attend while the zero positions will be unchanged. If a BoolTensor is provided, positions with True is not allowed to attend while False values will be unchanged. If a FloatTensor is provided, it will be added to the attention weight.

Shapes for outputs:
  • attn_output: \((L, N, E)\) where L is the target sequence length, N is the batch size, E is the embedding dimension. \((N, L, E)\) if batch_first is True.

  • attn_output_weights: \((N, L, S)\) where N is the batch size, L is the target sequence length, S is the source sequence length.

class CausalLinearAttention(embed_dim, heads, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, out_channels: Optional[int] = None, concat: bool = True, dim: int = 1)[source]#
forward(query: Tensor, key: Optional[Tensor] = None, value: Optional[Tensor] = None)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.