Layers#

Graph Convolutions#

DenseGraphConv

Simple Dense Graph Convolution performing X' = AXW + b.

DenseGraphConvOrderK

Dense implementation the spatial diffusion of order K.

DiffConv

The Diffusion Convolution Layer from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018).

MultiHeadGraphAttention

GATConv

Extension of GATConv for static graphs with multidimensional features.

GATLayer

GRIL

SpatioTemporalAtt

GatedGraphNetwork

Gate Graph Neural Network layer (with residual connections) inspired by Satorras et al., "Multivariate Time Series Forecasting with Latent Graph Inference", arxiv 2022.

AdaptiveGraphConv

Dense Adaptive Graph Conv operator from Bai et al. "Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting", NeurIPS 2020.

class DenseGraphConv(input_size, output_size, bias=True)[source]#

Simple Dense Graph Convolution performing X’ = AXW + b.

Parameters:
  • input_size – Size of the input.

  • output_size – Output size.

  • bias – Whether to add a learnable bias.

class DenseGraphConvOrderK(input_size, output_size, support_len=3, order=2, include_self=True, channel_last=False)[source]#

Dense implementation the spatial diffusion of order K. Adapted from: https://github.com/nnzhan/Graph-WaveNet

Parameters:
  • input_size (int) – Size of the input.

  • output_size (int) – Size of the output.

  • support_len (int) – Number of reference operators.

  • order (int) – Order of the diffusion process.

  • include_self (bool) – Whether to include the central node or not.

  • channel_last (bool, optional) – Whether to use the layout “B S N C” as opposed to “B C N S”

class DiffConv(in_channels, out_channels, k, root_weight: bool = True, add_backward: bool = True, bias: bool = True)[source]#

The Diffusion Convolution Layer from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).

Parameters:
  • in_channels (int) – Number of input features.

  • out_channels (int) – Number of output features.

  • k (int) – Filter size \(K\).

  • root_weight (bool) – If True, then add a filter also for the \(0\)-order neighborhood (i.e., the root node itself). (default True)

  • add_backward (bool) – If True, then additional \(K\) filters are learnt for the transposed connectivity. (default True)

  • bias (bool, optional) – If True, add a trainable additive bias. (default: True)

static compute_support_index(edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None, num_nodes: Optional[int] = None, add_backward: bool = True) List[source]#

Normalize the connectivity weights and (optionally) add normalized backward weights.

reset_parameters()[source]#

Resets all learnable parameters of the module.

class MultiHeadGraphAttention(embed_dim, heads: int = 1, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, edge_dim: Optional[int] = None, concat: bool = True, dropout: float = 0.0, bias: bool = True, root_weight: bool = True, **kwargs)[source]#
reset_parameters()[source]#

Resets all learnable parameters of the module.

forward(query: Tensor, key: Tensor, value: Tensor, edge_index: Union[Tensor, SparseTensor], edge_attr: Optional[Tensor] = None, return_attention_weights: Optional[bool] = False, return_attention_matrix: Optional[bool] = False)[source]#

Runs the forward pass of the module.

message(q_i: Tensor, k_j: Optional[Tensor], v_j: Optional[Tensor], edge_attr: Optional[Tensor], index: Tensor, size_i: int) Tensor[source]#

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

class GATConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, heads: int = 1, concat: bool = True, dim: int = -2, negative_slope: float = 0.2, dropout: float = 0.0, add_self_loops: bool = True, edge_dim: Optional[int] = None, fill_value: Union[float, Tensor, str] = 'mean', bias: bool = True, **kwargs)[source]#

Extension of GATConv for static graphs with multidimensional features.

The graph attentional operator from the “Graph Attention Networks” paper

\[\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},\]

where the attention coefficients \(\alpha_{i,j}\) are computed as

\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k] \right)\right)}.\]

If the graph has multi-dimensional edge features \(\mathbf{e}_{i,j}\), the attention coefficients \(\alpha_{i,j}\) are computed as

\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j \, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,j}]\right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k \, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,k}]\right)\right)}.\]
Parameters:
  • in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

  • out_channels (int) – Size of each output sample.

  • heads (int, optional) – Number of multi-head-attentions. (default: 1)

  • concat (bool, optional) – If set to True, the output dimension of each attention head is out_channels/heads and all heads’ output are concatenated, resulting in out_channels number of features. If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)

  • dim (int) – The axis along which to propagate. (default: -2)

  • negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)

  • dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)

  • add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

  • edge_dim (int, optional) – Edge feature dimensionality (in case there are any). (default: None)

  • fill_value (float or Tensor or str, optional) – The way to generate edge features of self-loops (in case edge_dim != None). If given as float or torch.Tensor, edge features of self-loops will be directly given by fill_value. If given as str, edge features of self-loops are computed by aggregating all features of edges that point to the specific node, according to a reduce operation. ("add", "mean", "min", "max", "mul"). (default: "mean")

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

Shapes:
  • - **input – ** node features \((*, |\mathcal{V}|, *, F_{in})\) or \(((*, |\mathcal{V_s}|, *, F_s), (*, |\mathcal{V_t}|, *, F_t))\) if bipartite, edge indices \((2, |\mathcal{E}|)\), edge features \((|\mathcal{E}|, D)\) (optional)

  • - **output – ** node features \((*, |\mathcal{V}|, *, F_{out})\) or \(((*, |\mathcal{V}_t|, *, F_{out})\) if bipartite attention_weights \(((2, |\mathcal{E}|), (|\mathcal{E}|, H)))\) if need_weights is True else None

reset_parameters()[source]#

Resets all learnable parameters of the module.

class GATLayer(d_model, n_heads, concat=False, dropout=0.1)[source]#
forward(x, edge_index)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class GRIL(input_size: int, hidden_size: int, exog_size: int = 0, n_layers: int = 1, n_nodes: Optional[int] = None, kernel_size: int = 2, decoder_order: int = 1, layer_norm: bool = False, dropout: float = 0.0)[source]#
forward(x: Tensor, edge_index: LongTensor, edge_weight: Optional[Tensor] = None, mask: Optional[Tensor] = None, u: Optional[Tensor] = None, h: Optional[Union[List[Tensor], Tensor]] = None)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class SpatioTemporalAtt(d_in, d_model, d_ff, n_heads, dropout, pool_size=1, pooling_op='mean')[source]#
forward(x, **kwargs)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class GatedGraphNetwork(input_size: int, output_size: int, activation: str = 'silu', parametrized_skip_conn: bool = False)[source]#

Gate Graph Neural Network layer (with residual connections) inspired by Satorras et al., “Multivariate Time Series Forecasting with Latent Graph Inference”, arxiv 2022.

Parameters:
  • input_size (int) – Input channels.

  • output_size (int) – Output channels.

  • activation (str, optional) – Activation function.

  • parametrized_skip_conn (bool, optional) – Whether to add a linear layer in the residual connection even if input and output dimensions match.

message(x_i, x_j)[source]#

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

class AdaptiveGraphConv(input_size, emb_size, output_size, num_nodes, bias=True)[source]#

Dense Adaptive Graph Conv operator from Bai et al. “Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting”, NeurIPS 2020

Parameters:
  • input_size – Size of the input.

  • emb_size – Size of the input node embeddings.

  • output_size – Output size.

  • num_nodes – Number of nodes in the input graph.

  • bias – Whether to add a learnable bias.

Norm Layers#

Norm

Applies a normalization of the specified type.

LayerNorm

Applies layer normalization.

InstanceNorm

Applies graph-wise instance normalization.

BatchNorm

Applies graph-wise batch normalization.

class Norm(norm_type, in_channels, **kwargs)[source]#

Applies a normalization of the specified type.

Parameters:

in_channels (int) – Size of each input sample.

class LayerNorm(in_channels, eps=1e-05, affine=True)[source]#

Applies layer normalization.

Parameters:
  • in_channels (int) – Size of each input sample.

  • eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

  • affine (bool, optional) – If set to True, this module has learnable affine parameters \(\gamma\) and \(\beta\). (default: True)

class InstanceNorm(in_channels, eps=1e-05, affine=True)[source]#

Applies graph-wise instance normalization.

Parameters:
  • in_channels (int) – Size of each input sample.

  • eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

  • affine (bool, optional) – If set to True, this module has learnable affine parameters \(\gamma\) and \(\beta\). (default: True)

forward(x: Tensor) Tensor[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class BatchNorm(in_channels, eps: float = 1e-05, momentum: float = 0.1, affine: bool = True, track_running_stats: bool = True)[source]#

Applies graph-wise batch normalization.

Parameters:
  • in_channels (int) – Size of each input sample.

  • eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

  • momentum (float, bool) – Running stats momentum.

  • affine (bool, optional) – If set to True, this module has learnable affine parameters \(\gamma\) and \(\beta\). (default: True)

  • track_running_stats (bool, optional) – Whether to track stats to perform batch norm. (default: True)

General Layers#

LinkPredictor

Output a pairwise score for each couple of input elements.

PositionalEncoding

Implementation of the positional encoding from Vaswani et al. 2017.

Lambda

Call a generic function on the input.

Concatenate

Concatenate tensors along dimension dim.

Select

Apply select() to select one element from a Tensor along a dimension.

GradNorm

Scales the gradient in back-propagation.

class LinkPredictor(emb_size, ff_size, hidden_size, dropout=0.0, activation='relu')[source]#

Output a pairwise score for each couple of input elements. Can be used as a building block for a graph learning model.

\[\mathbf{S} = \left(\text{MLP}_s(\mathbf{E})\right) \left(\text{MLP}_t(\mathbf{E})\right)^T\]
Parameters:
  • emb_size – Size of the input embeddings.

  • ff_size – Size of the hidden layer used to learn the scores.

  • dropout – Dropout probability.

  • activation – Activation function used in the hidden layer.

class PositionalEncoding(d_model, dropout=0.0, max_len=5000, affinity=False, batch_first=True)[source]#

Implementation of the positional encoding from Vaswani et al. 2017

forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class Lambda(function: Callable)[source]#

Call a generic function on the input.

Parameters:

function (callable) – The function to call in forward(input).

forward(input: Tensor) Tensor[source]#

Returns self.function(input).

class Concatenate(dim: int = 0)[source]#

Concatenate tensors along dimension dim.

The tensors dimensions are matched (i.e., broadcasted if necessary) before concatenation.

Parameters:

dim (int) – The dimension to concatenate on. (default: 0)

forward(tensors: Union[Tuple[Tensor, ...], List[Tensor]]) Tensor[source]#

Returns expand_then_cat() on input tensors.

class Select(dim: int, index: int)[source]#

Apply select() to select one element from a Tensor along a dimension.

This layer returns a view of the original tensor with the given dimension removed.

Parameters:
  • dim (int) – The dimension to slice.

  • index (int) – The index to select with.

forward(tensor: Tensor) Tensor[source]#

Returns select() on input tensor.

class GradNorm(*args, **kwargs)[source]#

Scales the gradient in back-propagation. In the forward pass is an identity operation.