Layers#
Graph Convolutions#
Simple Dense Graph Convolution performing X' = AXW + b. 

Dense implementation the spatial diffusion of order K. 

The Diffusion Convolution Layer from the paper "Diffusion Convolutional Recurrent Neural Network: DataDriven Traffic Forecasting" (Li et al., ICLR 2018). 

Extension of 

Gate Graph Neural Network layer (with residual connections) inspired by Satorras et al., "Multivariate Time Series Forecasting with Latent Graph Inference", arxiv 2022. 

Dense Adaptive Graph Conv operator from Bai et al. "Adaptive Graph Convolutional Recurrent Network for Trafﬁc Forecasting", NeurIPS 2020. 
 class DenseGraphConv(input_size, output_size, bias=True)[source]#
Simple Dense Graph Convolution performing X’ = AXW + b.
 Parameters:
input_size – Size of the input.
output_size – Output size.
bias – Whether to add a learnable bias.
 class DenseGraphConvOrderK(input_size, output_size, support_len=3, order=2, include_self=True, channel_last=False)[source]#
Dense implementation the spatial diffusion of order K. Adapted from: https://github.com/nnzhan/GraphWaveNet
 Parameters:
input_size (int) – Size of the input.
output_size (int) – Size of the output.
support_len (int) – Number of reference operators.
order (int) – Order of the diffusion process.
include_self (bool) – Whether to include the central node or not.
channel_last (bool, optional) – Whether to use the layout “B S N C” as opposed to “B C N S”
 class DiffConv(in_channels, out_channels, k, root_weight: bool = True, add_backward: bool = True, bias: bool = True)[source]#
The Diffusion Convolution Layer from the paper “Diffusion Convolutional Recurrent Neural Network: DataDriven Traffic Forecasting” (Li et al., ICLR 2018).
 Parameters:
in_channels (int) – Number of input features.
out_channels (int) – Number of output features.
k (int) – Filter size \(K\).
root_weight (bool) – If
True
, then add a filter also for the \(0\)order neighborhood (i.e., the root node itself). (defaultTrue
)add_backward (bool) – If
True
, then additional \(K\) filters are learnt for the transposed connectivity. (defaultTrue
)bias (bool, optional) – If
True
, add a trainable additive bias. (default:True
)
 class MultiHeadGraphAttention(embed_dim, heads: int = 1, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, edge_dim: Optional[int] = None, concat: bool = True, dropout: float = 0.0, bias: bool = True, root_weight: bool = True, **kwargs)[source]#

 forward(query: Tensor, key: Tensor, value: Tensor, edge_index: Union[Tensor, SparseTensor], edge_attr: Optional[Tensor] = None, return_attention_weights: Optional[bool] = False, return_attention_matrix: Optional[bool] = False)[source]#
Runs the forward pass of the module.
 message(q_i: Tensor, k_j: Optional[Tensor], v_j: Optional[Tensor], edge_attr: Optional[Tensor], index: Tensor, size_i: int) Tensor [source]#
Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in
edge_index
. This function can take any argument as input which was initially passed topropagate()
. Furthermore, tensors passed topropagate()
can be mapped to the respective nodes \(i\) and \(j\) by appending_i
or_j
to the variable name, .e.g.x_i
andx_j
.
 class GATConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, heads: int = 1, concat: bool = True, dim: int = 2, negative_slope: float = 0.2, dropout: float = 0.0, add_self_loops: bool = True, edge_dim: Optional[int] = None, fill_value: Union[float, Tensor, str] = 'mean', bias: bool = True, **kwargs)[source]#
Extension of
GATConv
for static graphs with multidimensional features.The graph attentional operator from the “Graph Attention Networks” paper
\[\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},\]where the attention coefficients \(\alpha_{i,j}\) are computed as
\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k] \right)\right)}.\]If the graph has multidimensional edge features \(\mathbf{e}_{i,j}\), the attention coefficients \(\alpha_{i,j}\) are computed as
\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j \, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,j}]\right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k \, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,k}]\right)\right)}.\] Parameters:
in_channels (int or tuple) – Size of each input sample, or
1
to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.out_channels (int) – Size of each output sample.
heads (int, optional) – Number of multiheadattentions. (default:
1
)concat (bool, optional) – If set to
True
, the output dimension of each attention head isout_channels/heads
and all heads’ output are concatenated, resulting inout_channels
number of features. If set toFalse
, the multihead attentions are averaged instead of concatenated. (default:True
)dim (int) – The axis along which to propagate. (default:
2
)negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default:
0.2
)dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default:
0
)add_self_loops (bool, optional) – If set to
False
, will not add selfloops to the input graph. (default:True
)edge_dim (int, optional) – Edge feature dimensionality (in case there are any). (default:
None
)fill_value (float or Tensor or str, optional) – The way to generate edge features of selfloops (in case
edge_dim != None
). If given asfloat
ortorch.Tensor
, edge features of selfloops will be directly given byfill_value
. If given asstr
, edge features of selfloops are computed by aggregating all features of edges that point to the specific node, according to a reduce operation. ("add"
,"mean"
,"min"
,"max"
,"mul"
). (default:"mean"
)bias (bool, optional) – If set to
False
, the layer will not learn an additive bias. (default:True
)**kwargs (optional) – Additional arguments of
torch_geometric.nn.conv.MessagePassing
.
 Shapes:
 **input – ** node features \((*, \mathcal{V}, *, F_{in})\) or \(((*, \mathcal{V_s}, *, F_s), (*, \mathcal{V_t}, *, F_t))\) if bipartite, edge indices \((2, \mathcal{E})\), edge features \((\mathcal{E}, D)\) (optional)
 **output – ** node features \((*, \mathcal{V}, *, F_{out})\) or \(((*, \mathcal{V}_t, *, F_{out})\) if bipartite attention_weights \(((2, \mathcal{E}), (\mathcal{E}, H)))\) if
need_weights
isTrue
elseNone
 class GATLayer(d_model, n_heads, concat=False, dropout=0.1)[source]#
 forward(x, edge_index)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class GRIL(input_size: int, hidden_size: int, exog_size: int = 0, n_layers: int = 1, n_nodes: Optional[int] = None, kernel_size: int = 2, decoder_order: int = 1, layer_norm: bool = False, dropout: float = 0.0)[source]#
 forward(x: Tensor, edge_index: LongTensor, edge_weight: Optional[Tensor] = None, mask: Optional[Tensor] = None, u: Optional[Tensor] = None, h: Optional[Union[List[Tensor], Tensor]] = None)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class SpatioTemporalAtt(d_in, d_model, d_ff, n_heads, dropout, pool_size=1, pooling_op='mean')[source]#
 forward(x, **kwargs)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class GatedGraphNetwork(input_size: int, output_size: int, activation: str = 'silu', parametrized_skip_conn: bool = False)[source]#
Gate Graph Neural Network layer (with residual connections) inspired by Satorras et al., “Multivariate Time Series Forecasting with Latent Graph Inference”, arxiv 2022.
 Parameters:
 message(x_i, x_j)[source]#
Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in
edge_index
. This function can take any argument as input which was initially passed topropagate()
. Furthermore, tensors passed topropagate()
can be mapped to the respective nodes \(i\) and \(j\) by appending_i
or_j
to the variable name, .e.g.x_i
andx_j
.
 class AdaptiveGraphConv(input_size, emb_size, output_size, num_nodes, bias=True)[source]#
Dense Adaptive Graph Conv operator from Bai et al. “Adaptive Graph Convolutional Recurrent Network for Trafﬁc Forecasting”, NeurIPS 2020
 Parameters:
input_size – Size of the input.
emb_size – Size of the input node embeddings.
output_size – Output size.
num_nodes – Number of nodes in the input graph.
bias – Whether to add a learnable bias.
Norm Layers#
Applies a normalization of the specified type. 

Applies layer normalization. 

Applies graphwise instance normalization. 

Applies graphwise batch normalization. 
 class Norm(norm_type, in_channels, **kwargs)[source]#
Applies a normalization of the specified type.
 Parameters:
in_channels (int) – Size of each input sample.
 class InstanceNorm(in_channels, eps=1e05, affine=True)[source]#
Applies graphwise instance normalization.
 Parameters:
 forward(x: Tensor) Tensor [source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class BatchNorm(in_channels, eps: float = 1e05, momentum: float = 0.1, affine: bool = True, track_running_stats: bool = True)[source]#
Applies graphwise batch normalization.
 Parameters:
in_channels (int) – Size of each input sample.
eps (float, optional) – A value added to the denominator for numerical stability. (default:
1e5
)affine (bool, optional) – If set to
True
, this module has learnable affine parameters \(\gamma\) and \(\beta\). (default:True
)track_running_stats (bool, optional) – Whether to track stats to perform batch norm. (default:
True
)
General Layers#
Output a pairwise score for each couple of input elements. 

Implementation of the positional encoding from Vaswani et al. 2017. 

Call a generic function on the input. 

Concatenate tensors along dimension 

Apply 

Scales the gradient in backpropagation. 
 class LinkPredictor(emb_size, ff_size, hidden_size, dropout=0.0, activation='relu')[source]#
Output a pairwise score for each couple of input elements. Can be used as a building block for a graph learning model.
\[\mathbf{S} = \left(\text{MLP}_s(\mathbf{E})\right) \left(\text{MLP}_t(\mathbf{E})\right)^T\] Parameters:
emb_size – Size of the input embeddings.
ff_size – Size of the hidden layer used to learn the scores.
dropout – Dropout probability.
activation – Activation function used in the hidden layer.
 class PositionalEncoding(d_model, dropout=0.0, max_len=5000, affinity=False, batch_first=True)[source]#
Implementation of the positional encoding from Vaswani et al. 2017
 forward(x)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class Lambda(function: Callable)[source]#
Call a generic function on the input.
 Parameters:
function (callable) – The function to call in
forward(input)
.
 class Concatenate(dim: int = 0)[source]#
Concatenate tensors along dimension
dim
.The tensors dimensions are matched (i.e., broadcasted if necessary) before concatenation.
 Parameters:
dim (int) – The dimension to concatenate on. (default:
0
)