Layers#

This module contains all the neural layers available in tsl.

Graph Convolutional Layers#

The subpackage tsl.nn.layers.graph_convs contains the graph convolutional layers.

GraphConv

A simple graph convolutional operator where the message function is a simple linear projection and aggregation a simple average.

DenseGraphConv

A dense graph convolution performing \(\mathbf{X}^{\prime} = \mathbf{\tilde{A}} \mathbf{X} \boldsymbol{\Theta} + \mathbf{b}\).

DenseGraphConvOrderK

Dense implementation of the spatial diffusion convolution of order \(K\).

DiffConv

The Diffusion Convolution Layer from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018).

DiffusionConv

Alias for DiffConv.

GraphPolyVAR

Polynomial spatiotemporal graph filter from the paper "Forecasting time series with VARMA recursions on graphs." (Isufi et al., IEEE Transactions on Signal Processing 2019).

MultiHeadGraphAttention

The multi-head attention from the paper Attention Is All You Need (Vaswani et al., NeurIPS 2017) for graph-structured data.

GATConv

Extension of GATConv for static graphs with multidimensional features.

GatedGraphNetwork

Gate Graph Neural Network layer (with residual connections) inspired by the FC-GNN model from the paper "Multivariate Time Series Forecasting with Latent Graph Inference" (Satorras et al., 2022).

AdaptiveGraphConv

The Dense Adaptive Graph Convolution operator from the paper "Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting" (Bai et al., NeurIPS 2020).

SpatiotemporalCrossAttention

HierarchicalSpatiotemporalCrossAttention

class GraphConv(input_size: int, output_size: int, bias: bool = True, norm: str = 'mean', root_weight: bool = True, activation: Optional[str] = None, cached: bool = False)[source]#

A simple graph convolutional operator where the message function is a simple linear projection and aggregation a simple average. In other terms:

\[\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1} \mathbf{\tilde{A}} \mathbf{X} \boldsymbol{\Theta} + \mathbf{b} .\]
Parameters:
  • input_size (int) – Size of the input features.

  • output_size (int) – Size of the output features.

  • bias (bool) – If False, then the layer will not learn an additive bias vector. (default: True)

  • norm (str) – The normalization used for edges and edge weights. If 'mean', then edge weights are normalized as \(a_{j \rightarrow i} = \frac{a_{j \rightarrow i}} {deg_{i}}\), other available options are: 'gcn', 'asym' and 'none'. (default: 'mean')

  • root_weight (bool) – If True, then add a linear layer for the root node itself (a skip connection). (default True)

  • activation (str, optional) – Activation function to be used, None for identity function (i.e., no activation). (default: None)

  • cached (bool) – If True, then cached the normalized edge weights computed in the first call. (default False)

reset_parameters()[source]#

Resets all learnable parameters of the module.

class DenseGraphConv(input_size, output_size, bias=True)[source]#

A dense graph convolution performing \(\mathbf{X}^{\prime} = \mathbf{\tilde{A}} \mathbf{X} \boldsymbol{\Theta} + \mathbf{b}\).

Parameters:
  • input_size – Size of the input.

  • output_size – Output size.

  • bias – Whether to add a learnable bias.

class DenseGraphConvOrderK(input_size, output_size, support_len=3, order=2, include_self=True, channel_last=False)[source]#

Dense implementation of the spatial diffusion convolution of order \(K\).

Parameters:
  • input_size (int) – Size of the input.

  • output_size (int) – Size of the output.

  • support_len (int) – Number of reference operators.

  • order (int) – Order of the diffusion process.

  • include_self (bool) – Whether to include the central node or not.

  • channel_last (bool, optional) – Whether to use the pattern “b t n f” as opposed to “b f n t”.

class DiffConv(in_channels: int, out_channels: int, k: int, root_weight: bool = True, add_backward: bool = True, bias: bool = True, activation: Optional[str] = None)[source]#

The Diffusion Convolution Layer from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).

Parameters:
  • in_channels (int) – Number of input features.

  • out_channels (int) – Number of output features.

  • k (int) – Filter size \(K\).

  • root_weight (bool) – If True, then add a filter also for the \(0\)-order neighborhood (i.e., the root node itself). (default True)

  • add_backward (bool) – If True, then additional \(K\) filters are learnt for the transposed connectivity. (default True)

  • bias (bool, optional) – If True, add a trainable additive bias. (default: True)

  • activation (str, optional) – Activation function to be used, None for identity function (i.e., no activation). (default: None)

static compute_support_index(edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None, num_nodes: Optional[int] = None, add_backward: bool = True) List[source]#

Normalize the connectivity weights and (optionally) add normalized backward weights.

reset_parameters()[source]#

Resets all learnable parameters of the module.

class DiffusionConv(in_channels: int, out_channels: int, k: int, root_weight: bool = True, add_backward: bool = True, bias: bool = True, activation: Optional[str] = None)[source]#

Alias for DiffConv.

class GraphPolyVAR(temporal_order: int, spatial_order: int, norm: str = 'none', cached: bool = False)[source]#

Polynomial spatiotemporal graph filter from the paper “Forecasting time series with VARMA recursions on graphs.” (Isufi et al., IEEE Transactions on Signal Processing 2019).

\[\mathbf{X}_t = \sum_{p=1}^{P} \sum_{l=1}^{L} \Theta_{p,l} \cdot \mathbf{\tilde{A}}^{l-1} \mathbf{X}_{t-p}\]
where
  • \(\mathbf{\tilde{A}}\) is a graph shift operator (GSO);

  • \(\Theta \in \mathbb{R}^{P \times L}\) are the filter coefficients accounting for up to \(L\)-hop neighbors and \(P\) time steps in the past.

Parameters:
  • temporal_order (int) – The filter temporal order \(P\).

  • spatial_order (int) – The filter spatial order \(L\).

  • norm (str) – The normalization used for edges and edge weights. The available options are: 'gcn', 'asym' and 'none'. (default: 'none')

  • cached (bool) – If True, then cache the normalized edge weights computed in the first call. (default False)

reset_parameters()[source]#

Resets all learnable parameters of the module.

class MultiHeadGraphAttention(embed_dim: int, num_heads: int = 1, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, edge_dim: Optional[int] = None, concat: bool = True, dropout: float = 0.0, root_weight: bool = True, bias: bool = True, **kwargs)[source]#

The multi-head attention from the paper Attention Is All You Need (Vaswani et al., NeurIPS 2017) for graph-structured data.

Parameters:
  • embed_dim (int) – Size of the embedding dimension.

  • num_heads (int) – Number of attention heads. (default: 1)

  • qdim (int, optional) – Number of features of the query. If None, then defaults to embed_dim. (default: None)

  • kdim (int, optional) – Number of features of the key. If None, then defaults to embed_dim. (default: None)

  • vdim (int, optional) – Number of features of the value. If None, then defaults to embed_dim. (default: None)

  • edge_dim (int, optional) – Number of edge features (None if there are no edge features). (default: None)

  • concat (bool) – If True, then the heads’ outputs are concatenated along the feature dimension, and the dimension of each head’s output is embed_dim / num_heads. Note that the total number of features in output is embed_dim in both cases. (default: True)

  • dropout (float, optional) – The dropout rate. (default: 0)

  • root_weight (bool) – If True, then add a skip connection from the input with a linear transformation. (default True)

  • bias (bool, optional) – If True, then add a bias vector in output. (default: True)

  • **kwargs – keyword arguments for the super(MessagePassing) call.

reset_parameters()[source]#

Resets all learnable parameters of the module.

class GATConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, heads: int = 1, concat: bool = True, dim: int = -2, negative_slope: float = 0.2, dropout: float = 0.0, add_self_loops: bool = True, edge_dim: Optional[int] = None, fill_value: Union[float, Tensor, str] = 'mean', bias: bool = True, **kwargs)[source]#

Extension of GATConv for static graphs with multidimensional features.

The graph attentional operator from the “Graph Attention Networks” paper

\[\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},\]

where the attention coefficients \(\alpha_{i,j}\) are computed as

\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k] \right)\right)}.\]

If the graph has multi-dimensional edge features \(\mathbf{e}_{i,j}\), the attention coefficients \(\alpha_{i,j}\) are computed as

\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j \, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,j}]\right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k \, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,k}]\right)\right)}.\]
Parameters:
  • in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

  • out_channels (int) – Size of each output sample.

  • heads (int, optional) – Number of multi-head-attentions. (default: 1)

  • concat (bool, optional) – If set to True, the output dimension of each attention head is out_channels/heads and all heads’ output are concatenated, resulting in out_channels number of features. If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)

  • dim (int) – The axis along which to propagate. (default: -2)

  • negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)

  • dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)

  • add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

  • edge_dim (int, optional) – Edge feature dimensionality (in case there are any). (default: None)

  • fill_value (float or Tensor or str, optional) – The way to generate edge features of self-loops (in case edge_dim != None). If given as float or torch.Tensor, edge features of self-loops will be directly given by fill_value. If given as str, edge features of self-loops are computed by aggregating all features of edges that point to the specific node, according to a reduce operation. ("add", "mean", "min", "max", "mul"). (default: "mean")

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

Shapes:
  • - **input – ** node features \((*, |\mathcal{V}|, *, F_{in})\) or \(((*, |\mathcal{V_s}|, *, F_s), (*, |\mathcal{V_t}|, *, F_t))\) if bipartite, edge indices \((2, |\mathcal{E}|)\), edge features \((|\mathcal{E}|, D)\) (optional)

  • - **output – ** node features \((*, |\mathcal{V}|, *, F_{out})\) or \(((*, |\mathcal{V}_t|, *, F_{out})\) if bipartite attention_weights \(((2, |\mathcal{E}|), (|\mathcal{E}|, H)))\) if need_weights is True else None

reset_parameters()[source]#

Resets all learnable parameters of the module.

class GatedGraphNetwork(input_size: int, output_size: int, activation: str = 'silu', parametrized_skip_conn: bool = False)[source]#

Gate Graph Neural Network layer (with residual connections) inspired by the FC-GNN model from the paper “Multivariate Time Series Forecasting with Latent Graph Inference” (Satorras et al., 2022).

Parameters:
  • input_size (int) – Input channels.

  • output_size (int) – Output channels.

  • activation (str, optional) – Activation function.

  • parametrized_skip_conn (bool, optional) – If True, then add a linear layer in the residual connection even if input and output dimensions match. (default: False)

class AdaptiveGraphConv(input_size: int, emb_size: int, output_size: int, num_nodes: int, bias: bool = True)[source]#

The Dense Adaptive Graph Convolution operator from the paper “Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting” (Bai et al., NeurIPS 2020).

Parameters:
  • input_size – Size of the input.

  • emb_size – Size of the input node embeddings.

  • output_size – Output size.

  • num_nodes – Number of nodes in the input graph.

  • bias – Whether to add a learnable bias.

class SpatiotemporalCrossAttention(input_size: Union[int, Tuple[int, int]], output_size: int, msg_size: Optional[int] = None, msg_layers: int = 1, root_weight: bool = True, reweigh: Optional[str] = None, temporal_self_attention: bool = True, mask_temporal: bool = True, mask_spatial: bool = True, norm: bool = True, dropout: float = 0.0, **kwargs)[source]#
reset_parameters()[source]#

Resets all learnable parameters of the module.

forward(x: Tuple[Tensor, Optional[Tensor]], edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None, mask: Optional[Tensor] = None)[source]#

Runs the forward pass of the module.

message(x_i: Tensor, x_j: Tensor, edge_weight: Optional[Tensor], mask_j: Optional[Tensor]) Tensor[source]#

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

class HierarchicalSpatiotemporalCrossAttention(h_size: int, z_size: int, msg_size: Optional[int] = None, msg_layers: int = 1, root_weight: bool = True, reweigh: Optional[str] = None, update_z_cross: bool = True, mask_temporal: bool = True, mask_spatial: bool = True, norm: bool = True, dropout: float = 0.0, aggr: str = 'add', **kwargs)[source]#
reset_parameters()[source]#

Resets all learnable parameters of the module.

forward(h: Tensor, z: Tensor, edge_index: Union[Tensor, SparseTensor], mask: Optional[Tensor] = None)[source]#

Runs the forward pass of the module.

message(h_i: Tensor, h_j: Tensor, z_i: Tensor, z_j: Tensor, index, size_i, mask_j: Optional[Tensor]) Tensor[source]#

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

Recurrent Layers#

The subpackage tsl.nn.layers.recurrent contains the cells used in encoders that process the input sequence in a recurrent fashion.

Base classes#

RNNCellBase

Base class for implementing recurrent neural networks (RNN) cells.

GRUCellBase

Base class for implementing gated recurrent unit (GRU) cells.

LSTMCellBase

Base class for implementing long short-term memory (LSTM) cells.

GraphGRUCellBase

Base class for implementing graph-based gated recurrent unit (GRU) cells.

GraphLSTMCellBase

Base class for implementing graph-based long short-term memory (LSTM) cells.

Implemented cells#

GRUCell

A gated recurrent unit (GRU) cell

LSTMCell

A long short-term memory (LSTM) cell.

GraphConvGRUCell

Gated Recurrent Unit with GraphConv as graph convolution in the gates, based on the paper "Structured Sequence Modeling with Graph Convolutional Recurrent Networks" (Seo et al., ICONIP 2017).

GraphConvLSTMCell

LSTM with GraphConv as graph convolution in the gates, based on the paper "Structured Sequence Modeling with Graph Convolutional Recurrent Networks" (Seo et al., ICONIP 2017).

DCRNNCell

The Diffusion Convolutional Recurrent cell from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018).

DenseDCRNNCell

Dense implementation of the Diffusion Convolutional Recurrent cell from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018).

AGCRNCell

The Adaptive Graph Convolutional cell from the paper "Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting" (Bai et al., NeurIPS 2020).

EvolveGCNOCell

EvolveGCNO cell from the paper "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs" (Pereja et al., AAAI 2020).

EvolveGCNHCell

EvolveGCNH cell from the paper "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs" (Pereja et al., AAAI 2020).

GRINCell

The Graph Recurrent Imputation cell with Diffusion Convolution from the paper "Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks" (Cini et al., ICLR 2022).

class RNNCellBase[source]#

Base class for implementing recurrent neural networks (RNN) cells.

class GRUCellBase(hidden_size: int, forget_gate: Module, update_gate: Module, candidate_gate: Module)[source]#

Base class for implementing gated recurrent unit (GRU) cells.

class LSTMCellBase(hidden_size: int, input_gate: Module, forget_gate: Module, cell_gate: Module, output_gate: Module)[source]#

Base class for implementing long short-term memory (LSTM) cells.

class GraphGRUCellBase(hidden_size: int, forget_gate: Module, update_gate: Module, candidate_gate: Module)[source]#

Base class for implementing graph-based gated recurrent unit (GRU) cells.

class GraphLSTMCellBase(hidden_size: int, input_gate: Module, forget_gate: Module, cell_gate: Module, output_gate: Module)[source]#

Base class for implementing graph-based long short-term memory (LSTM) cells.

class GRUCell(input_size: int, hidden_size: int, bias: bool = True, device=None, dtype=None)[source]#

A gated recurrent unit (GRU) cell

\[\begin{split}\begin{array}{ll} r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h \end{array}\end{split}\]

where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.

Parameters:
  • input_size – The number of expected features in the input x

  • hidden_size – The number of features in the hidden state h

  • bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True

Inputs: input, hidden
  • input : tensor containing input features

  • hidden : tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided.

Outputs: h’
  • h’ : tensor containing the next hidden state for each element in the batch

Shape:
  • - input\((N, H_{in})\) or \((H_{in})\) tensor containing input features where \(H_{in}\) = input_size.

  • - hidden\((N, H_{out})\) or \((H_{out})\) tensor containing the initial hidden state where \(H_{out}\) = hidden_size. Defaults to zero if not provided.

  • - output\((N, H_{out})\) or \((H_{out})\) tensor containing the next hidden state.

weight_ih#

the learnable input-hidden weights, of shape (3*hidden_size, input_size)

Type:

torch.Tensor

weight_hh#

the learnable hidden-hidden weights, of shape (3*hidden_size, hidden_size)

Type:

torch.Tensor

bias_ih#

the learnable input-hidden bias, of shape (3*hidden_size)

bias_hh#

the learnable hidden-hidden bias, of shape (3*hidden_size)

Note

All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)

On certain ROCm devices, when using float16 inputs this module will use different precision for backward.

Examples:

>>> rnn = nn.GRUCell(10, 20)
>>> input = torch.randn(6, 3, 10)
>>> hx = torch.randn(3, 20)
>>> output = []
>>> for i in range(6):
...     hx = rnn(input[i], hx)
...     output.append(hx)
class LSTMCell(input_size: int, hidden_size: int, bias: bool = True, device=None, dtype=None)[source]#

A long short-term memory (LSTM) cell.

\[\begin{split}\begin{array}{ll} i = \sigma(W_{ii} x + b_{ii} + W_{hi} h + b_{hi}) \\ f = \sigma(W_{if} x + b_{if} + W_{hf} h + b_{hf}) \\ g = \tanh(W_{ig} x + b_{ig} + W_{hg} h + b_{hg}) \\ o = \sigma(W_{io} x + b_{io} + W_{ho} h + b_{ho}) \\ c' = f * c + i * g \\ h' = o * \tanh(c') \\ \end{array}\end{split}\]

where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.

Parameters:
  • input_size – The number of expected features in the input x

  • hidden_size – The number of features in the hidden state h

  • bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True

Inputs: input, (h_0, c_0)
  • input of shape (batch, input_size) or (input_size): tensor containing input features

  • h_0 of shape (batch, hidden_size) or (hidden_size): tensor containing the initial hidden state

  • c_0 of shape (batch, hidden_size) or (hidden_size): tensor containing the initial cell state

    If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.

Outputs: (h_1, c_1)
  • h_1 of shape (batch, hidden_size) or (hidden_size): tensor containing the next hidden state

  • c_1 of shape (batch, hidden_size) or (hidden_size): tensor containing the next cell state

weight_ih#

the learnable input-hidden weights, of shape (4*hidden_size, input_size)

Type:

torch.Tensor

weight_hh#

the learnable hidden-hidden weights, of shape (4*hidden_size, hidden_size)

Type:

torch.Tensor

bias_ih#

the learnable input-hidden bias, of shape (4*hidden_size)

bias_hh#

the learnable hidden-hidden bias, of shape (4*hidden_size)

Note

All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)

On certain ROCm devices, when using float16 inputs this module will use different precision for backward.

Examples:

>>> rnn = nn.LSTMCell(10, 20) # (input_size, hidden_size)
>>> input = torch.randn(2, 3, 10) # (time_steps, batch, input_size)
>>> hx = torch.randn(3, 20) # (batch, hidden_size)
>>> cx = torch.randn(3, 20)
>>> output = []
>>> for i in range(input.size()[0]):
...     hx, cx = rnn(input[i], (hx, cx))
...     output.append(hx)
>>> output = torch.stack(output, dim=0)
class GraphConvGRUCell(input_size: int, hidden_size: int, bias: bool = True, norm: str = 'mean', root_weight: bool = True, cached: bool = False, **kwargs)[source]#

Gated Recurrent Unit with GraphConv as graph convolution in the gates, based on the paper “Structured Sequence Modeling with Graph Convolutional Recurrent Networks” (Seo et al., ICONIP 2017).

Parameters:
  • input_size (int) – Size of the input.

  • hidden_size (int) – Number of units in the hidden state.

  • bias (bool) – If True, then the layer will learn an additive bias for each gate. (default: True)

  • norm (str) – The normalization used for edges and edge weights. If 'mean', then edge weights are normalized as \(a_{j \rightarrow i} = \frac{a_{j \rightarrow i}} {deg_{i}}\), other available options are: 'gcn', 'asym' and 'none'. (default: 'mean')

  • root_weight (bool) – If True, then add a filter (with different weights) for the root node itself. (default True)

  • cached (bool) – If True, then cached the normalized edge weights computed in the first call. (default False)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

class GraphConvLSTMCell(input_size: int, hidden_size: int, bias: bool = True, norm: str = 'mean', root_weight: bool = True, cached: bool = False, **kwargs)[source]#

LSTM with GraphConv as graph convolution in the gates, based on the paper “Structured Sequence Modeling with Graph Convolutional Recurrent Networks” (Seo et al., ICONIP 2017).

Parameters:
  • input_size (int) – Size of the input.

  • hidden_size (int) – Number of units in the hidden state.

  • bias (bool) – If True, then the layer will learn an additive bias for each gate. (default: True)

  • norm (str) – Normalization used by the graph convolutional layer. (default mean)

  • root_weight (bool) – If True, then add a filter (with different weights) for the root node itself. (default True)

  • cached (bool) – If True, then cached the normalized edge weights computed in the first call. (default False)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

class DCRNNCell(input_size: int, hidden_size: int, k: int = 2, root_weight: bool = True, add_backward: bool = True, bias: bool = True)[source]#

The Diffusion Convolutional Recurrent cell from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).

Parameters:
  • input_size – Size of the input.

  • hidden_size – Number of units in the hidden state.

  • k – Size of the diffusion kernel.

  • root_weight – Whether to learn a separate transformation for the central node.

class DenseDCRNNCell(input_size: int, hidden_size: int, k: int = 2, root_weight: bool = False)[source]#

Dense implementation of the Diffusion Convolutional Recurrent cell from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).

In this implementation, the adjacency matrix is dense and the convolution is performed with matrix multiplication.

Parameters:
  • input_size – Size of the input.

  • hidden_size – Number of units in the hidden state.

  • k – Size of the diffusion kernel.

  • root_weight (bool) – Whether to learn a separate transformation for the central node.

class AGCRNCell(input_size: int, emb_size: int, hidden_size: int, num_nodes: int, bias: bool = True)[source]#

The Adaptive Graph Convolutional cell from the paper “Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting” (Bai et al., NeurIPS 2020).

Parameters:
  • input_size – Size of the input.

  • emb_size – Size of the input node embeddings.

  • hidden_size – Output size.

  • num_nodes – Number of nodes in the input graph.

class EvolveGCNOCell(in_size, out_size, norm, activation='relu', root_weight=False, bias=True, cached=False)[source]#

EvolveGCNO cell from the paper “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” (Pereja et al., AAAI 2020).

This variant of the model simply updates the weights of the graph convolution.

Parameters:
  • in_size (int) – Size of the input.

  • out_size (int) – Number of units in the hidden state.

  • norm (str) – Method used to normalize the adjacency matrix.

  • activation (str) – Activation function after the GCN layer.

  • root_weight (bool) – Whether to add a parametrized skip connection.

  • bias (bool) – Whether to learn a bias.

  • cached (bool) – Whether to cache normalized edge_weights.

reset_parameters()[source]#

Resets all learnable parameters of the module.

class EvolveGCNHCell(in_size, out_size, norm, activation='relu', root_weight=False, bias=True, cached=False)[source]#

EvolveGCNH cell from the paper “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” (Pereja et al., AAAI 2020).

This variant of the model adapts the weights of the graph convolution by looking at node features.

Parameters:
  • in_size (int) – Size of the input.

  • out_size (int) – Number of units in the hidden state.

  • norm (bool) – Methods used to normalize the adjacency matrix.

  • activation (str) – Activation function after the GCN layer.

  • root_weight (bool) – Whether to add a parametrized skip connection.

  • bias (bool) – Whether to learn a bias.

  • cached (bool) – Whether to cache normalized edge_weights.

reset_parameters()[source]#

Resets all learnable parameters of the module.

class GRINCell(input_size: int, hidden_size: int, exog_size: int = 0, n_layers: int = 1, n_nodes: Optional[int] = None, kernel_size: int = 2, decoder_order: int = 1, layer_norm: bool = False, dropout: float = 0.0)[source]#

The Graph Recurrent Imputation cell with Diffusion Convolution from the paper “Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks” (Cini et al., ICLR 2022).

Parameters:
  • input_size (int) – Size of the input.

  • hidden_size (int) – Number of units in the DCRNN hidden layer. (default: 64)

  • exog_size (int) – Number of channels in the exogenous variables, if any. (default: 0)

  • n_layers (int) – Number of stacked DCRNN cells. (default: 1)

  • n_nodes (int, optional) – Number of nodes in the input graph. (default: None)

  • kernel_size (int) – Order of the spatial diffusion process in the DCRNN cells. (default: 2)

  • decoder_order (int) – Order of the spatial diffusion process in the spatial decoder. (default: 1)

  • layer_norm (bool, optional) – If True, then use layer normalization. (default: False)

  • dropout (float, optional) – Dropout probability in the DCRNN cells. (default: 0)

Multi Layers#

The subpackage tsl.nn.layers.multi contains the layers that perform an operation using a different set of parameters for the different instances stacked in a dimension of the input data (e.g., the node dimension). They can be used to process with independent parameters each node (or time step), breaking the permutation equivariant property of the original operation.

MultiLinear

Applies linear transformations with different weights to the different instances in the input data.

MultiDense

Applies linear transformations with different weights to the different instances in the input data with a final nonlinear activation.

MultiConv1d

Applies convolutions with different weights to the different instances in the input data.

MultiGRUCell

Multiple parallel gated recurrent unit (GRU) cells.

MultiLSTMCell

Multiple parallel long short-term memory (LSTM) cells.

class MultiLinear(in_channels: int, out_channels: int, n_instances: int, *, ndim: Optional[int] = None, pattern: Optional[str] = None, instance_dim: Union[int, str] = -2, channel_dim: Union[int, str] = -1, bias: bool = True, device=None, dtype=None)[source]#

Applies linear transformations with different weights to the different instances in the input data.

\[\mathbf{X}^{\prime} = [\boldsymbol{\Theta}_i \mathbf{x}_i + \mathbf{b}_i]_{i=0,\ldots,N}\]
Parameters:
  • in_channels (int) – Size of instance input sample.

  • out_channels (int) – Size of instance output sample.

  • n_instances (int) – The number \(N\) of parallel linear operations. Each operation has different weights and biases.

  • instance_dim (int or str) – Dimension of the instances (must match n_instances at runtime). (default: -2)

  • channel_dim (int or str) – Dimension of the input channels. (default: -1)

  • bias (bool) – If True, then the layer will learn an additive bias for each instance. (default: True)

  • device (optional) – The device of the parameters. (default: None)

  • dtype (optional) – The data type of the parameters. (default: None)

Examples

>>> m = MultiLinear(20, 32, 10, pattern='t n f', instance_dim='n')
>>> input = torch.randn(64, 12, 10, 20)  # shape: [b t n f]
>>> output = m(input)
>>> print(output.size())
torch.Size([64, 24, 10, 32])
forward(input: Tensor) Tensor[source]#

Compute \(\mathbf{X}^{\prime} = [\boldsymbol{\Theta}_i \mathbf{x}_i + \mathbf{b}_i]_{i=0,\ldots,N}\)

class MultiDense(in_channels: int, out_channels: int, n_instances: int, activation: str = 'relu', dropout: float = 0.0, *, ndim: Optional[int] = None, pattern: Optional[str] = None, instance_dim: int = -2, channel_dim: int = -1, bias: bool = True, device=None, dtype=None)[source]#

Applies linear transformations with different weights to the different instances in the input data with a final nonlinear activation.

\[\mathbf{X}^{\prime} = \left[\sigma\left(\boldsymbol{\Theta}_i \mathbf{x}_i + \mathbf{b}_i \right)\right]_{i=0,\ldots,N}\]
Parameters:
  • in_channels (int) – Size of instance input sample.

  • out_channels (int) – Size of instance output sample.

  • n_instances (int) – The number \(N\) of parallel linear operations. Each operation has different weights and biases.

  • activation (str, optional) – Activation function to be used. (default: 'relu')

  • dropout (float, optional) – Dropout rate. (default: 0)

  • instance_dim (int or str) – Dimension of the instances (must match n_instances at runtime). (default: -2)

  • channel_dim (int or str) – Dimension of the input channels. (default: -1)

  • bias (bool) – If True, then the layer will learn an additive bias for each instance. (default: True)

  • device (optional) – The device of the parameters. (default: None)

  • dtype (optional) – The data type of the parameters. (default: None)

forward(input: Tensor) Tensor[source]#

Compute \(\mathbf{X}^{\prime} = \left[\sigma\left(\boldsymbol{ \Theta}_i\mathbf{x}_i + \mathbf{b}_i \right)\right]_{i=0,\ldots,N}\).

class MultiConv1d(in_channels: int, out_channels: int, n_instances: int, kernel_size: int, stride: int = 1, padding: Union[str, int] = 0, dilation: int = 1, bias: bool = True, device=None, dtype=None)[source]#

Applies convolutions with different weights to the different instances in the input data.

class MultiGRUCell(input_size: int, hidden_size: int, n_instances: int, bias: bool = True, device=None, dtype=None)[source]#

Multiple parallel gated recurrent unit (GRU) cells.

\[\begin{split}\begin{array}{ll} r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h \end{array}\end{split}\]

where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.

Parameters:
  • input_size (int) – The number of features in the instance input sample.

  • hidden_size (int) – The number of features in the instance hidden state.

  • n_instances (int) – The number of parallel GRU cells. Each cell has different weights.

  • bias (bool) – If True, then the layer will learn an additive bias for each instance gate. (default: True)

  • device (optional) – The device of the parameters. (default: None)

  • dtype (optional) – The data type of the parameters. (default: None)

Examples:

>>> rnn = MultiGRUCell(20, 32, 10)
>>> input = torch.randn(64, 12, 10, 20)
>>> h = None
>>> output = []
>>> for i in range(12):
...     h = rnn(input[:, i], h)
...     output.append(h)
>>> output = torch.stack(output, dim=1)
>>> print(output.size())
torch.Size([64, 12, 10, 32])
class MultiLSTMCell(input_size: int, hidden_size: int, n_instances: int, bias: bool = True, device=None, dtype=None)[source]#

Multiple parallel long short-term memory (LSTM) cells.

\[\begin{split}\begin{array}{ll} i = \sigma(W_{ii} x + b_{ii} + W_{hi} h + b_{hi}) \\ f = \sigma(W_{if} x + b_{if} + W_{hf} h + b_{hf}) \\ g = \tanh(W_{ig} x + b_{ig} + W_{hg} h + b_{hg}) \\ o = \sigma(W_{io} x + b_{io} + W_{ho} h + b_{ho}) \\ c' = f * c + i * g \\ h' = o * \tanh(c') \\ \end{array}\end{split}\]

where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.

Parameters:
  • input_size (int) – The number of features in the instance input sample.

  • hidden_size (int) – The number of features in the instance hidden state.

  • n_instances (int) – The number of parallel LSTM cells. Each cell has different weights.

  • bias (bool) – If True, then the layer will learn an additive bias for each instance gate. (default: True)

  • device (optional) – The device of the parameters. (default: None)

  • dtype (optional) – The data type of the parameters. (default: None)

Examples:

>>> rnn = MultiLSTMCell(20, 32, 10)
>>> input = torch.randn(64, 12, 10, 20)
>>> h = None
>>> output = []
>>> for i in range(12):
...     h = rnn(input[:, i], h)  # h = h, c
...     output.append(h[0])      # i-th output is h_i
>>> output = torch.stack(output, dim=1)
>>> print(output.size())
torch.Size([64, 12, 10, 32])

Normalization Layers#

The subpackage tsl.nn.layers.norm contains the normalization layers.

Norm

Applies a normalization of the specified type.

LayerNorm

Applies layer normalization.

InstanceNorm

Applies graph-wise instance normalization.

BatchNorm

Applies graph-wise batch normalization.

class Norm(norm_type, in_channels, **kwargs)[source]#

Applies a normalization of the specified type.

Parameters:

in_channels (int) – Size of each input sample.

class LayerNorm(in_channels, eps=1e-05, affine=True)[source]#

Applies layer normalization.

Parameters:
  • in_channels (int) – Size of each input sample.

  • eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

  • affine (bool, optional) – If set to True, this module has learnable affine parameters \(\gamma\) and \(\beta\). (default: True)

class InstanceNorm(in_channels, eps=1e-05, affine=True)[source]#

Applies graph-wise instance normalization.

Parameters:
  • in_channels (int) – Size of each input sample.

  • eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

  • affine (bool, optional) – If set to True, this module has learnable affine parameters \(\gamma\) and \(\beta\). (default: True)

class BatchNorm(in_channels, eps: float = 1e-05, momentum: float = 0.1, affine: bool = True, track_running_stats: bool = True)[source]#

Applies graph-wise batch normalization.

Parameters:
  • in_channels (int) – Size of each input sample.

  • eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

  • momentum (float, bool) – Running stats momentum.

  • affine (bool, optional) – If set to True, this module has learnable affine parameters \(\gamma\) and \(\beta\). (default: True)

  • track_running_stats (bool, optional) – Whether to track stats to perform batch norm. (default: True)

Base Layers#

The subpackage tsl.nn.layers.base contains basic layers used at the core of other layers.

Dense

A simple fully-connected layer implementing

TemporalConv

Learns a standard temporal convolutional filter.

GatedTemporalConv

Temporal convolutional filter with gated tanh connection.

NodeEmbedding

Creates a table of node embeddings with the specified size.

PositionalEncoding

The positional encoding from the paper "Attention Is All You Need" (Vaswani et al., NeurIPS 2017).

MultiHeadAttention

The multi-head attention from the paper "Attention Is All You Need" (Vaswani et al., NeurIPS 2017) for spatiotemporal data.

TemporalSelfAttention

Temporal Self Attention layer.

SpatialSelfAttention

Spatial Self Attention layer.

class Dense(input_size: int, output_size: int, activation: str = 'relu', dropout: float = 0.0, bias: bool = True)[source]#

A simple fully-connected layer implementing

\[\mathbf{x}^{\prime} = \sigma\left(\boldsymbol{\Theta}\mathbf{x} + \mathbf{b}\right)\]

where \(\mathbf{x} \in \mathbb{R}^{d_{in}}, \mathbf{x}^{\prime} \in \mathbb{R}^{d_{out}}\) are the input and output features, respectively, \(\boldsymbol{\Theta} \in \mathbb{R}^{d_{out} \times d_{in}} \mathbf{b} \in \mathbb{R}^{d_{out}}\) are trainable parameters, and \(\sigma\) is an activation function.

Parameters:
  • input_size (int) – Number of input features.

  • output_size (int) – Number of output features.

  • activation (str, optional) – Activation function to be used. (default: 'relu')

  • dropout (float, optional) – The dropout rate. (default: 0)

  • bias (bool, optional) – If True, then the bias vector is used. (default: True)

class TemporalConv(input_channels: int, output_channels: int, kernel_size: int, dilation: int = 1, stride: int = 1, bias: bool = True, padding: Union[int, Tuple[int]] = 0, causal_pad: bool = True, weight_norm: bool = False, channel_last: bool = False)[source]#

Learns a standard temporal convolutional filter.

Parameters:
  • input_channels (int) – Input size.

  • output_channels (int) – Output size.

  • kernel_size (int) – Size of the convolution kernel.

  • dilation (int, optional) – Spacing between kernel elements.

  • stride (int, optional) – Stride of the convolution.

  • bias (bool, optional) – Whether to add a learnable bias to the output of the convolution.

  • padding (int or tuple, optional) – Padding of the input. Used only of causal_pad is False.

  • causal_pad (bool, optional) – Whether to pad the input as to preserve causality.

  • weight_norm (bool, optional) – Wheter to apply weight normalization to the parameters of the filter.

class GatedTemporalConv(input_channels: int, output_channels: int, kernel_size: int, dilation: int = 1, stride: int = 1, bias: bool = True, padding: Union[int, Tuple[int]] = 0, causal_pad: bool = True, weight_norm: bool = False, channel_last: bool = False)[source]#

Temporal convolutional filter with gated tanh connection.

class NodeEmbedding(n_nodes: int, emb_size: int, initializer: Union[str, Tensor] = 'uniform', requires_grad: bool = True)[source]#

Creates a table of node embeddings with the specified size.

Parameters:
  • n_nodes (int) – Number of elements for which to store an embedding.

  • emb_size (int) – Size of the embedding.

  • initializer (str or Tensor) – Initialization methods. (default 'uniform')

  • requires_grad (bool) – Whether to compute gradients for the embeddings. (default True)

class PositionalEncoding(d_model: int, dropout: float = 0.0, max_len: int = 5000, affinity: bool = False, batch_first=True)[source]#

The positional encoding from the paper “Attention Is All You Need” (Vaswani et al., NeurIPS 2017).

class MultiHeadAttention(embed_dim: int, heads: int, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, axis: Union[str, int] = 'time', add_bias_kv: bool = False, add_zero_attn: bool = False, causal: bool = False, dropout: float = 0.0, bias: bool = True, device=None, dtype=None)[source]#

The multi-head attention from the paper “Attention Is All You Need” (Vaswani et al., NeurIPS 2017) for spatiotemporal data.

Parameters:
  • embed_dim (int) – Size of the hidden dimension associated with each node at each time step.

  • heads (int) – Number of parallel attention heads.

  • qdim (int, optional) – Size of the query dimension. If None, then defaults to embed_dim. (default: None)

  • kdim (int, optional) – Size of the query dimension. If None, then defaults to embed_dim. (default: None)

  • vdim (int, optional) – Size of the query dimension. If None, then defaults to embed_dim. (default: None)

  • axis (str) – Dimension on which to apply attention to update the representations. Can be either, 'time' or 'nodes'. (default: 'time')

  • add_bias_kv (bool) – If True, then adds bias to the key and value sequences. (default: False)

  • add_zero_attn (bool) – If True, then adds a new batch of zeros to the key and value sequences. (default: False)

  • causal (bool) – If True, then causally mask attention scores in temporal attention (has an effect only if axis is 'time'). (default: False)

  • dropout (float) – Dropout probability. (default: 0.)

  • bias (bool) – Whether to add a learnable bias. (default: True)

  • device (optional) – Device on which store the model. (default: None)

  • dtype (optional) – Data Type of the parameters. (default: None)

class TemporalSelfAttention(embed_dim, num_heads, in_channels=None, dropout=0.0, bias=True, device=None, dtype=None)[source]#

Temporal Self Attention layer.

Parameters:
  • embed_dim (int) – Size of the hidden dimension associated with each node at each time step.

  • num_heads (int) – Number of parallel attention heads.

  • dropout (float) – Dropout probability.

  • bias (bool, optional) – Whther to add a learnable bias.

  • device (optional) – Device on which store the model.

  • dtype (optional) – Data Type of the parameters.

Examples::
>>> import torch
>>> m = TemporalSelfAttention(32, 4, -1)
>>> input = torch.randn(128, 24, 10, 20)
>>> output, _ = m(input)
>>> print(output.size())
torch.Size([128, 24, 10, 32])
class SpatialSelfAttention(embed_dim, num_heads, in_channels=None, dropout=0.0, bias=True, device=None, dtype=None)[source]#

Spatial Self Attention layer.

Parameters:
  • embed_dim (int) – Size of the hidden dimension associated with each node at each time step.

  • num_heads (int) – Number of parallel attention heads.

  • dropout (float) – Dropout probability.

  • bias (bool, optional) – Whther to add a learnable bias.

  • device (optional) – Device on which store the model.

  • dtype (optional) – Data Type of the parameters.

Examples::
>>> import torch
>>> m = SpatialSelfAttention(32, 4, -1)
>>> input = torch.randn(128, 24, 10, 20)
>>> output, _ = m(input)
>>> print(output.size())
torch.Size([128, 24, 10, 32])

Operational Layers#

The subpackage tsl.nn.layers.ops contains operational layers that do not involve learnable parameters.

Lambda

Call a generic function on the input.

Concatenate

Concatenate tensors along dimension dim.

Select

Apply select() to select one element from a Tensor along a dimension.

GradNorm

Scales the gradient in back-propagation.

Activation

A utility layer for any activation function.

class Lambda(function: Callable)[source]#

Call a generic function on the input.

Parameters:

function (callable) – The function to call in forward(input).

extra_repr() str[source]#

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(input: Tensor) Tensor[source]#

Returns self.function(input).

class Concatenate(dim: int = 0)[source]#

Concatenate tensors along dimension dim.

The tensors dimensions are matched (i.e., broadcasted if necessary) before concatenation.

Parameters:

dim (int) – The dimension to concatenate on. (default: 0)

forward(tensors: Union[Tuple[Tensor, ...], List[Tensor]]) Tensor[source]#

Returns expand_then_cat() on input tensors.

class Select(dim: int, index: int)[source]#

Apply select() to select one element from a Tensor along a dimension.

This layer returns a view of the original tensor with the given dimension removed.

Parameters:
  • dim (int) – The dimension to slice.

  • index (int) – The index to select with.

forward(tensor: Tensor) Tensor[source]#

Returns select() on input tensor.

class GradNorm(*args, **kwargs)[source]#

Scales the gradient in back-propagation. In the forward pass is an identity operation.

class Activation(activation: str, **kwargs)[source]#

A utility layer for any activation function.

Parameters:
  • activation (str) – Name of the activation function.

  • **kwargs – Keyword arguments for the activation layer.

forward(x)[source]#

Returns self.activation(input).