Blocks#

Encoders#

`MLP`	Simple Multi-layer Perceptron encoder with optional linear readout.
`ResidualMLP`	Multi-layer Perceptron with residual connections.
`MultiMLP`	A multi-layer perceptron (MLP) (with optional linear readout) with different weights for each element in the specified dimension.
`ConditionalBlock`	Simple layer to condition the input on a set of exogenous variables.
`TemporalConvNet`	Simple TCN encoder with optional linear readout.
`SpatioTemporalConvNet`	SpatioTemporalConvolutional encoder with optional linear readout.
`ConditionalTCNBlock`	Mirrors the architecture of `tsl.nn.blocks.encoders.ConditionalBlock` but using temporal convolutions instead of affine transformations.
`MLPAttention`
`TemporalMLPAttention`
`TransformerLayer`	A Transformer layer from the paper "Attention Is All You Need" (Vaswani et al., NeurIPS 2017).

Recurrent Encoders#

`SpatioTemporalTransformerLayer`	A `TransformerLayer` which attend both the spatial and temporal dimensions by stacking two `MultiHeadAttention` layers.
`Transformer`	A stack of Transformer layers.
`RNNBase`	Base class for implementing recurrent neural networks (RNNs).
`RNN`	Simple RNN encoder with optional linear readout.
`MultiRNN`	A Recurrent Neural Network whose cells' weights are not shared among the different instances.
`GraphConvRNN`	The Graph Convolutional Recurrent Network based on the paper "Structured Sequence Modeling with Graph Convolutional Recurrent Networks" (Seo et al., ICONIP 2017), using `GraphConv` as graph convolution.
`DCRNN`	The Diffusion Convolutional Recurrent Neural Network from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018).
`DenseDCRNN`	Dense implementation of the Diffusion Convolutional Recurrent Neural Network from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018).
`AGCRN`	The Adaptive Graph Convolutional Recurrent Network from the paper "Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting" (Bai et al., NeurIPS 2020).
`EvolveGCN`	EvolveGCN encoder from the paper "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs" (Pereja et al., AAAI 2020).

class MLP(input_size, hidden_size, output_size=None, exog_size=None, n_layers=1, activation='relu', dropout=0.0)[source]#

Simple Multi-layer Perceptron encoder with optional linear readout.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
output_size (int, optional) – Size of the optional readout.
exog_size (int, optional) – Size of the optional exogenous variables.
n_layers (int, optional) – Number of hidden layers. (default: 1)
activation (str, optional) – Activation function. (default: relu)
dropout (float, optional) – Dropout probability.

class ResidualMLP(input_size, hidden_size, output_size=None, exog_size=None, n_layers=1, activation='relu', dropout=0.0, parametrized_skip=False)[source]#

Multi-layer Perceptron with residual connections.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
output_size (int, optional) – Size of the optional readout.
exog_size (int, optional) – Size of the optional exogenous variables.
n_layers (int, optional) – Number of hidden layers. (default: 1)
activation (str, optional) – Activation function. (default: relu)
dropout (float, optional) – Dropout probability. (default: 0.)
parametrized_skip (bool, optional) – Whether to use parametrized skip connections for the residuals.

class MultiMLP(input_size: int, hidden_size: int, n_instances: int, *, ndim: Optional[int] = None, pattern: Optional[str] = None, instance_dim: int = -2, output_size: Optional[int] = None, exog_size: Optional[int] = None, n_layers: int = 1, activation: str = 'relu', dropout: float = 0.0)[source]#

A multi-layer perceptron (MLP) (with optional linear readout) with different weights for each element in the specified dimension.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
output_size (int, optional) – Size of the optional readout.
exog_size (int, optional) – Size of the optional exogenous variables.
n_layers (int, optional) – Number of hidden layers. (default: 1)
activation (str, optional) – Activation function. (default: relu)

class ConditionalBlock(input_size, exog_size, output_size, dropout=0.0, skip_connection=False, activation='relu')[source]#

Simple layer to condition the input on a set of exogenous variables.

\[\text{CondBlock}(\mathbf{x}, \mathbf{u}) = \left(\text{MLP}_x(\mathbf{x})\right) + \left(\text{MLP}_u(\mathbf{u})\right)\]

Parameters:

size (input) – Input size.
exog_size (int) – Size of the covariates.
output_size (int) – Output size.
dropout (float, optional) – Dropout probability.
skip_connection (bool, optional) – Whether to add a parametrized residual connection. (default: False).
activation (str, optional) – Activation function.

class TemporalConvNet(input_channels, hidden_channels, kernel_size, dilation, stride=1, exog_channels=None, output_channels=None, n_layers=1, gated=False, dropout=0.0, activation='relu', exponential_dilation=False, weight_norm=False, causal_padding=True, bias=True, channel_last=True)[source]#

Simple TCN encoder with optional linear readout.

Parameters:

input_channels (int) – Input size.
hidden_channels (int) – Channels in the hidden layers.
kernel_size (int) – Size of the convolutional kernel.
dilation (int) – Dilation coefficient of the convolutional kernel.
stride (int, optional) – Stride of the convolutional kernel.
output_channels (int, optional) – Channels of the optional exogenous variables.
output_channels – Channels in the output layer.
n_layers (int, optional) – Number of hidden layers. (default: 1)
gated (bool, optional) – Whether to use the GatedTanH activation function. (default: False)
dropout (float, optional) – Dropout probability.
activation (str, optional) – Activation function. (default: 'relu')
exponential_dilation (bool, optional) – Whether to increase exponentially the dilation factor at each layer.
weight_norm (bool, optional) – Whether to apply weight normalization to the temporal convolutional filters.
causal_padding (bool, optional) – Whether to pad the input sequence to preserve causality.
bias (bool, optional) – Whether to add a learnable bias to the output.
channel_last (bool, optional) – If True input must have layout (b s n c), (b c n s) otherwise.

class SpatioTemporalConvNet(input_size, output_size, temporal_kernel_size, spatial_kernel_size, temporal_convs=2, spatial_convs=1, dilation=1, norm='none', dropout=0.0, gated=False, pad=True, activation='relu')[source]#

SpatioTemporalConvolutional encoder with optional linear readout.

Applies several temporal convolutions followed by diffusion convolution over a graph.

Parameters:

input_size (int) – Input size.
output_size (int) – Channels in the output representation.
temporal_kernel_size (int) – Size of the temporal convolutional kernel.
spatial_kernel_size (int) – Size of the spatial diffusion kernel.
temporal_convs (int, optional) – Number of temporal convolutions. (default: 2)
spatial_convs (int, optional) – Number of spatial convolutions. (default: 1)
dilation (int) – Dilation coefficient of the temporal convolutional kernel.
norm (str, optional) – Type of normalization applied to the hidden units.
dropout (float, optional) – Dropout probability.
gated (bool, optional) – Whether to used the GatedTanH activation function after temporal convolutions. (default: False)
pad (bool, optional) – Whether to pad the input sequence to preserve the sequence length.
activation (str, optional) – Activation function. (default: 'relu')

class ConditionalTCNBlock(input_size, exog_size, output_size, kernel_size, dilation=1, dropout=0.0, gated=False, activation='relu', weight_norm=False, channel_last=True, skip_connection=False)[source]#

Mirrors the architecture of tsl.nn.blocks.encoders.ConditionalBlock but using temporal convolutions instead of affine transformations.

Parameters:

input_size (int) – Size of the input.
exog_size (int) – Size of the exogenous variables.
output_size (int) – Size of the output.
kernel_size (int) – Size of the convolution kernel.
dilation (int) – Spacing between kernel elements.
dropout (float) – Dropout probability.
gated (bool) – Whether to use gated tanh activations.
activation (str, optional) – Activation function.
weight_norm (bool) – Whether to apply weight normalization to the parameters of the filter.
channel_last (bool) – If True input data must follow the b t n f layout, assumes b f n t otherwise.
skip_connection (bool) – If True adds a parametrized skip connection from the input to the output.

class MLPAttention(input_size: Union[int, Tuple[int, int]], output_size: int, msg_size: Optional[int] = None, msg_layers: int = 1, root_weight: bool = True, reweigh: Optional[str] = None, norm: bool = True, dropout: float = 0.0, dim: int = -2, **kwargs)[source]#

reset_parameters()[source]#: Resets all learnable parameters of the module.

forward(x: Tuple[Tensor, Tensor], edge_index: Union[Tensor, SparseTensor], mask: Optional[Tensor] = None)[source]#: Runs the forward pass of the module.

message(msg_j: Tensor, msg_i: Tensor, index, size_i, mask_j: Optional[Tensor] = None) → Tensor[source]#: Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

class TemporalMLPAttention(input_size: Union[int, Tuple[int, int]], output_size: int, msg_size: Optional[int] = None, msg_layers: int = 1, root_weight: bool = True, reweigh: Optional[str] = None, norm: bool = True, dropout: float = 0.0, **kwargs)[source]#

forward(x: Tuple[Tensor, Tensor], mask: Optional[Tensor] = None, temporal_mask: Optional[Tensor] = None, causal_lag: Optional[int] = None)[source]#: Runs the forward pass of the module.

class TransformerLayer(input_size, hidden_size, ff_size=None, n_heads=1, axis='time', causal=True, activation='elu', dropout=0.0)[source]#

A Transformer layer from the paper “Attention Is All You Need” (Vaswani et al., NeurIPS 2017).

This layer can be instantiated to attend the temporal or spatial dimension.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Dimension of the learned representations.
ff_size (int) – Units in the MLP after self attention.
n_heads (int, optional) – Number of parallel attention heads.
axis (str, optional) – Dimension on which to apply attention to update the representations. Can be either, ‘time’ or ‘nodes’. (default: 'time')
causal (bool, optional) – If True, then causally mask attention scores in temporal attention (has an effect only if axis is 'time'). (default: True)
activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.

class SpatioTemporalTransformerLayer(input_size, hidden_size, ff_size=None, n_heads=1, causal=True, activation='elu', dropout=0.0)[source]#

A TransformerLayer which attend both the spatial and temporal dimensions by stacking two MultiHeadAttention layers.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Dimension of the learned representations.
ff_size (int) – Units in the MLP after self attention.
n_heads (int, optional) – Number of parallel attention heads.
causal (bool, optional) – If True, then causally mask attention scores in temporal attention. (default: True)
activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.

class Transformer(input_size, hidden_size, ff_size=None, output_size=None, n_layers=1, n_heads=1, axis='time', causal=True, activation='elu', dropout=0.0)[source]#

A stack of Transformer layers.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Dimension of the learned representations.
ff_size (int) – Units in the MLP after self attention.
output_size (int, optional) – Size of an optional linear readout.
n_layers (int, optional) – Number of Transformer layers.
n_heads (int, optional) – Number of parallel attention heads.
axis (str, optional) – Dimension on which to apply attention to update the representations. Can be either, ‘time’, ‘nodes’, or ‘both’. (default: 'time')
causal (bool, optional) – If True, then causally mask attention scores in temporal attention (has an effect only if axis is 'time' or 'both'). (default: True)
activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.

class RNNBase(cells: Union[RNNCellBase, List[RNNCellBase], ModuleList], cat_states_layers: bool = False, return_only_last_state: bool = False)[source]#: Base class for implementing recurrent neural networks (RNNs).

class RNN(input_size: int, hidden_size: int, exog_size: Optional[int] = None, output_size: Optional[int] = None, n_layers: int = 1, return_only_last_state: bool = False, cell: str = 'gru', bias: bool = True, dropout: float = 0.0, **kwargs)[source]#

Simple RNN encoder with optional linear readout.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
exog_size (int, optional) – Size of the optional exogenous variables.
output_size (int, optional) – Size of the optional readout.
n_layers (int, optional) – Number of hidden layers. (default: 1)
cell (str, optional) – Type of cell that should be use (options: 'gru', 'lstm'). (default: 'gru')
dropout (float, optional) – Dropout probability. (default: 0.)

forward(x: Tensor, u: Optional[Tensor] = None)[source]#

Process the input sequence x with optional exogenous variables u.

Parameters:

x (Tensor) – Input data.
u (Tensor) – Exogenous data.

Shapes:

x – \((B, T, N, F_x)\) where \(B\) is the batch dimension, \(T\) is the number of time steps, \(N\) is the number of nodes, and \(F_x\) is the number of input features.
u – \((B, T, N, F_u)\) or \((B, T, F_u)\) where \(B\) is the batch dimension, \(T\) is the number of time steps, \(N\) is the number of nodes (optional), and \(F_u\) is the number of exogenous features.

class MultiRNN(input_size: int, hidden_size: int, n_instances: int, n_layers: int = 1, cat_states_layers: bool = False, return_only_last_state: bool = False, cell: str = 'gru', bias: bool = True, **kwargs)[source]#: A Recurrent Neural Network whose cells’ weights are not shared among the different instances.

class GraphConvRNN(input_size: int, hidden_size: int, n_layers: int = 1, cat_states_layers: bool = False, return_only_last_state: bool = False, cell: str = 'gru', bias: bool = True, norm: str = 'mean', root_weight: bool = True, activation: Optional[str] = None, cached: bool = False, **kwargs)[source]#

The Graph Convolutional Recurrent Network based on the paper “Structured Sequence Modeling with Graph Convolutional Recurrent Networks” (Seo et al., ICONIP 2017), using GraphConv as graph convolution.

Parameters:

input_size (int) – Size of the input.
hidden_size (int) – Number of units in the hidden state.
n_layers (int) – Number of hidden layers. (default: 1)
cat_states_layers (bool) – If True, then the states of each layer are concatenated along the feature dimension. (default: False)
return_only_last_state (bool) – If True, then the forward() method returns only the state at the end of the processing, instead of the full sequence of states. (default: False)
cell (str) – Type of graph recurrent cell that should be use (options: 'gru', 'lstm'). (default: 'gru')
bias (bool) – If False, then the layer will not learn an additive bias vector for each gate. (default: True)
norm (str) – The normalization used for edges and edge weights. If 'mean', then edge weights are normalized as \(a_{j \rightarrow i} = \frac{a_{j \rightarrow i}} {deg_{i}}\), other available options are: 'gcn', 'asym' and 'none'. (default: 'mean')
root_weight (bool) – If True, then add a filter (with different weights) for the root node itself. (default True)
activation (str, optional) – Activation function to be used, None for identity function (i.e., no activation). (default: None)
cached (bool) – If True, then cached the normalized edge weights computed in the first call. (default False)
**kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

class DCRNN(input_size: int, hidden_size: int, n_layers: int = 1, cat_states_layers: bool = False, return_only_last_state: bool = False, k: int = 2, root_weight: bool = True, add_backward: bool = True, bias: bool = True)[source]#

The Diffusion Convolutional Recurrent Neural Network from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).

Parameters:

input_size – Size of the input.
hidden_size – Number of units in the hidden state.
n_layers – Number of layers.
k – Size of the diffusion kernel.
root_weight – Whether to learn a separate transformation for the central node.

class DenseDCRNN(input_size: int, hidden_size: int, n_layers: int = 1, cat_states_layers: bool = False, return_only_last_state: bool = False, k: int = 2, root_weight: bool = False)[source]#

Dense implementation of the Diffusion Convolutional Recurrent Neural Network from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).

In this implementation, the adjacency matrix is dense and the convolution is performed with matrix multiplication.

Parameters:

input_size – Size of the input.
hidden_size – Number of units in the hidden state.
n_layers – Number of layers.
k – Size of the diffusion kernel.
root_weight – Whether to learn a separate transformation for the central node.

class AGCRN(input_size: int, emb_size: int, hidden_size: int, num_nodes: int, n_layers: int = 1, cat_states_layers: bool = False, return_only_last_state: bool = False, bias: bool = True)[source]#

The Adaptive Graph Convolutional Recurrent Network from the paper “Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting” (Bai et al., NeurIPS 2020).

Parameters:

input_size – Size of the input.
emb_size – Size of the input node embeddings.
hidden_size – Output size.
num_nodes – Number of nodes in the input graph.
n_layers – Number of recurrent layers.

class EvolveGCN(input_size, hidden_size, n_layers, norm, variant='H', root_weight=False, cached=False, activation='relu')[source]#

EvolveGCN encoder from the paper “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” (Pereja et al., AAAI 2020).

Parameters:

input_size (int) – Size of the input.
hidden_size (int) – Number of hidden units in each hidden layer.
n_layers (int) – Number of layers in the encoder.
asymmetric_norm (bool) – Whether to consider the input graph as directed.
variant (str) – Variant of EvolveGCN to use (options: ‘H’ or ‘O’)
root_weight (bool) – Whether to add a parametrized skip connection.
cached (bool) – Whether to cache normalized edge_weights.
activation (str) – Activation after each GCN layer.

Decoders#

`AttPool`	Pool representations along a dimension with learned softmax scores.
`GCNDecoder`	GCN decoder for multistep forecasting.
`LinearReadout`	Simple linear readout for multistep forecasting.
`MLPDecoder`	Simple MLP decoder for multistep forecasting.
`MultiHorizonMLPDecoder`	Decoder for multistep forecasting based on the paper "A Multi-Horizon Quantile Recurrent Forecaster" (Wen et al., 2018).

class AttPool(input_size: int, dim: int)[source]#

Pool representations along a dimension with learned softmax scores.

Parameters:

input_size (int) – Input size.
dim (int) – Dimension on which to apply the attention pooling.

class GCNDecoder(input_size: int, hidden_size: int, output_size: int, horizon: int = 1, n_layers: int = 1, activation: str = 'relu', dropout: float = 0.0)[source]#

GCN decoder for multistep forecasting.

Applies multiple graph convolutional layers followed by a feed-forward layer and a linear readout. If the input representation has a temporal dimension, this model will simply take as input the representation corresponding to the last step.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Hidden size.
output_size (int) – Output size.
horizon (int) – Number of time steps in the prediction horizon. (default: 1)
n_layers (int) – Number of layers in the decoder. (default: 1)
activation (str, optional) – Activation function to be used. (default: 'relu')
dropout (float, optional) – Dropout probability applied in the hidden layers. (default: 0)

class LinearReadout(input_size: int, output_size: int, horizon: int = 1, bias: bool = True)[source]#

Simple linear readout for multistep forecasting.

If the input representation has a temporal dimension, this model will simply take the representation corresponding to the last step.

Parameters:

input_size (int) – Input size.
output_size (int) – Output size.
horizon (int) – Number of steps to predict. (default: 1)
bias (bool) – Whether to add a learnable bias. (default: True)

class MLPDecoder(input_size: int, hidden_size: int, output_size: int, horizon: int = 1, n_layers: int = 1, receptive_field: int = 1, activation: str = 'relu', dropout: float = 0.0)[source]#

Simple MLP decoder for multistep forecasting.

If the input representation has a temporal dimension, this model will take the flattened representations corresponding to the last 'receptive_field' time steps.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Hidden size.
output_size (int) – Output size.
horizon (int) – Number of steps to predict. (default: 1)
n_layers (int) – Number of hidden layers in the decoder. (default: 1)
receptive_field (int) – Number of steps to consider for decoding. (default: 1)
activation (str, optional) – Activation function to be used. (default: 'relu')
dropout (float, optional) – Dropout probability applied in the hidden layers. (default: 0)

class MultiHorizonMLPDecoder(input_size, exog_size, hidden_size, context_size, output_size, n_layers, horizon, activation='relu', dropout=0.0)[source]#

Decoder for multistep forecasting based on the paper “A Multi-Horizon Quantile Recurrent Forecaster” (Wen et al., 2018).

It requires exogenous variables synchronized with the forecasting horizon.

Parameters:

input_size (int) – Size of the input.
exog_size (int) – Size of the horizon exogenous variables.
hidden_size (int) – Number of hidden units.
context_size (int) – Number of units used to condition the forecasting of each step.
output_size (int) – Output channels.
n_layers (int) – Number of hidden layers.
horizon (int) – Forecasting horizon.
activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.