Blocks#

Encoders#

`ConditionalBlock`	Simple layer to condition the input on a set of exogenous variables.
`ConditionalTCNBlock`	Mirrors the architecture of ConditionalBlock but using temporal convolutions instead of affine transformations.
`MLP`	Simple Multi-layer Perceptron encoder with optional linear readout.
`ResidualMLP`	Multi-layer Perceptron with residual connections.
`RNN`	Simple RNN encoder with optional linear readout.
`DCRNNCell`	Diffusion Convolutional Recurrent Cell.
`DenseDCRNNCell`	Diffusion Convolutional Recurrent Cell.
`GraphConvGRUCell`	Gate Recurrent Unit with GraphConv gates.
`GraphConvLSTMCell`	LSTM with GraphConv gates.
`AGCRNCell`	Adaptive Graph Convolutional Cell.
`EvolveGCNOCell`	EvolveGCNH model from Pereja et al., "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs", AAAI 2020.
`EvolveGCNHCell`	EvolveGCNH model from Pereja et al., "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs", AAAI 2020.
`DCRNN`	Diffusion Convolutional Recurrent Network, from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting".
`DenseDCRNN`	Diffusion Convolutional Recurrent Network.
`GraphConvGRU`	GraphConv GRU network.
`GraphConvLSTM`	GraphConv LSTM network.
`EvolveGCN`	EvolveGCN encoder form Pereja et al., "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs", AAAI 2020.
`AGCRN`	Adaptive Graph Convolutional Recurrent Network.
`TemporalConvNet`	Simple TCN encoder with optional linear readout.
`SpatioTemporalConvNet`	SpatioTemporalConvolutional encoder with optional linear readout.
`TransformerLayer`	A Transformer layer from the paper "Attention Is All You Need" (Vaswani et al., NeurIPS 2017).
`SpatioTemporalTransformerLayer`	A `TransformerLayer` which attend both the spatial and temporal dimensions by stacking two `MultiHeadAttention` layers.
`Transformer`	A stack of Transformer layers.

class ConditionalBlock(input_size, exog_size, output_size, dropout=0.0, skip_connection=False, activation='relu')[source]#

Simple layer to condition the input on a set of exogenous variables.

\[\text{CondBlock}(\mathbf{x}, \mathbf{u}) = \left(\text{MLP}_x(\mathbf{x})\right) + \left(\text{MLP}_u(\mathbf{u})\right)\]

Parameters:

size (input) – Input size.
exog_size (int) – Size of the covariates.
output_size (int) – Output size.
dropout (float, optional) – Dropout probability.
skip_connection (bool, optional) – Whether to add a parametrized residual connection. (default: False).
activation (str, optional) – Activation function.

forward(x, u=None)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class ConditionalTCNBlock(input_size, exog_size, output_size, kernel_size, dilation=1, dropout=0.0, gated=False, activation='relu', weight_norm=False, channel_last=True, skip_connection=False)[source]#

Mirrors the architecture of ConditionalBlock but using temporal convolutions instead of affine transformations.

Parameters:

input_size (int) – Size of the input.
exog_size (int) – Size of the exogenous variables.
output_size (int) – Size of the output.
kernel_size (int) – Size of the convolution kernel.
dilation (int, optional) – Spacing between kernel elements.
dropout (float, optional) – Dropout probability.
gated (bool, optional) – Whether to use gated tanh activations.
activation (str, optional) – Activation function.
weight_norm (bool, optional) – Whether to apply weight normalization to the parameters of the filter.
channel_last (bool, optional) – If True input data must follow the B S N C layout, assumes B C N S otherwise.
skip_connection (bool, optional) – If True adds a parametrized skip connection from the input to the output.

class MLP(input_size, hidden_size, output_size=None, exog_size=None, n_layers=1, activation='relu', dropout=0.0)[source]#

Simple Multi-layer Perceptron encoder with optional linear readout.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
output_size (int, optional) – Size of the optional readout.
exog_size (int, optional) – Size of the optional exogenous variables.
n_layers (int, optional) – Number of hidden layers. (default: 1)
activation (str, optional) – Activation function. (default: relu)
dropout (float, optional) – Dropout probability.

class ResidualMLP(input_size, hidden_size, output_size=None, exog_size=None, n_layers=1, activation='relu', dropout=0.0, parametrized_skip=False)[source]#

Multi-layer Perceptron with residual connections.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
output_size (int, optional) – Size of the optional readout.
exog_size (int, optional) – Size of the optional exogenous variables.
n_layers (int, optional) – Number of hidden layers. (default: 1)
activation (str, optional) – Activation function. (default: relu)
dropout (float, optional) – Dropout probability. (default: 0.)
parametrized_skip (bool, optional) – Whether to use parametrized skip connections for the residuals.

class RNN(input_size, hidden_size, exog_size=None, output_size=None, n_layers=1, dropout=0.0, cell='gru')[source]#

Simple RNN encoder with optional linear readout.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
exog_size (int, optional) – Size of the optional exogenous variables.
output_size (int, optional) – Size of the optional readout.
n_layers (int, optional) – Number of hidden layers. (default: 1)
cell (str, optional) – Type of cell that should be use (options: [gru, lstm]). (default: gru)
dropout (float, optional) – Dropout probability.

forward(x, u=None, return_last_state=False)[source]#

Parameters:

x (torch.Tensor) – Input tensor.
return_last_state – Whether to return only the state corresponding to the last time step.

class DCRNNCell(input_size, output_size, k=2, root_weight=True)[source]#

Diffusion Convolutional Recurrent Cell.

Parameters:

input_size – Size of the input.
output_size – Number of units in the hidden state.
k – Size of the diffusion kernel.
root_weight – Whether to learn a separate transformation for the central node.

class DenseDCRNNCell(input_size, output_size, k=2, root_weight=False)[source]#

Diffusion Convolutional Recurrent Cell.

Parameters:

input_size – Size of the input.
output_size – Number of units in the hidden state.
k – Size of the diffusion kernel.
root_weight (bool) – Whether to learn a separate transformation for the central node.

class GraphConvGRUCell(in_size, out_size, root_weight=True)[source]#

Gate Recurrent Unit with GraphConv gates. Loosely based on Seo et al., ”Structured Sequence Modeling with Graph Convolutional Recurrent Networks”, ICONIP 2017

Parameters:

input_size – Size of the input.
out_size – Number of units in the hidden state.
root_weight – Whether to learn a separate transformation for the central node.

class GraphConvLSTMCell(in_size, out_size, root_weight=True)[source]#

LSTM with GraphConv gates. Loosely based on Seo et al., ”Structured Sequence Modeling with Graph Convolutional Recurrent Networks”, ICONIP 2017

Parameters:

input_size – Size of the input.
out_size – Number of units in the hidden state.
root_weight – Whether to learn a separate transformation for the central node.

class AGCRNCell(in_size, emb_size, out_size, num_nodes)[source]#

Adaptive Graph Convolutional Cell. Based on Bai et al. “Adaptive Graph Convolutional Recurrent Network for Trafﬁc Forecasting”, NeurIPS 2020

Parameters:

in_size – Size of the input.
emb_size – Size of the input node embeddings.
out_size – Output size.
num_nodes – Number of nodes in the input graph.

class EvolveGCNOCell(in_size, out_size, asymmetric_norm, activation='relu', root_weight=False, bias=True, cached=False)[source]#

EvolveGCNH model from Pereja et al., “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs”, AAAI 2020. This variant of the model simply updates the weights of the graph convolution.

Parameters:

in_size (int) – Size of the input.
out_size (int) – Number of units in the hidden state.
asymmetric_norm (bool) – Whether to consider the graph as directed when normalizaing weights.
activation (str) – Activation function after the GCN layer.
root_weight (bool) – Whether to add a parametrized skip connection.
bias (bool) – Whether to learn a bias.
cached (bool) – Whether to cache normalized edge_weights.

reset_parameters()[source]#: Resets all learnable parameters of the module.

class EvolveGCNHCell(in_size, out_size, asymmetric_norm, activation='relu', root_weight=False, bias=True, cached=False)[source]#

EvolveGCNH model from Pereja et al., “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs”, AAAI 2020. This variant of the model adapts the weights of the graph convolution by looking at node features.

Parameters:

in_size (int) – Size of the input.
out_size (int) – Number of units in the hidden state.
asymmetric_norm (bool) – Whether to consider the graph as directed when normalizaing weights.
activation (str) – Activation function after the GCN layer.
root_weight (bool) – Whether to add a parametrized skip connection.
bias (bool) – Whether to learn a bias.
cached (bool) – Whether to cache normalized edge_weights.

reset_parameters()[source]#: Resets all learnable parameters of the module.

message(x_j: Tensor, edge_weight) → Tensor[source]#: Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

class DCRNN(input_size, hidden_size, n_layers=1, k=2, root_weight=True)[source]#

Diffusion Convolutional Recurrent Network, from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting”.

Parameters:

input_size – Size of the input.
hidden_size – Number of units in the hidden state.
n_layers – Number of layers.
k – Size of the diffusion kernel.
root_weight – Whether to learn a separate transformation for the central node.

class DenseDCRNN(input_size, hidden_size, n_layers=1, k=2, root_weight=False)[source]#

Diffusion Convolutional Recurrent Network.

From Li et al., ”Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting”, ICLR 2018

Parameters:

input_size – Size of the input.
hidden_size – Number of units in the hidden state.
n_layers – Number of layers.
k – Size of the diffusion kernel.
root_weight – Whether to learn a separate transformation for the central node.

forward(x, adj, h=None)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

class GraphConvGRU(input_size, hidden_size, n_layers=1, root_weight=True)[source]#

GraphConv GRU network.

Loosely based on Seo et al., ”Structured Sequence Modeling with Graph Convolutional Recurrent Networks”, ICONIP 2017

Parameters:

input_size (int) – Size of the input.
hidden_size (int) – Number of units in the hidden state.
n_layers (int, optional) – Number of hidden layers.
root_weight (bool, optional) – Whether to learn a separate transformation for the central node.

class GraphConvLSTM(input_size, hidden_size, n_layers=1, root_weight=True)[source]#

GraphConv LSTM network.

Loosely based on Seo et al., ”Structured Sequence Modeling with Graph Convolutional Recurrent Networks”, ICONIP 2017

Parameters:

input_size (int) – Size of the input.
hidden_size (int) – Number of units in the hidden state.
n_layers (int, optional) – Number of hidden layers.
root_weight (bool, optional) – Whether to learn a separate transformation for the central node.

class EvolveGCN(input_size, hidden_size, n_layers, asymmetric_norm, variant='H', root_weight=False, cached=False, activation='relu')[source]#

EvolveGCN encoder form Pereja et al., “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs”, AAAI 2020.

Parameters:

input_size (int) – Size of the input.
hidden_size (int) – Number of hidden units in each hidden layer.
n_layers (int) – Number of layers in the encoder.
asymmetric_norm (bool) – Whether to consider the input graph as directed.
variant (str) – Variant of EvolveGCN to use (options: ‘H’ or ‘O’)
root_weight (bool) – Whether to add a parametrized skip connection.
cached (bool) – Whether to cache normalized edge_weights.
activation (str) – Activation after each GCN layer.

forward(x, edge_index, edge_weight=None)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

class AGCRN(input_size, emb_size, hidden_size, num_nodes, n_layers=1)[source]#

Adaptive Graph Convolutional Recurrent Network. Based on Bai et al. “Adaptive Graph Convolutional Recurrent Network for Trafﬁc Forecasting”, NeurIPS 2020

Parameters:

input_size – Size of the input.
emb_size – Size of the input node embeddings.
hidden_size – Output size.
num_nodes – Number of nodes in the input graph.
n_layers – Number of recurrent layers.

forward(x, *args, h=None, **kwargs)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

class TemporalConvNet(input_channels, hidden_channels, kernel_size, dilation, stride=1, exog_channels=None, output_channels=None, n_layers=1, gated=False, dropout=0.0, activation='relu', exponential_dilation=False, weight_norm=False, causal_padding=True, bias=True, channel_last=True)[source]#

Simple TCN encoder with optional linear readout.

Parameters:

input_channels (int) – Input size.
hidden_channels (int) – Channels in the hidden layers.
kernel_size (int) – Size of the convolutional kernel.
dilation (int) – Dilation coefficient of the convolutional kernel.
stride (int, optional) – Stride of the convolutional kernel.
output_channels (int, optional) – Channels of the optional exogenous variables.
output_channels – Channels in the output layer.
n_layers (int, optional) – Number of hidden layers. (default: 1)
gated (bool, optional) – Whether to used the GatedTanH activation function. (default: False)
dropout (float, optional) – Dropout probability.
activation (str, optional) – Activation function. (default: relu)
exponential_dilation (bool, optional) – Whether to increase exponentially the dilation factor at each layer.
weight_norm (bool, optional) – Whether to apply weight normalization to the temporal convolutional filters.
causal_padding (bool, optional) – Whether to pad the input sequence to preserve causality.
bias (bool, optional) – Whether to add a learnable bias to the output.
channel_last (bool, optional) – If True input must have layout (b s n c), (b c n s) otherwise.

class SpatioTemporalConvNet(input_size, output_size, temporal_kernel_size, spatial_kernel_size, temporal_convs=2, spatial_convs=1, dilation=1, norm='none', dropout=0.0, gated=False, pad=True, activation='relu')[source]#

SpatioTemporalConvolutional encoder with optional linear readout. Applies several temporal convolutions followed by diffusion convolution over a graph.

Parameters:

input_size (int) – Input size.
output_size (int) – Channels in the output representation.
temporal_kernel_size (int) – Size of the temporal convolutional kernel.
spatial_kernel_size (int) – Size of the spatial diffusion kernel.
temporal_convs (int, optional) – Number of temporal convolutions. (default: 2)
spatial_convs (int, optional) – Number of spatial convolutions. (default: 1)
dilation (int) – Dilation coefficient of the temporal convolutional kernel.
norm (str, optional) – Type of normalization applied to the hidden units.
dropout (float, optional) – Dropout probability.
gated (bool, optional) – Whether to used the GatedTanH activation function after temporal convolutions. (default: False)
pad (bool, optional) – Whether to pad the input sequence to preserve the sequence length.
activation (str, optional) – Activation function. (default: relu)

class TransformerLayer(input_size, hidden_size, ff_size=None, n_heads=1, axis='time', causal=True, activation='elu', dropout=0.0)[source]#

A Transformer layer from the paper “Attention Is All You Need” (Vaswani et al., NeurIPS 2017).

This layer can be instantiated to attend the temporal or spatial dimension.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Dimension of the learned representations.
ff_size (int) – Units in the MLP after self attention.
n_heads (int, optional) – Number of parallel attention heads.
axis (str, optional) – Dimension on which to apply attention to update the representations. Can be either, ‘time’ or ‘nodes’. (default: 'time')
causal (bool, optional) – If True, then causally mask attention scores in temporal attention (has an effect only if axis is 'time'). (default: True)
activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.

class SpatioTemporalTransformerLayer(input_size, hidden_size, ff_size=None, n_heads=1, causal=True, activation='elu', dropout=0.0)[source]#

A TransformerLayer which attend both the spatial and temporal dimensions by stacking two MultiHeadAttention layers.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Dimension of the learned representations.
ff_size (int) – Units in the MLP after self attention.
n_heads (int, optional) – Number of parallel attention heads.
causal (bool, optional) – If True, then causally mask attention scores in temporal attention. (default: True)
activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.

class Transformer(input_size, hidden_size, ff_size=None, output_size=None, n_layers=1, n_heads=1, axis='time', causal=True, activation='elu', dropout=0.0)[source]#

A stack of Transformer layers.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Dimension of the learned representations.
ff_size (int) – Units in the MLP after self attention.
output_size (int, optional) – Size of an optional linear readout.
n_layers (int, optional) – Number of Transformer layers.
n_heads (int, optional) – Number of parallel attention heads.
axis (str, optional) – Dimension on which to apply attention to update the representations. Can be either, ‘time’, ‘nodes’, or ‘both’. (default: 'time')
causal (bool, optional) – If True, then causally mask attention scores in temporal attention (has an effect only if axis is 'time' or 'both'). (default: True)
activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.

Decoders#

`AttPool`	Pool representations along a dimension with learned softmax scores.
`GCNDecoder`	GCN decoder for multistep forecasting.
`LinearReadout`	Simple linear readout for multistep forecasting.
`MLPDecoder`	Simple MLP decoder for multistep forecasting.
`MultiHorizonMLPDecoder`	Decoder for multistep forecasting based on

class AttPool(input_size, dim)[source]#

Pool representations along a dimension with learned softmax scores.

Parameters:

input_size (int) – Input size.
dim (int) – Dimension on which to apply the attention pooling.

forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

class GCNDecoder(input_size, hidden_size, output_size, horizon=1, n_layers=1, activation='relu', dropout=0.0)[source]#

GCN decoder for multistep forecasting. Applies multiple graph convolutional layers followed by a feed-forward layer amd a linear readout.

If the input representation has a temporal dimension, this model will simply take as input the representation corresponding to the last step.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Hidden size.
output_size (int) – Output size.
horizon (int) – Output steps.
n_layers (int, optional) – Number of layers in the decoder. (default: 1)
activation (str, optional) – Activation function to use.
dropout (float, optional) – Dropout probability applied in the hidden layers.

class LinearReadout(input_size, output_size, horizon=1, bias=True)[source]#

Simple linear readout for multistep forecasting.

If the input representation has a temporal dimension, this model will simply take the representation corresponding to the last step.

Parameters:

input_size (int) – Input size.
output_size (int) – Output size.
horizon (int) – Number of steps predict.
bias (bool) – Whether to add a learnable bias.

forward(h)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

class MLPDecoder(input_size, hidden_size, output_size, horizon=1, n_layers=1, receptive_field=1, activation='relu', dropout=0.0)[source]#

Simple MLP decoder for multistep forecasting.

If the input representation has a temporal dimension, this model will take the flatten representations corresponding to the last receptive_field time steps.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Hidden size.
output_size (int) – Output size.
horizon (int) – Output steps.
n_layers (int, optional) – Number of layers in the decoder. (default: 1)
receptive_field (int, optional) – Number of steps to consider for decoding. (default: 1)
activation (str, optional) – Activation function to use.
dropout (float, optional) – Dropout probability applied in the hidden layers.

forward(h)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

class MultiHorizonMLPDecoder(input_size, exog_size, hidden_size, context_size, output_size, n_layers, horizon, activation='relu', dropout=0.0)[source]#

Decoder for multistep forecasting based on

Wen et al., “A Multi-Horizon Quantile Recurrent Forecaster”, 2018.

It requires exogenous variables synched with the forecasting horizon.

Parameters:

input_size (int) – Size of the input.
exog_size (int) – Size of the horizon exogenous variables.
hidden_size (int) – Number of hidden units.
context_size (int) – Number of units used to condition the forecasting of each step.
output_size (int) – Output channels.
n_layers (int) – Number of hidden layers.
horizon (int) – Forecasting horizon.
activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.