Blocks#
Encoders#
Simple Multi-layer Perceptron encoder with optional linear readout. |
|
Multi-layer Perceptron with residual connections. |
|
A multi-layer perceptron (MLP) (with optional linear readout) with different weights for each element in the specified dimension. |
|
Simple layer to condition the input on a set of exogenous variables. |
|
Simple TCN encoder with optional linear readout. |
|
SpatioTemporalConvolutional encoder with optional linear readout. |
|
Mirrors the architecture of |
|
A Transformer layer from the paper "Attention Is All You Need" (Vaswani et al., NeurIPS 2017). |
|
A |
|
A stack of Transformer layers. |
Recurrent Encoders#
Base class for implementing recurrent neural networks (RNNs). |
|
Simple RNN encoder with optional linear readout. |
|
A Recurrent Neural Network whose cells' weights are not shared among the different instances. |
|
The Graph Convolutional Recurrent Network based on the paper "Structured Sequence Modeling with Graph Convolutional Recurrent Networks" (Seo et al., ICONIP 2017), using |
|
The Diffusion Convolutional Recurrent Neural Network from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018). |
|
Dense implementation of the Diffusion Convolutional Recurrent Neural Network from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018). |
|
The Adaptive Graph Convolutional Recurrent Network from the paper "Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting" (Bai et al., NeurIPS 2020). |
|
EvolveGCN encoder from the paper "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs" (Pereja et al., AAAI 2020). |
- class MLP(input_size, hidden_size, output_size=None, exog_size=None, n_layers=1, activation='relu', dropout=0.0)[source]#
Simple Multi-layer Perceptron encoder with optional linear readout.
- Parameters:
input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
output_size (int, optional) – Size of the optional readout.
exog_size (int, optional) – Size of the optional exogenous variables.
n_layers (int, optional) – Number of hidden layers. (default: 1)
activation (str, optional) – Activation function. (default: relu)
dropout (float, optional) – Dropout probability.
- class ResidualMLP(input_size, hidden_size, output_size=None, exog_size=None, n_layers=1, activation='relu', dropout=0.0, parametrized_skip=False)[source]#
Multi-layer Perceptron with residual connections.
- Parameters:
input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
output_size (int, optional) – Size of the optional readout.
exog_size (int, optional) – Size of the optional exogenous variables.
n_layers (int, optional) – Number of hidden layers. (default: 1)
activation (str, optional) – Activation function. (default: relu)
dropout (float, optional) – Dropout probability. (default: 0.)
parametrized_skip (bool, optional) – Whether to use parametrized skip connections for the residuals.
- class MultiMLP(input_size: int, hidden_size: int, n_instances: int, *, ndim: Optional[int] = None, pattern: Optional[str] = None, instance_dim: int = -2, output_size: Optional[int] = None, exog_size: Optional[int] = None, n_layers: int = 1, activation: str = 'relu', dropout: float = 0.0)[source]#
A multi-layer perceptron (MLP) (with optional linear readout) with different weights for each element in the specified dimension.
- Parameters:
input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
output_size (int, optional) – Size of the optional readout.
exog_size (int, optional) – Size of the optional exogenous variables.
n_layers (int, optional) – Number of hidden layers. (default: 1)
activation (str, optional) – Activation function. (default: relu)
- class ConditionalBlock(input_size, exog_size, output_size, dropout=0.0, skip_connection=False, activation='relu')[source]#
Simple layer to condition the input on a set of exogenous variables.
\[\text{CondBlock}(\mathbf{x}, \mathbf{u}) = \left(\text{MLP}_x(\mathbf{x})\right) + \left(\text{MLP}_u(\mathbf{u})\right)\]- Parameters:
size (input) – Input size.
exog_size (int) – Size of the covariates.
output_size (int) – Output size.
dropout (float, optional) – Dropout probability.
skip_connection (bool, optional) – Whether to add a parametrized residual connection. (default: False).
activation (str, optional) – Activation function.
- class TemporalConvNet(input_channels, hidden_channels, kernel_size, dilation, stride=1, exog_channels=None, output_channels=None, n_layers=1, gated=False, dropout=0.0, activation='relu', exponential_dilation=False, weight_norm=False, causal_padding=True, bias=True, channel_last=True)[source]#
Simple TCN encoder with optional linear readout.
- Parameters:
input_channels (int) – Input size.
hidden_channels (int) – Channels in the hidden layers.
kernel_size (int) – Size of the convolutional kernel.
dilation (int) – Dilation coefficient of the convolutional kernel.
stride (int, optional) – Stride of the convolutional kernel.
output_channels (int, optional) – Channels of the optional exogenous variables.
output_channels – Channels in the output layer.
n_layers (int, optional) – Number of hidden layers. (default: 1)
gated (bool, optional) – Whether to use the GatedTanH activation function. (default:
False
)dropout (float, optional) – Dropout probability.
activation (str, optional) – Activation function. (default:
'relu'
)exponential_dilation (bool, optional) – Whether to increase exponentially the dilation factor at each layer.
weight_norm (bool, optional) – Whether to apply weight normalization to the temporal convolutional filters.
causal_padding (bool, optional) – Whether to pad the input sequence to preserve causality.
bias (bool, optional) – Whether to add a learnable bias to the output.
channel_last (bool, optional) – If
True
input must have layout (b s n c), (b c n s) otherwise.
- class SpatioTemporalConvNet(input_size, output_size, temporal_kernel_size, spatial_kernel_size, temporal_convs=2, spatial_convs=1, dilation=1, norm='none', dropout=0.0, gated=False, pad=True, activation='relu')[source]#
SpatioTemporalConvolutional encoder with optional linear readout.
Applies several temporal convolutions followed by diffusion convolution over a graph.
- Parameters:
input_size (int) – Input size.
output_size (int) – Channels in the output representation.
temporal_kernel_size (int) – Size of the temporal convolutional kernel.
spatial_kernel_size (int) – Size of the spatial diffusion kernel.
temporal_convs (int, optional) – Number of temporal convolutions. (default:
2
)spatial_convs (int, optional) – Number of spatial convolutions. (default:
1
)dilation (int) – Dilation coefficient of the temporal convolutional kernel.
norm (str, optional) – Type of normalization applied to the hidden units.
dropout (float, optional) – Dropout probability.
gated (bool, optional) – Whether to used the GatedTanH activation function after temporal convolutions. (default:
False
)pad (bool, optional) – Whether to pad the input sequence to preserve the sequence length.
activation (str, optional) – Activation function. (default:
'relu'
)
- class ConditionalTCNBlock(input_size, exog_size, output_size, kernel_size, dilation=1, dropout=0.0, gated=False, activation='relu', weight_norm=False, channel_last=True, skip_connection=False)[source]#
Mirrors the architecture of
tsl.nn.blocks.encoders.ConditionalBlock
but using temporal convolutions instead of affine transformations.- Parameters:
input_size (int) – Size of the input.
exog_size (int) – Size of the exogenous variables.
output_size (int) – Size of the output.
kernel_size (int) – Size of the convolution kernel.
dilation (int) – Spacing between kernel elements.
dropout (float) – Dropout probability.
gated (bool) – Whether to use gated tanh activations.
activation (str, optional) – Activation function.
weight_norm (bool) – Whether to apply weight normalization to the parameters of the filter.
channel_last (bool) – If
True
input data must follow the b t n f layout, assumes b f n t otherwise.skip_connection (bool) – If
True
adds a parametrized skip connection from the input to the output.
- class TransformerLayer(input_size, hidden_size, ff_size=None, n_heads=1, axis='time', causal=True, activation='elu', dropout=0.0)[source]#
A Transformer layer from the paper “Attention Is All You Need” (Vaswani et al., NeurIPS 2017).
This layer can be instantiated to attend the temporal or spatial dimension.
- Parameters:
input_size (int) – Input size.
hidden_size (int) – Dimension of the learned representations.
ff_size (int) – Units in the MLP after self attention.
n_heads (int, optional) – Number of parallel attention heads.
axis (str, optional) – Dimension on which to apply attention to update the representations. Can be either, ‘time’ or ‘nodes’. (default:
'time'
)causal (bool, optional) – If
True
, then causally mask attention scores in temporal attention (has an effect only ifaxis
is'time'
). (default:True
)activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.
- class SpatioTemporalTransformerLayer(input_size, hidden_size, ff_size=None, n_heads=1, causal=True, activation='elu', dropout=0.0)[source]#
A
TransformerLayer
which attend both the spatial and temporal dimensions by stacking twoMultiHeadAttention
layers.- Parameters:
input_size (int) – Input size.
hidden_size (int) – Dimension of the learned representations.
ff_size (int) – Units in the MLP after self attention.
n_heads (int, optional) – Number of parallel attention heads.
causal (bool, optional) – If
True
, then causally mask attention scores in temporal attention. (default:True
)activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.
- class Transformer(input_size, hidden_size, ff_size=None, output_size=None, n_layers=1, n_heads=1, axis='time', causal=True, activation='elu', dropout=0.0)[source]#
A stack of Transformer layers.
- Parameters:
input_size (int) – Input size.
hidden_size (int) – Dimension of the learned representations.
ff_size (int) – Units in the MLP after self attention.
output_size (int, optional) – Size of an optional linear readout.
n_layers (int, optional) – Number of Transformer layers.
n_heads (int, optional) – Number of parallel attention heads.
axis (str, optional) – Dimension on which to apply attention to update the representations. Can be either, ‘time’, ‘nodes’, or ‘both’. (default:
'time'
)causal (bool, optional) – If
True
, then causally mask attention scores in temporal attention (has an effect only ifaxis
is'time'
or'both'
). (default:True
)activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.
- class RNNBase(cells: Union[RNNCellBase, List[RNNCellBase], ModuleList], cat_states_layers: bool = False, return_only_last_state: bool = False)[source]#
Base class for implementing recurrent neural networks (RNNs).
- class RNN(input_size: int, hidden_size: int, exog_size: Optional[int] = None, output_size: Optional[int] = None, n_layers: int = 1, return_only_last_state: bool = False, cell: str = 'gru', bias: bool = True, dropout: float = 0.0, **kwargs)[source]#
Simple RNN encoder with optional linear readout.
- Parameters:
input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
exog_size (int, optional) – Size of the optional exogenous variables.
output_size (int, optional) – Size of the optional readout.
n_layers (int, optional) – Number of hidden layers. (default:
1
)cell (str, optional) – Type of cell that should be use (options:
'gru'
,'lstm'
). (default:'gru'
)dropout (float, optional) – Dropout probability. (default:
0.
)
- forward(x: Tensor, u: Optional[Tensor] = None)[source]#
Process the input sequence
x
with optional exogenous variablesu
.- Parameters:
x (Tensor) – Input data.
u (Tensor) – Exogenous data.
- Shapes:
x – \((B, T, N, F_x)\) where \(B\) is the batch dimension, \(T\) is the number of time steps, \(N\) is the number of nodes, and \(F_x\) is the number of input features.
u – \((B, T, N, F_u)\) or \((B, T, F_u)\) where \(B\) is the batch dimension, \(T\) is the number of time steps, \(N\) is the number of nodes (optional), and \(F_u\) is the number of exogenous features.
- class MultiRNN(input_size: int, hidden_size: int, n_instances: int, n_layers: int = 1, cat_states_layers: bool = False, return_only_last_state: bool = False, cell: str = 'gru', bias: bool = True, **kwargs)[source]#
A Recurrent Neural Network whose cells’ weights are not shared among the different instances.
- class GraphConvRNN(input_size: int, hidden_size: int, n_layers: int = 1, cat_states_layers: bool = False, return_only_last_state: bool = False, cell: str = 'gru', bias: bool = True, asymmetric_norm: bool = True, root_weight: bool = True, activation: Optional[str] = None, cached: bool = False, **kwargs)[source]#
The Graph Convolutional Recurrent Network based on the paper “Structured Sequence Modeling with Graph Convolutional Recurrent Networks” (Seo et al., ICONIP 2017), using
GraphConv
as graph convolution.- Parameters:
input_size (int) – Size of the input.
hidden_size (int) – Number of units in the hidden state.
n_layers (int) – Number of hidden layers. (default:
1
)cat_states_layers (bool) – If
True
, then the states of each layer are concatenated along the feature dimension. (default:False
)return_only_last_state (bool) – If
True
, then theforward()
method returns only the state at the end of the processing, instead of the full sequence of states. (default:False
)cell (str) – Type of graph recurrent cell that should be use (options:
'gru'
,'lstm'
). (default:'gru'
)bias (bool) – If
False
, then the layer will not learn an additive bias vector for each gate. (default:True
)asymmetric_norm (bool) – If
True
, then normalize the edge weights as \(a_{j \rightarrow i} = \frac{a_{j \rightarrow i}} {deg_{i}}\), otherwise apply the GCN normalization. (default:True
)root_weight (bool) – If
True
, then add a filter (with different weights) for the root node itself. (defaultTrue
)activation (str, optional) – Activation function to be used,
None
for identity function (i.e., no activation). (default:None
)cached (bool) – If
True
, then cached the normalized edge weights computed in the first call. (defaultFalse
)**kwargs (optional) – Additional arguments of
torch_geometric.nn.conv.MessagePassing
.
- class DCRNN(input_size: int, hidden_size: int, n_layers: int = 1, cat_states_layers: bool = False, return_only_last_state: bool = False, k: int = 2, root_weight: bool = True, add_backward: bool = True, bias: bool = True)[source]#
The Diffusion Convolutional Recurrent Neural Network from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).
- Parameters:
input_size – Size of the input.
hidden_size – Number of units in the hidden state.
n_layers – Number of layers.
k – Size of the diffusion kernel.
root_weight – Whether to learn a separate transformation for the central node.
- class DenseDCRNN(input_size: int, hidden_size: int, n_layers: int = 1, cat_states_layers: bool = False, return_only_last_state: bool = False, k: int = 2, root_weight: bool = False)[source]#
Dense implementation of the Diffusion Convolutional Recurrent Neural Network from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).
In this implementation, the adjacency matrix is dense and the convolution is performed with matrix multiplication.
- Parameters:
input_size – Size of the input.
hidden_size – Number of units in the hidden state.
n_layers – Number of layers.
k – Size of the diffusion kernel.
root_weight – Whether to learn a separate transformation for the central node.
- class AGCRN(input_size: int, emb_size: int, hidden_size: int, num_nodes: int, n_layers: int = 1, cat_states_layers: bool = False, return_only_last_state: bool = False, bias: bool = True)[source]#
The Adaptive Graph Convolutional Recurrent Network from the paper “Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting” (Bai et al., NeurIPS 2020).
- Parameters:
input_size – Size of the input.
emb_size – Size of the input node embeddings.
hidden_size – Output size.
num_nodes – Number of nodes in the input graph.
n_layers – Number of recurrent layers.
- class EvolveGCN(input_size, hidden_size, n_layers, norm, variant='H', root_weight=False, cached=False, activation='relu')[source]#
EvolveGCN encoder from the paper “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” (Pereja et al., AAAI 2020).
- Parameters:
input_size (int) – Size of the input.
hidden_size (int) – Number of hidden units in each hidden layer.
n_layers (int) – Number of layers in the encoder.
asymmetric_norm (bool) – Whether to consider the input graph as directed.
variant (str) – Variant of EvolveGCN to use (options: ‘H’ or ‘O’)
root_weight (bool) – Whether to add a parametrized skip connection.
cached (bool) – Whether to cache normalized edge_weights.
activation (str) – Activation after each GCN layer.
Decoders#
Pool representations along a dimension with learned softmax scores. |
|
GCN decoder for multistep forecasting. |
|
Simple linear readout for multistep forecasting. |
|
Simple MLP decoder for multistep forecasting. |
|
Decoder for multistep forecasting based on the paper "A Multi-Horizon Quantile Recurrent Forecaster" (Wen et al., 2018). |
- class AttPool(input_size: int, dim: int)[source]#
Pool representations along a dimension with learned softmax scores.
- class GCNDecoder(input_size: int, hidden_size: int, output_size: int, horizon: int = 1, n_layers: int = 1, activation: str = 'relu', dropout: float = 0.0)[source]#
GCN decoder for multistep forecasting.
Applies multiple graph convolutional layers followed by a feed-forward layer and a linear readout. If the input representation has a temporal dimension, this model will simply take as input the representation corresponding to the last step.
- Parameters:
input_size (int) – Input size.
hidden_size (int) – Hidden size.
output_size (int) – Output size.
horizon (int) – Number of time steps in the prediction horizon. (default:
1
)n_layers (int) – Number of layers in the decoder. (default:
1
)activation (str, optional) – Activation function to be used. (default:
'relu'
)dropout (float, optional) – Dropout probability applied in the hidden layers. (default:
0
)
- class LinearReadout(input_size: int, output_size: int, horizon: int = 1, bias: bool = True)[source]#
Simple linear readout for multistep forecasting.
If the input representation has a temporal dimension, this model will simply take the representation corresponding to the last step.
- class MLPDecoder(input_size: int, hidden_size: int, output_size: int, horizon: int = 1, n_layers: int = 1, receptive_field: int = 1, activation: str = 'relu', dropout: float = 0.0)[source]#
Simple MLP decoder for multistep forecasting.
If the input representation has a temporal dimension, this model will take the flattened representations corresponding to the last
'receptive_field'
time steps.- Parameters:
input_size (int) – Input size.
hidden_size (int) – Hidden size.
output_size (int) – Output size.
horizon (int) – Number of steps to predict. (default:
1
)n_layers (int) – Number of hidden layers in the decoder. (default:
1
)receptive_field (int) – Number of steps to consider for decoding. (default:
1
)activation (str, optional) – Activation function to be used. (default:
'relu'
)dropout (float, optional) – Dropout probability applied in the hidden layers. (default:
0
)
- class MultiHorizonMLPDecoder(input_size, exog_size, hidden_size, context_size, output_size, n_layers, horizon, activation='relu', dropout=0.0)[source]#
Decoder for multistep forecasting based on the paper “A Multi-Horizon Quantile Recurrent Forecaster” (Wen et al., 2018).
It requires exogenous variables synchronized with the forecasting horizon.
- Parameters:
input_size (int) – Size of the input.
exog_size (int) – Size of the horizon exogenous variables.
hidden_size (int) – Number of hidden units.
context_size (int) – Number of units used to condition the forecasting of each step.
output_size (int) – Output channels.
n_layers (int) – Number of hidden layers.
horizon (int) – Forecasting horizon.
activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.