Layers#
This module contains all the neural layers available in tsl.
Graph Convolutional Layers#
The subpackage tsl.nn.layers.graph_convs
contains the graph convolutional
layers.
A simple graph convolutional operator where the message function is a simple linear projection and aggregation a simple average. |
|
A dense graph convolution performing \(\mathbf{X}^{\prime} = \mathbf{\tilde{A}} \mathbf{X} \boldsymbol{\Theta} + \mathbf{b}\). |
|
Dense implementation of the spatial diffusion convolution of order \(K\). |
|
The Diffusion Convolution Layer from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018). |
|
Alias for |
|
Polynomial spatiotemporal graph filter from the paper "Forecasting time series with VARMA recursions on graphs." (Isufi et al., IEEE Transactions on Signal Processing 2019). |
|
The multi-head attention from the paper Attention Is All You Need (Vaswani et al., NeurIPS 2017) for graph-structured data. |
|
Extension of |
|
Gate Graph Neural Network layer (with residual connections) inspired by the FC-GNN model from the paper "Multivariate Time Series Forecasting with Latent Graph Inference" (Satorras et al., 2022). |
|
The Dense Adaptive Graph Convolution operator from the paper "Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting" (Bai et al., NeurIPS 2020). |
- class GraphConv(input_size: int, output_size: int, bias: bool = True, norm: str = 'mean', root_weight: bool = True, activation: Optional[str] = None, cached: bool = False)[source]#
A simple graph convolutional operator where the message function is a simple linear projection and aggregation a simple average. In other terms:
\[\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1} \mathbf{\tilde{A}} \mathbf{X} \boldsymbol{\Theta} + \mathbf{b} .\]- Parameters:
input_size (int) – Size of the input features.
output_size (int) – Size of the output features.
bias (bool) – If
False
, then the layer will not learn an additive bias vector. (default:True
)norm (str) – The normalization used for edges and edge weights. If
'mean'
, then edge weights are normalized as \(a_{j \rightarrow i} = \frac{a_{j \rightarrow i}} {deg_{i}}\), other available options are:'gcn'
,'asym'
and'none'
. (default:'mean'
)root_weight (bool) – If
True
, then add a linear layer for the root node itself (a skip connection). (defaultTrue
)activation (str, optional) – Activation function to be used,
None
for identity function (i.e., no activation). (default:None
)cached (bool) – If
True
, then cached the normalized edge weights computed in the first call. (defaultFalse
)
- class DenseGraphConv(input_size, output_size, bias=True)[source]#
A dense graph convolution performing \(\mathbf{X}^{\prime} = \mathbf{\tilde{A}} \mathbf{X} \boldsymbol{\Theta} + \mathbf{b}\).
- Parameters:
input_size – Size of the input.
output_size – Output size.
bias – Whether to add a learnable bias.
- class DenseGraphConvOrderK(input_size, output_size, support_len=3, order=2, include_self=True, channel_last=False)[source]#
Dense implementation of the spatial diffusion convolution of order \(K\).
- Parameters:
input_size (int) – Size of the input.
output_size (int) – Size of the output.
support_len (int) – Number of reference operators.
order (int) – Order of the diffusion process.
include_self (bool) – Whether to include the central node or not.
channel_last (bool, optional) – Whether to use the pattern “b t n f” as opposed to “b f n t”.
- class DiffConv(in_channels: int, out_channels: int, k: int, root_weight: bool = True, add_backward: bool = True, bias: bool = True, activation: Optional[str] = None)[source]#
The Diffusion Convolution Layer from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).
- Parameters:
in_channels (int) – Number of input features.
out_channels (int) – Number of output features.
k (int) – Filter size \(K\).
root_weight (bool) – If
True
, then add a filter also for the \(0\)-order neighborhood (i.e., the root node itself). (defaultTrue
)add_backward (bool) – If
True
, then additional \(K\) filters are learnt for the transposed connectivity. (defaultTrue
)bias (bool, optional) – If
True
, add a trainable additive bias. (default:True
)activation (str, optional) – Activation function to be used,
None
for identity function (i.e., no activation). (default:None
)
- class DiffusionConv(in_channels: int, out_channels: int, k: int, root_weight: bool = True, add_backward: bool = True, bias: bool = True, activation: Optional[str] = None)[source]#
Alias for
DiffConv
.
- class GraphPolyVAR(temporal_order: int, spatial_order: int, norm: str = 'none', cached: bool = False)[source]#
Polynomial spatiotemporal graph filter from the paper “Forecasting time series with VARMA recursions on graphs.” (Isufi et al., IEEE Transactions on Signal Processing 2019).
\[\mathbf{X}_t = \sum_{p=1}^{P} \sum_{l=1}^{L} \Theta_{p,l} \cdot \mathbf{\tilde{A}}^{l-1} \mathbf{X}_{t-p}\]- where
\(\mathbf{\tilde{A}}\) is a graph shift operator (GSO);
\(\Theta \in \mathbb{R}^{P \times L}\) are the filter coefficients accounting for up to \(L\)-hop neighbors and \(P\) time steps in the past.
- Parameters:
temporal_order (int) – The filter temporal order \(P\).
spatial_order (int) – The filter spatial order \(L\).
norm (str) – The normalization used for edges and edge weights. The available options are:
'gcn'
,'asym'
and'none'
. (default:'none'
)cached (bool) – If
True
, then cache the normalized edge weights computed in the first call. (defaultFalse
)
- class MultiHeadGraphAttention(embed_dim: int, num_heads: int = 1, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, edge_dim: Optional[int] = None, concat: bool = True, dropout: float = 0.0, root_weight: bool = True, bias: bool = True, **kwargs)[source]#
The multi-head attention from the paper Attention Is All You Need (Vaswani et al., NeurIPS 2017) for graph-structured data.
- Parameters:
embed_dim (int) – Size of the embedding dimension.
num_heads (int) – Number of attention heads. (default:
1
)qdim (int, optional) – Number of features of the query. If
None
, then defaults toembed_dim
. (default:None
)kdim (int, optional) – Number of features of the key. If
None
, then defaults toembed_dim
. (default:None
)vdim (int, optional) – Number of features of the value. If
None
, then defaults toembed_dim
. (default:None
)edge_dim (int, optional) – Number of edge features (
None
if there are no edge features). (default:None
)concat (bool) – If
True
, then the heads’ outputs are concatenated along the feature dimension, and the dimension of each head’s output isembed_dim / num_heads
. Note that the total number of features in output isembed_dim
in both cases. (default:True
)dropout (float, optional) – The dropout rate. (default:
0
)root_weight (bool) – If
True
, then add a skip connection from the input with a linear transformation. (defaultTrue
)bias (bool, optional) – If
True
, then add a bias vector in output. (default:True
)**kwargs – keyword arguments for the
super(MessagePassing)
call.
- class GATConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, heads: int = 1, concat: bool = True, dim: int = -2, negative_slope: float = 0.2, dropout: float = 0.0, add_self_loops: bool = True, edge_dim: Optional[int] = None, fill_value: Union[float, Tensor, str] = 'mean', bias: bool = True, **kwargs)[source]#
Extension of
GATConv
for static graphs with multidimensional features.The graph attentional operator from the “Graph Attention Networks” paper
\[\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},\]where the attention coefficients \(\alpha_{i,j}\) are computed as
\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k] \right)\right)}.\]If the graph has multi-dimensional edge features \(\mathbf{e}_{i,j}\), the attention coefficients \(\alpha_{i,j}\) are computed as
\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j \, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,j}]\right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k \, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,k}]\right)\right)}.\]- Parameters:
in_channels (int or tuple) – Size of each input sample, or
-1
to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.out_channels (int) – Size of each output sample.
heads (int, optional) – Number of multi-head-attentions. (default:
1
)concat (bool, optional) – If set to
True
, the output dimension of each attention head isout_channels/heads
and all heads’ output are concatenated, resulting inout_channels
number of features. If set toFalse
, the multi-head attentions are averaged instead of concatenated. (default:True
)dim (int) – The axis along which to propagate. (default:
-2
)negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default:
0.2
)dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default:
0
)add_self_loops (bool, optional) – If set to
False
, will not add self-loops to the input graph. (default:True
)edge_dim (int, optional) – Edge feature dimensionality (in case there are any). (default:
None
)fill_value (float or Tensor or str, optional) – The way to generate edge features of self-loops (in case
edge_dim != None
). If given asfloat
ortorch.Tensor
, edge features of self-loops will be directly given byfill_value
. If given asstr
, edge features of self-loops are computed by aggregating all features of edges that point to the specific node, according to a reduce operation. ("add"
,"mean"
,"min"
,"max"
,"mul"
). (default:"mean"
)bias (bool, optional) – If set to
False
, the layer will not learn an additive bias. (default:True
)**kwargs (optional) – Additional arguments of
torch_geometric.nn.conv.MessagePassing
.
- Shapes:
- **input – ** node features \((*, |\mathcal{V}|, *, F_{in})\) or \(((*, |\mathcal{V_s}|, *, F_s), (*, |\mathcal{V_t}|, *, F_t))\) if bipartite, edge indices \((2, |\mathcal{E}|)\), edge features \((|\mathcal{E}|, D)\) (optional)
- **output – ** node features \((*, |\mathcal{V}|, *, F_{out})\) or \(((*, |\mathcal{V}_t|, *, F_{out})\) if bipartite attention_weights \(((2, |\mathcal{E}|), (|\mathcal{E}|, H)))\) if
need_weights
isTrue
elseNone
- class GatedGraphNetwork(input_size: int, output_size: int, activation: str = 'silu', parametrized_skip_conn: bool = False)[source]#
Gate Graph Neural Network layer (with residual connections) inspired by the FC-GNN model from the paper “Multivariate Time Series Forecasting with Latent Graph Inference” (Satorras et al., 2022).
- class AdaptiveGraphConv(input_size: int, emb_size: int, output_size: int, num_nodes: int, bias: bool = True)[source]#
The Dense Adaptive Graph Convolution operator from the paper “Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting” (Bai et al., NeurIPS 2020).
- Parameters:
input_size – Size of the input.
emb_size – Size of the input node embeddings.
output_size – Output size.
num_nodes – Number of nodes in the input graph.
bias – Whether to add a learnable bias.
Recurrent Layers#
The subpackage tsl.nn.layers.recurrent
contains the cells used in
encoders that process the input sequence in a recurrent fashion.
Base classes#
Base class for implementing recurrent neural networks (RNN) cells. |
|
Base class for implementing gated recurrent unit (GRU) cells. |
|
Base class for implementing long short-term memory (LSTM) cells. |
|
Base class for implementing graph-based gated recurrent unit (GRU) cells. |
|
Base class for implementing graph-based long short-term memory (LSTM) cells. |
Implemented cells#
A gated recurrent unit (GRU) cell |
|
A long short-term memory (LSTM) cell. |
|
Gated Recurrent Unit with |
|
LSTM with |
|
The Diffusion Convolutional Recurrent cell from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018). |
|
Dense implementation of the Diffusion Convolutional Recurrent cell from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018). |
|
The Adaptive Graph Convolutional cell from the paper "Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting" (Bai et al., NeurIPS 2020). |
|
EvolveGCNO cell from the paper "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs" (Pereja et al., AAAI 2020). |
|
EvolveGCNH cell from the paper "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs" (Pereja et al., AAAI 2020). |
|
The Graph Recurrent Imputation cell with Diffusion Convolution from the paper "Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks" (Cini et al., ICLR 2022). |
- class GRUCellBase(hidden_size: int, forget_gate: Module, update_gate: Module, candidate_gate: Module)[source]#
Base class for implementing gated recurrent unit (GRU) cells.
- class LSTMCellBase(hidden_size: int, input_gate: Module, forget_gate: Module, cell_gate: Module, output_gate: Module)[source]#
Base class for implementing long short-term memory (LSTM) cells.
- class GraphGRUCellBase(hidden_size: int, forget_gate: Module, update_gate: Module, candidate_gate: Module)[source]#
Base class for implementing graph-based gated recurrent unit (GRU) cells.
- class GraphLSTMCellBase(hidden_size: int, input_gate: Module, forget_gate: Module, cell_gate: Module, output_gate: Module)[source]#
Base class for implementing graph-based long short-term memory (LSTM) cells.
- class GRUCell(input_size: int, hidden_size: int, bias: bool = True, device=None, dtype=None)[source]#
A gated recurrent unit (GRU) cell
\[\begin{split}\begin{array}{ll} r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h \end{array}\end{split}\]where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.
- Parameters:
input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
bias – If
False
, then the layer does not use bias weights b_ih and b_hh. Default:True
- Inputs: input, hidden
input : tensor containing input features
hidden : tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided.
- Outputs: h’
h’ : tensor containing the next hidden state for each element in the batch
- Shape:
- input – \((N, H_{in})\) or \((H_{in})\) tensor containing input features where \(H_{in}\) = input_size.
- hidden – \((N, H_{out})\) or \((H_{out})\) tensor containing the initial hidden state where \(H_{out}\) = hidden_size. Defaults to zero if not provided.
- output – \((N, H_{out})\) or \((H_{out})\) tensor containing the next hidden state.
- weight_ih#
the learnable input-hidden weights, of shape (3*hidden_size, input_size)
- Type:
- weight_hh#
the learnable hidden-hidden weights, of shape (3*hidden_size, hidden_size)
- Type:
- bias_ih#
the learnable input-hidden bias, of shape (3*hidden_size)
- bias_hh#
the learnable hidden-hidden bias, of shape (3*hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
Examples:
>>> rnn = nn.GRUCell(10, 20) >>> input = torch.randn(6, 3, 10) >>> hx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): ... hx = rnn(input[i], hx) ... output.append(hx)
- class LSTMCell(input_size: int, hidden_size: int, bias: bool = True, device=None, dtype=None)[source]#
A long short-term memory (LSTM) cell.
\[\begin{split}\begin{array}{ll} i = \sigma(W_{ii} x + b_{ii} + W_{hi} h + b_{hi}) \\ f = \sigma(W_{if} x + b_{if} + W_{hf} h + b_{hf}) \\ g = \tanh(W_{ig} x + b_{ig} + W_{hg} h + b_{hg}) \\ o = \sigma(W_{io} x + b_{io} + W_{ho} h + b_{ho}) \\ c' = f * c + i * g \\ h' = o * \tanh(c') \\ \end{array}\end{split}\]where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.
- Parameters:
input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
bias – If
False
, then the layer does not use bias weights b_ih and b_hh. Default:True
- Inputs: input, (h_0, c_0)
input of shape (batch, input_size) or (input_size): tensor containing input features
h_0 of shape (batch, hidden_size) or (hidden_size): tensor containing the initial hidden state
c_0 of shape (batch, hidden_size) or (hidden_size): tensor containing the initial cell state
If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.
- Outputs: (h_1, c_1)
h_1 of shape (batch, hidden_size) or (hidden_size): tensor containing the next hidden state
c_1 of shape (batch, hidden_size) or (hidden_size): tensor containing the next cell state
- weight_ih#
the learnable input-hidden weights, of shape (4*hidden_size, input_size)
- Type:
- weight_hh#
the learnable hidden-hidden weights, of shape (4*hidden_size, hidden_size)
- Type:
- bias_ih#
the learnable input-hidden bias, of shape (4*hidden_size)
- bias_hh#
the learnable hidden-hidden bias, of shape (4*hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
Examples:
>>> rnn = nn.LSTMCell(10, 20) # (input_size, hidden_size) >>> input = torch.randn(2, 3, 10) # (time_steps, batch, input_size) >>> hx = torch.randn(3, 20) # (batch, hidden_size) >>> cx = torch.randn(3, 20) >>> output = [] >>> for i in range(input.size()[0]): ... hx, cx = rnn(input[i], (hx, cx)) ... output.append(hx) >>> output = torch.stack(output, dim=0)
- class GraphConvGRUCell(input_size: int, hidden_size: int, bias: bool = True, norm: str = 'mean', root_weight: bool = True, cached: bool = False, **kwargs)[source]#
Gated Recurrent Unit with
GraphConv
as graph convolution in the gates, based on the paper “Structured Sequence Modeling with Graph Convolutional Recurrent Networks” (Seo et al., ICONIP 2017).- Parameters:
input_size (int) – Size of the input.
hidden_size (int) – Number of units in the hidden state.
bias (bool) – If
True
, then the layer will learn an additive bias for each gate. (default:True
)norm (str) – Normalization used by the graph convolutional layer. (default
mean
)root_weight (bool) – If
True
, then add a filter (with different weights) for the root node itself. (defaultTrue
)cached (bool) – If
True
, then cached the normalized edge weights computed in the first call. (defaultFalse
)**kwargs (optional) – Additional arguments of
torch_geometric.nn.conv.MessagePassing
.
- class GraphConvLSTMCell(input_size: int, hidden_size: int, bias: bool = True, norm: str = 'mean', root_weight: bool = True, cached: bool = False, **kwargs)[source]#
LSTM with
GraphConv
as graph convolution in the gates, based on the paper “Structured Sequence Modeling with Graph Convolutional Recurrent Networks” (Seo et al., ICONIP 2017).- Parameters:
input_size (int) – Size of the input.
hidden_size (int) – Number of units in the hidden state.
bias (bool) – If
True
, then the layer will learn an additive bias for each gate. (default:True
)norm (str) – Normalization used by the graph convolutional layer. (default
mean
)root_weight (bool) – If
True
, then add a filter (with different weights) for the root node itself. (defaultTrue
)cached (bool) – If
True
, then cached the normalized edge weights computed in the first call. (defaultFalse
)**kwargs (optional) – Additional arguments of
torch_geometric.nn.conv.MessagePassing
.
- class DCRNNCell(input_size: int, hidden_size: int, k: int = 2, root_weight: bool = True, add_backward: bool = True, bias: bool = True)[source]#
The Diffusion Convolutional Recurrent cell from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).
- Parameters:
input_size – Size of the input.
hidden_size – Number of units in the hidden state.
k – Size of the diffusion kernel.
root_weight – Whether to learn a separate transformation for the central node.
- class DenseDCRNNCell(input_size: int, hidden_size: int, k: int = 2, root_weight: bool = False)[source]#
Dense implementation of the Diffusion Convolutional Recurrent cell from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).
In this implementation, the adjacency matrix is dense and the convolution is performed with matrix multiplication.
- Parameters:
input_size – Size of the input.
hidden_size – Number of units in the hidden state.
k – Size of the diffusion kernel.
root_weight (bool) – Whether to learn a separate transformation for the central node.
- class AGCRNCell(input_size: int, emb_size: int, hidden_size: int, num_nodes: int, bias: bool = True)[source]#
The Adaptive Graph Convolutional cell from the paper “Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting” (Bai et al., NeurIPS 2020).
- Parameters:
input_size – Size of the input.
emb_size – Size of the input node embeddings.
hidden_size – Output size.
num_nodes – Number of nodes in the input graph.
- class EvolveGCNOCell(in_size, out_size, norm, activation='relu', root_weight=False, bias=True, cached=False)[source]#
EvolveGCNO cell from the paper “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” (Pereja et al., AAAI 2020).
This variant of the model simply updates the weights of the graph convolution.
- Parameters:
in_size (int) – Size of the input.
out_size (int) – Number of units in the hidden state.
norm (str) – Method used to normalize the adjacency matrix.
activation (str) – Activation function after the GCN layer.
root_weight (bool) – Whether to add a parametrized skip connection.
bias (bool) – Whether to learn a bias.
cached (bool) – Whether to cache normalized edge_weights.
- class EvolveGCNHCell(in_size, out_size, norm, activation='relu', root_weight=False, bias=True, cached=False)[source]#
EvolveGCNH cell from the paper “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” (Pereja et al., AAAI 2020).
This variant of the model adapts the weights of the graph convolution by looking at node features.
- Parameters:
in_size (int) – Size of the input.
out_size (int) – Number of units in the hidden state.
norm (bool) – Methods used to normalize the adjacency matrix.
activation (str) – Activation function after the GCN layer.
root_weight (bool) – Whether to add a parametrized skip connection.
bias (bool) – Whether to learn a bias.
cached (bool) – Whether to cache normalized edge_weights.
- class GRINCell(input_size: int, hidden_size: int, exog_size: int = 0, n_layers: int = 1, n_nodes: Optional[int] = None, kernel_size: int = 2, decoder_order: int = 1, layer_norm: bool = False, dropout: float = 0.0)[source]#
The Graph Recurrent Imputation cell with Diffusion Convolution from the paper “Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks” (Cini et al., ICLR 2022).
- Parameters:
input_size (int) – Size of the input.
hidden_size (int) – Number of units in the DCRNN hidden layer. (default:
64
)exog_size (int) – Number of channels in the exogenous variables, if any. (default:
0
)n_layers (int) – Number of stacked DCRNN cells. (default:
1
)n_nodes (int, optional) – Number of nodes in the input graph. (default:
None
)kernel_size (int) – Order of the spatial diffusion process in the DCRNN cells. (default:
2
)decoder_order (int) – Order of the spatial diffusion process in the spatial decoder. (default:
1
)layer_norm (bool, optional) – If
True
, then use layer normalization. (default:False
)dropout (float, optional) – Dropout probability in the DCRNN cells. (default:
0
)
Multi Layers#
The subpackage tsl.nn.layers.multi
contains the layers that perform an
operation using a different set of parameters for the different instances
stacked in a dimension of the input data (e.g., the node dimension). They can be
used to process with independent parameters each node (or time step), breaking
the permutation equivariant property of the original operation.
Applies linear transformations with different weights to the different instances in the input data. |
|
Applies linear transformations with different weights to the different instances in the input data with a final nonlinear activation. |
|
Applies convolutions with different weights to the different instances in the input data. |
|
Multiple parallel gated recurrent unit (GRU) cells. |
|
Multiple parallel long short-term memory (LSTM) cells. |
- class MultiLinear(in_channels: int, out_channels: int, n_instances: int, *, ndim: Optional[int] = None, pattern: Optional[str] = None, instance_dim: Union[int, str] = -2, channel_dim: Union[int, str] = -1, bias: bool = True, device=None, dtype=None)[source]#
Applies linear transformations with different weights to the different instances in the input data.
\[\mathbf{X}^{\prime} = [\boldsymbol{\Theta}_i \mathbf{x}_i + \mathbf{b}_i]_{i=0,\ldots,N}\]- Parameters:
in_channels (int) – Size of instance input sample.
out_channels (int) – Size of instance output sample.
n_instances (int) – The number \(N\) of parallel linear operations. Each operation has different weights and biases.
instance_dim (int or str) – Dimension of the instances (must match
n_instances
at runtime). (default:-2
)channel_dim (int or str) – Dimension of the input channels. (default:
-1
)bias (bool) – If
True
, then the layer will learn an additive bias for each instance. (default:True
)device (optional) – The device of the parameters. (default:
None
)dtype (optional) – The data type of the parameters. (default:
None
)
Examples
>>> m = MultiLinear(20, 32, 10, pattern='t n f', instance_dim='n') >>> input = torch.randn(64, 12, 10, 20) # shape: [b t n f] >>> output = m(input) >>> print(output.size()) torch.Size([64, 24, 10, 32])
- class MultiDense(in_channels: int, out_channels: int, n_instances: int, activation: str = 'relu', dropout: float = 0.0, *, ndim: Optional[int] = None, pattern: Optional[str] = None, instance_dim: int = -2, channel_dim: int = -1, bias: bool = True, device=None, dtype=None)[source]#
Applies linear transformations with different weights to the different instances in the input data with a final nonlinear activation.
\[\mathbf{X}^{\prime} = \left[\sigma\left(\boldsymbol{\Theta}_i \mathbf{x}_i + \mathbf{b}_i \right)\right]_{i=0,\ldots,N}\]- Parameters:
in_channels (int) – Size of instance input sample.
out_channels (int) – Size of instance output sample.
n_instances (int) – The number \(N\) of parallel linear operations. Each operation has different weights and biases.
activation (str, optional) – Activation function to be used. (default:
'relu'
)dropout (float, optional) – Dropout rate. (default:
0
)instance_dim (int or str) – Dimension of the instances (must match
n_instances
at runtime). (default:-2
)channel_dim (int or str) – Dimension of the input channels. (default:
-1
)bias (bool) – If
True
, then the layer will learn an additive bias for each instance. (default:True
)device (optional) – The device of the parameters. (default:
None
)dtype (optional) – The data type of the parameters. (default:
None
)
- class MultiConv1d(in_channels: int, out_channels: int, n_instances: int, kernel_size: int, stride: int = 1, padding: Union[str, int] = 0, dilation: int = 1, bias: bool = True, device=None, dtype=None)[source]#
Applies convolutions with different weights to the different instances in the input data.
- class MultiGRUCell(input_size: int, hidden_size: int, n_instances: int, bias: bool = True, device=None, dtype=None)[source]#
Multiple parallel gated recurrent unit (GRU) cells.
\[\begin{split}\begin{array}{ll} r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h \end{array}\end{split}\]where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.
- Parameters:
input_size (int) – The number of features in the instance input sample.
hidden_size (int) – The number of features in the instance hidden state.
n_instances (int) – The number of parallel GRU cells. Each cell has different weights.
bias (bool) – If
True
, then the layer will learn an additive bias for each instance gate. (default:True
)device (optional) – The device of the parameters. (default:
None
)dtype (optional) – The data type of the parameters. (default:
None
)
Examples:
>>> rnn = MultiGRUCell(20, 32, 10) >>> input = torch.randn(64, 12, 10, 20) >>> h = None >>> output = [] >>> for i in range(12): ... h = rnn(input[:, i], h) ... output.append(h) >>> output = torch.stack(output, dim=1) >>> print(output.size()) torch.Size([64, 12, 10, 32])
- class MultiLSTMCell(input_size: int, hidden_size: int, n_instances: int, bias: bool = True, device=None, dtype=None)[source]#
Multiple parallel long short-term memory (LSTM) cells.
\[\begin{split}\begin{array}{ll} i = \sigma(W_{ii} x + b_{ii} + W_{hi} h + b_{hi}) \\ f = \sigma(W_{if} x + b_{if} + W_{hf} h + b_{hf}) \\ g = \tanh(W_{ig} x + b_{ig} + W_{hg} h + b_{hg}) \\ o = \sigma(W_{io} x + b_{io} + W_{ho} h + b_{ho}) \\ c' = f * c + i * g \\ h' = o * \tanh(c') \\ \end{array}\end{split}\]where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.
- Parameters:
input_size (int) – The number of features in the instance input sample.
hidden_size (int) – The number of features in the instance hidden state.
n_instances (int) – The number of parallel LSTM cells. Each cell has different weights.
bias (bool) – If
True
, then the layer will learn an additive bias for each instance gate. (default:True
)device (optional) – The device of the parameters. (default:
None
)dtype (optional) – The data type of the parameters. (default:
None
)
Examples:
>>> rnn = MultiLSTMCell(20, 32, 10) >>> input = torch.randn(64, 12, 10, 20) >>> h = None >>> output = [] >>> for i in range(12): ... h = rnn(input[:, i], h) # h = h, c ... output.append(h[0]) # i-th output is h_i >>> output = torch.stack(output, dim=1) >>> print(output.size()) torch.Size([64, 12, 10, 32])
Normalization Layers#
The subpackage tsl.nn.layers.norm
contains the normalization layers.
Applies a normalization of the specified type. |
|
Applies layer normalization. |
|
Applies graph-wise instance normalization. |
|
Applies graph-wise batch normalization. |
- class Norm(norm_type, in_channels, **kwargs)[source]#
Applies a normalization of the specified type.
- Parameters:
in_channels (int) – Size of each input sample.
- class InstanceNorm(in_channels, eps=1e-05, affine=True)[source]#
Applies graph-wise instance normalization.
- class BatchNorm(in_channels, eps: float = 1e-05, momentum: float = 0.1, affine: bool = True, track_running_stats: bool = True)[source]#
Applies graph-wise batch normalization.
- Parameters:
in_channels (int) – Size of each input sample.
eps (float, optional) – A value added to the denominator for numerical stability. (default:
1e-5
)affine (bool, optional) – If set to
True
, this module has learnable affine parameters \(\gamma\) and \(\beta\). (default:True
)track_running_stats (bool, optional) – Whether to track stats to perform batch norm. (default:
True
)
Base Layers#
The subpackage tsl.nn.layers.base
contains basic layers used at the core
of other layers.
A simple fully-connected layer implementing |
|
Learns a standard temporal convolutional filter. |
|
Temporal convolutional filter with gated tanh connection. |
|
Creates a table of node embeddings with the specified size. |
|
The positional encoding from the paper "Attention Is All You Need" (Vaswani et al., NeurIPS 2017). |
|
The multi-head attention from the paper "Attention Is All You Need" (Vaswani et al., NeurIPS 2017) for spatiotemporal data. |
|
Temporal Self Attention layer. |
|
Spatial Self Attention layer. |
- class Dense(input_size: int, output_size: int, activation: str = 'relu', dropout: float = 0.0, bias: bool = True)[source]#
A simple fully-connected layer implementing
\[\mathbf{x}^{\prime} = \sigma\left(\boldsymbol{\Theta}\mathbf{x} + \mathbf{b}\right)\]where \(\mathbf{x} \in \mathbb{R}^{d_{in}}, \mathbf{x}^{\prime} \in \mathbb{R}^{d_{out}}\) are the input and output features, respectively, \(\boldsymbol{\Theta} \in \mathbb{R}^{d_{out} \times d_{in}} \mathbf{b} \in \mathbb{R}^{d_{out}}\) are trainable parameters, and \(\sigma\) is an activation function.
- Parameters:
input_size (int) – Number of input features.
output_size (int) – Number of output features.
activation (str, optional) – Activation function to be used. (default:
'relu'
)dropout (float, optional) – The dropout rate. (default:
0
)bias (bool, optional) – If
True
, then the bias vector is used. (default:True
)
- class TemporalConv(input_channels, output_channels, kernel_size, dilation=1, stride=1, bias=True, padding=0, causal_pad=True, weight_norm=False, channel_last=False)[source]#
Learns a standard temporal convolutional filter.
- Parameters:
input_channels (int) – Input size.
output_channels (int) – Output size.
kernel_size (int) – Size of the convolution kernel.
dilation (int, optional) – Spacing between kernel elements.
stride (int, optional) – Stride of the convolution.
bias (bool, optional) – Whether to add a learnable bias to the output of the convolution.
padding (int or tuple, optional) – Padding of the input. Used only of causal_pad is False.
causal_pad (bool, optional) – Whether to pad the input as to preserve causality.
weight_norm (bool, optional) – Wheter to apply weight normalization to the parameters of the filter.
- class GatedTemporalConv(input_channels, output_channels, kernel_size, dilation=1, stride=1, bias=True, padding=0, causal_pad=True, weight_norm=False, channel_last=False)[source]#
Temporal convolutional filter with gated tanh connection.
- class NodeEmbedding(n_nodes: int, emb_size: int, initializer: Union[str, Tensor] = 'uniform', requires_grad: bool = True)[source]#
Creates a table of node embeddings with the specified size.
- class PositionalEncoding(d_model: int, dropout: float = 0.0, max_len: int = 5000, affinity: bool = False, batch_first=True)[source]#
The positional encoding from the paper “Attention Is All You Need” (Vaswani et al., NeurIPS 2017).
- class MultiHeadAttention(embed_dim, heads, qdim: Optional[int] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, axis='steps', dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, device=None, dtype=None, causal=False)[source]#
The multi-head attention from the paper “Attention Is All You Need” (Vaswani et al., NeurIPS 2017) for spatiotemporal data.
- class TemporalSelfAttention(embed_dim, num_heads, in_channels=None, dropout=0.0, bias=True, device=None, dtype=None)[source]#
Temporal Self Attention layer.
- Parameters:
embed_dim (int) – Size of the hidden dimension associeted with each node at each time step.
num_heads (int) – Number of parallel attention heads.
dropout (float) – Dropout probability.
bias (bool, optional) – Whther to add a learnable bias.
device (optional) – Device on which store the model.
dtype (optional) – Data Type of the parameters.
- Examples::
>>> import torch >>> m = TemporalSelfAttention(32, 4, -1) >>> input = torch.randn(128, 24, 10, 20) >>> output, _ = m(input) >>> print(output.size()) torch.Size([128, 24, 10, 32])
- class SpatialSelfAttention(embed_dim, num_heads, in_channels=None, dropout=0.0, bias=True, device=None, dtype=None)[source]#
Spatial Self Attention layer.
- Parameters:
embed_dim (int) – Size of the hidden dimension associeted with each node at each time step.
num_heads (int) – Number of parallel attention heads.
dropout (float) – Dropout probability.
bias (bool, optional) – Whther to add a learnable bias.
device (optional) – Device on which store the model.
dtype (optional) – Data Type of the parameters.
- Examples::
>>> import torch >>> m = SpatialSelfAttention(32, 4, -1) >>> input = torch.randn(128, 24, 10, 20) >>> output, _ = m(input) >>> print(output.size()) torch.Size([128, 24, 10, 32])
Operational Layers#
The subpackage tsl.nn.layers.ops
contains operational layers that do not
involve learnable parameters.
Call a generic function on the input. |
|
Concatenate tensors along dimension |
|
Apply |
|
Scales the gradient in back-propagation. |
|
A utility layer for any activation function. |
- class Lambda(function: Callable)[source]#
Call a generic function on the input.
- Parameters:
function (callable) – The function to call in
forward(input)
.
- class Concatenate(dim: int = 0)[source]#
Concatenate tensors along dimension
dim
.The tensors dimensions are matched (i.e., broadcasted if necessary) before concatenation.
- Parameters:
dim (int) – The dimension to concatenate on. (default:
0
)
- class Select(dim: int, index: int)[source]#
Apply
select()
to select one element from aTensor
along a dimension.This layer returns a view of the original tensor with the given dimension removed.