Models#

class BaseModel(return_type: Optional[Type[Union[Tensor, Dict, List, Tuple]]] = None)[source]#

Base class for creating neural models.

This class provides useful utilities for the model designer:

the methods add_model_specific_args() and add_argparse_args() allow to automatically add to an ArgumentParser the arguments needed to initialize the model (with typing and default values).
the method loss() can be used to compute a custom loss on the provided training target. Inference modules in tsl will call this method for the loss computation, if implemented in the model.
the method predict() can be used to define a variation of the forward() function for only inference purposes (e.g., removing outputs used only for auxiliary tasks during training).
the parameter return_type specifies which the return type of the forward function (Tensor, list or dict).

property has_loss: bool#: Returns True if the model has implemented the loss() method.

property has_predict: bool#: Returns True if the model has implemented the predict() method.

loss(target, *args, **kwargs)[source]#: Compute a custom loss w.r.t. target.

predict(*args, **kwargs)[source]#: Forward function used only for inference.

classmethod get_model_signature() → dict[source]#: Get signature of the model’s BaseModel’s __init__ function.

classmethod get_forward_signature() → dict[source]#: Get signature of the model’s forward() function.

classmethod filter_model_args_(mapping: dict)[source]#: Remove from mapping all the keys that are not in BaseModel’s __init__ function.

classmethod model_excluded_args() → Set[source]#: Set of arguments of __init__() to be excluded when adding model’s args to an ArgumentParser (see add_model_specific_args()).

classmethod add_model_specific_args(parser: ArgumentParser)[source]#

Adds to the ArgumentParser parser the arguments needed to initialize the model (with typing and default values).

The arguments added are all the parameters of the __init__() method, excluding the keys returned by model_excluded_args().

classmethod add_argparse_args(parser: ArgumentParser, exclude_args: Optional[Set] = None)[source]#: Adds to the ArgumentParser parser all the parameters of the __init__() method (with typing and default values).

Spatiotemporal Models#

`DCRNNModel`	The Diffusion Convolutional Recurrent Neural Network from the paper "Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting" (Li et al., ICLR 2018).
`GraphWaveNetModel`	The Graph WaveNet model from the paper "Graph WaveNet for Deep Spatial-Temporal Graph Modeling" (Wu et al., IJCAI 2019).
`GatedGraphNetworkModel`	Simple time-then-space model with an MLP with residual connections as encoder (flattened time dimension) and a gated GN decoder with node identification.
`RNNEncGCNDecModel`	Simple time-then-space model.
`STCNModel`	Spatiotemporal GNN with interleaved temporal and spatial diffusion convolutions.
`GRINModel`	The Graph Recurrent Imputation Network with DCRNN cells from the paper "Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks" (Cini et al., ICLR 2022).
`EvolveGCNModel`	The EvolveGCN model from the paper "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs" (Pereja et al., AAAI 2020).
`GRUGCNModel`	Simple time-then-space model with a GRU encoder and a GCN decoder from the paper "On the Equivalence Between Temporal and Static Equivariant Graph Representations" (Guo et al., ICML 2022).
`AGCRNModel`	The Adaptive Graph Convolutional Recurrent Network from the paper "Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting" (Bai et al., NeurIPS 2020).

class DCRNNModel(input_size: int, output_size: int, horizon: int, exog_size: int = 0, hidden_size: int = 32, kernel_size: int = 2, ff_size: int = 256, n_layers: int = 1, cache_support: bool = False, dropout: float = 0.0, activation: str = 'relu')[source]#

The Diffusion Convolutional Recurrent Neural Network from the paper “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting” (Li et al., ICLR 2018).

Differently from the original implementation, the recurrent decoder is substituted with a fixed-length nonlinear readout.

Parameters:

input_size (int) – Number of features of the input sample.
output_size (int) – Number of output channels.
horizon (int) – Number of future time steps to forecast.
exog_size (int) – Number of features of the input covariate, if any. (default: 0)
hidden_size (int) – Number of hidden units. (default: 32)
kernel_size (int) – Order of the spatial diffusion process. (default: 2)
ff_size (int) – Number of units in the nonlinear readout. (default: 256)
n_layers (int) – Number of DCRNN cells. (default: 1)
dropout (float) – Dropout probability. (default: 0)
activation (str) – Activation function in the readout. (default: 'relu')

forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None, u: Optional[Tensor] = None) → Tensor[source]#

class GraphWaveNetModel(input_size: int, output_size: int, horizon: int, exog_size: int = 0, hidden_size: int = 32, ff_size: int = 256, n_layers: int = 8, temporal_kernel_size: int = 2, spatial_kernel_size: int = 2, learned_adjacency: bool = True, n_nodes: Optional[int] = None, emb_size: int = 10, dilation: int = 2, dilation_mod: int = 2, norm: str = 'batch', dropout: float = 0.3)[source]#

The Graph WaveNet model from the paper “Graph WaveNet for Deep Spatial-Temporal Graph Modeling” (Wu et al., IJCAI 2019).

Parameters:

input_size (int) – Number of features of the input sample.
output_size (int) – Number of output channels.
horizon (int) – Number of future time steps to forecast.
exog_size (int) – Number of features of the input covariate, if any. (default: 0)
hidden_size (int) – Number of hidden units. (default: 32)
ff_size (int) – Number of units in the nonlinear readout. (default: 256)
n_layers (int) – Number of Graph WaveNet blocks. (default: 8)
temporal_kernel_size (int) – Size of the temporal convolution kernel. (default: 2)
spatial_kernel_size (int) – Order of the spatial diffusion process. (default: 2)
learned_adjacency (bool) – If True, then consider an additional learned adjacency matrix. (default: True)
n_nodes (int, optional) – Number of nodes in the input graph, required only when learned_adjacency is True. (default: None)
emb_size (int) – Number of features in the node embeddings used for graph learning. (default: 10)
dilation (int) – Dilation of the temporal convolutional kernels. (default: 2)
dilation_mod (int) – Length of the cycle for the dilation coefficient. (default: 2)
norm (str) – Normalization strategy. (default: 'batch')
dropout (float) – Dropout probability. (default: 0.3)

get_learned_adj()[source]#

forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None, u: Optional[Tensor] = None) → Tensor[source]#

class GatedGraphNetworkModel(input_size: int, input_window_size: int, horizon: int, n_nodes: int, hidden_size: int, output_size: Optional[int] = None, exog_size: int = 0, enc_layers: int = 1, gnn_layers: int = 1, full_graph: bool = True, activation: str = 'silu')[source]#

Simple time-then-space model with an MLP with residual connections as encoder (flattened time dimension) and a gated GN decoder with node identification.

Inspired by the FC-GNN model from the paper “Multivariate Time Series Forecasting with Latent Graph Inference” (Satorras et al., 2022).

Parameters:

input_size (int) – Size of the input.
input_window_size (int) – Size of the input window (this model cannot process sequences of variable lenght).
hidden_size (int) – Number of hidden units in each hidden layer.
output_size (int) – Size of the output.
horizon (int) – Forecasting steps.
n_nodes (int) – Number of nodes.
exog_size (int) – Size of the optional exogenous variables.
enc_layers (int) – Number of layers in the MLP encoder.
gnn_layers (int) – Number of GNN layers in the decoder.
full_graph (int) – Whether to use a full graph for the GNN. In that case, the model turns into a dense spatial attention layer.

forward(x, edge_index=None, u=None)[source]#

class RNNEncGCNDecModel(input_size, hidden_size, output_size, exog_size, rnn_layers, gcn_layers, rnn_dropout, gcn_dropout, horizon, cell_type='gru', activation='relu')[source]#

Simple time-then-space model.

Input time series are encoded in vectors using an RNN and then decoded using a stack of GCN layers.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Units in the hidden layers.
output_size (int) – Size of the optional readout.
exog_size (int) – Size of the exogenous variables.
rnn_layers (int) – Number of recurrent layers in the encoder.
gcn_layers (int) – Number of graph convolutional layers in the decoder.
rnn_dropout (float, optional) – Dropout probability in the RNN encoder.
gcn_dropout (float, optional) – Dropout probability int the GCN decoder.
horizon (int) – Forecasting horizon.
cell_type (str, optional) – Type of cell that should be used. (options: ['gru', 'lstm']). (default: 'gru')
activation (str, optional) – Activation function.

forward(x, edge_index, edge_weight, u=None, **kwargs)[source]#

class STCNModel(input_size, exog_size, hidden_size, ff_size, output_size, n_layers, horizon, temporal_kernel_size, spatial_kernel_size, temporal_convs_layer=2, spatial_convs_layer=1, dilation=1, norm='none', gated=False, activation='relu', dropout=0.0)[source]#

Spatiotemporal GNN with interleaved temporal and spatial diffusion convolutions.

Parameters:

input_size (int) – Size of the input.
exog_size (int) – Size of the exogenous variables.
hidden_size (int) – Number of units in the hidden layer.
ff_size (int) – Number of units in the hidden layers of the nonlinear readout.
output_size (int) – Number of output channels.
n_layers (int) – Number of GraphWaveNet blocks.
horizon (int) – Forecasting horizon.
temporal_kernel_size (int) – Size of the temporal convolution kernel.
spatial_kernel_size (int) – Order of the spatial diffusion process.
dilation (int, optional) – Dilation of the temporal convolutional kernels.
norm (str, optional) – Normalization strategy.
gated (bool, optional) – Whether to use gated TanH activation in the temporal convolutional layers.
activation (str, optional) – Activation function.
dropout (float, optional) – Dropout probability.

forward(x, edge_index, edge_weight=None, u=None, **kwargs)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class GRINModel(input_size: int, hidden_size: int = 64, ff_size: int = 128, embedding_size: Optional[int] = None, exog_size: int = 0, n_layers: int = 1, n_nodes: Optional[int] = None, kernel_size: int = 2, decoder_order: int = 1, layer_norm: bool = False, dropout: float = 0.0, ff_dropout: float = 0.0, merge_mode: str = 'mlp')[source]#

The Graph Recurrent Imputation Network with DCRNN cells from the paper “Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks” (Cini et al., ICLR 2022).

Parameters:

input_size (int) – Size of the input.
hidden_size (int) – Number of units in the DCRNN hidden layer. (default: 64)
ff_size (int) – Number of units in the nonlinear readout. (default: 128)
embedding_size (int, optional) – Number of features in the optional node embeddings. (default: None)
exog_size (int) – Number of channels in the exogenous variables, if any. (default: None)
n_layers (int) – Number DCRNN cells. (default: 1)
n_nodes (int, optional) – Number of nodes in the input graph. (default: None)
kernel_size (int) – Order of the spatial diffusion process in the DCRNN cells. (default: 2)
decoder_order (int) – Order of the spatial diffusion process in the spatial decoder. (default: 1)
layer_norm (bool, optional) – If True, then use layer normalization. (default: False)
dropout (float, optional) – Dropout probability in the DCRNN cells. (default: 0)
ff_dropout (float, optional) – Dropout probability in the readout. (default: 0)
merge_mode (str, optional) – Strategy used to merge representations coming from the two branches of the bidirectional model. (default: mlp)

forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None, mask: Optional[Tensor] = None, u: Optional[Tensor] = None) → list[source]#

predict(x: Tensor, edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None, mask: Optional[Tensor] = None, u: Optional[Tensor] = None) → Tensor[source]#

class EvolveGCNModel(input_size, hidden_size, output_size, horizon, exog_size, n_layers, norm, root_weight, cached, variant='H', activation='relu')[source]#

The EvolveGCN model from the paper “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” (Pereja et al., AAAI 2020).

Parameters:

input_size (int) – Size of the input.
hidden_size (int) – Number of hidden units in each hidden layer.
output_size (int) – Size of the output.
horizon (int) – Forecasting steps.
exog_size (int) – Size of the optional exogenous variables.
n_layers (int) – Number of layers in the encoder.
asymmetric_norm (bool) – Whether to consider the input graph as directed.
root_weight (bool) – Whether to add a parametrized skip connection.
cached (bool) – Whether to cache normalized edge_weights.
variant (str) – Variant of EvolveGCN to use (options: ‘H’ or ‘O’)
activation (str) – Activation after each GCN layer.

forward(x, edge_index, edge_weight=None, u=None)[source]#

class GRUGCNModel(input_size, hidden_size, output_size, horizon, exog_size, enc_layers, gcn_layers, norm='mean', encode_edges=False, activation='softplus')[source]#

Simple time-then-space model with a GRU encoder and a GCN decoder from the paper “On the Equivalence Between Temporal and Static Equivariant Graph Representations” (Guo et al., ICML 2022).

Parameters:

input_size (int) – Size of the input.
hidden_size (int) – Number of hidden units in each hidden layer.
output_size (int) – Size of the output.
horizon (int) – Forecasting steps.
exog_size (int) – Size of the optional exogenous variables.
enc_layers (int) – Number of layers in the GRU encoder.
gcn_layers (int) – Number of GCN layers in GCN decoder.
norm (str) – Normalization used by the graph convolutional layers.

forward(x, edge_index, edge_weight=None, edge_features=None, u=None)[source]#

class AGCRNModel(input_size: int, output_size: int, horizon: int, n_nodes: int, hidden_size: int = 64, emb_size: int = 10, exog_size: int = 0, n_layers: int = 1)[source]#

The Adaptive Graph Convolutional Recurrent Network from the paper “Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting” (Bai et al., NeurIPS 2020).

Parameters:

input_size (int) – Number of features of the input sample.
output_size (int) – Number of output channels.
horizon (int) – Number of future time steps to forecast.
exog_size (int) – Number of features of the input covariate, if any.
hidden_size (int) – Number of hidden units.
hidden_size – Size of the learned node embeddings.
n_nodes (int) – Number of nodes in the input (static) graph.
n_layers (int) – Number of AGCRN cells. (default: 1)

forward(x: Tensor, u: Optional[Tensor] = None) → Tensor[source]#

Temporal Models#

`ARModel`	Simple univariate linear AR model for multistep forecasting.
`VARModel`	A simple VAR model for multistep forecasting.
`RNNModel`	Simple RNN for multistep forecasting.
`FCRNNModel`	A simple fully connected RNN for multistep forecasting that simply flattens data along the spatial dimension.
`TCNModel`	A simple Causal Dilated Temporal Convolutional Network for multistep forecasting.
`TransformerModel`	A Transformer from the paper "Attention Is All You Need" (Vaswani et al., NeurIPS 2017) for multistep time series forecasting.
`RNNImputerModel`	Fill the blanks with 1-step-ahead predictions of a recurrent network.
`BiRNNImputerModel`	Fill the blanks with 1-step-ahead predictions of a bidirectional recurrent neural network.

class ARModel(input_size: int, temporal_order: int, output_size: int, horizon: int, exog_size: int = 0, bias: bool = True)[source]#

Simple univariate linear AR model for multistep forecasting.

Parameters:

input_size (int) – Size of the input.
temporal_order (int) – Number of units in the recurrent cell.
output_size (int) – Number of output channels.
exog_size (int) – Size of the exogenous variables.
horizon (int) – Forecasting horizon.

forward(x: Tensor, u: Optional[Tensor] = None) → Tensor[source]#

class VARModel(input_size: int, temporal_order: int, output_size: int, horizon: int, n_nodes: int, exog_size: int = 0, bias: bool = True)[source]#

A simple VAR model for multistep forecasting.

Parameters:

input_size (int) – Size of the input.
n_nodes (int) – Number of nodes.
temporal_order (int) – Number of units in the recurrent cell.
output_size (int) – Number of output channels.
exog_size (int) – Size of the exogenous variables.
horizon (int) – Forecasting horizon.

forward(x: Tensor, u: Optional[Tensor] = None) → Tensor[source]#

class RNNModel(input_size: int, output_size: int, horizon: int, exog_size: int = 0, hidden_size: int = 32, ff_size: int = 64, rec_layers: int = 1, ff_layers: int = 1, rec_dropout: float = 0.0, ff_dropout: float = 0.0, cell_type: str = 'gru', activation: str = 'relu')[source]#

Simple RNN for multistep forecasting.

Parameters:

input_size (int) – Size of the input.
hidden_size (int) – Number of units in the recurrent cell.
output_size (int) – Number of output channels.
ff_size (int) – Number of units in the link predictor.
exog_size (int) – Size of the exogenous variables.
rec_layers (int) – Number of RNN layers.
ff_layers (int) – Number of hidden layers in the decoder.
rec_dropout (float, optional) – Dropout probability in the RNN encoder.
ff_dropout (float, optional) – Dropout probability int the GCN decoder.
horizon (int) – Forecasting horizon.
cell_type (str, optional) – Type of cell that should be use (options: [gru, lstm]). (default: gru)
activation (str, optional) – Activation function. (default: relu)

forward(x: Tensor, u: Optional[Tensor] = None) → Tensor[source]#

class FCRNNModel(input_size: int, output_size: int, horizon: int, n_nodes: int, exog_size: Optional[int] = None, hidden_size: int = 32, ff_size: int = 64, rec_layers: int = 1, ff_layers: int = 1, rec_dropout: float = 0.0, ff_dropout: float = 0.0, cell_type: str = 'gru', activation: str = 'relu')[source]#

A simple fully connected RNN for multistep forecasting that simply flattens data along the spatial dimension.

Parameters:

input_size (int) – Size of the input.
hidden_size (int) – Number of units in the recurrent cell.
output_size (int) – Number of output channels.
ff_size (int) – Number of units in the link predictor.
exog_size (int) – Size of the exogenous variables.
rec_layers (int) – Number of RNN layers.
ff_layers (int) – Number of hidden layers in the decoder.
rec_dropout (float, optional) – Dropout probability in the RNN encoder.
ff_dropout (float, optional) – Dropout probability int the GCN decoder.
horizon (int) – Forecasting horizon.
cell_type (str, optional) – Type of cell that should be use (options: [gru, lstm]). (default: gru)
activation (str, optional) – Activation function. (default: relu)

forward(x: Tensor, u: Optional[Tensor] = None) → Tensor[source]#

class TCNModel(input_size: int, output_size: int, horizon: int, exog_size: int = 0, hidden_size: int = 32, ff_size: int = 32, kernel_size: int = 2, n_layers: int = 4, n_convs_layer: int = 2, readout_kernel_size: int = 1, dilation: int = 2, gated: bool = False, resnet: bool = True, norm: str = 'batch', dropout: float = 0.0, activation: str = 'relu')[source]#

A simple Causal Dilated Temporal Convolutional Network for multistep forecasting. Learned temporal embeddings are pooled together using dynamics weights.

Parameters:

input_size (int) – Number of features of the input sample.
output_size (int) – Number of output channels.
horizon (int) – Number of future time steps to forecast.
exog_size (int) – Number of features of the input covariate, if any. (default: 0)
hidden_size (int) – Number of hidden units. (default: 32)
ff_size (int) – Number of units in the hidden layers of the decoder. (default: 32)
kernel_size (int) – Size of the convolutional kernel. (default: 2)
n_layers (int) – Number of TCN blocks. (default: 4)
n_convs_layer (int) – Number of temporal convolutions in each layer. (default: 2)
readout_kernel_size (int) – Width of the readout kernel size. (default: 1)
dilation (int) – Dilation coefficient of the convolutional kernel. (default: 2)
gated (bool) – If True, then the gated_tanh() activation function is used. (default: False)
resnet (bool) – If True, then residual connections are used. (default: True)
norm (str) – Normalization strategy. (default: 'batch')
dropout (float) – Dropout probability. (default: 0)
activation (str) – Activation function. (default: 'relu)

forward(x: Tensor, u: Optional[Tensor] = None) → Tensor[source]#

class TransformerModel(input_size: int, output_size: int, horizon: int, exog_size: int = 0, hidden_size: int = 32, ff_size: int = 32, n_heads: int = 1, n_layers: int = 1, dropout: float = 0.0, axis: str = 'time', activation: str = 'elu')[source]#

A Transformer from the paper “Attention Is All You Need” (Vaswani et al., NeurIPS 2017) for multistep time series forecasting.

Parameters:

input_size (int) – Input size.
hidden_size (int) – Dimension of the learned representations.
output_size (int) – Dimension of the output.
ff_size (int) – Units in the MLP after self attention.
exog_size (int) – Dimension of the exogenous variables.
horizon (int) – Number of forecasting steps.
n_heads (int, optional) – Number of parallel attention heads.
n_layers (int, optional) – Number of layers.
dropout (float, optional) – Dropout probability.
axis (str, optional) – Dimension on which to apply attention to update the representations. Can be either, ‘time’, ‘nodes’, or ‘both’. (default: 'time')
activation (str, optional) – Activation function.

forward(x: Tensor, u: Optional[Tensor] = None) → Tensor[source]#

class RNNImputerModel(input_size: int, hidden_size: int = 64, exog_size: int = 0, cell: str = 'gru', concat_mask: bool = True, fully_connected: bool = False, n_nodes: Optional[int] = None, detach_input: bool = False, n_layers: int = 1, cat_states_layers: bool = False)[source]#

Fill the blanks with 1-step-ahead predictions of a recurrent network.

\[\bar{x}_{t} = m_{t} \cdot x_{t} + (1 - m_{t}) \cdot \hat{x}_{t}\]

Parameters:

input_size (int) – Number of features of the input sample.
hidden_size (int) – Number of hidden units. (default: 64)
exog_size (int) – Number of features of the input covariate, if any. (default: 0)
cell (str) – Type of recurrent cell to be used, one of [gru, lstm]. (default: gru)
concat_mask (bool) – If True, then the input tensor is concatenated to the mask when fed to the RNN cell. (default: True)
fully_connected (bool) – If True, then node and feature dimensions are flattened together. (default: False)
n_nodes (int, optional) – The number of nodes in the input sample, to be provided in case fully_connected is True. (default: None)
detach_input (bool) – If True, call detach() on predictions before they are used to fill the gaps, breaking the error backpropagation. (default: False)
n_layers (int, optional) – Number of hidden layers. (default: 1)
cat_states_layers (bool) – If True, then the states of the RNN are concatenated together. (default: False)

forward(x: Tensor, mask: Tensor, u: Optional[Tensor] = None, return_hidden: bool = False) → Union[Tensor, list][source]#

predict(x: Tensor, mask: Tensor, u: Optional[Tensor] = None) → Tensor[source]#

class BiRNNImputerModel(input_size: int, hidden_size: int = 64, exog_size: int = 0, cell: str = 'gru', concat_mask: bool = True, fully_connected: bool = False, n_nodes: Optional[int] = None, detach_input: bool = False, n_layers: int = 1, cat_states_layers: bool = False, dropout: float = 0.0)[source]#

Fill the blanks with 1-step-ahead predictions of a bidirectional recurrent neural network.

Parameters:

input_size (int) – Number of features of the input sample.
hidden_size (int) – Number of hidden units. (default: 64)
exog_size (int) – Number of features of the input covariate, if any. (default: 0)
cell (str) – Type of recurrent cell to be used, one of [gru, lstm]. (default: gru)
concat_mask (bool) – If True, then the input tensor is concatenated to the mask when fed to the RNN cell. (default: True)
fully_connected (bool) – If True, then node and feature dimensions are flattened together. (default: False)
n_nodes (int, optional) – The number of nodes in the input sample, to be provided in case fully_connected is True. (default: None)
detach_input (bool) – If True, call detach() on predictions before they are used to fill the gaps, breaking the error backpropagation. (default: False)
n_layers (int, optional) – Number of hidden layers. (default: 1)
return_previous_state (bool) – If True, then the returned states are shifted one-step behind the imputations. (default: True)
cat_states_layers (bool) – If True, then the states of the RNN are concatenated together. (default: False)
dropout (float, optional) – Dropout probability in the decoder. (default: 0.)

forward(x: Tensor, mask: Tensor, u: Optional[Tensor] = None, return_hidden: bool = False, return_predictions: bool = True) → Union[Tensor, list][source]#

predict(x: Tensor, mask: Tensor, u: Optional[Tensor] = None) → Tensor[source]#