PyTorch Datasets#

SpatioTemporalDataset

Base class for structures that are bridges between Datasets and Models.

ImputationDataset

A dataset for imputation tasks.

class SpatioTemporalDataset(target: Union[DataFrame, ndarray, Tensor], index: Optional[Union[DatetimeIndex, PeriodIndex, TimedeltaIndex]] = None, mask: Optional[Union[DataFrame, ndarray, Tensor]] = None, connectivity: Optional[Union[Tensor, SparseTensor, ndarray, coo_matrix, csr_matrix, csc_matrix, Tuple[Union[DataFrame, ndarray, Tensor]]]] = None, covariates: Optional[Mapping[str, Union[DataFrame, ndarray, Tensor]]] = None, input_map: Optional[Union[Mapping, BatchMap]] = None, target_map: Optional[Union[Mapping, BatchMap]] = None, auxiliary_map: Optional[Union[Mapping, BatchMap]] = None, scalers: Optional[Mapping[str, Scaler]] = None, trend: Optional[Union[DataFrame, ndarray, Tensor]] = None, transform: Optional[Callable] = None, window: int = 12, horizon: int = 1, delay: int = 0, stride: int = 1, window_lag: int = 1, horizon_lag: int = 1, precision: Union[int, str] = 32, name: Optional[str] = None)[source]#

Base class for structures that are bridges between Datasets and Models.

A SpatioTemporalDataset takes as input a Dataset and build a proper structure to feed deep models.

Parameters:
  • target (DataArray) – Data relative to the primary channels.

  • index (TemporalIndex, optional) – Temporal indices for the data. (default: None)

  • mask (DataArray, optional) – Boolean mask denoting if signal in data is valid (1) or not (0). (default: None)

  • connectivity (SparseTensArray, tuple, optional) – The adjacency matrix defining nodes’ relational information. It can be either a dense/sparse matrix \(\mathbf{A} \in \mathbb{R}^{N \times N}\) or an (edge_index \(\in \mathbb{N}^{2 \times E}\), edge_weight \(\in \mathbb{R}^{E})\) tuple. The input layout will be preserved (e.g., a sparse matrix will be stored as a torch_sparse.SparseTensor). In any case, the connectivity will be stored in the attribute edge_index, and the weights will be eventually stored as edge_weight. (default: None)

  • covariates (dict, optional) – Dictionary of exogenous channels with label. An exogenous element is a temporal array with node- or graph-level channels which are covariates to the main signal. The temporal dimension must be equal to the temporal dimension of data, as well as the number of nodes if the exogenous is node-level. (default: None)

  • input_map (BatchMap or dict, optional) – Defines how data (i.e., the target and the covariates) are mapped to dataset sample input. Keys in the mapping are keys in both item and item.input, while values are BatchMapItem. (default: None)

  • target_map (BatchMap or dict, optional) – Defines how data (i.e., the target and the covariates) are mapped to dataset sample target. Keys in the mapping are keys in both item and item.target, while values are BatchMapItem. (default: None)

  • auxiliary_map (BatchMap or dict, optional) – Defines how data (i.e., the target and the covariates) are added as additional attributes to the dataset sample. Keys in the mapping are keys only in item, while values are BatchMapItem. (default: None)

  • scalers (Mapping or None) – Dictionary of scalers that must be used for data preprocessing. (default: None)

  • trend (DataArray, optional) – Trend paired with main signal. Must be of the same shape of data. (default: None)

  • transform (callable, optional) – A function/transform that takes in a tsl.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • window (int) – Length (in number of steps) of the lookback window. (default: 12)

  • horizon (int) – Length (in number of steps) of the prediction horizon. (default: 1)

  • delay (int) – Offset (in number of steps) between end of window and start of horizon. (default: 0)

  • stride (int) – Offset (in number of steps) between a sample and the next one. (default: 1)

  • window_lag (int) – Sampling frequency (in number of steps) in lookback window. (default: 1)

  • horizon_lag (int) – Sampling frequency (in number of steps) in prediction horizon. (default: 1)

  • precision (int or str, optional) – The float precision to store the data. Can be expressed as number (16, 32, or 64) or string (“half”, “full”, “double”). (default: 32)

  • name (str, optional) – The (optional) name of the dataset.

property n_steps: int#

Total number of time steps in the dataset.

property n_nodes: int#

Number of nodes in the dataset.

property n_channels: int#

Number of channels in dataset’s target.

property n_edges: Optional[int]#

Number of edges in the dataset, if a connectivity is set.

property shape: tuple#

Shape of the target tensor.

property patterns: dict#

Shows the dimension of dataset’s tensors in a more informative way.

The pattern mapping can be useful to glimpse on how data are arranged. The convention we use is the following:

  • ‘t’ stands for “number of time steps”

  • ‘n’ stands for “number of nodes”

  • ‘f’ stands for “number of features” (per node)

  • ‘e’ stands for “number edges”

property batch_patterns: dict#

Shows the dimension of dataset’s tensors in a more informative way.

The pattern mapping can be useful to glimpse on how data are arranged. The convention we use is the following:

  • ‘t’ stands for “number of time steps”

  • ‘n’ stands for “number of nodes”

  • ‘f’ stands for “number of features” (per node)

  • ‘e’ stands for “number edges”

property keys: list#

Keys in dataset.

property batch_keys: list#

Keys in dataset item.

property horizon_offset: int#

Lag of starting step of horizon.

property sample_span: int#

Total number of steps of an item, including window and horizon.

property samples_offset: int#

Difference (in number of steps) between two items.

property indices: Tensor#

Indices of the dataset. The i-th item is mapped to indices[i]

property n_samples: int#

Number of samples (i.e., items) in the dataset.

property covariates: dict#

Mapping of dataset’s covariates.

property exogenous: dict#

Time-varying covariates of the dataset’s target.

property attributes: dict#

Static features related to the dataset.

property n_covariates: int#

Number of covariates in the dataset.

property has_connectivity: bool#

Whether the dataset has a connectivity.

property has_mask: bool#

Whether the dataset has a mask denoting valid values in target.

property has_covariates: bool#

Whether the dataset has covariates to the target tensor.

set_data(data: Union[DataFrame, ndarray, Tensor])[source]#

Set sequence of primary channels at self.data.

set_mask(mask: Optional[Union[DataFrame, ndarray, Tensor]], add_to_auxiliary_map: bool = True)[source]#

Set mask of target channels, i.e., a bool for each (node, time step, channel) triplet denoting if corresponding value in target is observed (obj:True) or not (obj:False).

set_connectivity(connectivity: Union[Tensor, SparseTensor, ndarray, coo_matrix, csr_matrix, csc_matrix, Tuple[Union[DataFrame, ndarray, Tensor]]], target_layout: Optional[str] = None)[source]#

Set dataset connectivity.

The input can be either a dense/sparse matrix \(\mathbf{A} \in \mathbb{R}^{N \times N}\) or an (edge_index \(\in \mathbb{N}^{2 \times E}\), edge_weight \(\in \mathbb{R}^{E})\) tuple. If target_layout is None, the input layout will be preserved (e.g., a sparse matrix will be stored as a torch_sparse.SparseTensor), otherwise the connectivity is converted to the specified layout. In any case, the connectivity will be stored in the attribute edge_index, and the weights will be eventually stored as edge_weight.

Parameters:
  • connectivity (SparseTensArray, tuple, optional) – The connectivity

  • target_layout (str, optional) – If specified, the input connectivity is converted to this layout. Possible options are [dense, sparse, edge_index]. If None, the target layout is inferred from the input. (default: None)

add_covariate(name: str, value: Union[DataFrame, ndarray, Tensor], pattern: Optional[str] = None, add_to_input_map: bool = True, synch_mode: Optional[SynchMode] = None, preprocess: bool = True, convert_precision: bool = True)[source]#

Add covariate to the dataset. Examples of covariate are exogenous signals (in the form of dynamic multidimensional data) or static attributes (e.g., graph/node metadata). Parameter pattern specifies what each axis refers to:

  • ‘t’: temporal dimension;

  • ‘n’: node dimension;

  • ‘c’/’f’: channels/features dimension.

For instance, the pattern of a node-level covariate is ‘t n f’, while a pairwise metric between nodes has pattern ‘n n’.

Parameters:
  • name (str) – the name of the object. You can then access the added object as dataset.{name}.

  • value (DataArray) – the object to be added. Can be a DataFrame, a ndarray or a Tensor.

  • pattern (str, optional) –

    the pattern of the object. A pattern specifies what each axis refers to:

    • ’t’: temporal dimension;

    • ’n’: node dimension;

    • ’c’/’f’: channels/features dimension.

    If None, the pattern is inferred from the shape. (default None)

  • add_to_input_map (bool) – Whether to map the covariate to dataset item when calling get methods. (default: True)

  • synch_mode (SynchMode) – How to synchronize the exogenous variable inside dataset item, i.e., with the window slice (SynchMode.WINDOW) or horizon (SynchMode.HORIZON). It applies only for time-varying covariates. (default: SynchMode.WINDOW)

  • preprocess (bool) – If True and the dataset has a scaler with same key, then data are scaled when calling get methods. (default: True)

  • convert_precision (bool) – If True, then cast value with dataset’s precision. (default: True)

update_covariate(name: str, value: Optional[Union[DataFrame, ndarray, Tensor]] = None, pattern: Optional[str] = None, add_to_input_map: bool = True, synch_mode: Optional[SynchMode] = None, preprocess: Optional[bool] = None)[source]#

Update a covariate already in the dataset.

Parameters:
  • name (str) – the name of the object. You can then access the added object as dataset.{name}.

  • value (DataArray, optional) – the object to be added. Can be a DataFrame, a ndarray or a Tensor.

  • pattern (str, optional) –

    the pattern of the object. A pattern specifies what each axis refers to:

    • ’t’: temporal dimension;

    • ’n’: node dimension;

    • ’c’/’f’: channels/features dimension.

    If None, the pattern is inferred from the shape. (default None)

  • add_to_input_map (bool) – Whether to map the covariate to dataset item when calling get methods. (default: True)

  • synch_mode (SynchMode) – How to synchronize the exogenous variable inside dataset item, i.e., with the window slice (SynchMode.WINDOW) or horizon (SynchMode.HORIZON). It applies only for time-varying covariates. (default: SynchMode.WINDOW)

  • preprocess (bool) – If True and the dataset has a scaler with same key, then data are scaled when calling get methods. (default: True)

remove_covariate(name: str)[source]#

Delete covariate from the dataset.

Parameters:

name (str) – the name of the covariate to be deleted.

add_exogenous(name: str, value: Union[DataFrame, ndarray, Tensor], node_level: bool = True, add_to_input_map: bool = True, synch_mode: SynchMode = SynchMode.WINDOW, preprocess: bool = True)[source]#

Shortcut method to add a time-varying covariate.

Exogenous variables are time-varying covariates of the dataset’s target. They can either be graph-level (i.e., with same temporal length as target but with no node dimension) or node-level (i.e., with same temporal and node size as target).

Parameters:
  • name (str) – The name of the exogenous variable. If the name starts with "global_", the variable is assumed to be graph-level (overriding parameter node_level), and the "global_" prefix is removed from the name.

  • value (DataArray) – The data sequence. Can be a DataFrame, a ndarray or a Tensor.

  • node_level (bool) – Whether the input variable is node- or graph-level. If a 2-dimensional array is given and node-level is True, it is assumed that the input has one channel. (default: True)

  • add_to_input_map (bool) – Whether to map the exogenous variable to dataset item when calling get methods. (default: True)

  • synch_mode (SynchMode) – How to synchronize the exogenous variable inside dataset item, i.e., with the window slice (SynchMode.WINDOW) or horizon (SynchMode.HORIZON). (default: SynchMode.WINDOW)

  • preprocess (bool) – If True and the dataset has a scaler with same key, then data are scaled when calling get methods. (default: True)

Returns:

the dataset with added exogenous.

Return type:

SpatioTemporalDataset

set_trend(trend: Optional[Union[DataFrame, ndarray, Tensor]])[source]#

Set trend of dataset’s target data.

add_scaler(key: str, scaler: Union[Scaler, ScalerModule])[source]#

Add a tsl.data.preprocessing.Scaler for the object indexed by key in the dataset.

Parameters:
  • key (str) – The name of the variable associated to the scaler. It must be a temporal variable, i.e., data or an exogenous.

  • scaler (Scaler) – The Scaler.

reduce(time_index: Optional[Union[slice, List, Tuple, Tensor, ndarray]] = None, node_index: Optional[Union[slice, List, Tuple, Tensor, ndarray]] = None)[source]#

Reduce the dataset in terms of number of steps and nodes. Returns a copy of the reduced dataset.

If dataset has a connectivity, edges ending to or starting from removed nodes will be removed as well.

Parameters:
  • time_index (IndexSlice, optional) – index or mask of the nodes to keep after reduction. (default: None)

  • node_index (IndexSlice, optional) – index or mask of the nodes to keep after reduction. (default: None)

reduce_(time_index: Optional[Union[slice, List, Tuple, Tensor, ndarray]] = None, node_index: Optional[Union[slice, List, Tuple, Tensor, ndarray]] = None)[source]#

Reduce the dataset in terms of number of steps and nodes. This is an inplace operation.

If dataset has a connectivity, edges ending to or starting from removed nodes will be removed as well.

Parameters:
  • time_index (IndexSlice, optional) – index or mask of the nodes to keep after reduction. (default: None)

  • node_index (IndexSlice, optional) – index or mask of the nodes to keep after reduction. (default: None)

save(filename: str) None[source]#

Save SpatioTemporalDataset to disk.

Parameters:

filename (str) – path to filename for storage.

classmethod load(filename: str) SpatioTemporalDataset[source]#

Load instance of SpatioTemporalDataset from disk.

Parameters:

filename (str) – path of SpatioTemporalDataset.

classmethod from_dataset(dataset, connectivity: Optional[Union[Tensor, SparseTensor, ndarray, coo_matrix, csr_matrix, csc_matrix, Tuple[Union[DataFrame, ndarray, Tensor]]]] = None, covariate_keys: Optional[List[str]] = None, input_map: Optional[Union[Mapping, BatchMap]] = None, target_map: Optional[Union[Mapping, BatchMap]] = None, auxiliary_map: Optional[Union[Mapping, BatchMap]] = None, scalers: Optional[Mapping[str, Scaler]] = None, trend: Optional[Union[DataFrame, ndarray, Tensor]] = None, window: int = 12, horizon: int = 1, delay: int = 0, stride: int = 1, window_lag: int = 1, horizon_lag: int = 1) SpatioTemporalDataset[source]#

Create a SpatioTemporalDataset from a TabularDataset.

class ImputationDataset(target: Union[DataFrame, ndarray, Tensor], eval_mask: Union[DataFrame, ndarray, Tensor], index: Optional[Union[DatetimeIndex, PeriodIndex, TimedeltaIndex]] = None, mask: Optional[Union[DataFrame, ndarray, Tensor]] = None, connectivity: Optional[Union[Tensor, SparseTensor, ndarray, coo_matrix, csr_matrix, csc_matrix, Tuple[Union[DataFrame, ndarray, Tensor]]]] = None, covariates: Optional[Mapping[str, Union[DataFrame, ndarray, Tensor]]] = None, input_map: Optional[Union[Mapping, BatchMap]] = None, target_map: Optional[Union[Mapping, BatchMap]] = None, auxiliary_map: Optional[Union[Mapping, BatchMap]] = None, scalers: Optional[Mapping[str, Scaler]] = None, trend: Optional[Union[DataFrame, ndarray, Tensor]] = None, transform: Optional[Callable] = None, window: int = 12, stride: int = 1, window_lag: int = 1, precision: Union[int, str] = 32, name: Optional[str] = None)[source]#

A dataset for imputation tasks. It is a subclass of SpatioTemporalDataset and most of its attributes. The main difference is the addition of a eval_mask attribute which is a boolean mask denoting if values to evaluate imputations.

Parameters:
  • target (DataArray) – Data relative to the primary channels.

  • eval_mask (DataArray) – Boolean mask denoting values that can be used for evaluating imputations. The mask is True if the corresponding value must be used for evaluation and False otherwise.

  • index (TemporalIndex, optional) – Temporal indices for the data. (default: None)

  • mask (DataArray, optional) – Boolean mask denoting if signal in data is valid (True) or not (False). (default: None)

  • connectivity (SparseTensArray, tuple, optional) – The adjacency matrix defining nodes’ relational information. It can be either a dense/sparse matrix \(\mathbf{A} \in \mathbb{R}^{N \times N}\) or an (edge_index \(\in \mathbb{N}^{2 \times E}\), edge_weight \(\in \mathbb{R}^{E})\) tuple. The input layout will be preserved (e.g., a sparse matrix will be stored as a torch_sparse.SparseTensor). In any case, the connectivity will be stored in the attribute edge_index, and the weights will be eventually stored as edge_weight. (default: None)

  • covariates (dict, optional) – Dictionary of exogenous channels with label. An exogenous element is a temporal array with node- or graph-level channels which are covariates to the main signal. The temporal dimension must be equal to the temporal dimension of data, as well as the number of nodes if the exogenous is node-level. (default: None)

  • input_map (BatchMap or dict, optional) – Defines how data (i.e., the target and the covariates) are mapped to dataset sample input. Keys in the mapping are keys in both item and item.input, while values are BatchMapItem. (default: None)

  • target_map (BatchMap or dict, optional) – Defines how data (i.e., the target and the covariates) are mapped to dataset sample target. Keys in the mapping are keys in both item and item.target, while values are BatchMapItem. (default: None)

  • auxiliary_map (BatchMap or dict, optional) – Defines how data (i.e., the target and the covariates) are added as additional attributes to the dataset sample. Keys in the mapping are keys only in item, while values are BatchMapItem. (default: None)

  • scalers (Mapping or None) – Dictionary of scalers that must be used for data preprocessing. (default: None)

  • trend (DataArray, optional) – Trend paired with main signal. Must be of the same shape of data. (default: None)

  • transform (callable, optional) – A function/transform that takes in a tsl.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • window (int) – Length (in number of steps) of the lookback window. (default: 12)

  • stride (int) – Offset (in number of steps) between a sample and the next one. (default: 1)

  • window_lag (int) – Sampling frequency (in number of steps) in lookback window. (default: 1)

  • precision (int or str, optional) – The float precision to store the data. Can be expressed as number (16, 32, or 64) or string (“half”, “full”, “double”). (default: 32)

  • name (str, optional) – The (optional) name of the dataset.

set_mask(mask: Optional[Union[DataFrame, ndarray, Tensor]], add_to_input_map: bool = True)[source]#

Set mask of target channels, i.e., a bool for each (node, time step, channel) triplet denoting if corresponding value in target is observed (obj:True) or not (obj:False).