PyTorch Datasets#
Base class for structures that are bridges between Datasets and Models. |
|
Extension of |
- class SpatioTemporalDataset(target: Union[DataFrame, ndarray, Tensor], index: Optional[Union[DatetimeIndex, PeriodIndex, TimedeltaIndex]] = None, mask: Optional[Union[DataFrame, ndarray, Tensor]] = None, connectivity: Optional[Union[Tensor, SparseTensor, ndarray, coo_matrix, csr_matrix, csc_matrix, Tuple[Union[DataFrame, ndarray, Tensor]]]] = None, covariates: Optional[Mapping[str, Union[DataFrame, ndarray, Tensor]]] = None, input_map: Optional[Union[Mapping, BatchMap]] = None, target_map: Optional[Union[Mapping, BatchMap]] = None, auxiliary_map: Optional[Union[Mapping, BatchMap]] = None, scalers: Optional[Mapping[str, Scaler]] = None, trend: Optional[Union[DataFrame, ndarray, Tensor]] = None, transform: Optional[Callable] = None, window: int = 12, horizon: int = 1, delay: int = 0, stride: int = 1, window_lag: int = 1, horizon_lag: int = 1, precision: Union[int, str] = 32, name: Optional[str] = None)[source]#
Base class for structures that are bridges between Datasets and Models.
A
SpatioTemporalDataset
takes as input aDataset
and build a proper structure to feed deep models.- Parameters:
target (DataArray) – Data relative to the primary channels.
index (TemporalIndex, optional) – Temporal indices for the data. (default:
None
)mask (DataArray, optional) – Boolean mask denoting if signal in data is valid (1) or not (0). (default:
None
)connectivity (SparseTensArray, tuple, optional) – The adjacency matrix defining nodes’ relational information. It can be either a dense/sparse matrix \(\mathbf{A} \in \mathbb{R}^{N \times N}\) or an (
edge_index
\(\in \mathbb{N}^{2 \times E}\),edge_weight
\(\in \mathbb{R}^{E})\) tuple. The input layout will be preserved (e.g., a sparse matrix will be stored as atorch_sparse.SparseTensor
). In any case, the connectivity will be stored in the attributeedge_index
, and the weights will be eventually stored asedge_weight
. (default:None
)covariates (dict, optional) – Dictionary of exogenous channels with label. An
exogenous
element is a temporal array with node- or graph- level channels which are covariates to the main signal. The temporal dimension must be equal to the temporal dimension of data, as well as the number of nodes if the exogenous is node-level. (default:None
)input_map (BatchMap or dict, optional) – Defines how data (i.e., the target and the covariates) are mapped to dataset sample input. Keys in the mapping are keys in both
item
anditem.input
, while values areBatchMapItem
. (default:None
)target_map (BatchMap or dict, optional) – Defines how data (i.e., the target and the covariates) are mapped to dataset sample target. Keys in the mapping are keys in both
item
anditem.target
, while values areBatchMapItem
. (default:None
)auxiliary_map (BatchMap or dict, optional) – Defines how data (i.e., the target and the covariates) are added as additional attributes to the dataset sample. Keys in the mapping are keys only in
item
, while values areBatchMapItem
. (default:None
)scalers (Mapping or None) – Dictionary of scalers that must be used for data preprocessing. (default:
None
)trend (DataArray, optional) – Trend paired with main signal. Must be of the same shape of data. (default:
None
)transform (callable, optional) – A function/transform that takes in a
tsl.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)window (int) – Length (in number of steps) of the lookback window. (default: 12)
horizon (int) – Length (in number of steps) of the prediction horizon. (default: 1)
delay (int) – Offset (in number of steps) between end of window and start of horizon. (default: 0)
stride (int) – Offset (in number of steps) between a sample and the next one. (default: 1)
window_lag (int) – Sampling frequency (in number of steps) in lookback window. (default: 1)
horizon_lag (int) – Sampling frequency (in number of steps) in prediction horizon. (default: 1)
precision (int or str, optional) – The float precision to store the data. Can be expressed as number (16, 32, or 64) or string (“half”, “full”, “double”). (default: 32)
name (str, optional) – The (optional) name of the dataset.
- property patterns: dict#
Shows the dimension of dataset’s tensors in a more informative way.
The pattern mapping can be useful to glimpse on how data are arranged. The convention we use is the following:
‘t’ stands for “number of time steps”
‘n’ stands for “number of nodes”
‘f’ stands for “number of features” (per node)
‘e’ stands for “number edges”
- property batch_patterns: dict#
Shows the dimension of dataset’s tensors in a more informative way.
The pattern mapping can be useful to glimpse on how data are arranged. The convention we use is the following:
‘t’ stands for “number of time steps”
‘n’ stands for “number of nodes”
‘f’ stands for “number of features” (per node)
‘e’ stands for “number edges”
- set_data(data: Union[DataFrame, ndarray, Tensor])[source]#
Set sequence of primary channels at
self.data
.
- set_mask(mask: Optional[Union[DataFrame, ndarray, Tensor]])[source]#
Set mask of target channels, i.e., a bool for each (node, time step, channel) triplet denoting if corresponding value in target is observed (obj:True) or not (obj:False).
- set_connectivity(connectivity: Union[Tensor, SparseTensor, ndarray, coo_matrix, csr_matrix, csc_matrix, Tuple[Union[DataFrame, ndarray, Tensor]]], target_layout: Optional[str] = None)[source]#
Set dataset connectivity.
The input can be either a dense/sparse matrix \(\mathbf{A} \in \mathbb{R}^{N \times N}\) or an (
edge_index
\(\in \mathbb{N}^{2 \times E}\),edge_weight
\(\in \mathbb{R}^{E})\) tuple. Iftarget_layout
isNone
, the input layout will be preserved (e.g., a sparse matrix will be stored as atorch_sparse.SparseTensor
), otherwise the connectivity is converted to the specified layout. In any case, the connectivity will be stored in the attributeedge_index
, and the weights will be eventually stored asedge_weight
.
- add_covariate(name: str, value: Union[DataFrame, ndarray, Tensor], pattern: Optional[str] = None, add_to_input_map: bool = True, synch_mode: Optional[SynchMode] = None, preprocess: bool = True)[source]#
Add covariate to the dataset. Examples of covariate are exogenous signals (in the form of dynamic multidimensional data) or static attributes (e.g., graph/node metadata). Parameter
pattern
specifies what each axis refers to:‘t’: temporal dimension;
‘n’: node dimension;
‘c’/’f’: channels/features dimension.
For instance, the pattern of a node-level covariate is ‘t n f’, while a pairwise metric between nodes has pattern ‘n n’.
- Parameters:
name (str) – the name of the object. You can then access the added object as
dataset.{name}
.value (DataArray) – the object to be added. Can be a
DataFrame
, andarray
or aTensor
.pattern (str, optional) –
the pattern of the object. A pattern specifies what each axis refers to:
’t’: temporal dimension;
’n’: node dimension;
’c’/’f’: channels/features dimension.
If
None
, the pattern is inferred from the shape. (defaultNone
)add_to_input_map (bool) – Whether to map the covariate to dataset item when calling
get
methods. (default:True
)synch_mode (SynchMode) – How to synchronize the exogenous variable inside dataset item, i.e., with the window slice (
SynchMode.WINDOW
) or horizon (SynchMode.HORIZON
). It applies only for time-varying covariates. (default:SynchMode.WINDOW
)preprocess (bool) – If
True
and the dataset has a scaler with same key, then data are scaled when callingget
methods. (default:True
)
- update_covariate(name: str, value: Optional[Union[DataFrame, ndarray, Tensor]] = None, pattern: Optional[str] = None, add_to_input_map: bool = True, synch_mode: Optional[SynchMode] = None, preprocess: Optional[bool] = None)[source]#
Update a covariate already in the dataset.
- Parameters:
name (str) – the name of the object. You can then access the added object as
dataset.{name}
.value (DataArray, optional) – the object to be added. Can be a
DataFrame
, andarray
or aTensor
.pattern (str, optional) –
the pattern of the object. A pattern specifies what each axis refers to:
’t’: temporal dimension;
’n’: node dimension;
’c’/’f’: channels/features dimension.
If
None
, the pattern is inferred from the shape. (defaultNone
)add_to_input_map (bool) – Whether to map the covariate to dataset item when calling
get
methods. (default:True
)synch_mode (SynchMode) – How to synchronize the exogenous variable inside dataset item, i.e., with the window slice (
SynchMode.WINDOW
) or horizon (SynchMode.HORIZON
). It applies only for time-varying covariates. (default:SynchMode.WINDOW
)preprocess (bool) – If
True
and the dataset has a scaler with same key, then data are scaled when callingget
methods. (default:True
)
- remove_covariate(name: str)[source]#
Delete covariate from the dataset.
- Parameters:
name (str) – the name of the covariate to be deleted.
- add_exogenous(name: str, value: Union[DataFrame, ndarray, Tensor], node_level: bool = True, add_to_input_map: bool = True, synch_mode: SynchMode = SynchMode.WINDOW, preprocess: bool = True)[source]#
Shortcut method to add a time-varying covariate.
Exogenous variables are time-varying covariates of the dataset’s target. They can either be graph-level (i.e., with same temporal length as
target
but with no node dimension) or node-level (i.e., with same temporal and node size astarget
).- Parameters:
name (str) – The name of the exogenous variable. If the name starts with
"global_"
, the variable is assumed to be graph-level (overriding parameternode_level
), and the"global_"
prefix is removed from the name.value (DataArray) – The data sequence. Can be a
DataFrame
, andarray
or aTensor
.node_level (bool) – Whether the input variable is node- or graph-level. If a 2-dimensional array is given and node-level is
True
, it is assumed that the input has one channel. (default:True
)add_to_input_map (bool) – Whether to map the exogenous variable to dataset item when calling
get
methods. (default:True
)synch_mode (SynchMode) – How to synchronize the exogenous variable inside dataset item, i.e., with the window slice (
SynchMode.WINDOW
) or horizon (SynchMode.HORIZON
). (default:SynchMode.WINDOW
)preprocess (bool) – If
True
and the dataset has a scaler with same key, then data are scaled when callingget
methods. (default:True
)
- Returns:
the dataset with added exogenous.
- Return type:
- set_trend(trend: Optional[Union[DataFrame, ndarray, Tensor]])[source]#
Set trend of dataset’s target data.
- add_scaler(key: str, scaler: Union[Scaler, ScalerModule])[source]#
Add a
tsl.data.preprocessing.Scaler
for the object indexed bykey
in the dataset.
- reduce(time_index: Optional[Union[slice, List, Tuple, Tensor, ndarray]] = None, node_index: Optional[Union[slice, List, Tuple, Tensor, ndarray]] = None)[source]#
Reduce the dataset in terms of number of steps and nodes. Returns a copy of the reduced dataset.
If dataset has a connectivity, edges ending to or starting from removed nodes will be removed as well.
- reduce_(time_index: Optional[Union[slice, List, Tuple, Tensor, ndarray]] = None, node_index: Optional[Union[slice, List, Tuple, Tensor, ndarray]] = None)[source]#
Reduce the dataset in terms of number of steps and nodes. This is an inplace operation.
If dataset has a connectivity, edges ending to or starting from removed nodes will be removed as well.
- save(filename: str) None [source]#
Save
SpatioTemporalDataset
to disk.- Parameters:
filename (str) – path to filename for storage.
- classmethod load(filename: str) SpatioTemporalDataset [source]#
Load instance of
SpatioTemporalDataset
from disk.- Parameters:
filename (str) – path of
SpatioTemporalDataset
.
- classmethod from_dataset(dataset, connectivity: Optional[Union[Tensor, SparseTensor, ndarray, coo_matrix, csr_matrix, csc_matrix, Tuple[Union[DataFrame, ndarray, Tensor]]]] = None, input_map: Optional[Union[Mapping, BatchMap]] = None, target_map: Optional[Union[Mapping, BatchMap]] = None, auxiliary_map: Optional[Union[Mapping, BatchMap]] = None, scalers: Optional[Mapping[str, Scaler]] = None, trend: Optional[Union[DataFrame, ndarray, Tensor]] = None, window: int = 12, horizon: int = 1, delay: int = 0, stride: int = 1, window_lag: int = 1, horizon_lag: int = 1) SpatioTemporalDataset [source]#
Create a
SpatioTemporalDataset
from aTabularDataset
.
- class ImputationDataset(target: Union[DataFrame, ndarray, Tensor], eval_mask: Union[DataFrame, ndarray, Tensor], index: Optional[Union[DatetimeIndex, PeriodIndex, TimedeltaIndex]] = None, input_mask: Optional[Union[DataFrame, ndarray, Tensor]] = None, connectivity: Optional[Union[Tensor, SparseTensor, ndarray, coo_matrix, csr_matrix, csc_matrix, Tuple[Union[DataFrame, ndarray, Tensor]]]] = None, covariates: Optional[Mapping[str, Union[DataFrame, ndarray, Tensor]]] = None, input_map: Optional[Union[Mapping, BatchMap]] = None, target_map: Optional[Union[Mapping, BatchMap]] = None, auxiliary_map: Optional[Union[Mapping, BatchMap]] = None, scalers: Optional[Mapping[str, Scaler]] = None, trend: Optional[Union[DataFrame, ndarray, Tensor]] = None, transform: Optional[Callable] = None, window: int = 12, stride: int = 1, window_lag: int = 1, horizon_lag: int = 1, precision: Union[int, str] = 32, name: Optional[str] = None)[source]#
Extension of
SpatioTemporalDataset
for imputation.