Spatiotemporal Dataset#
Warning
This page is still under development.
An elegant and effective solution to handle datasets in PyTorch is by means of
the torch.utils.data.Dataset
object. This object allows us to access
the samples as in any Python mapping object, by implementing the
__getitem__()
and __len__()
protocols. As such, a the idx
-th sample
in such a map-style dataset is retrieved by dataset[idx]
(see the PyTorch
tutorial
on datasets and dataloaders for more information).
The main class in TSL for handling spatiotemporal datasets is the
tsl.data.SpatioTemporalDataset
object, that inherits directly from the
PyTorch Dataset
. The core functionality of SpatioTemporalDataset
is
to map (long) sequence of spatiotemporal data into tsl.data.Data
samples. In this section, we explain in details how to create
properly SpatioTemporalDataset
objects.
Sliding window#
A common approach in time-series analysis when dealing with long sequences of data consists in splitting the data along the temporal dimension in sliding windows of fixed length. Then, for supervised learning methods, the created windows are associated with a label that can be a new sequence of data (as in regression problems) or a class (in the case of classification problems).
The SpatioTemporalDataset
object eases the creation of such a
dataset from tabular data. In particular, the parameters used to define how to
create the windows from the entire sequence are the following:
window
: length of the temporal window.horizon
: length of the target sequence (e.g., forecasting horizon).delay
: number of steps between the window’s end and the target sequence’s beginning.stride
: number of steps between a sample and the next one.window_lag
: window’s sampling frequency (in number of time steps).horizon_lag
: horizon’s sampling frequency (in number of time steps).
Fig. 1 shows graphically how these parameters affect the dataset partitioning into windows.
Fig. 1 Sliding window parameters.#
In the case illustrated in the figure, we have window=6
,
horizon=4
, delay=3
, and stride=3
, with unitary window_lag
and horizon_lag
. Note that the number of samples
n_samples
will always be lower than
the number of time steps n_steps
.
Note
The SpatioTemporalDataset
object is automatically
partitioned into samples every time that any of these parameter is updated.
You can override the computed windows by assigning to the dataset specific
sample indices (see set_indices()
).
We report in Table 1 some example configuration for prediction/forecasting problems.
Window |
Horizon |
Delay |
Stride |
|
---|---|---|---|---|
\(H\)-step-ahead prediction |
Any |
\(H\) |
0 |
Any |
\(L\)-lagged \(H\)-step-ahead prediction |
Any |
\(H\) |
\(L\) |
Any |
\(H\)-step-ahead predictions (disjoint windows) |
Any |
\(H\) |
0 |
\(H\) |
Nonetheless, we can play around with these parameters to enable more complex configuration, as for instance window reconstruction. Table 2 shows some examples on how to set the windowing parameters for imputation.
Window |
Horizon |
Delay |
Stride |
|
---|---|---|---|---|
In-window imputation |
\(W\) |
\(W\) |
\(-W\) |
Any |
In-window imputation with \(K\)-th steps of warmup |
\(W\) |
\(W - K\) |
\(-W\) |
Any |
\(t\)-th step imputation |
\(2t - 1\) |
\(1\) |
\(-t\) |
\(1\) |
ImputationDataset
The tsl.data.ImputationDataset
object provides shortcut APIs for
the creation of SpatioTemporalDataset
objects tailored
for the imputation task.
Adding spatiotemporal data#
A spatiotemporal dataset need spatiotemporal data. In standard autoregressive problems (e.g., forecasting), the objective is to model future values of a time series conditioned to a (finite) set of past observations. We call the 3-dimensional tensor representing this time series – spanning over temporal, spatial and features dimensions – the target of the dataset.
The target
argument is the only mandatory argument for creating a
SpatioTemporalDataset
. Unless otherwise specified (see
`Mapping tensors to graph attributes`_), the tensor set as
target
is mapped in dataset sample
dataset[idx]
as:
dataset[idx].x
, the sequence of past observations, lasting fordataset.window
time steps.dataset[idx].y
, the sequence of future values with lengthdataset.horizon
.
Note
The target
tensor is assumed to have
always three dimensions: time, nodes (i.e., spatial points) and features. If
the input data is bi-dimensional, then a dummy uni-dimensional feature is
inferred.
Any other data coming into play is handled as a covariate to the target sequence. Covariates are not restricted to a specific shape or number of dimensions. It is a good practice to specify to which dimension each axis in the data refers to by means of patterns.