Spatiotemporal Dataset#
Warning
This page is still under development.
An elegant and effective solution to handle datasets in PyTorch is by means of
the torch.utils.data.Dataset
object. This object allows us to access
the samples as in any Python mapping object, by implementing the
__getitem__()
and __len__()
protocols. As such, a the idx
th sample
in such a mapstyle dataset is retrieved by dataset[idx]
(see the PyTorch
tutorial
on datasets and dataloaders for more information).
The main class in tsl for handling spatiotemporal datasets is the
tsl.data.SpatioTemporalDataset
object, that inherits directly from the
PyTorch Dataset
. The core functionality of SpatioTemporalDataset
is
to map (long) sequence of spatiotemporal data into tsl.data.Data
samples. In this section, we explain in details how to create
properly SpatioTemporalDataset
objects.
Sliding window#
A common approach in timeseries analysis when dealing with long sequences of data consists in splitting the data along the temporal dimension in sliding windows of fixed length. Then, for supervised learning methods, the created windows are associated with a label that can be a new sequence of data (as in regression problems) or a class (in the case of classification problems).
The SpatioTemporalDataset
object eases the creation of such a
dataset from tabular data. In particular, the parameters used to define how to
create the windows from the entire sequence are the following:
window
: length of the temporal window.horizon
: length of the target sequence (e.g., forecasting horizon).delay
: number of steps between the window’s end and the target sequence’s beginning.stride
: number of steps between a sample and the next one.window_lag
: window’s sampling frequency (in number of time steps).horizon_lag
: horizon’s sampling frequency (in number of time steps).
Fig. 1 shows graphically how these parameters affect the dataset partitioning into windows.
In the case illustrated in the figure, we have window=6
,
horizon=4
, delay=3
, and stride=3
, with unitary window_lag
and horizon_lag
. Note that the number of samples
n_samples
will always be lower than
the number of time steps n_steps
.
Note
The SpatioTemporalDataset
object is automatically
partitioned into samples every time that any of these parameter is updated.
You can override the computed windows by assigning to the dataset specific
sample indices (see set_indices()
).
We report in Table 1 some example configuration for prediction/forecasting problems.
Window 
Horizon 
Delay 
Stride 


\(H\)stepahead prediction 
Any 
\(H\) 
0 
Any 
\(L\)lagged \(H\)stepahead prediction 
Any 
\(H\) 
\(L\) 
Any 
\(H\)stepahead predictions (disjoint windows) 
Any 
\(H\) 
0 
\(H\) 
Nonetheless, we can play around with these parameters to enable more complex configuration, as for instance window reconstruction. Table 2 shows some examples on how to set the windowing parameters for imputation.
Window 
Horizon 
Delay 
Stride 


Inwindow imputation 
\(W\) 
\(W\) 
\(W\) 
Any 
Inwindow imputation with \(K\)th steps of warmup 
\(W\) 
\(W  K\) 
\(W\) 
Any 
\(t\)th step imputation 
\(2t  1\) 
\(1\) 
\(t\) 
\(1\) 
ImputationDataset
The tsl.data.ImputationDataset
object provides shortcut APIs for
the creation of SpatioTemporalDataset
objects tailored
for the imputation task.
Adding spatiotemporal data#
A spatiotemporal dataset need spatiotemporal data. In standard autoregressive problems (e.g., forecasting), the objective is to model future values of a time series conditioned to a (finite) set of past observations. We call the 3dimensional tensor representing this time series – spanning over temporal, spatial and features dimensions – the target of the dataset.
The target
argument is the only mandatory argument for creating a
SpatioTemporalDataset
. Unless otherwise specified (see
Mapping tensors to graph attributes), the tensor set as
target
is mapped in dataset sample
dataset[idx]
as:
dataset[idx].x
, the sequence of past observations, lasting fordataset.window
time steps.dataset[idx].y
, the sequence of future values with lengthdataset.horizon
.
Note
The target
tensor is assumed to have
always three dimensions: time, nodes (i.e., spatial points) and features. If
the input data is bidimensional, then a dummy unidimensional feature is
inferred.
Any other data coming into play is handled as a covariate to the target sequence. Covariates are not restricted to a specific shape or number of dimensions. It is a good practice to specify to which dimension each axis in the data refers to by means of patterns.
Understanding patterns#
Spatial relationships#
Mapping tensors to graph attributes#
Understanding patterns#
The t > n > f Convention#
In tsl, tabular data of this form are represented by following the [Time, Node, Features] (T N F) convention. Considering the previous example, we represent measurements acquired by 400 air quality monitoring stations in a day (with a sampling interval of one hour) as a tensor \(\mathbf{X}\) with dimensions \(\left(24, 400, 3 \right)\).
Note
Unless otherwise stated, all layers and models in tsl.nn
expect
as input a 4dim tensor shaped as [batch_size, steps, nodes, channels]
.