Spatiotemporal Dataset#

Warning

An elegant and effective solution to handle datasets in PyTorch is by means of the torch.utils.data.Dataset object. This object allows us to access the samples as in any Python mapping object, by implementing the __getitem__() and __len__() protocols. As such, a the idx-th sample in such a map-style dataset is retrieved by dataset[idx] (see the PyTorch tutorial on datasets and dataloaders for more information).

The main class in tsl for handling spatiotemporal datasets is the tsl.data.SpatioTemporalDataset object, that inherits directly from the PyTorch Dataset. The core functionality of SpatioTemporalDataset is to map (long) sequence of spatiotemporal data into tsl.data.Data samples. In this section, we explain in details how to create properly SpatioTemporalDataset objects.

Sliding window#

A common approach in time-series analysis when dealing with long sequences of data consists in splitting the data along the temporal dimension in sliding windows of fixed length. Then, for supervised learning methods, the created windows are associated with a label that can be a new sequence of data (as in regression problems) or a class (in the case of classification problems).

The SpatioTemporalDataset object eases the creation of such a dataset from tabular data. In particular, the parameters used to define how to create the windows from the entire sequence are the following:

• window: length of the temporal window.

• horizon: length of the target sequence (e.g., forecasting horizon).

• delay: number of steps between the window’s end and the target sequence’s beginning.

• stride: number of steps between a sample and the next one.

• window_lag: window’s sampling frequency (in number of time steps).

• horizon_lag: horizon’s sampling frequency (in number of time steps).

Fig. 1 shows graphically how these parameters affect the dataset partitioning into windows.

In the case illustrated in the figure, we have window=6, horizon=4, delay=3, and stride=3, with unitary window_lag and horizon_lag. Note that the number of samples n_samples will always be lower than the number of time steps n_steps.

Note

The SpatioTemporalDataset object is automatically partitioned into samples every time that any of these parameter is updated. You can override the computed windows by assigning to the dataset specific sample indices (see set_indices()).

We report in Table 1 some example configuration for prediction/forecasting problems.

Table 1 Examples of windowing parameters settings (prediction).#

Window

Horizon

Delay

Stride

$$H$$-step-ahead prediction

Any

$$H$$

0

Any

$$L$$-lagged $$H$$-step-ahead prediction

Any

$$H$$

$$L$$

Any

$$H$$-step-ahead predictions (disjoint windows)

Any

$$H$$

0

$$H$$

Nonetheless, we can play around with these parameters to enable more complex configuration, as for instance window reconstruction. Table 2 shows some examples on how to set the windowing parameters for imputation.

Table 2 Examples of windowing parameters settings (imputation).#

Window

Horizon

Delay

Stride

In-window imputation

$$W$$

$$W$$

$$-W$$

Any

In-window imputation with $$K$$-th steps of warmup

$$W$$

$$W - K$$

$$-W$$

Any

$$t$$-th step imputation

$$2t - 1$$

$$1$$

$$-t$$

$$1$$

ImputationDataset

The tsl.data.ImputationDataset object provides shortcut APIs for the creation of SpatioTemporalDataset objects tailored for the imputation task.

A spatiotemporal dataset need spatiotemporal data. In standard autoregressive problems (e.g., forecasting), the objective is to model future values of a time series conditioned to a (finite) set of past observations. We call the 3-dimensional tensor representing this time series – spanning over temporal, spatial and features dimensions – the target of the dataset.

The target argument is the only mandatory argument for creating a SpatioTemporalDataset. Unless otherwise specified (see Mapping tensors to graph attributes), the tensor set as target is mapped in dataset sample dataset[idx] as:

• dataset[idx].x, the sequence of past observations, lasting for dataset.window time steps.

• dataset[idx].y, the sequence of future values with length dataset.horizon.

Note

The target tensor is assumed to have always three dimensions: time, nodes (i.e., spatial points) and features. If the input data is bi-dimensional, then a dummy uni-dimensional feature is inferred.

Any other data coming into play is handled as a covariate to the target sequence. Covariates are not restricted to a specific shape or number of dimensions. It is a good practice to specify to which dimension each axis in the data refers to by means of patterns.

SpatioTemporalDataset API

See more about the class APIs.

Notebook

Check the introductory notebook.

Understanding patterns#

The t > n > f Convention#

In tsl, tabular data of this form are represented by following the [Time, Node, Features] (T N F) convention. Considering the previous example, we represent measurements acquired by 400 air quality monitoring stations in a day (with a sampling interval of one hour) as a tensor $$\mathbf{X}$$ with dimensions $$\left(24, 400, 3 \right)$$.

Note

Unless otherwise stated, all layers and models in tsl.nn expect as input a 4-dim tensor shaped as [batch_size, steps, nodes, channels].