Data Structures#

tsl is meant to deal with discrete-time spatiotemporal data, i.e., signals that evolve over discrete points in time and space. Common input examples are data coming from sensor networks. In principle, data of this kind can be represented by 3-dimensional tensors, with:

  1. The Time (t) dimension, accounting for the temporal evolution of the signal within a node (i.e., a sensor).

  2. The Node (n) dimension, accounting for simultaneous observations measured at the different nodes in the network in a given time step.

  3. The Features (f) or Channels dimension, allowing for multiple (heterogeneous) measurements at the same spatio-temporal point.

We call a spatiotemporal graph a tensor with finite t, n, and f dimensions, paired with the underlying topology. In tsl, we use the class tsl.data.Data to represent and store the attributes of a single spatiotemporal graph.

The Data object#

The tsl.data.Data object contains attributes related to a single spatiotemporal graph. This object extends torch_geometric.data.Data, preserving all its functionalities and adding utilities for spatiotemporal data processing. The main APIs of Data include:

  • Data.input: view on the tensors stored in Data that are meant to serve as input to the model. In the simplest case of a single node-attribute matrix, we could just have Data.input.x.

  • Data.target: view on the tensors stored in Data used as labels to train the model. In the common case of a single label, we could just have Data.input.y.

  • Data.edge_index: graph connectivity. Can be in COO format (i.e., a Tensor of shape [2, E]) or in form of a torch_sparse.SparseTensor with shape [N, N]. For dynamic graphs – with time-varying topology – edge_index is a Python list of Tensor.

  • Data.edge_weight: weights of the graph connectivity, if Data.edge_index is not a torch_sparse.SparseTensor. For dynamic graphs, edge_weight is a Python list of Tensor.

  • Data.mask: binary mask indicating the data in Data.target.y to be used as ground-truth for the loss (default is None).

  • Data.transform: mapping of ScalerModule, whose keys must be transformable (or transformed) tensors in Data.

  • Data.pattern: mapping containing the pattern for each tensor in Data.

None of these attributes are required and custom attributes can be seamlessly added. Data.input and Data.target – of type StorageView – provide a view on the unique (shared) storage in Data, such that the same key in Data.input and Data.target cannot reference different objects.

Dynamic graphs

If the graph connectivity changes over time, you can pass Python lists as Data.edge_index and Data.edge_weight.

We now consider a simple fully-connected, undirected graph with 3 nodes as the underlying topology. We assume to have a univariate signal – uniformly sampled and synchronized across nodes – on each node, plus a graph-wise exogenous variable (may be, for instance, an encoding of time, equal for all nodes). If we now want to forecast the next step given a sequence of 12 observations, our Data object would look like this:

import torch
from tsl.data import Data

edge_index = torch.tensor([[0, 0, 1, 1, 2, 2],
                           [1, 2, 0, 2, 0, 1]], dtype=torch.long)

input = dict(
    x=torch.randn(12, 3, 1),  # t=12  n=3  f=1
    u=torch.randn(12, 4)      # t=12       f=4
)
target = dict(
    y=torch.randn(1, 3, 1)    # t=1   n=3  f=1
)

data = Data(input=input, target=target, edge_index=edge_index)
>>> Data(
      input=(x=[12, 3, 1], u=[12, 4]),
      target=(y=[1, 3, 1]),
      has_mask=False
    )

Since we know also to which dimension each axis refers to in the tensors, it is a best practice to explicit them in the Data object through patterns.

pattern = dict(x='t n f', u='t f', y='t n f')

data = Data(input=input, target=target, edge_index=edge_index,
            pattern=pattern)
>>> Data(
      input=(x=[t=12, n=3, f=1], u=[t=12, f=4]),
      target=(y=[t=1, n=3, f=1]),
      has_mask=False
    )

Patterns

The usage of patterns is not mandatory, although they clarify the dimensions of each tensor in a spatiotemporal graph object and are used internally by tsl for operations on graphs (e.g., reduction to subgraph, temporal resampling, tensors collation).

The StaticBatch object#

The tsl.data.StaticBatch object models a temporal graph signal over a static graph: while data change over time, the topology does not. This object extends tsl.data.Data, and has two additional methods for collating (and separating) Data objects into StaticBatch objects.

The class method from_data_list() creates a new tsl.data.StaticBatch object from a list of Data objects. The implicit assumption is that all objects in the list share the same topology, and only the graph in the first object is kept. Accordingly, all the tensors in the Data objects having a static signal (i.e., without temporal dimension) are not collated – only one copy of them is kept. Instead, all time-varying data are stacked along the first dimension, as usually done in mini-batch collations. Also, ScalerModule objects are collated or copied in a similar fashion. Consider also that the changes made in the tensors are then reflected in the StaticBatch object’s patterns.

Conversely, the method get_example() allows accessing the idx-th sample in the batch. This can be equally achieved through the __get_item__ function as StaticBatch[idx], which supports also slices. Note that you can use this function also on StaticBatch that have been directly instantiated, without the use of the method from_data_list().

The DisjointBatch object#

More generally, data at hand come from a possibly dynamic setting, in which also the underlying topology changes over time. We supports two different types of discrete-time dynamic graph signals:

  • Disjoint Graph Signals, where the topology is static within the temporal window of a sample, but may change from a sample to another. This is a common scenario when we put together multiple temporal graph signals, each on a different (static) graph.

  • Dynamic Graph Signals, where the topology may change not only from sample to sample, but also from a time step to another in the same temporal window.

The aggregation of samples into mini-batches is handled in both these cases by the tsl.data.DisjointBatch object, a subclass of torch_geometric.data.Batch for dynamic spatiotemporal graphs.

tsl.data API

Read the docs of this module.

Notebook

Check the introductory notebook.