Prototypes#
The submodule tsl.datasets.prototypes
provides interfaces that can help
in creating new datasets. All datasets provided by the library are implemented
extending these interfaces.
The most general interface is Dataset
, which is the
parent class for every dataset in tsl. The more complete class
TabularDataset
provides useful functionalities for
multivariate time series datasets with data in a tabular format, i.e., with
time, node and feature dimensions. Data passed to this dataset should be
pandas.DataFrame
and/or numpy.ndarray
. Missing values are
supported either by setting as nan
the missing entry or by explicitly
setting the attribute mask
.
If your data are timestamped, meaning that each observation is associated with
a specific date and time, then you can consider using
DatetimeDataset
, which extends
TabularDataset
and provides additional functionalities
for temporal data (e.g., datetime_encoded()
,
resample()
). This class accepts
DataFrame
with index of type DatetimeIndex
and
columns of type MultiIndex
(with nodes
as the first level
and channels
as the second) for the target
.
Base class for Datasets in tsl. |
|
Base |
|
Create a tsl dataset from a |
- class Dataset(*args, **kwargs)[source]#
Base class for Datasets in tsl.
- Parameters:
name (str, optional) – Name of the dataset. If
None
, use name of the class. (default:None
)spatial_aggregation (str) – Function (as string) used for aggregation along temporal dimension. (default:
'sum'
)spatial_aggregation – Permutation invariant function (as string) used for aggregation along nodes’ dimension. (default:
'sum'
)
- property length: int#
Returns the length – in terms of time steps – of the dataset.
- Returns:
Temporal length of the dataset.
- Return type:
- property n_nodes: int#
Returns the number of nodes in the dataset. In case of dynamic graph,
n_nodes
is the total number of nodes present in at least one time step.- Returns:
Total number of nodes in the dataset.
- Return type:
- property n_channels: int#
Returns the number of node-level channels of the main signal in the dataset.
- Returns:
Number of channels of the main signal.
- Return type:
- property raw_file_names: Union[str, Sequence[str], Mapping[str, str]]#
The name of the files in the
self.root_dir
folder that must be present in order to skip downloading.
- property required_file_names: Union[str, Sequence[str], Mapping[str, str]]#
The name of the files in the
self.root_dir
folder that must be present in order to skip building.
- property raw_files_paths: Union[List[str], Mapping[str, str]]#
The absolute filepaths that must be present in order to skip downloading.
- property required_files_paths: Union[List[str], Mapping[str, str]]#
The absolute filepaths that must be present in order to skip building.
- property raw_files_paths_list: List[str]#
The list of absolute filepaths that must be present in order to skip downloading.
- property required_files_paths_list: List[str]#
The list of absolute filepaths that are required to load the dataset.
- dataframe() Union[DataFrame, List[DataFrame]] [source]#
Returns a pandas representation of the dataset in the form of a
DataFrame
. May be a list of DataFrames if the dataset has a dynamic structure.
- numpy(return_idx: bool = False) Union[ndarray, List[ndarray], Tuple[ndarray, Series], Tuple[List[ndarray], Series]] [source]#
Returns a numpy representation of the dataset in the form of a
ndarray
. Ifreturn_index
isTrue
, it returns also aSeries
that can be used as index. May be a list of ndarrays (and Series) if the dataset has a dynamic structure.
- save_pickle(filename: str) None [source]#
Save
Dataset
to disk.- Parameters:
filename (str) – path to filename for storage.
- compute_similarity(method: str, **kwargs) Optional[ndarray] [source]#
Implements the options for the similarity matrix \(\mathbf{S} \in \mathbb{R}^{N \times N}\) computation, according to
method
.- Parameters:
method (str) – Method for the similarity computation.
**kwargs (optional) – Additional optional keyword arguments.
- Returns:
The similarity dense matrix.
- Return type:
ndarray
- get_similarity(method: Optional[str] = None, save: bool = False, **kwargs) ndarray [source]#
Returns the matrix \(\mathbf{S} \in \mathbb{R}^{N \\times N}\), where \(N=\)
self.n_nodes
, with the pairwise similarity scores between nodes.- Parameters:
method (str, optional) – Method for the similarity computation. If
None
, defaults to dataset-specific default method. (default:None
)save (bool) – Whether to save similarity matrix in dataset’s directory after computation. (default:
True
)**kwargs (optional) – Additional optional keyword arguments.
- Returns:
The similarity dense matrix.
- Return type:
ndarray
- Raises:
ValueError – If the similarity method is not valid.
- get_connectivity(method: Optional[str] = None, threshold: Optional[float] = None, knn: Optional[int] = None, binary_weights: bool = False, include_self: bool = True, force_symmetric: bool = False, normalize_axis: Optional[int] = None, layout: str = 'edge_index', **kwargs) Union[ndarray, Tuple, coo_matrix, csr_matrix, csc_matrix] [source]#
Returns the weighted adjacency matrix \(\mathbf{A} \in \mathbb{R}^{N \times N}\), where \(N=\)
self.n_nodes
. The element \(a_{i,j} \in \mathbf{A}\) is 0 if there not exists an edge connecting node \(i\) to node \(j\). The return type depends on the specifiedlayout
(default:edge_index
).- Parameters:
method (str, optional) – Method for the similarity computation. If
None
, defaults to dataset-specific default method. (default:None
)threshold (float, optional) – If not
None
, set to 0 the values below the threshold. (default:None
)knn (int, optional) – If not
None
, keep only \(k=\)knn
nearest incoming neighbors. (default:None
)binary_weights (bool) – If
True
, the positive weights of the adjacency matrix are set to 1. (default:False
)include_self (bool) – If
False
, self-loops are never taken into account. (default:True
)force_symmetric (bool) – Force adjacency matrix to be symmetric by taking the maximum value between the two directions for each edge. (default:
False
)normalize_axis (int, optional) – Divide edge weight \(a_{i, j}\) by \(\sum_k a_{i, k}\), if
normalize_axis=0
or \(\sum_k a_{k, j}\), ifnormalize_axis=1
.None
for no normalization. (default:None
)layout (str) –
Convert matrix to a dense/sparse format. Available options are:
dense
: keep matrix dense \(\mathbf{A} \in \mathbb{R}^{N \times N}\).edge_index
: convert to (edge_index, edge_weight) tuple, where edge_index has shape \([2, E]\) and edge_weight has shape \([E]\), being \(E\) the number of edges.coo
/csr
/csc
: convert to specified scipy sparse matrix type.
(default:
edge_index
)**kwargs (optional) – Additional optional keyword arguments for similarity computation.
- Returns:
The similarity dense matrix.
- get_splitter(method: Optional[str] = None, *args, **kwargs) Splitter [source]#
Returns the splitter for a
SpatioTemporalDataset
. ASplitter
provides the splits of the dataset – in terms of indices – for cross validation.
- aggregate(node_index: Optional[Iterable[Iterable]] = None)[source]#
Aggregates nodes given an index of cluster assignments (spatial aggregation).
- Parameters:
node_index – Sequence of grouped node ids.
- get_config() dict [source]#
Returns the keywords arguments (as dict) for instantiating a
SpatioTemporalDataset
.
- class TabularDataset(*args, **kwargs)[source]#
Base
Dataset
class for tabular data.Tabular data are assumed to be 3-dimensional arrays where the dimensions represent time, nodes and features, respectively. They can be either
DataFrame
orndarray
.- Parameters:
target (FrameArray) –
DataFrame
ornumpy.ndarray
containing the data related to the target signals. The first dimension (or the DataFrame index) is considered as the temporal dimension. The second dimension represents nodes, the last one denotes the number of channels. If the input array is bi-dimensional (or the DataFrame’s columns are not aMultiIndex
), the sequence is assumed to be univariate (number of channels = 1). If DataFrame’s columns are aMultiIndex
with two levels, we assume nodes are at first level, channels at second.covariates (dict, optional) –
named mapping of
DataFrame
ornumpy.ndarray
representing covariates. Examples of covariates are exogenous signals (in the form of dynamic, multidimensional data) or static attributes (e.g., graph/node metadata). You can specify what each axis refers to by providing apattern
for each item in the mapping. Every item can be:a
DataFrame
orndarray
: in this case the pattern is inferred from the shape (if possible).a
dict
with keys ‘value’ and ‘pattern’ indexing the covariate object and the relative pattern, respectively.
(default:
None
)mask (FrameArray, optional) – Boolean mask denoting if values in target are valid (
True
) or not (False
). (default:None
)similarity_score (str) – Default method to compute the similarity matrix with
compute_similarity
. It must be inside dataset’ssimilarity_options
. (default:None
)temporal_aggregation (str) – Default temporal aggregation method after resampling. (default:
sum
)spatial_aggregation (str) – Default spatial aggregation method for
aggregate
, i.e., how to aggregate multiple nodes together. (default:sum
)default_splitting_method (str, optional) – Default splitting method for the dataset, i.e., how to split the dataset into train/val/test. (default:
temporal
)force_synchronization (bool) – Synchronize all time-varying covariates with target. (default:
True
)name (str, optional) – Optional name of the dataset. (default:
class_name
)precision (int or str, optional) – numerical precision for data: 16 (or “half”), 32 (or “full”) or 64 (or “double”). (default:
32
)
- property patterns: dict#
Shows the dimension of the data in the dataset in a more informative way.
The pattern mapping can be useful to glimpse on how data are arranged. The convention we use is the following:
‘t’ stands for “number of time steps”
‘n’ stands for “number of nodes”
‘f’ stands for “number of features” (per node)
- property exogenous#
Time-varying covariates of the dataset’s target.
- property attributes#
Static features related to the dataset.
- set_target(value: Union[DataFrame, ndarray])[source]#
Set sequence of target channels at
self.target
.
- set_mask(mask: Optional[Union[DataFrame, ndarray]])[source]#
Set mask of target channels, i.e., a bool for each (node, time step, feature) triplet denoting if corresponding value in target is observed (obj:True) or not (obj:False).
- add_covariate(name: str, value: Union[DataFrame, ndarray], pattern: Optional[str] = None)[source]#
Add covariate to the dataset. Examples of covariate are exogenous signals (in the form of dynamic multidimensional data) or static attributes (e.g., graph/node metadata). Parameter
pattern
specifies what each axis refers to:‘t’: temporal dimension;
‘n’: node dimension;
‘c’/’f’: channels/features dimension.
For instance, the pattern of a node-level covariate is ‘t n f’, while a pairwise metric between nodes has pattern ‘n n’.
- Parameters:
name (str) – the name of the object. You can then access the added object as
dataset.{name}
.value (FrameArray) – the object to be added.
pattern (str, optional) –
the pattern of the object. A pattern specifies what each axis refers to:
’t’: temporal dimension;
’n’: node dimension;
’c’/’f’: channels/features dimension.
If
None
, the pattern is inferred from the shape. (defaultNone
)
- add_exogenous(name: str, value: Union[DataFrame, ndarray], node_level: bool = True)[source]#
Shortcut method to add a time-varying covariate.
- aggregate(node_index: Optional[Union[Index, Mapping]] = None, aggr: Optional[str] = None, mask_tolerance: float = 0.0)[source]#
Aggregates nodes given an index of cluster assignments (spatial aggregation).
- Parameters:
node_index – Sequence of grouped node ids.
- dataframe() DataFrame [source]#
Returns a pandas representation of the dataset in the form of a
DataFrame
. May be a list of DataFrames if the dataset has a dynamic structure.
- numpy(return_idx=False) Union[ndarray, Tuple[ndarray, Index]] [source]#
Returns a numpy representation of the dataset in the form of a
ndarray
. Ifreturn_index
isTrue
, it returns also aSeries
that can be used as index. May be a list of ndarrays (and Series) if the dataset has a dynamic structure.
- compute_similarity(method: str, **kwargs) Optional[ndarray] #
Implements the options for the similarity matrix \(\mathbf{S} \in \mathbb{R}^{N \times N}\) computation, according to
method
.- Parameters:
method (str) – Method for the similarity computation.
**kwargs (optional) – Additional optional keyword arguments.
- Returns:
The similarity dense matrix.
- Return type:
ndarray
- get_config() dict #
Returns the keywords arguments (as dict) for instantiating a
SpatioTemporalDataset
.
- get_connectivity(method: Optional[str] = None, threshold: Optional[float] = None, knn: Optional[int] = None, binary_weights: bool = False, include_self: bool = True, force_symmetric: bool = False, normalize_axis: Optional[int] = None, layout: str = 'edge_index', **kwargs) Union[ndarray, Tuple, coo_matrix, csr_matrix, csc_matrix] #
Returns the weighted adjacency matrix \(\mathbf{A} \in \mathbb{R}^{N \times N}\), where \(N=\)
self.n_nodes
. The element \(a_{i,j} \in \mathbf{A}\) is 0 if there not exists an edge connecting node \(i\) to node \(j\). The return type depends on the specifiedlayout
(default:edge_index
).- Parameters:
method (str, optional) – Method for the similarity computation. If
None
, defaults to dataset-specific default method. (default:None
)threshold (float, optional) – If not
None
, set to 0 the values below the threshold. (default:None
)knn (int, optional) – If not
None
, keep only \(k=\)knn
nearest incoming neighbors. (default:None
)binary_weights (bool) – If
True
, the positive weights of the adjacency matrix are set to 1. (default:False
)include_self (bool) – If
False
, self-loops are never taken into account. (default:True
)force_symmetric (bool) – Force adjacency matrix to be symmetric by taking the maximum value between the two directions for each edge. (default:
False
)normalize_axis (int, optional) – Divide edge weight \(a_{i, j}\) by \(\sum_k a_{i, k}\), if
normalize_axis=0
or \(\sum_k a_{k, j}\), ifnormalize_axis=1
.None
for no normalization. (default:None
)layout (str) –
Convert matrix to a dense/sparse format. Available options are:
dense
: keep matrix dense \(\mathbf{A} \in \mathbb{R}^{N \times N}\).edge_index
: convert to (edge_index, edge_weight) tuple, where edge_index has shape \([2, E]\) and edge_weight has shape \([E]\), being \(E\) the number of edges.coo
/csr
/csc
: convert to specified scipy sparse matrix type.
(default:
edge_index
)**kwargs (optional) – Additional optional keyword arguments for similarity computation.
- Returns:
The similarity dense matrix.
- get_similarity(method: Optional[str] = None, save: bool = False, **kwargs) ndarray #
Returns the matrix \(\mathbf{S} \in \mathbb{R}^{N \\times N}\), where \(N=\)
self.n_nodes
, with the pairwise similarity scores between nodes.- Parameters:
method (str, optional) – Method for the similarity computation. If
None
, defaults to dataset-specific default method. (default:None
)save (bool) – Whether to save similarity matrix in dataset’s directory after computation. (default:
True
)**kwargs (optional) – Additional optional keyword arguments.
- Returns:
The similarity dense matrix.
- Return type:
ndarray
- Raises:
ValueError – If the similarity method is not valid.
- get_splitter(method: Optional[str] = None, *args, **kwargs) Splitter #
Returns the splitter for a
SpatioTemporalDataset
. ASplitter
provides the splits of the dataset – in terms of indices – for cross validation.
- load_raw(*args, **kwargs)#
Loads raw dataset without any data preprocessing.
- property raw_file_names: Union[str, Sequence[str], Mapping[str, str]]#
The name of the files in the
self.root_dir
folder that must be present in order to skip downloading.
- property raw_files_paths: Union[List[str], Mapping[str, str]]#
The absolute filepaths that must be present in order to skip downloading.
- property raw_files_paths_list: List[str]#
The list of absolute filepaths that must be present in order to skip downloading.
- property required_file_names: Union[str, Sequence[str], Mapping[str, str]]#
The name of the files in the
self.root_dir
folder that must be present in order to skip building.
- property required_files_paths: Union[List[str], Mapping[str, str]]#
The absolute filepaths that must be present in order to skip building.
- class DatetimeDataset(*args, **kwargs)[source]#
Create a tsl dataset from a
pandas.DataFrame
.- Parameters:
target (pandas.Dataframe) –
DataFrame containing the data related to the main signals. The index is considered as the temporal dimension. The columns are identified as:
nodes: if there is only one level (we assume the number of channels to be 1).
(nodes, channels): if there are two levels (i.e., if columns is a
MultiIndex
). We assume nodes are at first level, channels at second.
covariates (dict, optional) –
named mapping of
DataFrame
ornumpy.ndarray
representing covariates. Examples of covariates are exogenous signals (in the form of dynamic, multidimensional data) or static attributes (e.g., graph/node metadata). You can specify what each axis refers to by providing apattern
for each item in the mapping. Every item can be:a
DataFrame
orndarray
: in this case the pattern is inferred from the shape (if possible).a
dict
with keys ‘value’ and ‘pattern’ indexing the covariate object and the relative pattern, respectively.
(default:
None
)mask (pandas.Dataframe or numpy.ndarray, optional) – Boolean mask denoting if values in data are valid (
True
) or not (False
). (default:None
)freq (str, optional) – Force a sampling rate, eventually by resampling. (default:
None
)similarity_score (str) – Default method to compute the similarity matrix with
compute_similarity
. It must be inside dataset’ssimilarity_options
. (default:None
)temporal_aggregation (str) – Default temporal aggregation method after resampling. This method is used during instantiation to resample the dataset. It must be inside dataset’s
temporal_aggregation_options
. (default:sum
)spatial_aggregation (str) – Default spatial aggregation method for
aggregate
, i.e., how to aggregate multiple nodes together. It must be inside dataset’sspatial_aggregation_options
. (default:sum
)default_splitting_method (str, optional) – Default splitting method for the dataset, i.e., how to split the dataset into train/val/test. (default:
temporal
)sort_index (bool) – whether to sort the dataset chronologically at initialization. (default:
True
)force_synchronization (bool) – Synchronize all time-varying covariates with target. (default:
True
)name (str, optional) – Optional name of the dataset. (default:
class_name
)precision (int or str, optional) – numerical precision for data: 16 (or “half”), 32 (or “full”) or 64 (or “double”). (default:
32
)
- add_covariate(name: str, value: Union[DataFrame, ndarray], pattern: Optional[str] = None)#
Add covariate to the dataset. Examples of covariate are exogenous signals (in the form of dynamic multidimensional data) or static attributes (e.g., graph/node metadata). Parameter
pattern
specifies what each axis refers to:‘t’: temporal dimension;
‘n’: node dimension;
‘c’/’f’: channels/features dimension.
For instance, the pattern of a node-level covariate is ‘t n f’, while a pairwise metric between nodes has pattern ‘n n’.
- Parameters:
name (str) – the name of the object. You can then access the added object as
dataset.{name}
.value (FrameArray) – the object to be added.
pattern (str, optional) –
the pattern of the object. A pattern specifies what each axis refers to:
’t’: temporal dimension;
’n’: node dimension;
’c’/’f’: channels/features dimension.
If
None
, the pattern is inferred from the shape. (defaultNone
)
- add_exogenous(name: str, value: Union[DataFrame, ndarray], node_level: bool = True)#
Shortcut method to add a time-varying covariate.
- aggregate(node_index: Optional[Union[Index, Mapping]] = None, aggr: Optional[str] = None, mask_tolerance: float = 0.0)#
Aggregates nodes given an index of cluster assignments (spatial aggregation).
- Parameters:
node_index – Sequence of grouped node ids.
- property attributes#
Static features related to the dataset.
- compute_similarity(method: str, **kwargs) Optional[ndarray] #
Implements the options for the similarity matrix \(\mathbf{S} \in \mathbb{R}^{N \times N}\) computation, according to
method
.- Parameters:
method (str) – Method for the similarity computation.
**kwargs (optional) – Additional optional keyword arguments.
- Returns:
The similarity dense matrix.
- Return type:
ndarray
- dataframe() DataFrame #
Returns a pandas representation of the dataset in the form of a
DataFrame
. May be a list of DataFrames if the dataset has a dynamic structure.
- datetime_encoded(units: Union[str, List]) DataFrame #
Transform dataset’s temporal index into covariates using sinusoidal transformations. Each temporal unit is used as period to compute the operations, obtaining two feature (\(\sin\) and \(\cos\)) for each unit.
- datetime_idx(units: Union[str, List]) DataFrame #
Transform dataset’s temporal index into compact index encoding for each temporal unit specified.
- datetime_onehot(units: Union[str, List]) DataFrame #
Transform dataset’s temporal index into one-hot-encodings for each temporal unit specified. Internally, this function calls
pandas.get_dummies()
.
- property exogenous#
Time-varying covariates of the dataset’s target.
- get_config() dict #
Returns the keywords arguments (as dict) for instantiating a
SpatioTemporalDataset
.
- get_connectivity(method: Optional[str] = None, threshold: Optional[float] = None, knn: Optional[int] = None, binary_weights: bool = False, include_self: bool = True, force_symmetric: bool = False, normalize_axis: Optional[int] = None, layout: str = 'edge_index', **kwargs) Union[ndarray, Tuple, coo_matrix, csr_matrix, csc_matrix] #
Returns the weighted adjacency matrix \(\mathbf{A} \in \mathbb{R}^{N \times N}\), where \(N=\)
self.n_nodes
. The element \(a_{i,j} \in \mathbf{A}\) is 0 if there not exists an edge connecting node \(i\) to node \(j\). The return type depends on the specifiedlayout
(default:edge_index
).- Parameters:
method (str, optional) – Method for the similarity computation. If
None
, defaults to dataset-specific default method. (default:None
)threshold (float, optional) – If not
None
, set to 0 the values below the threshold. (default:None
)knn (int, optional) – If not
None
, keep only \(k=\)knn
nearest incoming neighbors. (default:None
)binary_weights (bool) – If
True
, the positive weights of the adjacency matrix are set to 1. (default:False
)include_self (bool) – If
False
, self-loops are never taken into account. (default:True
)force_symmetric (bool) – Force adjacency matrix to be symmetric by taking the maximum value between the two directions for each edge. (default:
False
)normalize_axis (int, optional) – Divide edge weight \(a_{i, j}\) by \(\sum_k a_{i, k}\), if
normalize_axis=0
or \(\sum_k a_{k, j}\), ifnormalize_axis=1
.None
for no normalization. (default:None
)layout (str) –
Convert matrix to a dense/sparse format. Available options are:
dense
: keep matrix dense \(\mathbf{A} \in \mathbb{R}^{N \times N}\).edge_index
: convert to (edge_index, edge_weight) tuple, where edge_index has shape \([2, E]\) and edge_weight has shape \([E]\), being \(E\) the number of edges.coo
/csr
/csc
: convert to specified scipy sparse matrix type.
(default:
edge_index
)**kwargs (optional) – Additional optional keyword arguments for similarity computation.
- Returns:
The similarity dense matrix.
- get_similarity(method: Optional[str] = None, save: bool = False, **kwargs) ndarray #
Returns the matrix \(\mathbf{S} \in \mathbb{R}^{N \\times N}\), where \(N=\)
self.n_nodes
, with the pairwise similarity scores between nodes.- Parameters:
method (str, optional) – Method for the similarity computation. If
None
, defaults to dataset-specific default method. (default:None
)save (bool) – Whether to save similarity matrix in dataset’s directory after computation. (default:
True
)**kwargs (optional) – Additional optional keyword arguments.
- Returns:
The similarity dense matrix.
- Return type:
ndarray
- Raises:
ValueError – If the similarity method is not valid.
- get_splitter(method: Optional[str] = None, *args, **kwargs) Splitter #
Returns the splitter for a
SpatioTemporalDataset
. ASplitter
provides the splits of the dataset – in terms of indices – for cross validation.
- holidays_onehot(country, subdiv=None) DataFrame #
Returns a DataFrame to indicate if dataset timestamps is holiday. See https://python-holidays.readthedocs.io/en/latest/.
- Parameters:
- Returns:
- DataFrame with one column (“holiday”) as one-hot
encoding (1 if the timestamp is in a holiday, 0 otherwise).
- Return type:
- load_raw(*args, **kwargs)#
Loads raw dataset without any data preprocessing.
- numpy(return_idx=False) Union[ndarray, Tuple[ndarray, Index]] #
Returns a numpy representation of the dataset in the form of a
ndarray
. Ifreturn_index
isTrue
, it returns also aSeries
that can be used as index. May be a list of ndarrays (and Series) if the dataset has a dynamic structure.
- property patterns: dict#
Shows the dimension of the data in the dataset in a more informative way.
The pattern mapping can be useful to glimpse on how data are arranged. The convention we use is the following:
‘t’ stands for “number of time steps”
‘n’ stands for “number of nodes”
‘f’ stands for “number of features” (per node)
- property raw_file_names: Union[str, Sequence[str], Mapping[str, str]]#
The name of the files in the
self.root_dir
folder that must be present in order to skip downloading.
- property raw_files_paths: Union[List[str], Mapping[str, str]]#
The absolute filepaths that must be present in order to skip downloading.
- property raw_files_paths_list: List[str]#
The list of absolute filepaths that must be present in order to skip downloading.
- property required_file_names: Union[str, Sequence[str], Mapping[str, str]]#
The name of the files in the
self.root_dir
folder that must be present in order to skip building.
- property required_files_paths: Union[List[str], Mapping[str, str]]#
The absolute filepaths that must be present in order to skip building.
- property required_files_paths_list: List[str]#
The list of absolute filepaths that are required to load the dataset.
- save_pickle(filename: str) None #
Save
Dataset
to disk.- Parameters:
filename (str) – path to filename for storage.