Reference¶
The ssb_timeseries
package is a helper library for production and analysis of statistical data in the form of time series.
It is designed to make it as easy as possible to store data and metadata for datasets and series in ways that are consistent with the information model, and to facilitate integration with automated workflows.
Functionality includes:
Read and write data and metadata
Metadata maintenance: tagging, detagging, retagging
Search and filtering
Time algebra: downsampling and upsampling to other time resolutions
Linear algebra operations with sets (matrices) and series (column vectors)
Metadata aware calculations, like unit conversions and aggregation over taxonomy hierarchies
Basic plotting
The most practical entry points are the ssb_timeseries.dataset
and ssb_timeseries.catalog
modules.
The ssb_timeseries.dataset
module and its Dataset
class is the very core of the ssb_timeseries
package, defining most of the key functionality.
The dataset is the unit of analysis for both information model and workflow integration,and performance will benefit from linear algebra with sets as matrices consisting of series column vectors.
As described in the Information model time series datasets may consist of any number of series of the same type
. Series types are defined by properties.Versioning
and properties.Temporality
, see properties.SeriesType
.
It is also strongly encouraged to make sure that the resolutions of the series in datasets are the same, and to minimize the number of gaps in the series. Very sparse data is a strong indication that a dataset is not well defined: may indicate that series in the set have different origins. What counts as ‘gaps’ in this context is any representation of undefined values: None, null, NAN or “not a number” values, as opposed to the number zero. The number zero is a gray area - it can be perfectly valid, but can also be an indication that not all the series should be part of the same set.
See also
See documentation for the ssb_timeseries.catalog
module for tools for searching for datasets or series by names or metadata.
The ssb_timeseries.catalog
module provides several tools for searching for datasets or series in every Repository
of a Catalog
.
The catalog is essentially just a logical collection of repositories, providing a search interface across all of them.
Searches can list or count sets, series or items (both). The search criteria can be complete names (equals), parts of names (contains), or metadata attributes (tags).
A returned py:class:CatalogItem instance is identified by name and descriptive metadate, plus the repository, object type and relationships to parent and child objects are provided. Other information, like lineage and data quality metrics may be added later.
>>>
>>> from ssb_timeseries.catalog import Catalog
>>> everything = Catalog().items()
>>>
The other modules of the package are helpers used by these core modules, and not intended for direct use.
Some notable exceptions are taxonomy and hierarchy features of ssb_timeseries.meta
and type definitions in ssb_timeseries.properties
.
ssb_timeseries.config
may be used for initial set up and later switching between repositories, if needed.
The ssb_timeseries.io
seeks to make the storage agnostic of whether data and metada are stored in files or databases and ssb_timeseries.fs
is an abstraction for local vs GCS file systems.
The package includes several modules:
WTF?