ssb_timeseries.meta

The ssb_timeseries.meta module provides tools for managing metadata associated with datasets and time series. It defines the core data structures for tags and exposes functionality for creating and manipulating taxonomies.

The ssb_timeseries.meta module defines the public API for metadata operations.

It provides data structures and functions for managing tags and taxonomies. Functionality is imported from submodules to create a single, convenient point of access.

Public API

The following classes and functions are exposed as the public API of the meta module.

Classes

class Taxonomy(*, klass_id=0, data=None, path='', name='Taxonomy', sep='.', **kwargs)

Bases: object

Wraps taxonomies defined in KLASS or json files in a object structure.

Variables:
  • name (str) – The name of the taxonomy.

  • structure_type (str) – The type of structure, e.g., ‘list’, ‘tree’, ‘graph’.

  • levels (int) – The number of levels not counting the root node.

  • entities (pa.Table) – Entity definitions, represented as a PyArrow Table.

  • structure (bigtree.Node) – The hierarchical structure of the taxonomy.

Parameters:
  • klass_id (int)

  • data (list[dict[str, str]] | IntoFrameT | None)

  • path (PathStr)

  • name (str)

  • sep (str)

  • kwargs (Any)

Note

Structure: Relations between entities of the taxonomy. Both lists and trees will be represented as hierarchies, with the root node being the taxonomy. Level two will be the first item level, so a flat list will have two levels. Hierarchies with a natural top or “root” node should have a single node at level two.

Lookups: Listing of supported names for all entities, mapping different categories of names of different standards and in different languages to a unique identifier.

__eq__(other)

Checks for equality. Taxonomies are considered equal if their codes and hierarchical relations are the same.

Return type:

bool

Parameters:

other (object)

__getitem__(key)

Get tree node by name (KLASS code).

Return type:

Node

Parameters:

key (str)

__init__(*, klass_id=0, data=None, path='', name='Taxonomy', sep='.', **kwargs)

Create a Taxonomy object from either a klass_id, a data dictionary or dataframe or a path to a JSON file.

Taxonomy items are listed in .entities and hierarchical relationships mapped in .structure. Optional keyword arguments: substitutions (dict): Code values to be replaced: {‘substitute_this’: ‘with_this’, ‘and_this’: ‘as well’}

Parameters:
  • klass_id (int)

  • data (list[dict[str, str]] | IntoFrameT | None)

  • path (str | PathLike[str])

  • name (str)

  • sep (str)

  • kwargs (Any)

Return type:

None

__sub__(other)

Return the tree difference between the two taxonomy (tree) structures.

Return type:

Node

Parameters:

other (Node)

all_nodes()

Return all nodes in the taxonomy.

Return type:

list[Node]

leaf_nodes(name='')

Return all leaf nodes in the taxonomy.

Return type:

list[Node] | list[str]

Parameters:

name (Node | str)

parent_nodes()

Return all non-leaf nodes in the taxonomy.

Return type:

list[Node]

print_tree(*args, **kwargs)

Return a string with the tree structure.

Implementation is ugly! It would be preferable not to print the tree to std out. … but this works.

Return type:

str

save(path)

Save taxonomy to json file.

The file can be read using Taxonomy(<path to file>).

Return type:

None

Parameters:

path (str | PathLike[str])

substitute(substitutions)

Substitute ‘code’ and ‘parent’ values with items in subsitution dictionary.

Return type:

None

Parameters:

substitutions (dict)

subtree(key)

Get subtree of node identified by name (KLASS code).

Return type:

typing.Any

Parameters:

key (str)

KlassTaxonomy

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

alias of list[dict[Hashable, Any] | dict[str, str | None]]

Type Aliases

DatasetTagDict

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

alias of dict[str, Any]

SeriesTagDict

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

alias of dict[str, dict[str, str | list[str]]]

TagDict

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

alias of dict[str, str | list[str]]

TagValue = str | list[str]

Represent a PEP 604 union type

E.g. for int | str

Functions

add_tag_values(old, additions, recursive=False)

Add tag values to a tag dict.

Will append new tags as a list if any values already exist. With parameters recursive=True, nested dicts are also traversed.

Parameters:
  • old (TagDict)

  • additions (TagDict)

  • recursive (bool)

Return type:

TagDict

delete_dataset_tags(dictionary, *args, **kwargs)

Remove selected attributes from dataset tag dictionary.

Parameters:
  • dictionary (DatasetTagDict)

  • args (str)

  • kwargs (SeriesTagDict | bool)

Return type:

DatasetTagDict

delete_series_tags(dictionary, *args, **kwargs)

Remove selected series attributes from series or dataset tag dictionary.

Return type:

dict[str, dict[str, str | list[str]]] | dict[str, Any]

Parameters:
  • dictionary (dict[str, dict[str, str | list[str]]] | dict[str, Any])

  • args (str)

  • kwargs (str | list[str])

filter_tags(tags, criteria)

Filter tags based on the specified criteria.

Parameters:
  • tags (dict[str, dict[str, typing.Any]]) – The dictionary of tags to filter.

  • criteria (dict[str, str | list[str]]) – The criteria to filter by. Values can be single strings or lists of strings.

Return type:

dict[str, dict[str, typing.Any]]

Returns:

A dictionary of tags that match the criteria.

inherit_set_tags(tags)

Return the tags that are inherited from the set.

Return type:

dict[str, typing.Any]

Parameters:

tags (dict[str, Any] | dict[str, dict[str, str | list[str]]])

matches_criteria(tag, criteria)

Check if a tag matches the specified criteria.

Parameters:
  • tag (dict[str, typing.Any]) – The tag to check.

  • criteria (dict[str, str | list[str]]) – The criteria to match against. Values can be single strings or lists of strings.

Return type:

bool

Returns:

True if the tag matches the criteria, False otherwise.

replace_dataset_tags(existing, old, new, recursive=False)

Alter selected attributes value pairs in a tag dictionary.

Parameters:
  • existing (TagDict)

  • old (TagDict)

  • new (TagDict)

  • recursive (bool)

Return type:

DatasetTagDict

search_by_tags(tags, criteria)

Filter tags based on the specified criteria and return the keys.

Parameters:
  • tags (dict[str, dict[str, typing.Any]]) – The dictionary of tags to filter.

  • criteria (dict[str, str | list[str]]) – The criteria to filter by. Values can be single strings or lists of strings.

Return type:

list[str]

Returns:

A list of keys for tags that match the criteria.

permutations(taxonomies, filters='')

For a dict on the form {‘a’: Taxonomy(A), ‘b’: Taxonomy(B)}, returns permutations of items in A and B, subject to filters.

Filters are experimental and quite likely to change type / implementation. Notably, support for custom functions and include/exclude lists may be considered. For now: str | list[str] with length matching the taxonomies identifies Taxonomy tree functions as follows:

‘all’ | ‘all_nodes’ –> .all_nodes() ‘parents’ | ‘parent_nodes’ – .parent_nodes() ‘leaves’ | ‘leaf_nodes’ | ‘children’ | ‘child_nodes’ –> .leaf_nodes()

If no filters are provided, the default is ‘all’.

Examples

>>> from ssb_timeseries.meta import Taxonomy
>>> tax_a = Taxonomy(data=[{'code': 'a1', 'parentCode': '0'}, {'code': 'a2', 'parentCode': '0'}])
>>> tax_b = Taxonomy(data=[{'code': 'b1', 'parentCode': '0'}, {'code': 'b2', 'parentCode': '0'}])
>>> permutations({'A': tax_a, 'B': tax_b})
[{'A': 'a1', 'B': 'b1'}, {'A': 'a1', 'B': 'b2'}, {'A': 'a2', 'B': 'b1'}, {'A': 'a2', 'B': 'b2'}]
Return type:

list[dict]

Parameters:
  • taxonomies (dict[str, Taxonomy])

  • filters (list[str] | str)