ssb_timeseries.meta

The ssb_timeseries.meta module is responsible for metadata maintenance.

Dataset and series tags are handled by Python dictionaries and stored in JSON files and Parquet headers. The is a meta module takes care of the mechanics of manipulating the dictionaries.

It also consumes taxonomies. That is functionality that should live in the ssb-python-klass or other meta data libraries. Likely subject to refactoring later.

class KlassItem

Bases: TypedDict

The structure of taxonomy items as returned by the API (JSON) and klass (Pandas DataFrame).

code: str

A unique entity identifier within the taxonomy. It may very well consist of numeric values, but will be represented as a string.

name: str

A unique human readable name. Not nullable.

parentCode: str

The code for the parent entity.

presentationName: str

A “self explanatory” unique name, if applicable.

shortName: str

A short version / mnemonic for name, if applicable.

validFrom: datetime | str

Date or ISO string representing the start of the entity lifespan.

validTo: datetime | str

Date or ISO string representing the end of the entity lifespan.

exception MissingAttributeError

Bases: Exception

At least one required attribute was not provided.

class Taxonomy(*, klass_id=0, data=None, path='', root_name='Taxonomy', sep='.', **kwargs)

Bases: object

Wraps taxonomies defined in KLASS or json files in a object structure.

Parameters:
  • klass_id (int)

  • data (list[dict[str, str]] | None)

  • path (str | PathLike[str])

  • root_name (str)

  • sep (str)

  • kwargs (Any)

definition

Descriptions of the taxonomy.

Type:

str

name
structure_type

enum: list | tree | graph

levels

number of levels not counting the root node

entities

Entity definitions, represented as a dataframe with columns as defined by :KlassItem.

Type:

pd.Dataframe

Notes

structure:

Relations between entities of the taxonomy. Both lists and trees will be represented as hierarchies; with the root node being the taxonomy. Level two will be the first item level, so a flat list will have two levels. Hierarchies with a natural top or “root” node should have a single node at level two.

lookups:

Listing of supported names for all entities, mapping different categories of names of different standards and in different languages to a unique identifier.

Create a Taxonomy object from either a klass_id, a data dictionary or a path to a JSON file.

Taxonomy items are listed in .entities and hierarchical relationships mapped in .structure. Optional keyword arguments: substitutions (dict): Code values to be replaced: {‘substitute_this’: ‘with_this’, ‘and_this’: ‘as well’}

__eq__(other)

Checks for equality. Taxonomies are considered equal if their codes and hierarchical relations are the same.

Return type:

bool

Parameters:

other (Self)

__getitem__(key)

Get tree node by name (KLASS code).

Return type:

Node

Parameters:

key (str)

__init__(*, klass_id=0, data=None, path='', root_name='Taxonomy', sep='.', **kwargs)

Create a Taxonomy object from either a klass_id, a data dictionary or a path to a JSON file.

Taxonomy items are listed in .entities and hierarchical relationships mapped in .structure. Optional keyword arguments: substitutions (dict): Code values to be replaced: {‘substitute_this’: ‘with_this’, ‘and_this’: ‘as well’}

Parameters:
  • klass_id (int)

  • data (list[dict[str, str]] | None)

  • path (str | PathLike[str])

  • root_name (str)

  • sep (str)

  • kwargs (Any)

Return type:

None

__sub__(other)

Return the tree difference between the two taxonomy (tree) structures.

Return type:

Node

Parameters:

other (Node)

all_nodes()

Return all nodes in the taxonomy.

Return type:

list[<module ‘bigtree.node’ from ‘/home/runner/.cache/pypoetry/virtualenvs/ssb-timeseries-bI7nmIN7-py3.12/lib/python3.12/site-packages/bigtree/node/__init__.py’>]

leaf_nodes(name='')

Return all leaf nodes in the taxonomy.

Return type:

list[<module ‘bigtree.node’ from ‘/home/runner/.cache/pypoetry/virtualenvs/ssb-timeseries-bI7nmIN7-py3.12/lib/python3.12/site-packages/bigtree/node/__init__.py’>]

Parameters:

name (Node | str)

parent_nodes()

Return all non-leaf nodes in the taxonomy.

Return type:

list[<module ‘bigtree.node’ from ‘/home/runner/.cache/pypoetry/virtualenvs/ssb-timeseries-bI7nmIN7-py3.12/lib/python3.12/site-packages/bigtree/node/__init__.py’>]

print_tree(*args, **kwargs)

Return a string with the tree structure.

Implementation is ugly! It would be preferable not to print the tree to std out. … but this works.

Return type:

str

save(path)

Save taxonomy to json file.

The file can be read using Taxonomy(<path to file>).

Return type:

None

Parameters:

path (str | PathLike[str])

substitute(substitutions)

Substitute ‘code’ and ‘parent’ values with items in subsitution dictionary.

Return type:

None

Parameters:

substitutions (dict)

subtree(key)

Get subtree of node identified by name (KLASS code).

Return type:

Any

Parameters:

key (str)

add_root_node(df, root_node)

Prepend root node row to taxonomy dataframe.

Return type:

DataFrame

Parameters:
  • df (DataFrame)

  • root_node (dict[str, str | None])

add_tag_values(old, additions, recursive=False)

Add tag values to a tag dict.

Will append new tags as a list if any values already exist. With parameters recursive=True, nested dicts are also traversed.

Parameters:
  • old (dict[str, str | list[str]])

  • additions (dict[str, str | list[str]])

  • recursive (bool)

Return type:

dict[str, str | list[str]]

delete_dataset_tags(dictionary, *args, **kwargs)

Remove selected attributes from dataset tag dictionary.

Parameters:
  • dictionary (dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]])

  • args (str)

  • kwargs (dict[str, dict[str, str | list[str]]] | bool)

Return type:

dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]]

delete_series_tags(dictionary, *args, **kwargs)

Remove selected series attributes from series or dataset tag dictionary.

Return type:

dict[str, dict[str, str | list[str]]] | dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]]

Parameters:
  • dictionary (dict[str, dict[str, str | list[str]]] | dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]])

  • args (str)

  • kwargs (str | list[str])

filter_tags(tags, criteria)

Filter tags based on the specified criteria.

Parameters:
  • tags (dict[str, dict[str, any]]) – The dictionary of tags to filter.

  • criteria (dict[str, str | list[str]]) – The criteria to filter by. Values can be single strings or lists of strings.

Returns:

A dictionary of tags that match the criteria.

Return type:

dict[str, dict[str, any]]

inherit_set_tags(tags)

Return the tags that are inherited from the set.

Return type:

dict[str, Any]

Parameters:

tags (dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]] | dict[str, dict[str, str | list[str]]])

matches_criteria(tag, criteria)

Check if a tag matches the specified criteria.

Parameters:
  • tag (dict[str, any]) – The tag to check.

  • criteria (dict[str, str | list[str]]) – The criteria to match against. Values can be single strings or lists of strings.

Returns:

True if the tag matches the criteria, False otherwise.

Return type:

bool

permutations(taxonomies, filters='')

For a dict on the form {‘a’: Taxonomy(A), ‘b’: Taxonomy(B)}, returns permutations of items in A and B, subject to filters.

Filters are experimental and quite likely to change type / implementation. Notably, support for custom functions and include/exclude lists may be considered. For now: str | list[str] with length matching the taxonomies identifies Taxonomy tree functions as follows: :rtype: list[dict]

‘all’ | ‘all_nodes’ –> .all_nodes() ‘parents’ | ‘parent_nodes’ – .parent_nodes() ‘leaves’ | ‘leaf_nodes’ | ‘children’ | ‘child_nodes’ –> .leaf_nodes()

If no filters are provided, the default is ‘all’.

Examples

TODO: add some. See code for dataset.aggregate() for a notable use case.

Parameters:
  • taxonomies (dict[str, Taxonomy])

  • filters (list[str] | str)

Return type:

list[dict]

replace_dataset_tags(existing, old, new, recursive=False)

Alter selected attributes value pairs in a tag dictionary.

Parameters:
  • existing (dict[str, str | list[str]])

  • old (dict[str, str | list[str]])

  • new (dict[str, str | list[str]])

  • recursive (bool)

Return type:

dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]]

rm_tag_values(existing, tags_to_remove, recursive=False)

Remove tag value from tag dict.

Values to remove and in tags can be string or list of strings.

Parameters:
  • existing (dict[str, str | list[str]])

  • tags_to_remove (dict[str, str | list[str]])

  • recursive (bool | list[str])

Return type:

dict[str, str | list[str]]

rm_tags(existing, *args, recursive=False)

Remove attribute from tag dict regardless of value.

Return type:

dict[str, str | list[str]]

Parameters:
  • existing (dict[str, str | list[str]])

  • args (str)

  • recursive (bool | list[str])

search_by_tags(tags, criteria)

Filter tags based on the specified criteria.

Parameters:
  • tags (dict[str, dict[str, any]]) – The dictionary of tags to filter.

  • criteria (dict[str, str | list[str]]) – The criteria to filter by. Values can be single strings or lists of strings.

Returns:

A dictionary of tags that match the criteria.

Return type:

dict[str, dict[str, any]]

series_tag_dict_edit(existing, replace, new)

Alter selected attributes in a Dataset.tag[‘series’] dictionary.

Either ‘replace’ or ‘new’ (or both) must be specified. If ‘replace == {}’, new tags are appended (aka ‘tag_series’). If ‘new == {}’, ‘replace’ tags are deleted (aka ‘detag_series’). If both are specified, ‘replace’ are deleted before ‘new’ are appended (aka ‘retag_series’).

Return type:

dict[str, dict[str, str | list[str]]]

Parameters:
  • existing (dict[str, dict[str, str | list[str]]])

  • replace (dict[str, str | list[str]])

  • new (dict[str, str | list[str]])

to_tag_value(tag)

If input is a list of unique strings.

Return type:

str | list[str]

Parameters:

tag (str | list[str] | set)

unique_tag_values(arg)

Wraps string input in list, and ensure the list is unique.

Return type:

list[str]

Parameters:

arg (Any)