`ssb_timeseries.meta`¶

The ssb_timeseries.meta module is responsible for metadata maintenance.

Dataset and series tags are handled by Python dictionaries and stored in JSON files and Parquet headers. The is a meta module takes care of the mechanics of manipulating the dictionaries.

It also consumes taxonomies. That is functionality that should live in the ssb-python-klass or other meta data libraries. Likely subject to refactoring later.

exception MissingAttributeError¶

Bases: Exception

At least one required attribute was not provided.

class Taxonomy(*, klass_id=0, data=None, path='', name='Taxonomy', sep='.', **kwargs)¶

Bases: object

Wraps taxonomies defined in KLASS or json files in a object structure.

Parameters:

klass_id (int)
data (list[dict[str, str]] | IntoFrameT | None)
path (str | PathLike[str])
name (str)
sep (str)
kwargs (Any)

name¶

Type:: str

structure_type¶: enum: list | tree | graph

levels¶: number of levels not counting the root node

entities¶

Entity definitions, represented as a dataframe with columns as defined by :KlassItem.

Type:: pa.Table

structure¶

Type:: bigtree.tree

Notes

structure:: Relations between entities of the taxonomy. Both lists and trees will be represented as hierarchies; with the root node being the taxonomy. Level two will be the first item level, so a flat list will have two levels. Hierarchies with a natural top or “root” node should have a single node at level two.
lookups:: Listing of supported names for all entities, mapping different categories of names of different standards and in different languages to a unique identifier.

__eq__(other)¶

Checks for equality. Taxonomies are considered equal if their codes and hierarchical relations are the same.

Return type:: bool
Parameters:: other (Self)

__getitem__(key)¶

Get tree node by name (KLASS code).

Return type:: Node
Parameters:: key (str)

__init__(*, klass_id=0, data=None, path='', name='Taxonomy', sep='.', **kwargs)¶

Create a Taxonomy object from either a klass_id, a data dictionary or dataframe or a path to a JSON file.

Taxonomy items are listed in .entities and hierarchical relationships mapped in .structure. Optional keyword arguments: substitutions (dict): Code values to be replaced: {‘substitute_this’: ‘with_this’, ‘and_this’: ‘as well’}

Parameters:

klass_id (int)
data (list[dict[str, str]] | IntoFrameT | None)
path (str | PathLike[str])
name (str)
sep (str)
kwargs (Any)

Return type:

None

__sub__(other)¶

Return the tree difference between the two taxonomy (tree) structures.

Return type:: Node
Parameters:: other (Node)

all_nodes()¶

Return all nodes in the taxonomy.

Return type:: list[<module ‘bigtree.node’ from ‘/home/runner/.cache/pypoetry/virtualenvs/ssb-timeseries-bI7nmIN7-py3.12/lib/python3.12/site-packages/bigtree/node/__init__.py’>]

leaf_nodes(name='')¶

Return all leaf nodes in the taxonomy.

Return type:: list[Node]
Parameters:: name (Node | str)

parent_nodes()¶

Return all non-leaf nodes in the taxonomy.

Return type:: list[<module ‘bigtree.node’ from ‘/home/runner/.cache/pypoetry/virtualenvs/ssb-timeseries-bI7nmIN7-py3.12/lib/python3.12/site-packages/bigtree/node/__init__.py’>]

print_tree(*args, **kwargs)¶

Return a string with the tree structure.

Implementation is ugly! It would be preferable not to print the tree to std out. … but this works.

Return type:: str

save(path)¶

Save taxonomy to json file.

The file can be read using Taxonomy(<path to file>).

Return type:: None
Parameters:: path (str | PathLike[str])

substitute(substitutions)¶

Substitute ‘code’ and ‘parent’ values with items in subsitution dictionary.

Return type:: None
Parameters:: substitutions (dict)

subtree(key)¶

Get subtree of node identified by name (KLASS code).

Return type:: Any
Parameters:: key (str)

add_tag_values(old, additions, recursive=False)¶

Add tag values to a tag dict.

Will append new tags as a list if any values already exist. With parameters recursive=True, nested dicts are also traversed.

Parameters:

old (dict[str, str | list[str]])
additions (dict[str, str | list[str]])
recursive (bool)

Return type:

dict[str, str | list[str]]

camel_to_snake(name)¶

Convert CamelCase to snake_case, handling acronyms.

Return type:: str
Parameters:: name (str)

Example

‘HTTPConnection’ -> ‘http_connection’

delete_dataset_tags(dictionary, *args, **kwargs)¶

Remove selected attributes from dataset tag dictionary.

Parameters:

dictionary (dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]])
args (str)
kwargs (dict[str, dict[str, str | list[str]]] | bool)

Return type:

dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]]

delete_series_tags(dictionary, *args, **kwargs)¶

Remove selected series attributes from series or dataset tag dictionary.

Return type:

Parameters:

dictionary (dict[str, dict[str, str | list[str]]] | dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]])
args (str)
kwargs (str | list[str])

filter_tags(tags, criteria)¶

Filter tags based on the specified criteria.

Parameters:

tags (dict[str, dict[str, any]]) – The dictionary of tags to filter.
criteria (dict[str, str | list[str]]) – The criteria to filter by. Values can be single strings or lists of strings.

Returns:

A dictionary of tags that match the criteria.

Return type:

dict[str, dict[str, any]]

inherit_set_tags(tags)¶

Return the tags that are inherited from the set.

Return type:: dict[str, Any]
Parameters:: tags (dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]] | dict[str, dict[str, str | list[str]]])

klass_classification(klass_id)¶

Get KLASS classification identified by ID as a list of dicts.

Return type:: list[dict[Hashable, Any] | dict[str, str | None]]
Parameters:: klass_id (int)

matches_criteria(tag, criteria)¶

Check if a tag matches the specified criteria.

Parameters:

tag (dict[str, any]) – The tag to check.
criteria (dict[str, str | list[str]]) – The criteria to match against. Values can be single strings or lists of strings.

Returns:

True if the tag matches the criteria, False otherwise.

Return type:

bool

permutations(taxonomies, filters='')¶

For a dict on the form {‘a’: Taxonomy(A), ‘b’: Taxonomy(B)}, returns permutations of items in A and B, subject to filters.

Filters are experimental and quite likely to change type / implementation. Notably, support for custom functions and include/exclude lists may be considered. For now: str | list[str] with length matching the taxonomies identifies Taxonomy tree functions as follows: :rtype: list[dict]

‘all’ | ‘all_nodes’ –> .all_nodes() ‘parents’ | ‘parent_nodes’ – .parent_nodes() ‘leaves’ | ‘leaf_nodes’ | ‘children’ | ‘child_nodes’ –> .leaf_nodes()

If no filters are provided, the default is ‘all’.

Examples

TODO: add some. See code for dataset.aggregate() for a notable use case.

Parameters:

taxonomies (dict[str, Taxonomy])
filters (list[str] | str)

Return type:

list[dict]

records_to_arrow(records)¶

Creates a PyArrow Table from a list of dictionaries (row/records).

Parameters:: records (list[dict[str, Any]]) – A list where each element is a dictionary representing a row.
Return type:: Table
Returns:: A pyarrow.Table representing the data.

replace_dataset_tags(existing, old, new, recursive=False)¶

Alter selected attributes value pairs in a tag dictionary.

Parameters:

existing (dict[str, str | list[str]])
old (dict[str, str | list[str]])
new (dict[str, str | list[str]])
recursive (bool)

Return type:

dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]]

rm_tag_values(existing, tags_to_remove, recursive=False)¶

Remove tag value from tag dict.

Values to remove and in tags can be string or list of strings.

Parameters:

existing (dict[str, str | list[str]])
tags_to_remove (dict[str, str | list[str]])
recursive (bool | list[str])

Return type:

dict[str, str | list[str]]

rm_tags(existing, *args, recursive=False)¶

Remove attribute from tag dict regardless of value.

Return type:

dict[str, str | list[str]]

Parameters:

existing (dict[str, str | list[str]])
args (str)
recursive (bool | list[str])

search_by_tags(tags, criteria)¶

Filter tags based on the specified criteria.

Parameters:

tags (dict[str, dict[str, any]]) – The dictionary of tags to filter.
criteria (dict[str, str | list[str]]) – The criteria to filter by. Values can be single strings or lists of strings.

Returns:

A dictionary of tags that match the criteria.

Return type:

dict[str, dict[str, any]]

series_tag_dict_edit(existing, replace, new)¶

Alter selected attributes in a Dataset.tag[‘series’] dictionary.

Either ‘replace’ or ‘new’ (or both) must be specified. If ‘replace == {}’, new tags are appended (aka ‘tag_series’). If ‘new == {}’, ‘replace’ tags are deleted (aka ‘detag_series’). If both are specified, ‘replace’ are deleted before ‘new’ are appended (aka ‘retag_series’).

Return type:

dict[str, dict[str, str | list[str]]]

Parameters:

existing (dict[str, dict[str, str | list[str]]])
replace (dict[str, str | list[str]])
new (dict[str, str | list[str]])

to_tag_value(tag)¶

If input is a list of unique strings.

Return type:: str | list[str]
Parameters:: tag (str | list[str] | set)

unique_tag_values(arg)¶

Wraps string input in list, and ensure the list is unique.

Return type:: list[str]
Parameters:: arg (Any)

ssb_timeseries.meta¶

`ssb_timeseries.meta`¶