ssb_timeseries.meta¶
The ssb_timeseries.meta module provides tools for managing metadata associated with datasets and time series. It defines the core data structures for tags and exposes functionality for creating and manipulating taxonomies.
The ssb_timeseries.meta module defines the public API for metadata operations.
It provides data structures and functions for managing tags and taxonomies. Functionality is imported from submodules to create a single, convenient point of access.
Public API¶
The following classes and functions are exposed as the public API of the meta module.
Classes¶
- class Taxonomy(*, klass_id=0, data=None, path='', name='Taxonomy', sep='.', **kwargs)¶
Bases:
objectWraps taxonomies defined in KLASS or json files in a object structure.
- Variables:
name (str) – The name of the taxonomy.
structure_type (str) – The type of structure, e.g., ‘list’, ‘tree’, ‘graph’.
levels (int) – The number of levels not counting the root node.
entities (pa.Table) – Entity definitions, represented as a PyArrow Table.
structure (bigtree.Node) – The hierarchical structure of the taxonomy.
- Parameters:
klass_id (int)
data (list[dict[str, str]] | IntoFrameT | None)
path (PathStr)
name (str)
sep (str)
kwargs (Any)
Note
Structure: Relations between entities of the taxonomy. Both lists and trees will be represented as hierarchies, with the root node being the taxonomy. Level two will be the first item level, so a flat list will have two levels. Hierarchies with a natural top or “root” node should have a single node at level two.
Lookups: Listing of supported names for all entities, mapping different categories of names of different standards and in different languages to a unique identifier.
- __eq__(other)¶
Checks for equality. Taxonomies are considered equal if their codes and hierarchical relations are the same.
- Return type:
bool- Parameters:
other (object)
- __getitem__(key)¶
Get tree node by name (KLASS code).
- Return type:
Node- Parameters:
key (str)
- __init__(*, klass_id=0, data=None, path='', name='Taxonomy', sep='.', **kwargs)¶
Create a Taxonomy object from either a klass_id, a data dictionary or dataframe or a path to a JSON file.
Taxonomy items are listed in .entities and hierarchical relationships mapped in .structure. Optional keyword arguments: substitutions (dict): Code values to be replaced: {‘substitute_this’: ‘with_this’, ‘and_this’: ‘as well’}
- Parameters:
klass_id (int)
data (list[dict[str, str]] | IntoFrameT | None)
path (str | PathLike[str])
name (str)
sep (str)
kwargs (Any)
- Return type:
None
- __sub__(other)¶
Return the tree difference between the two taxonomy (tree) structures.
- Return type:
Node- Parameters:
other (Node)
- all_nodes()¶
Return all nodes in the taxonomy.
- Return type:
list[Node]
- leaf_nodes(name='')¶
Return all leaf nodes in the taxonomy.
- Return type:
list[Node] |list[str]- Parameters:
name (Node | str)
- parent_nodes()¶
Return all non-leaf nodes in the taxonomy.
- Return type:
list[Node]
- print_tree(*args, **kwargs)¶
Return a string with the tree structure.
Implementation is ugly! It would be preferable not to print the tree to std out. … but this works.
- Return type:
str
- save(path)¶
Save taxonomy to json file.
The file can be read using Taxonomy(<path to file>).
- Return type:
None- Parameters:
path (str | PathLike[str])
- substitute(substitutions)¶
Substitute ‘code’ and ‘parent’ values with items in subsitution dictionary.
- Return type:
None- Parameters:
substitutions (dict)
- subtree(key)¶
Get subtree of node identified by name (KLASS code).
- Return type:
typing.Any
- Parameters:
key (str)
- KlassTaxonomy¶
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.
alias of
list[dict[Hashable,Any] |dict[str,str|None]]
Type Aliases¶
- DatasetTagDict¶
dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s
(key, value) pairs
- dict(iterable) -> new dictionary initialized as if via:
d = {} for k, v in iterable:
d[k] = v
- dict(**kwargs) -> new dictionary initialized with the name=value pairs
in the keyword argument list. For example: dict(one=1, two=2)
alias of
dict[str,Any]
- SeriesTagDict¶
dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s
(key, value) pairs
- dict(iterable) -> new dictionary initialized as if via:
d = {} for k, v in iterable:
d[k] = v
- dict(**kwargs) -> new dictionary initialized with the name=value pairs
in the keyword argument list. For example: dict(one=1, two=2)
alias of
dict[str,dict[str,str|list[str]]]
- TagDict¶
dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s
(key, value) pairs
- dict(iterable) -> new dictionary initialized as if via:
d = {} for k, v in iterable:
d[k] = v
- dict(**kwargs) -> new dictionary initialized with the name=value pairs
in the keyword argument list. For example: dict(one=1, two=2)
alias of
dict[str,str|list[str]]
- TagValue = str | list[str]¶
Represent a PEP 604 union type
E.g. for int | str
Functions¶
- add_tag_values(old, additions, recursive=False)¶
Add tag values to a tag dict.
Will append new tags as a list if any values already exist. With parameters recursive=True, nested dicts are also traversed.
- Parameters:
old (TagDict)
additions (TagDict)
recursive (bool)
- Return type:
TagDict
- delete_dataset_tags(dictionary, *args, **kwargs)¶
Remove selected attributes from dataset tag dictionary.
- Parameters:
dictionary (DatasetTagDict)
args (str)
kwargs (SeriesTagDict | bool)
- Return type:
DatasetTagDict
- delete_series_tags(dictionary, *args, **kwargs)¶
Remove selected series attributes from series or dataset tag dictionary.
- Return type:
dict[str,dict[str,str|list[str]]] |dict[str,Any]- Parameters:
dictionary (dict[str, dict[str, str | list[str]]] | dict[str, Any])
args (str)
kwargs (str | list[str])
- filter_tags(tags, criteria)¶
Filter tags based on the specified criteria.
- Parameters:
tags (
dict[str,dict[str, typing.Any]]) – The dictionary of tags to filter.criteria (
dict[str,str|list[str]]) – The criteria to filter by. Values can be single strings or lists of strings.
- Return type:
dict[str,dict[str, typing.Any]]- Returns:
A dictionary of tags that match the criteria.
- inherit_set_tags(tags)¶
Return the tags that are inherited from the set.
- Return type:
dict[str, typing.Any]- Parameters:
tags (dict[str, Any] | dict[str, dict[str, str | list[str]]])
- matches_criteria(tag, criteria)¶
Check if a tag matches the specified criteria.
- Parameters:
tag (
dict[str, typing.Any]) – The tag to check.criteria (
dict[str,str|list[str]]) – The criteria to match against. Values can be single strings or lists of strings.
- Return type:
bool- Returns:
True if the tag matches the criteria, False otherwise.
- replace_dataset_tags(existing, old, new, recursive=False)¶
Alter selected attributes value pairs in a tag dictionary.
- Parameters:
existing (TagDict)
old (TagDict)
new (TagDict)
recursive (bool)
- Return type:
DatasetTagDict
- search_by_tags(tags, criteria)¶
Filter tags based on the specified criteria and return the keys.
- Parameters:
tags (
dict[str,dict[str, typing.Any]]) – The dictionary of tags to filter.criteria (
dict[str,str|list[str]]) – The criteria to filter by. Values can be single strings or lists of strings.
- Return type:
list[str]- Returns:
A list of keys for tags that match the criteria.
- permutations(taxonomies, filters='')¶
For a dict on the form {‘a’: Taxonomy(A), ‘b’: Taxonomy(B)}, returns permutations of items in A and B, subject to filters.
Filters are experimental and quite likely to change type / implementation. Notably, support for custom functions and include/exclude lists may be considered. For now: str | list[str] with length matching the taxonomies identifies Taxonomy tree functions as follows:
‘all’ | ‘all_nodes’ –> .all_nodes() ‘parents’ | ‘parent_nodes’ – .parent_nodes() ‘leaves’ | ‘leaf_nodes’ | ‘children’ | ‘child_nodes’ –> .leaf_nodes()
If no filters are provided, the default is ‘all’.
Examples
>>> from ssb_timeseries.meta import Taxonomy >>> tax_a = Taxonomy(data=[{'code': 'a1', 'parentCode': '0'}, {'code': 'a2', 'parentCode': '0'}]) >>> tax_b = Taxonomy(data=[{'code': 'b1', 'parentCode': '0'}, {'code': 'b2', 'parentCode': '0'}]) >>> permutations({'A': tax_a, 'B': tax_b}) [{'A': 'a1', 'B': 'b1'}, {'A': 'a1', 'B': 'b2'}, {'A': 'a2', 'B': 'b1'}, {'A': 'a2', 'B': 'b2'}]
- Return type:
list[dict]- Parameters:
taxonomies (dict[str, Taxonomy])
filters (list[str] | str)