ssb_timeseries.meta
¶
The ssb_timeseries.meta
module is responsible for metadata maintenance.
Dataset and series tags are handled by Python dictionaries and stored in JSON files and Parquet headers. The is a meta
module takes care of the mechanics of manipulating the dictionaries.
It also consumes taxonomies. That is functionality that should live in the ssb-python-klass or other meta data libraries. Likely subject to refactoring later.
- class KlassItem¶
Bases:
TypedDict
The structure of taxonomy items as returned by the API (JSON) and
klass
(Pandas DataFrame).-
code:
str
¶ A unique entity identifier within the taxonomy. It may very well consist of numeric values, but will be represented as a string.
-
name:
str
¶ A unique human readable name. Not nullable.
-
parentCode:
str
¶ The code for the parent entity.
-
presentationName:
str
¶ A “self explanatory” unique name, if applicable.
-
shortName:
str
¶ A short version / mnemonic for name, if applicable.
-
validFrom:
datetime
|str
¶ Date or ISO string representing the start of the entity lifespan.
-
validTo:
datetime
|str
¶ Date or ISO string representing the end of the entity lifespan.
-
code:
- exception MissingAttributeError¶
Bases:
Exception
At least one required attribute was not provided.
- class Taxonomy(*, klass_id=0, data=None, path='', root_name='Taxonomy', sep='.', **kwargs)¶
Bases:
object
Wraps taxonomies defined in KLASS or json files in a object structure.
- Parameters:
klass_id (int)
data (list[dict[str, str]] | None)
path (str | PathLike[str])
root_name (str)
sep (str)
kwargs (Any)
- definition¶
Descriptions of the taxonomy.
- Type:
str
- name¶
- structure_type¶
enum: list | tree | graph
- levels¶
number of levels not counting the root node
- entities¶
Entity definitions, represented as a dataframe with columns as defined by :
KlassItem
.- Type:
pd.Dataframe
Notes
- structure:
Relations between entities of the taxonomy. Both lists and trees will be represented as hierarchies; with the root node being the taxonomy. Level two will be the first item level, so a flat list will have two levels. Hierarchies with a natural top or “root” node should have a single node at level two.
- lookups:
Listing of supported names for all entities, mapping different categories of names of different standards and in different languages to a unique identifier.
Create a Taxonomy object from either a klass_id, a data dictionary or a path to a JSON file.
Taxonomy items are listed in .entities and hierarchical relationships mapped in .structure. Optional keyword arguments: substitutions (dict): Code values to be replaced: {‘substitute_this’: ‘with_this’, ‘and_this’: ‘as well’}
- __eq__(other)¶
Checks for equality. Taxonomies are considered equal if their codes and hierarchical relations are the same.
- Return type:
bool
- Parameters:
other (Self)
- __getitem__(key)¶
Get tree node by name (KLASS code).
- Return type:
Node
- Parameters:
key (str)
- __init__(*, klass_id=0, data=None, path='', root_name='Taxonomy', sep='.', **kwargs)¶
Create a Taxonomy object from either a klass_id, a data dictionary or a path to a JSON file.
Taxonomy items are listed in .entities and hierarchical relationships mapped in .structure. Optional keyword arguments: substitutions (dict): Code values to be replaced: {‘substitute_this’: ‘with_this’, ‘and_this’: ‘as well’}
- Parameters:
klass_id (int)
data (list[dict[str, str]] | None)
path (str | PathLike[str])
root_name (str)
sep (str)
kwargs (Any)
- Return type:
None
- __sub__(other)¶
Return the tree difference between the two taxonomy (tree) structures.
- Return type:
Node
- Parameters:
other (Node)
- all_nodes()¶
Return all nodes in the taxonomy.
- Return type:
list
[<module ‘bigtree.node’ from ‘/home/runner/.cache/pypoetry/virtualenvs/ssb-timeseries-bI7nmIN7-py3.12/lib/python3.12/site-packages/bigtree/node/__init__.py’>]
- leaf_nodes(name='')¶
Return all leaf nodes in the taxonomy.
- Return type:
list
[<module ‘bigtree.node’ from ‘/home/runner/.cache/pypoetry/virtualenvs/ssb-timeseries-bI7nmIN7-py3.12/lib/python3.12/site-packages/bigtree/node/__init__.py’>]- Parameters:
name (Node | str)
- parent_nodes()¶
Return all non-leaf nodes in the taxonomy.
- Return type:
list
[<module ‘bigtree.node’ from ‘/home/runner/.cache/pypoetry/virtualenvs/ssb-timeseries-bI7nmIN7-py3.12/lib/python3.12/site-packages/bigtree/node/__init__.py’>]
- print_tree(*args, **kwargs)¶
Return a string with the tree structure.
Implementation is ugly! It would be preferable not to print the tree to std out. … but this works.
- Return type:
str
- save(path)¶
Save taxonomy to json file.
The file can be read using Taxonomy(<path to file>).
- Return type:
None
- Parameters:
path (str | PathLike[str])
- substitute(substitutions)¶
Substitute ‘code’ and ‘parent’ values with items in subsitution dictionary.
- Return type:
None
- Parameters:
substitutions (dict)
- subtree(key)¶
Get subtree of node identified by name (KLASS code).
- Return type:
Any
- Parameters:
key (str)
- add_root_node(df, root_node)¶
Prepend root node row to taxonomy dataframe.
- Return type:
DataFrame
- Parameters:
df (DataFrame)
root_node (dict[str, str | None])
- add_tag_values(old, additions, recursive=False)¶
Add tag values to a tag dict.
Will append new tags as a list if any values already exist. With parameters recursive=True, nested dicts are also traversed.
- Parameters:
old (dict[str, str | list[str]])
additions (dict[str, str | list[str]])
recursive (bool)
- Return type:
dict[str, str | list[str]]
- delete_dataset_tags(dictionary, *args, **kwargs)¶
Remove selected attributes from dataset tag dictionary.
- Parameters:
dictionary (dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]])
args (str)
kwargs (dict[str, dict[str, str | list[str]]] | bool)
- Return type:
dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]]
- delete_series_tags(dictionary, *args, **kwargs)¶
Remove selected series attributes from series or dataset tag dictionary.
- Return type:
dict
[str
,dict
[str
,str
|list
[str
]]] |dict
[str
,dict
[str
,str
|list
[str
]] |dict
[str
,dict
[str
,str
|list
[str
]]]]- Parameters:
dictionary (dict[str, dict[str, str | list[str]]] | dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]])
args (str)
kwargs (str | list[str])
- filter_tags(tags, criteria)¶
Filter tags based on the specified criteria.
- Parameters:
tags (dict[str, dict[str, any]]) – The dictionary of tags to filter.
criteria (dict[str, str | list[str]]) – The criteria to filter by. Values can be single strings or lists of strings.
- Returns:
A dictionary of tags that match the criteria.
- Return type:
dict[str, dict[str, any]]
- inherit_set_tags(tags)¶
Return the tags that are inherited from the set.
- Return type:
dict
[str
,Any
]- Parameters:
tags (dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]] | dict[str, dict[str, str | list[str]]])
- matches_criteria(tag, criteria)¶
Check if a tag matches the specified criteria.
- Parameters:
tag (dict[str, any]) – The tag to check.
criteria (dict[str, str | list[str]]) – The criteria to match against. Values can be single strings or lists of strings.
- Returns:
True if the tag matches the criteria, False otherwise.
- Return type:
bool
- permutations(taxonomies, filters='')¶
For a dict on the form {‘a’: Taxonomy(A), ‘b’: Taxonomy(B)}, returns permutations of items in A and B, subject to filters.
Filters are experimental and quite likely to change type / implementation. Notably, support for custom functions and include/exclude lists may be considered. For now: str | list[str] with length matching the taxonomies identifies Taxonomy tree functions as follows: :rtype:
list
[dict
]‘all’ | ‘all_nodes’ –> .all_nodes() ‘parents’ | ‘parent_nodes’ – .parent_nodes() ‘leaves’ | ‘leaf_nodes’ | ‘children’ | ‘child_nodes’ –> .leaf_nodes()
If no filters are provided, the default is ‘all’.
Examples
TODO: add some. See code for dataset.aggregate() for a notable use case.
- Parameters:
taxonomies (dict[str, Taxonomy])
filters (list[str] | str)
- Return type:
list[dict]
- replace_dataset_tags(existing, old, new, recursive=False)¶
Alter selected attributes value pairs in a tag dictionary.
- Parameters:
existing (dict[str, str | list[str]])
old (dict[str, str | list[str]])
new (dict[str, str | list[str]])
recursive (bool)
- Return type:
dict[str, dict[str, str | list[str]] | dict[str, dict[str, str | list[str]]]]
- rm_tag_values(existing, tags_to_remove, recursive=False)¶
Remove tag value from tag dict.
Values to remove and in tags can be string or list of strings.
- Parameters:
existing (dict[str, str | list[str]])
tags_to_remove (dict[str, str | list[str]])
recursive (bool | list[str])
- Return type:
dict[str, str | list[str]]
- rm_tags(existing, *args, recursive=False)¶
Remove attribute from tag dict regardless of value.
- Return type:
dict
[str
,str
|list
[str
]]- Parameters:
existing (dict[str, str | list[str]])
args (str)
recursive (bool | list[str])
- search_by_tags(tags, criteria)¶
Filter tags based on the specified criteria.
- Parameters:
tags (dict[str, dict[str, any]]) – The dictionary of tags to filter.
criteria (dict[str, str | list[str]]) – The criteria to filter by. Values can be single strings or lists of strings.
- Returns:
A dictionary of tags that match the criteria.
- Return type:
dict[str, dict[str, any]]
- series_tag_dict_edit(existing, replace, new)¶
Alter selected attributes in a Dataset.tag[‘series’] dictionary.
Either ‘replace’ or ‘new’ (or both) must be specified. If ‘replace == {}’, new tags are appended (aka ‘tag_series’). If ‘new == {}’, ‘replace’ tags are deleted (aka ‘detag_series’). If both are specified, ‘replace’ are deleted before ‘new’ are appended (aka ‘retag_series’).
- Return type:
dict
[str
,dict
[str
,str
|list
[str
]]]- Parameters:
existing (dict[str, dict[str, str | list[str]]])
replace (dict[str, str | list[str]])
new (dict[str, str | list[str]])
- to_tag_value(tag)¶
If input is a list of unique strings.
- Return type:
str
|list
[str
]- Parameters:
tag (str | list[str] | set)
- unique_tag_values(arg)¶
Wraps string input in list, and ensure the list is unique.
- Return type:
list
[str
]- Parameters:
arg (Any)