ssb_timeseries.catalog
¶
The ssb_timeseries.catalog
module provides several tools for searching for datasets or series in every Repository
of a Catalog
.
The catalog is essentially just a logical collection of repositories, providing a search interface across all of them.
Searches can list or count sets, series or items (both). The search criteria can be complete names (equals), parts of names (contains), or metadata attributes (tags).
A returned py:class:CatalogItem instance is identified by name and descriptive metadate, plus the repository, object type and relationships to parent and child objects are provided. Other information, like lineage and data quality metrics may be added later.
>>>
>>> from ssb_timeseries.catalog import Catalog
>>> everything = Catalog().items()
>>>
- class Catalog(config)¶
Bases:
_CatalogProtocol
A data catalog collects metadata from one or more physical data repositories and performs searches across them.
Add all repositories in the configuration to catalog object.
Repositories are essentially just named locations.
Example
>>> from ssb_timeseries.config import CONFIG >>> some_directory = CONFIG.catalog
>>> repo1 = Repository(name="test_1", directory=some_directory) >>> repo2 = Repository(name="test_2", directory=some_directory) >>> catalog = Catalog(config=[repo1, repo2])
>>> series_in_repo1 = repo1.series(contains='KOSTRA') >>> sets_in_catalog = catalog.datasets()
- Parameters:
config (list[_FileRepositoryProtocol])
- __init__(config)¶
Add all repositories in the configuration to catalog object.
Repositories are essentially just named locations.
Example
>>> from ssb_timeseries.config import CONFIG >>> some_directory = CONFIG.catalog
>>> repo1 = Repository(name="test_1", directory=some_directory) >>> repo2 = Repository(name="test_2", directory=some_directory) >>> catalog = Catalog(config=[repo1, repo2])
>>> series_in_repo1 = repo1.series(contains='KOSTRA') >>> sets_in_catalog = catalog.datasets()
- Parameters:
config (list[_FileRepositoryProtocol])
- Return type:
None
- __repr__()¶
Return a machine readable string representation that can regenerate the catalog object.
- Return type:
str
- count(*, object_type='', equals='', contains='', tags=None)¶
Count items of specified object type that match the criteria.
- Parameters:
object_type (
str
) – ‘dataset’ or ‘series’.equals (str) – Search within datasets where names are equal to the argument. The default ‘’ searches within all sets.
contains (str) – Search within datasets where names contain the argument. The default ‘’ searches within all sets.
tags (dict) – Filter the sets or series in the result set by the specified tags. Defaults to None. All tags in dict must be satisfied for the same series (tags are combined by AND). If a list of values is provided for a tag, the criteria is satisfied for either of them (OR). | list(dict) Support for list(dict) is planned, not yet implemented, to satisfy alternative sets of criteria (the dicts will be combined by OR).
- Return type:
int
- datasets(**kwargs)¶
Search in all repositories for datasets that match the criteria.
- Parameters:
equals (str) – Search within datasets where names are equal to the argument. The default ‘’ searches within all sets.
contains (str) – Search within datasets where names contain the argument. The default ‘’ searches within all sets.
tags (dict) – Filter the sets or series in the result set by the specified tags. Defaults to None. All tags in dict must be satisfied for the same series (tags are combined by AND). If a list of values is provided for a tag, the criteria is satisfied for either of them (OR). | list(dict) Support for list(dict) is planned, not yet implemented, to satisfy alternative sets of criteria (the dicts will be combined by OR).
- Return type:
list
[CatalogItem
]
- items(datasets=True, series=True, equals='', contains='', **kwargs)¶
Aggregate all the information into a single dictionary.
- Return type:
list
[CatalogItem
]- Parameters:
datasets (bool)
series (bool)
equals (str)
contains (str)
- series(**kwargs)¶
Search in all datasets in all repositories for series that match the criteria.
- Parameters:
equals (str) – Search within datasets where names are equal to the argument. The default ‘’ searches within all sets.
contains (str) – Search within datasets where names contain the argument. The default ‘’ searches within all sets.
tags (dict) – Filter the sets or series in the result set by the specified tags. Defaults to None. All tags in dict must be satisfied for the same series (tags are combined by AND). If a list of values is provided for a tag, the criteria is satisfied for either of them (OR). | list(dict) Support for list(dict) is planned, not yet implemented, to satisfy alternative sets of criteria (the dicts will be combined by OR).
- Return type:
list
[CatalogItem
]
- class CatalogItem(repository_name, object_name, object_type, object_tags, parent=None, children=None)¶
Bases:
object
A single item (set or series) in the data catalog.
- Parameters:
repository_name (str)
object_name (str)
object_type (str)
object_tags (dict[str, Any])
parent (Any)
children (set[Any] | list[Any] | None)
- __eq__(other)¶
Catalog items are considered equal if object type and object name are equal.
- Return type:
bool
- __hash__()¶
Hash function must be provided to be able to make sets of catalog items.
This implementation requires that for each object_type there is only one item with any given object name in each repository.
- Return type:
int
-
children:
set
[Any
] |list
[Any
] |None
= None¶ The children of the object.
- get()¶
Return the dataset.
- Return type:
Any
- has_tags(tags)¶
Check if the catalog item has all tags provided in criteria.
- Return type:
bool
- Parameters:
tags (Any)
-
object_name:
str
¶ The name of the object.
-
object_tags:
dict
[str
,Any
]¶ The tags of the object.
-
object_type:
str
¶ The type of the object.
-
parent:
Any
= None¶ The parent of the object.
-
repository_name:
str
¶ The repository name that contains the object.
- class ObjectType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
Enum
Supported object types for data catalog items.
- DATASET = 'dataset'¶
- SERIES = 'series'¶
- class Repository(name='', directory='', repo_config=None)¶
Bases:
_CatalogProtocol
A physical storage repository for timeseries datasets.
Initiate one repository.
- Parameters:
name (str)
directory (str)
repo_config (_FileRepositoryProtocol | None)
- __init__(name='', directory='', repo_config=None)¶
Initiate one repository.
- Parameters:
name (str)
directory (str)
repo_config (_FileRepositoryProtocol | None)
- Return type:
None
- __repr__()¶
Return a machine readable string representation that can regenerate the repository object.
- Return type:
str
- count(*, object_type='', **kwargs)¶
Count items of specified object type that match the criteria.
- Parameters:
object_type (
str
) – ‘dataset’ or ‘series’.equals (str) – Search within datasets where names are equal to the argument. The default ‘’ searches within all sets.
contains (str) – Search within datasets where names contain the argument. The default ‘’ searches within all sets.
tags (dict) – Filter the sets or series in the result set by the specified tags. Defaults to None. All tags in dict must be satisfied for the same series (tags are combined by AND). If a list of values is provided for a tag, the criteria is satisfied for either of them (OR). | list(dict) Support for list(dict) is planned, not yet implemented, to satisfy alternative sets of criteria (the dicts will be combined by OR).
- Return type:
int
- datasets(*, equals='', contains='', tags=None)¶
Search in all repositories for datasets that match the criteria.
- Parameters:
equals (str) – Search within datasets where names are equal to the argument. The default ‘’ searches within all sets.
contains (str) – Search within datasets where names contain the argument. The default ‘’ searches within all sets.
tags (dict) – Filter the sets or series in the result set by the specified tags. Defaults to None. All tags in dict must be satisfied for the same series (tags are combined by AND). If a list of values is provided for a tag, the criteria is satisfied for either of them (OR). | list(dict) Support for list(dict) is planned, not yet implemented, to satisfy alternative sets of criteria (the dicts will be combined by OR).
- Return type:
list
[CatalogItem
]
-
directory:
str
= ''¶
- files(*, contains='', equals='')¶
Return all files in the repository.
- Return type:
list
[str
]- Parameters:
contains (str)
equals (str)
- items(datasets=True, series=True, equals='', contains='', tags=None)¶
Search in all repositories for items (either sets or series) that match the criteria.
- Parameters:
datasets (bool) – Search for ‘datasets’.
series (bool) – Search for ‘series’.
equals (str) – Search within datasets where names are equal to the argument. The default ‘’ searches within all sets.
contains (str) – Search within datasets where names contain the argument. The default ‘’ searches within all sets.
tags (dict) – Filter the sets or series in the result set by the specified tags. Defaults to None. All tags in dict must be satisfied for the same series (tags are combined by AND). If a list of values is provided for a tag, the criteria is satisfied for either of them (OR). | list(dict) Support for list(dict) is planned, not yet implemented, to satisfy alternative sets of criteria (the dicts will be combined by OR).
- Return type:
list
[CatalogItem
]
-
name:
str
= ''¶
- series(*, equals='', contains='', tags=None)¶
Search in all datasets in all repositories for series that match the criteria.
- Parameters:
equals (str) – Search within datasets where names are equal to the argument. The default ‘’ searches within all sets.
contains (str) – Search within datasets where names contain the argument. The default ‘’ searches within all sets.
tags (dict) – Filter the sets or series in the result set by the specified tags. Defaults to None. All tags in dict must be satisfied for the same series (tags are combined by AND). If a list of values is provided for a tag, the criteria is satisfied for either of them (OR). | list(dict) Support for list(dict) is planned, not yet implemented, to satisfy alternative sets of criteria (the dicts will be combined by OR).
- Return type:
list
[CatalogItem
]
- class _CatalogProtocol(*args, **kwargs)¶
Bases:
Protocol
Defines the required methods for catalogs and repositories.
Catalogs consist of one or more repositories, hence performs the searches across all repositories and accumulates the results.
- series(
equals:str, contains:str, tags:TagDict | list[TagDict]
- ) --> list[CatalogItem]
- datasets(
equals:str, contains:str, tags:TagDict | list[TagDict]
- ) --> list[CatalogItem]
- items(
equals:str, contains:str, tags:TagDict | list[TagDict]
- ) --> list[CatalogItem]
- count(
object_type:str, equals:str, contains:str, tags:TagDict | list[TagDict]
- ) -> int
- count(*, object_type='', equals='', contains='', tags=None)¶
Count items of specified object type that match the criteria.
- Parameters:
object_type (
str
) – ‘dataset’ or ‘series’.equals (str) – Search within datasets where names are equal to the argument. The default ‘’ searches within all sets.
contains (str) – Search within datasets where names contain the argument. The default ‘’ searches within all sets.
tags (dict) – Filter the sets or series in the result set by the specified tags. Defaults to None. All tags in dict must be satisfied for the same series (tags are combined by AND). If a list of values is provided for a tag, the criteria is satisfied for either of them (OR). | list(dict) Support for list(dict) is planned, not yet implemented, to satisfy alternative sets of criteria (the dicts will be combined by OR).
- Return type:
int
- datasets(*, equals='', contains='', tags=None)¶
Search in all repositories for datasets that match the criteria.
- Parameters:
equals (str) – Search within datasets where names are equal to the argument. The default ‘’ searches within all sets.
contains (str) – Search within datasets where names contain the argument. The default ‘’ searches within all sets.
tags (dict) – Filter the sets or series in the result set by the specified tags. Defaults to None. All tags in dict must be satisfied for the same series (tags are combined by AND). If a list of values is provided for a tag, the criteria is satisfied for either of them (OR). | list(dict) Support for list(dict) is planned, not yet implemented, to satisfy alternative sets of criteria (the dicts will be combined by OR).
- Return type:
list
[CatalogItem
]
- items(*, datasets=True, series=True, equals='', contains='', tags=None)¶
Search in all repositories for items (either sets or series) that match the criteria.
- Parameters:
datasets (bool) – Search for ‘datasets’.
series (bool) – Search for ‘series’.
equals (str) – Search within datasets where names are equal to the argument. The default ‘’ searches within all sets.
contains (str) – Search within datasets where names contain the argument. The default ‘’ searches within all sets.
tags (dict) – Filter the sets or series in the result set by the specified tags. Defaults to None. All tags in dict must be satisfied for the same series (tags are combined by AND). If a list of values is provided for a tag, the criteria is satisfied for either of them (OR). | list(dict) Support for list(dict) is planned, not yet implemented, to satisfy alternative sets of criteria (the dicts will be combined by OR).
- series(*, equals='', contains='', tags=None)¶
Search in all datasets in all repositories for series that match the criteria.
- Parameters:
equals (str) – Search within datasets where names are equal to the argument. The default ‘’ searches within all sets.
contains (str) – Search within datasets where names contain the argument. The default ‘’ searches within all sets.
tags (dict) – Filter the sets or series in the result set by the specified tags. Defaults to None. All tags in dict must be satisfied for the same series (tags are combined by AND). If a list of values is provided for a tag, the criteria is satisfied for either of them (OR). | list(dict) Support for list(dict) is planned, not yet implemented, to satisfy alternative sets of criteria (the dicts will be combined by OR).
- Return type:
list
[CatalogItem
]