ssb_timeseries.io.json_metadata¶
Provides a file-based I/O handler for storing metadata in JSON format.
This handler registers dataset metadata in a central “catalog” directory. The structure is /<repository_catalog_path>/<dataset_name>-metadata.json.
This approach duplicates metadata that might also be stored in data files (like Parquet headers), but provides a fast and searchable central index.
- class JsonMetaIO(repository, set_name='')¶
Bases:
objectProvides file-based metadata storage for time series Datasets.
This class handles reading and writing metadata to a central catalog, where each dataset’s metadata is stored in a separate JSON file.
- Parameters:
repository (FileBasedRepository)
set_name (str)
- __init__(repository, set_name='')¶
Initialize the handler for a given repository and dataset.
- Parameters:
repository (
Repository) – The repository configuration dictionary.set_name (
str) – The name of the dataset to operate on.
- Return type:
None
- property dir: str¶
Return the configured catalog directory path for the repository.
- property exists: bool¶
Check if the metadata file for a given dataset exists.
- fullpath(set_name='')¶
Return the full path to a dataset’s metadata file.
- Return type:
str- Parameters:
set_name (str)
- read(**kwargs)¶
Read and return the metadata for a given dataset.
- Parameters:
**kwargs – May include ‘set_name’ to override the instance’s default.
- Return type:
dict
- search(**kwargs)¶
Search the catalog for datasets and series matching given criteria.
- Parameters:
**kwargs – Search criteria including ‘equals’, ‘contains’, ‘pattern’, ‘tags’, ‘datasets’ (bool), and ‘series’ (bool).
- Return type:
list[dict]- Returns:
A list of dictionaries, where each dictionary represents a matching dataset or series.
- write(tags, set_name)¶
Write metadata tags to a dataset’s JSON file.
- Parameters:
tags (
dict) – The dictionary of metadata to write.set_name (
str) – The name of the dataset.
- Return type:
None
- class SearchResult(name: str, type_directory: str)¶
Bases:
NamedTupleRepresents a single item in a metadata search result.
- Parameters:
name (str)
type_directory (str)
-
name:
str¶ Alias for field number 0
-
type_directory:
str¶ Alias for field number 1
- find_metadata_files(path, pattern='', contains='', equals='', **kwargs)¶
Find metadata JSON files in the catalog directory.
- Parameters:
path (
str|PathLike[str]) – The directory path to search in.pattern (
str) – A glob pattern to match against dataset names.contains (
str) – A substring to match within dataset names.equals (
str) – An exact dataset name to match.**kwargs – Additional arguments passed to the underlying search function.
- Return type:
list[str]- Returns:
A list of full paths to the matching metadata files.
- tags_from_json(dict_with_json_string, byte_encoded=True)¶
Deserialize a tag dictionary from the Parquet metadata format.
This is the reverse of tags_to_json, extracting the JSON string from the container dictionary and parsing it.
- Return type:
dict- Parameters:
dict_with_json_string (dict)
byte_encoded (bool)
- tags_from_json_file(file_or_files)¶
Read and parse one or more metadata JSON files.
- Return type:
dict[str,Any] |list[dict[str,Any]]- Parameters:
file_or_files (str | PathLike[str] | list[str | PathLike[str]])
- tags_to_json(x)¶
Serialize a tag dictionary into a format suitable for Parquet metadata.
This function encodes the entire tag dictionary as a single JSON string within a new dictionary, which is required for compatibility with the PyArrow schema metadata.
- Return type:
dict[str,str]- Parameters:
x (dict[str, str | list[str]])