ssb_timeseries.io.json_metadata

Provides a file-based I/O handler for storing metadata in JSON format.

This handler registers dataset metadata in a central “catalog” directory. The structure is /<repository_catalog_path>/<dataset_name>-metadata.json.

This approach duplicates metadata that might also be stored in data files (like Parquet headers), but provides a fast and searchable central index.

class JsonMetaIO(repository, set_name='')

Bases: object

Provides file-based metadata storage for time series Datasets.

This class handles reading and writing metadata to a central catalog, where each dataset’s metadata is stored in a separate JSON file.

Parameters:
  • repository (FileBasedRepository)

  • set_name (str)

__init__(repository, set_name='')

Initialize the handler for a given repository and dataset.

Parameters:
  • repository (Repository) – The repository configuration dictionary.

  • set_name (str) – The name of the dataset to operate on.

Return type:

None

property dir: str

Return the configured catalog directory path for the repository.

property exists: bool

Check if the metadata file for a given dataset exists.

fullpath(set_name='')

Return the full path to a dataset’s metadata file.

Return type:

str

Parameters:

set_name (str)

read(**kwargs)

Read and return the metadata for a given dataset.

Parameters:

**kwargs – May include ‘set_name’ to override the instance’s default.

Return type:

dict

search(**kwargs)

Search the catalog for datasets and series matching given criteria.

Parameters:

**kwargs – Search criteria including ‘equals’, ‘contains’, ‘pattern’, ‘tags’, ‘datasets’ (bool), and ‘series’ (bool).

Return type:

list[dict]

Returns:

A list of dictionaries, where each dictionary represents a matching dataset or series.

write(tags, set_name)

Write metadata tags to a dataset’s JSON file.

Parameters:
  • tags (dict) – The dictionary of metadata to write.

  • set_name (str) – The name of the dataset.

Return type:

None

class SearchResult(name: str, type_directory: str)

Bases: NamedTuple

Represents a single item in a metadata search result.

Parameters:
  • name (str)

  • type_directory (str)

name: str

Alias for field number 0

type_directory: str

Alias for field number 1

find_metadata_files(path, pattern='', contains='', equals='', **kwargs)

Find metadata JSON files in the catalog directory.

Parameters:
  • path (str | PathLike[str]) – The directory path to search in.

  • pattern (str) – A glob pattern to match against dataset names.

  • contains (str) – A substring to match within dataset names.

  • equals (str) – An exact dataset name to match.

  • **kwargs – Additional arguments passed to the underlying search function.

Return type:

list[str]

Returns:

A list of full paths to the matching metadata files.

tags_from_json(dict_with_json_string, byte_encoded=True)

Deserialize a tag dictionary from the Parquet metadata format.

This is the reverse of tags_to_json, extracting the JSON string from the container dictionary and parsing it.

Return type:

dict

Parameters:
  • dict_with_json_string (dict)

  • byte_encoded (bool)

tags_from_json_file(file_or_files)

Read and parse one or more metadata JSON files.

Return type:

dict[str, Any] | list[dict[str, Any]]

Parameters:

file_or_files (str | PathLike[str] | list[str | PathLike[str]])

tags_to_json(x)

Serialize a tag dictionary into a format suitable for Parquet metadata.

This function encodes the entire tag dictionary as a single JSON string within a new dictionary, which is required for compatibility with the PyArrow schema metadata.

Return type:

dict[str, str]

Parameters:

x (dict[str, str | list[str]])