ssb_timeseries.io.protocols

Defines the structural contracts for I/O handlers using typing.Protocol.

This module specifies the formal API that a custom I/O plugin must adhere to. By using Protocols (structural typing), external users can create handler classes that are compatible with ssb-timeseries without needing to inherit from any of its base classes. This provides maximum flexibility and decoupling for plugin authors.

class DataReadWrite(repository, set_name, set_type, as_of_utc=None, **kwargs)

Bases: Protocol

Defines the contract (protocol) for data IO handlers.

Parameters:
  • repository (str | dict)

  • set_name (str)

  • set_type (str)

  • as_of_utc (datetime | None)

__init__(repository, set_name, set_type, as_of_utc=None, **kwargs)

Initialize the IO handler with configuration for a specific data storage.

This constructor is called by the IO dispatcher. It configures the handler instance to operate within a specific context.

Parameters:
  • repository (str | dict) – The data repository name or configuration.

  • set_name (str) – The dataset name.

  • set_type (str) – The data type for the dataset.

  • as_of_utc (datetime | None) – The version marker (should be timezone aware).

  • **kwargs – Any parameters defined for the handler in the configuration.

Return type:

None

property exists: bool

Check if the dataset exists in the configured storage.

read(*args, **kwargs)

Read data from the configured storage.

Return type:

typing.Any

Returns:

The dataset’s data in a dataframe-like format (e.g., PyArrow Table). If the data does not exist, an empty dataframe should be returned.

versions(*args, **kwargs)

Retrieve a list of available versions for the dataset.

Return type:

list[datetime | str]

Returns:

A sorted list of version identifiers (datetimes or strings).

write(data, tags=None)

Write the dataset’s data to the configured storage.

This method should handle both the creation of new data files and the updating/merging of data into existing files, depending on the versioning strategy of the dataset.

Parameters:
  • data (typing.Any) – The data to be written (e.g., a pandas DataFrame or PyArrow Table).

  • tags (dict | None) – A dictionary of metadata tags to be stored with the data, often in the file’s schema.

Return type:

None

class MetadataReadWrite(repository, set_name, **kwargs)

Bases: Protocol

Defines the contract (protocol) for metadata IO handlers.

Parameters:
  • repository (str | dict)

  • set_name (str)

__init__(repository, set_name, **kwargs)

Initialize the IO handler for a specific metadata storage.

Parameters:
  • repository (str | dict) – The metadata repository name or configuration.

  • set_name (str) – The dataset name to operate on.

  • **kwargs – Any parameters defined for the handler in the configuration.

Return type:

None

exists(name)

Check if metadata for a given dataset name exists.

Return type:

bool

Parameters:

name (str)

find(**kwargs)

Find datasets in the configured storage based on metadata criteria.

Return type:

bool

read(**kwargs)

Read metadata from the configured storage.

Return type:

dict[str, typing.Any]

Returns:

A dictionary containing the metadata tags for the dataset.

classmethod search(**kwargs)

Search and retrieve metadata from the configured storage.

This method should allow searching for datasets based on various metadata criteria.

Return type:

dict[str, typing.Any]

Returns:

A dictionary or list of dictionaries containing the search results.

write(**kwargs)

Write metadata to the configured storage.

Return type:

None