ssb_timeseries.io¶
The IO module provides the high-level facade for all data and metadata I/O.
This module serves as the single, authoritative entry point for all storage operations. The functions exposed here are intended to be the exclusive interface used by the rest of the library (such as the Dataset class) to interact with the storage layer. This includes read_data, read_metadata, save, search, find, persist, and versions.
This facade design decouples the core application logic from the specifics of the storage backends. The underlying implementation is a pluggable, configuration-driven system that dispatches tasks to the appropriate backend handler based on the active project configuration.
### Convenience Wrappers (DataIO, MetaIO)
The Dataset class does not call the dispatch mechanism directly. Instead, it uses these two helper classes. When you create an instance like DataIO(my_dataset), it holds the dataset context. Its .dh property then calls the _io_handler to get the appropriate data handler on the fly. MetaIO does the same for metadata. This simplifies the interaction from the Dataset class’s perspective.
Internal components like DataIO, MetaIO, and the concrete handler modules (e.g., ssb_timeseries.io.pyarrow_simple) are considered implementation details of this facade. They should not be imported or used directly by other parts of the application.
- class DataIO(ds)¶
Bases:
objectProvides a generic IO interface for the data of a specific dataset.
- Parameters:
ds (Dataset)
- __init__(ds)¶
Initialize the data IO handler for the given Dataset.
- Parameters:
ds (Dataset)
- Return type:
None
- property dh: DataReadWrite¶
Expose the configured IO handler for data operations.
- class MetaIO(ds=None, repository='')¶
Bases:
objectProvides a generic IO interface for the metadata of a specific dataset.
- Parameters:
ds (Dataset | None)
repository (str)
- __init__(ds=None, repository='')¶
Initialize the metadata IO handler.
The handler can be bound to a Dataset instance or a repository name.
- Parameters:
ds (Dataset | None)
repository (str)
- Return type:
None
- property dh: MetadataReadWrite¶
Expose the configured IO handler for metadata operations.
- read(set_name='')¶
Read metadata for a given dataset.
- Return type:
dict- Parameters:
set_name (str)
- search(**kwargs)¶
Search for datasets within a single repository.
- Return type:
list[dict]
- write(set_name='', tags=None)¶
Write metadata for a given dataset.
- Return type:
None- Parameters:
set_name (str)
tags (dict[str, str | list[str]] | None)
- find(set_name='', repository='', require_one=False, require_unique=False, **kwargs)¶
Find dataset metadata by name in specified or all repositories.
- Parameters:
set_name (
str) – The name of the dataset to find.repository (
str|dict) – The specific repository to search in. If empty, searches all.require_one (
bool) – If True, raises an error if no results are found.require_unique (
bool) – If True, raises an error if more than one result is found.**kwargs – Unused, but present for compatibility.
- Return type:
list[dict] |dict- Returns:
A single dictionary if one result is found, otherwise a list of dictionaries.
- Raises:
LookupError – If require_one or require_unique is True and the number of results does not match the requirement.
- persist(ds)¶
Copy a dataset snapshot to its configured immutable and shared locations.
This function relies on a snapshots section being defined in the project configuration. The dataset’s process_stage and sharing attributes determine the exact destination paths.
See also
For detailed configuration examples, refer to the guide on Configure I/O.
- Parameters:
ds (
Dataset) – The Dataset object to persist.- Return type:
None
- read_data(repository, set_name, as_of_tz=None)¶
Read the data for a single dataset into a dataframe.
- Parameters:
repository (
str|dict) – The repository name or configuration dictionary.set_name (
str) – The name of the dataset.as_of_tz (
datetime|None) – The version timestamp if the dataset is versioned.
- Return type:
Union[narwhals.typing.IntoDataFrame, narwhals.typing.IntoLazyFrame]- Returns:
A dataframe containing the dataset’s data.
- read_metadata(repository, set_name)¶
Read the metadata for a single dataset from the configured handler.
- Parameters:
repository (
str|dict) – The repository name or configuration dictionary.set_name (
str) – The name of the dataset.
- Return type:
dict- Returns:
A dictionary containing the dataset’s metadata.
- save(ds)¶
Write a dataset’s data and metadata to storage.
- Parameters:
ds (
Dataset) – The Dataset object to save.- Return type:
None
- search(**kwargs)¶
Search for datasets or series across one or more repositories.
- Parameters:
**kwargs – Search criteria such as ‘equals’, ‘contains’, ‘pattern’, ‘tags’, and ‘repositories’. See JsonMetaIO.search for details.
- Return type:
list[dict]