ssb_timeseries.io.pyarrow_hive

Provides a Hive-partitioned, file-based I/O handler for Parquet format.

This handler stores datasets in a wide format (series as columns) with embedded metadata, using a defined Hive-partitioned directory structure. For example:

<repository_root>/
├── data_type=AS_OF_AT/
│   └── dataset=my_versioned_dataset/
│       ├── as_of=2023-01-01T120000+0000/
│       │   └── part-0.parquet
│       └── as_of=2023-01-02T120000+0000/
│           └── part-0.parquet
└── data_type=NONE_AT/
    └── dataset=my_dataset/
        └── as_of=__HIVE_DEFAULT_PARTITION__/
            └── part-0.parquet
class HiveFileSystem(repository, set_name, set_type, as_of_utc=None, **kwargs)

Bases: object

A filesystem abstraction for reading and writing Hive-partitioned datasets.

Parameters:
  • repository (Any)

  • set_name (str)

  • set_type (types.SeriesType)

  • as_of_utc (datetime | None)

  • kwargs (dict[str, Any])

__init__(repository, set_name, set_type, as_of_utc=None, **kwargs)

Initialize the filesystem handler for a given dataset.

Parameters:
  • repository (Any)

  • set_name (str)

  • set_type (SeriesType)

  • as_of_utc (datetime | None)

  • kwargs (dict[str, TypeAliasForwardRef('typing.Any')])

Return type:

None

property directory: str

Return the data directory for the dataset.

property exists: bool

Check if the dataset directory exists.

read(*args, **kwargs)

Read a partitioned dataset from the filesystem.

Return type:

narwhals.typing.FrameT

property root: str

Return the root path of the configured repository.

versions()

List available versions by inspecting subdirectories.

Return type:

list[datetime | str]

write(data, tags=None)

Write data to the filesystem, partitioned by versioning scheme.

Return type:

None

Parameters:
  • data (narwhals.typing.FrameT)

  • tags (dict | None)