ssb_timeseries.io.pyarrow_hive¶
Provides a Hive-partitioned, file-based I/O handler for Parquet format.
This handler stores datasets in a wide format (series as columns) with embedded metadata, using a defined Hive-partitioned directory structure. For example:
<repository_root>/
├── data_type=AS_OF_AT/
│ └── dataset=my_versioned_dataset/
│ ├── as_of=2023-01-01T120000+0000/
│ │ └── part-0.parquet
│ └── as_of=2023-01-02T120000+0000/
│ └── part-0.parquet
└── data_type=NONE_AT/
└── dataset=my_dataset/
└── as_of=__HIVE_DEFAULT_PARTITION__/
└── part-0.parquet
- class HiveFileSystem(repository, set_name, set_type, as_of_utc=None, **kwargs)¶
Bases:
objectA filesystem abstraction for reading and writing Hive-partitioned datasets.
- Parameters:
repository (Any)
set_name (str)
set_type (types.SeriesType)
as_of_utc (datetime | None)
kwargs (dict[str, Any])
- __init__(repository, set_name, set_type, as_of_utc=None, **kwargs)¶
Initialize the filesystem handler for a given dataset.
- Parameters:
repository (Any)
set_name (str)
set_type (SeriesType)
as_of_utc (datetime | None)
kwargs (dict[str, TypeAliasForwardRef('typing.Any')])
- Return type:
None
- property directory: str¶
Return the data directory for the dataset.
- property exists: bool¶
Check if the dataset directory exists.
- read(*args, **kwargs)¶
Read a partitioned dataset from the filesystem.
- Return type:
narwhals.typing.FrameT
- property root: str¶
Return the root path of the configured repository.
- versions()¶
List available versions by inspecting subdirectories.
- Return type:
list[datetime|str]
- write(data, tags=None)¶
Write data to the filesystem, partitioned by versioning scheme.
- Return type:
None- Parameters:
data (narwhals.typing.FrameT)
tags (dict | None)