ssb_timeseries.io.pyarrow_simple¶
Provides a PyArrow-based simple, file-based I/O handler for Parquet format.
This handler stores datasets in a wide format (series as columns) with embedded metadata, using a defined directory structure. For example:
<repository_root>/
├── AS_OF_AT/
│ └── my_versioned_dataset/
│ ├── my_versioned_dataset-as_of_20230101T120000+0000-data.parquet
│ └── my_versioned_dataset-as_of_20230102T120000+0000-data.parquet
└── NONE_AT/
└── my_dataset/
└── my_dataset-latest-data.parquet
It uses PyArrow for eager reading and writing.
- class FileSystem(repository, set_name, set_type, as_of_utc=None, process_stage='statistikk', sharing=None)¶
Bases:
objectA filesystem abstraction for reading and writing dataset data.
- Parameters:
repository (Any)
set_name (str)
set_type (types.SeriesType)
as_of_utc (datetime | None)
process_stage (str)
sharing (dict | None)
- __init__(repository, set_name, set_type, as_of_utc=None, process_stage='statistikk', sharing=None)¶
Initialize the filesystem handler for a given dataset.
This method calculates the necessary directory structure based on the dataset’s type and name.
- Parameters:
repository (Any)
set_name (str)
set_type (SeriesType)
as_of_utc (datetime | None)
process_stage (str)
sharing (dict | None)
- Return type:
None
- property directory: str¶
Return the data directory for the dataset.
- property exists: bool¶
Check if the data file for the dataset exists.
- property filename: str¶
Construct the standard filename for the dataset’s data file.
- property fullpath: str¶
Return the full path to the dataset’s data file.
- read(interval='')¶
Read data from the filesystem.
Returns an empty dataframe if the file is not found.
- Return type:
Table- Parameters:
interval (str)
- property root: str¶
Return the root path of the configured repository.
- versions(file_pattern='*', pattern='as_of')¶
List all available version markers from the data directory.
- Return type:
list[datetime|str]- Parameters:
file_pattern (str)
pattern (str | Versioning)
- write(data, tags=None)¶
Write data to the filesystem.
If versioning is AS_OF, a new file is always created. If versioning is NONE, new data is merged into the existing file.
- Return type:
None- Parameters:
data (narwhals.typing.FrameT)
tags (dict | None)
- class SearchResult(name: str, type_directory: str)¶
Bases:
NamedTupleRepresents a single item in a search result.
- Parameters:
name (str)
type_directory (str)
-
name:
str¶ Alias for field number 0
-
type_directory:
str¶ Alias for field number 1
- find_datasets(pattern='', exclude='metadata', repository='')¶
Search for dataset directories in all configured repositories.
- Parameters:
pattern (
str|PathLike[str]) – A glob pattern to match against directory names.exclude (
str) – A substring to exclude from the search results.repository (
list[str|PathLike[str]] |str|PathLike[str]) – A specific repository path to search in. If empty, searches all configured repositories.
- Return type:
list[SearchResult]- Returns:
A list of SearchResult objects for the found datasets.
- last_version_number_by_regex(directory, pattern='*')¶
Return the max version number from files in a directory matching a pattern.
- Return type:
str- Parameters:
directory (str)
pattern (str)