ssb_timeseries.io.pyarrow_simple

Provides a PyArrow-based simple, file-based I/O handler for Parquet format.

This handler stores datasets in a wide format (series as columns) with embedded metadata, using a defined directory structure. For example:

<repository_root>/
├── AS_OF_AT/
│   └── my_versioned_dataset/
│       ├── my_versioned_dataset-as_of_20230101T120000+0000-data.parquet
│       └── my_versioned_dataset-as_of_20230102T120000+0000-data.parquet
└── NONE_AT/
    └── my_dataset/
        └── my_dataset-latest-data.parquet

It uses PyArrow for eager reading and writing.

class FileSystem(repository, set_name, set_type, as_of_utc=None, process_stage='statistikk', sharing=None)

Bases: object

A filesystem abstraction for reading and writing dataset data.

Parameters:
  • repository (Any)

  • set_name (str)

  • set_type (types.SeriesType)

  • as_of_utc (datetime | None)

  • process_stage (str)

  • sharing (dict | None)

__init__(repository, set_name, set_type, as_of_utc=None, process_stage='statistikk', sharing=None)

Initialize the filesystem handler for a given dataset.

This method calculates the necessary directory structure based on the dataset’s type and name.

Parameters:
  • repository (Any)

  • set_name (str)

  • set_type (SeriesType)

  • as_of_utc (datetime | None)

  • process_stage (str)

  • sharing (dict | None)

Return type:

None

property directory: str

Return the data directory for the dataset.

property exists: bool

Check if the data file for the dataset exists.

property filename: str

Construct the standard filename for the dataset’s data file.

property fullpath: str

Return the full path to the dataset’s data file.

read(interval='')

Read data from the filesystem.

Returns an empty dataframe if the file is not found.

Return type:

Table

Parameters:

interval (str)

property root: str

Return the root path of the configured repository.

versions(file_pattern='*', pattern='as_of')

List all available version markers from the data directory.

Return type:

list[datetime | str]

Parameters:
write(data, tags=None)

Write data to the filesystem.

If versioning is AS_OF, a new file is always created. If versioning is NONE, new data is merged into the existing file.

Return type:

None

Parameters:
  • data (narwhals.typing.FrameT)

  • tags (dict | None)

class SearchResult(name: str, type_directory: str)

Bases: NamedTuple

Represents a single item in a search result.

Parameters:
  • name (str)

  • type_directory (str)

name: str

Alias for field number 0

type_directory: str

Alias for field number 1

find_datasets(pattern='', exclude='metadata', repository='')

Search for dataset directories in all configured repositories.

Parameters:
  • pattern (str | PathLike[str]) – A glob pattern to match against directory names.

  • exclude (str) – A substring to exclude from the search results.

  • repository (list[str | PathLike[str]] | str | PathLike[str]) – A specific repository path to search in. If empty, searches all configured repositories.

Return type:

list[SearchResult]

Returns:

A list of SearchResult objects for the found datasets.

last_version_number_by_regex(directory, pattern='*')

Return the max version number from files in a directory matching a pattern.

Return type:

str

Parameters:
  • directory (str)

  • pattern (str)