ssb_timeseries.fs

The main purpose of the ssb_timeseries.fs module is to allow file based IO that works on both a local file system and Google Cloud Storage.

cp(from_path, to_path)

Copy file … regardless of source and target location is local fs or GCS to local.

Return type:

None

Parameters:
  • from_path (str | PathLike[str])

  • to_path (str | PathLike[str])

existing_subpath(path)

Return the existing part of a path on local or GCS file system.

Return type:

str | PathLike[str]

Parameters:

path (str | PathLike[str])

exists(path)

Check if a given (local or GCS) path exists.

Return type:

bool

Parameters:

path (str | PathLike[str])

file_count(path, create=False)

Count files in path. Should work regardless of wether source and target location is local fs or GCS to local.

Return type:

int

Parameters:
  • path (str | PathLike[str])

  • create (bool)

find(search_path, equals='', contains='', pattern='', search_sub_dirs=True, full_path=False, replace_root=False)

Find files and subdirectories with names matching pattern. Should work for both local and GCS filesystems.

Return type:

list[str]

Parameters:
  • search_path (str | PathLike[str])

  • equals (str)

  • contains (str)

  • pattern (str)

  • search_sub_dirs (bool)

  • full_path (bool)

  • replace_root (bool)

fs_type(path)

Check filesystem type (local or GCS) for a given path.

Return type:

str

Parameters:

path (str | PathLike[str])

is_gcs(path)

Check if path is on GCS.

Return type:

bool

Parameters:

path (str | PathLike[str])

is_local(path)

Check if path is local.

Return type:

bool

Parameters:

path (str | PathLike[str])

ls(path, pattern='*', create=False)

List files. Should work regardless of wether the filesystem is local or GCS.

Return type:

list[str]

Parameters:
  • path (str)

  • pattern (str)

  • create (bool)

mk_parent_dir(path)

Ensure a parent directory exists. … regardless of wether fielsystem is local or GCS.

Return type:

None

Parameters:

path (str | PathLike[str])

mkdir(path)

Make directory regardless of filesystem is local or GCS.

Return type:

None

Parameters:

path (str | PathLike[str])

mv(from_path, to_path)

Move file … regardless of source and target location is local fs or GCS to local.

Return type:

None

Parameters:
  • from_path (str | PathLike[str])

  • to_path (str | PathLike[str])

pandas_read_parquet(path)

Quick and dirty –> replace later.

Return type:

DataFrame

Parameters:

path (str | PathLike[str])

pandas_write_parquet(df, path)

Quick and dirty –> replace later.

Return type:

None

Parameters:
  • df (DataFrame)

  • path (str | PathLike[str])

path(*args)

Join args to form path. Make sure that gcs paths are begins with double slash: gs://…

Return type:

str

Parameters:

args (str | PathLike[str])

path_to_str(path)

Normalise as strings.

This is a trick to make automated tests pass on Windows.

Return type:

str | PathLike[str]

Parameters:

path (str | PathLike[str])

read_json(path)

Read json file from path on either local fs or GCS.

Return type:

dict

Parameters:

path (str | PathLike[str])

read_parquet(path, returntype='pandas')

TODO: Add faster pyarrrow implementations enforcing type based schemas.

Return type:

tuple[table, Schema]

Parameters:
  • path (str | PathLike[str])

  • returntype (str)

read_text(path, file_format='')

Read a text file from specified path on either local fs or GCS.

Return type:

dict

Parameters:
  • path (str | PathLike[str])

  • file_format (str)

remove_prefix(path)

Helper function to compensate for some os.* functions shorten gs://<path> to gs:/<path>.

Return type:

str

Parameters:

path (str | PathLike[str])

rm(path)

Remove file from local or GCS filesystem. Nonrecursive. For a recursive variant, see rmtree().

Return type:

None

Parameters:

path (str | PathLike[str])

rmtree(path)

Recursively remove a directory and all its subdirectories and files regardless of local or GCS filesystem.

Return type:

None

Parameters:

path (str)

same_path(*args)

Return common part of path, for two or more files. Files must be on same file system, but the file system can be either local or GCS.

Return type:

str | PathLike[str]

to_arrow(df, schema=None)

Convert a Pandas or Polars dataframe to Pyarrow table, cast schema if provided.

Return type:

Table

Parameters:
  • df (Table | DataFrame | DataFrame)

  • schema (Schema | None)

touch(path)

Touch file regardless of wether the filesystem is local or GCS; return path.

Return type:

str | PathLike[str]

Parameters:

path (str | PathLike[str])

wrap_return_as_str(func)

Decorator to normalise outputs using path_to_str().

Return type:

Callable

Parameters:

func (Callable)

write_json(path, content)

Write json file to path on either local fs or GCS.

Return type:

None

Parameters:
  • path (str | PathLike[str])

  • content (str | dict)

write_parquet(data, path, schema=None, **kwargs)

TODO: Add faster pyarrrow implementations enforcing type based schemas.

Return type:

None

Parameters:
  • data (Table | DataFrame | DataFrame)

  • path (str | PathLike[str])

  • schema (Schema | None)

write_text(path, content, file_format)

Write json file to path on either local fs or GCS.

Return type:

None

Parameters:
  • path (str | PathLike[str])

  • content (str | dict)

  • file_format (str)