ssb_timeseries.fs
¶
The main purpose of the ssb_timeseries.fs
module is to allow file based IO that works on both a local file system and Google Cloud Storage.
- cp(from_path, to_path)¶
Copy file … regardless of source and target location is local fs or GCS to local.
- Return type:
None
- Parameters:
from_path (str | PathLike[str])
to_path (str | PathLike[str])
- existing_subpath(path)¶
Return the existing part of a path on local or GCS file system.
- Return type:
str
|PathLike
[str
]- Parameters:
path (str | PathLike[str])
- exists(path)¶
Check if a given (local or GCS) path exists.
- Return type:
bool
- Parameters:
path (str | PathLike[str])
- file_count(path, create=False)¶
Count files in path. Should work regardless of wether source and target location is local fs or GCS to local.
- Return type:
int
- Parameters:
path (str | PathLike[str])
create (bool)
- find(search_path, equals='', contains='', pattern='', search_sub_dirs=True, full_path=False, replace_root=False)¶
Find files and subdirectories with names matching pattern. Should work for both local and GCS filesystems.
- Return type:
list
[str
]- Parameters:
search_path (str | PathLike[str])
equals (str)
contains (str)
pattern (str)
search_sub_dirs (bool)
full_path (bool)
replace_root (bool)
- fs_type(path)¶
Check filesystem type (local or GCS) for a given path.
- Return type:
str
- Parameters:
path (str | PathLike[str])
- is_gcs(path)¶
Check if path is on GCS.
- Return type:
bool
- Parameters:
path (str | PathLike[str])
- is_local(path)¶
Check if path is local.
- Return type:
bool
- Parameters:
path (str | PathLike[str])
- ls(path, pattern='*', create=False)¶
List files. Should work regardless of wether the filesystem is local or GCS.
- Return type:
list
[str
]- Parameters:
path (str)
pattern (str)
create (bool)
- mk_parent_dir(path)¶
Ensure a parent directory exists. … regardless of wether fielsystem is local or GCS.
- Return type:
None
- Parameters:
path (str | PathLike[str])
- mkdir(path)¶
Make directory regardless of filesystem is local or GCS.
- Return type:
None
- Parameters:
path (str | PathLike[str])
- mv(from_path, to_path)¶
Move file … regardless of source and target location is local fs or GCS to local.
- Return type:
None
- Parameters:
from_path (str | PathLike[str])
to_path (str | PathLike[str])
- pandas_read_parquet(path)¶
Quick and dirty –> replace later.
- Return type:
DataFrame
- Parameters:
path (str | PathLike[str])
- pandas_write_parquet(df, path)¶
Quick and dirty –> replace later.
- Return type:
None
- Parameters:
df (DataFrame)
path (str | PathLike[str])
- path(*args)¶
Join args to form path. Make sure that gcs paths are begins with double slash: gs://…
- Return type:
str
- Parameters:
args (str | PathLike[str])
- path_to_str(path)¶
Normalise as strings.
This is a trick to make automated tests pass on Windows.
- Return type:
str
|PathLike
[str
]- Parameters:
path (str | PathLike[str])
- read_json(path)¶
Read json file from path on either local fs or GCS.
- Return type:
dict
- Parameters:
path (str | PathLike[str])
- read_parquet(path, returntype='pandas')¶
TODO: Add faster pyarrrow implementations enforcing type based schemas.
- Return type:
tuple
[table
,Schema
]- Parameters:
path (str | PathLike[str])
returntype (str)
- read_text(path, file_format='')¶
Read a text file from specified path on either local fs or GCS.
- Return type:
dict
- Parameters:
path (str | PathLike[str])
file_format (str)
- remove_prefix(path)¶
Helper function to compensate for some os.* functions shorten gs://<path> to gs:/<path>.
- Return type:
str
- Parameters:
path (str | PathLike[str])
- rm(path)¶
Remove file from local or GCS filesystem. Nonrecursive. For a recursive variant, see rmtree().
- Return type:
None
- Parameters:
path (str | PathLike[str])
- rmtree(path)¶
Recursively remove a directory and all its subdirectories and files regardless of local or GCS filesystem.
- Return type:
None
- Parameters:
path (str)
- same_path(*args)¶
Return common part of path, for two or more files. Files must be on same file system, but the file system can be either local or GCS.
- Return type:
str
|PathLike
[str
]
- to_arrow(df, schema=None)¶
Convert a Pandas or Polars dataframe to Pyarrow table, cast schema if provided.
- Return type:
Table
- Parameters:
df (Table | DataFrame | DataFrame)
schema (Schema | None)
- touch(path)¶
Touch file regardless of wether the filesystem is local or GCS; return path.
- Return type:
str
|PathLike
[str
]- Parameters:
path (str | PathLike[str])
- wrap_return_as_str(func)¶
Decorator to normalise outputs using path_to_str().
- Return type:
Callable
- Parameters:
func (Callable)
- write_json(path, content)¶
Write json file to path on either local fs or GCS.
- Return type:
None
- Parameters:
path (str | PathLike[str])
content (str | dict)
- write_parquet(data, path, schema=None, **kwargs)¶
TODO: Add faster pyarrrow implementations enforcing type based schemas.
- Return type:
None
- Parameters:
data (Table | DataFrame | DataFrame)
path (str | PathLike[str])
schema (Schema | None)
- write_text(path, content, file_format)¶
Write json file to path on either local fs or GCS.
- Return type:
None
- Parameters:
path (str | PathLike[str])
content (str | dict)
file_format (str)