fagfunksjoner.paths package

Submodules

fagfunksjoner.paths.git module

Code that uses things from git-files.

name_from_gitconfig()

Find the username from the git config in the current system.

Returns:

The found Username

Return type:

str

Raises:

FileNotFoundError – if the .gitconfig file is not found by navigating out through the storage.

fagfunksjoner.paths.project_root module

This module lets you easily navigate to the root of your local project files.

One of the main uses will be importing local functions in a notebook based project. As notebooks run from the folder they are opened from, not root, and functions usually will be .py files located in other folders than the notebooks.

class ProjectRoot

Bases: object

Contextmanager to import locally “with”.

As in:

with ProjectRoot():
    from src.functions.local_functions import local_function

So this class navigates back and forth using a single line/”instruction”

static load_toml(config_file)

Looks for a .toml file to load the contents from.

Looks in the current folder, the specified path, the project root.

Parameters:

config_file (str) – The path or filename of the config-file to load.

Returns:

The contents of the toml-file.

Return type:

dict[Any]

find_root()

Finds the root of the project, based on the hidden folder “.git”.

Which you usually should have only in your project root. Changes the current working directory back and forth, but should end up in the original starting directory.

Returns:

The project root folder.

Return type:

Path

Raises:

OSError – If the file specified is not found in the current folder, the specified path, or the project root.

load_toml(config_file)

Look for a .toml file to load the contents from.

Looks in the current folder, the specified path, the project root.

Parameters:

config_file (str) – The path or filename of the config-file to load.

Returns:

The contents of the toml-file

Return type:

dict[Any]

Raises:

OSError – If the file specified is not found in the current folder, the specified path, or the project root.

navigate_root()

Changes the current working directory to the project root.

Saves the folder it start from in the global variable (in this module) START_DIR

Returns:

The starting directory, where you are currently, as a pathlib Path.

Changing the current working directory to root (different than returned) as a side-effect.

Return type:

Path

return_to_work_dir()

Navigate back to the last recorded START_DIR.

Return type:

None

fagfunksjoner.paths.versions module

This module works with filepaths and the versioning convention at SSB.

The main purpose is fileversions according to Statistics Norway standards. The aim is to help versioning up and getting the latest version of paths in use on storage.

The module is not targeted at files that do not follow the naming convention of versions, for example the __DOC.json-files, will not work, because they do not end with “_v1” before the file extension.

construct_file_pattern(filepath, version_denoter='*')

Constructs a file pattern for versioned file paths.

This function generates a file pattern by extracting the base file name and its extension, allowing the version part to be replaced by a specified version denoter (default is ‘*’). If the filepath does not contain an extension, ‘.parquet’ is assumed.

Parameters:
  • filepath (str) – The input file path with a version number.

  • version_denoter (str) – A placeholder for the version number in the pattern (default is ‘*’).

Returns:

The constructed file pattern with the version denoter in place of the actual version.

Return type:

str

get_file_name(filepath)

Extracts the base file name from a given file path, excluding the version number.

This function extracts the file name before the ‘_v’ version indicator and removes any preceding directory path. For example, if the input is ‘path/to/file_v1.parquet’, it will return ‘file’.

Parameters:

filepath (str) – The file path string containing the file name and version information.

Returns:

The base file name without the version number and directory path.

Return type:

str

get_fileversions(filepath)

Retrieves a list of file versions matching a specified pattern.

This function generates a glob pattern based on the provided file path and retrieves all matching versions. It supports both local files and files stored in Google Cloud Storage (GCS). If the filepath points to a cloud location (e.g., starting with ‘gs://’, ‘http’, or ‘ssb-‘), it uses a GCS file system client to find matches; otherwise, it searches for files locally using the glob module.

Parameters:

filepath (str) – The input file path with a version indicator.

Return type:

list[str]

Returns:

A list of file paths matching the version pattern.

get_latest_fileversions(glob_list_path)

Receives a list of filenames with multiple versions and returns the latest versions of the files.

Recommend using glob operation to create the input list. See doc for glob operations: - GCS: https://gcsfs.readthedocs.io/en/latest/api.html#gcsfs.core.GCSFileSystem.glob - Locally: https://docs.python.org/3/library/glob.html

Parameters:

glob_list_path (list[str] | str) – List of strings or single string that represents a filepath. Recommend that the list is created with glob operation.

Returns:

List of strings with unique filepaths and their latest versions.

Return type:

list[str]

Raises:

TypeError – If parameter does not fit with type-narrowing to list of strings.

Example:

import dapla as dp
fs = dp.FileClient.get_gcs_file_system()
all_files = fs.glob("gs://dir/statdata_v*.parquet")
latest_files = get_latest_fileversions(all_files)
get_version_number(filepath)

Extracts the version number from a given file path.

This function parses the file path to retrieve the version number, which should be indicated using ‘_v’ followed by digits before the file extension. For example, a valid file path would be ‘file_v1.parquet’. If the naming convention is not followed, a ValueError is raised.

Parameters:

filepath (str) – The file path string containing the version information.

Returns:

The extracted version number as an integer.

Return type:

int

Raises:

ValueError – If the filepath does not contain ‘_v’ followed by digits.

latest_version_number(filepath)

Function for finding latest version in use for a file.

Parameters:

filepath (str) – GCS filepath or local filepath, should be the full path, but needs to follow the naming standard. eg. ssb-prod-ofi-skatteregn-data-produkt/skatteregn/inndata/skd_data/2023/skd_p2023-01_v1.parquet or /ssb/stammeXX/kortkode/inndata/skd_data/2023/skd_p2023-01_v1.parquet

Returns:

The latest version number for the file.

Return type:

int

latest_version_path(filepath)

Finds the path to the latest version of a specified file.

This function retrieves all versioned files matching the provided file path pattern and identifies the latest version. It supports both Google Cloud Storage (GCS) paths and local file paths, provided they follow the required naming convention with version numbers (e.g., ‘_v1’). If no versions are found, it defaults to returning a pattern representing version 1.

Parameters:

filepath (str) – The full path of the file, either a GCS path or a local path. It should follow the naming standard, including the version indicator.

Returns:

The path to the latest version of the file. If no versions are found, returns

a pattern for version 1 of the file.

Return type:

str

Raises:
  • ValueError – If get_latest_fileversions returns a list of more than one file.

  • ValueError – If the filepath does not follow the naming convention with ‘_v’ followed by digits to denote version, when a versioned file is required.

Examples

  • ‘ssb-prod-ofi-skatteregn-data-produkt/skatteregn/inndata/skd_data/2023/skd_p2023-01_v1.parquet’

  • ‘/ssb/stammeXX/kortkode/inndata/skd_data/2023/skd_p2023-01_v1.parquet’

next_version_number(filepath)

Function for finding next version for a new file.

Parameters:

filepath (str) – GCS filepath or local filepath, should be the full path, but needs to follow the naming standard. eg. ssb-prod-ofi-skatteregn-data-produkt/skatteregn/inndata/skd_data/2023/skd_p2023-01_v1.parquet or /ssb/stammeXX/kortkode/inndata/skd_data/2023/skd_p2023-01_v1.parquet

Returns:

The next version number for the file.

Return type:

int

next_version_path(filepath)

Generates a new file path with an incremented version number.

Constructs a filepath for a new version of a file, based on the latest existing version found in a specified folder. Meaning it skips to “one after the highest version it finds”. It increments the version number by one, to ensure the new file path is unique.

Parameters:

filepath (str) – The address for the file.

Returns:

The new file path with an incremented version number and specified suffix.

Return type:

str

Example:

next_version_path('gs://my-bucket/datasets/data_v1.parquet')
'gs://my-bucket/datasets/data_v2.parquet'

Module contents

Module to manipulate paths at SSB.