ssb_timeseries.config

Configurations for the SSB timeseries library.

An environment variable TIMESERIES_CONFIG is expected to point to a JSON file with configurations. If these exist, they will be loaded and put into a Config object CONFIG when the configuration module is loaded.

In most cases, this would happen behind the scene when ssb_timeseries.dataset or ssb_timeseries.catalog are imported.

Directly accessing the configuration module should only be required when manipulating configurations from Python code.

Example

>>> 
>>> from ssb_timeseries.config import CONFIG
>>> CONFIG.catalog = 'gs://{bucket}/timeseries/metadata/'
>>> CONFIG.save()
>>> 

For switching between preset configurations, use the timeseries-config command:

poetry run timeseries-config <option>

which is equivalent to:

python ./config.py <option>

See ssb_timeseries.config.main() for details on the named options.

CONFIG = <ssb_timeseries.config.Config object>

A Config object.

class Config(**kwargs)

Bases: object

Configuration class; for reading and writing timeseries configurations.

If instantiated with no parameters, an existing configuration file is exepected to exist: either in a location specified by the environment variable TIMESERIES_CONFIG or in the default location in the user’s home directory. If not, an error is returned.

If the configuration_file attribute is specified, configurations will be loaded from that file. No other parameters are required. A FileNotFoundError or FileDoesNotExist error will be returned if the file is not found. In this case, no attempt is made to load configurations from locations specified by environment variable or defaults.

If any additional parameters are provided, they will override values from the configuration file. If the result is not a valid configuration, a ValidationError is raised.

If one or more parameters are provided, but the configuration_file parameter is not among them, configurations are identified by the environment variable TIMESERIES_CONFIG or the default configuration file location (in that order of priority). Provided parameters override values from the configuration file. If the result is not a valid configuration, an error is raised.

The returned configuration will not be saved, but held in memory only till the save() method is called. Then the configuration will be savedto a file and the environment variable TIMESERIES_CONFIG set to reflect the location of the file.

__eq__(other)

Equality test.

Return type:

bool

Parameters:

other (Self | dict)

__getitem__(item)

Get the value of a configuration.

Return type:

Any | None

Parameters:

item (str)

__init__(**kwargs)

Initialize Config object from keyword arguments.

Keyword Arguments:
  • preset (str) – Optional. Name of a preset configuration. If provided, the preset configuration is loaded, and no other parameters are considered.

  • configuration_file (str) – Path to the configuration file. If the parameter is not provided, the environment variable TIMESERIES_CONFIG is used. If the environment variable is not set, the default configuration file location is used.

  • repositories (list[FileBasedRepository]) – New in version 0.5.0. Replaces bucket, timeseries_root and catalog.

  • timeseries_root (str) – Path to the root directory for time series data. If one of these identifies a vaild json file, the configuration is loaded from that file and no other parameters are required. If provided, they will override values from the configuration file.

  • catalog (str) – Path to the catalog file.

  • log_file (str) – Path to the log file.

  • bucket (str) – Name of the GCS bucket.

  • ignore_file (bool)

Raises:
  • FileNotFoundError – If the configuration file as implied by provided or not provided parameters does not exist. # noqa: DAR402

  • ValidationError – If the resulting configuration is not valid. # noqa: DAR402

  • EnvVarNotDefinedeError – If the environment variable TIMESERIES_CONFIG is not defined.

Return type:

None

Examples

To load an existing preset configuration:

>>> from ssb_timeseries.config import Config
>>> config = Config(preset='daplalab')
__str__()

Return timeseries configurations as JSON string.

Return type:

str

classmethod active()

Force reload the file identified by ENV_VAR_NAME and return the configuration.

Return type:

Self

apply(configuration)

Set configuration values from a dictionary.

Return type:

None

Parameters:

configuration (dict)

configuration_file: str | PathLike[str]

The path to the configuRation file.

property is_valid: bool

Check if the configuration has all required fields.

property log_file: str

Get file name from logging configuration, if a file based log handler is defined.

logging: dict

Logging configuration as a valid logging.dictConfig.

repositories: list[FileBasedRepository]

A list of time series repositories.

save(path='')

Saves configurations to the JSON file defined by path or configuration_file.

If path is set, it will take presence and configuration_file will be set accordingly.

Parameters:

path (PathStr) – Full path of the JSON file to save to. If not specified, it will attempt to use the environment variable TIMESERIES_CONFIG before falling back to the default location $HOME/.config/ssb_timeseries/timeseries_config.json.

Raises:

ValueError – If path is not provided and configuration_file is not set.

Return type:

None

class ConfigDict

Bases: TypedDict

Required attributes for configuration.

configuration_file: Required[str]
log_file: NotRequired[str]
logging: Required[dict[str, Any]]
repositories: Required[dict[str, FileBasedRepository]]
DAPLA_BUCKET = 'gs://<teamname>-'

//{DAPLA_TEAM}-{DAPLA_ENV}.

Type:

Returns the Dapla product bucket name for the current environment

Type:

gs

DAPLA_ENV = ''

‘prod’ | test | dev

Type:

Returns the Dapla environment

DAPLA_TEAM = '<teamname>'

Returns the Dapla team/project name.’

class DictObject(dict_)

Bases: object

Helper class to convert dict to object.

Parameters:

dict_ (dict)

classmethod from_dict(d)
Parameters:

d (dict)

class FileBasedRepository

Bases: TypedDict

Defines required attributes for file based repositories.

catalog: str

Directory for meta data files.

Can be equal to the data directory, a subdirectory, or any other location. Multiple repositories can share a single catalog directory. TODO: consider optionality: Set equal to data root directory if not provided.

directory: Required[str]

Root directory for data storage; contains one directory per data type and (optionally) logs and metadata.

name: Required[str]
exception MissingEnvironmentVariableError

Bases: Exception

The environment variable TIMESEREIS_CONFIG must be defined.

exception ValidationError

Bases: Exception

Configuration validation error.

active_file(path='')

If a path is provided, sets environment variable ENV_VAR_NAME to specify the location of the configuration file.

Returns the value of the environment variable.

Return type:

str

Parameters:

path (str | PathLike[str])

configuration_schema(version='0.3.1')

Return the JSON schema for the configuration file.

Return type:

dict

Parameters:

version (str)

convert_schema_v1_to_v2(config)

Till we are done.

Return type:

dict

Parameters:

config (dict)

is_valid_config(configuration)

Check if a dictionary is a valid configuration ConfigDict.

Return type:

tuple[bool, object]

Parameters:

configuration (dict)

load_json_file(path, error_on_missing=False)

Read configurations from a JSON file into a Config object.

Return type:

dict

Parameters:
  • path (str | PathLike[str])

  • error_on_missing (bool)

main(*args)

Set configurations to predefined defaults when run from command line.

Use:

` poetry run timeseries-config <option> `

or

` python ./config.py <option>` `

Parameters:

*args (str) – ‘home’ | ‘gcs’ | ‘daplalab’.

Raises:

ValueError – If args is not ‘home’ | ‘gcs’ | ‘daplalab’. # noqa: DAR402

Return type:

None

path_str(*args)

Concatenate paths as string: str(Path(…)).

Return type:

str

presets(named_config)

Set configurations to predefined defaults.

Raises:

ValueError – If args is not ‘home’ | ‘gcs’ | ‘daplalab’.

Return type:

dict | ConfigDict

Parameters:

named_config (str)

unset_env_var()

Unsets the environment variable ENV_VAR_NAME and returns the value that was unset.

Return type:

str