dapla_metadata.datasets.utility package

dapla_metadata.datasets.utility.constants module

Repository for constant values in Datadoc backend.

dapla_metadata.datasets.utility.enums module

Enumerations used in Datadoc.

class EncryptionAlgorithm(*values)[source]

Bases: str, Enum

Encryption algorithm values for pseudonymization algoprithms offered on Dapla.

DAEAD_ENCRYPTION_ALGORITHM = 'TINK-DAEAD'
PAPIS_ENCRYPTION_ALGORITHM = 'TINK-FPE'
class SupportedLanguages(*values)[source]

Bases: str, Enum

The list of languages metadata may be recorded in.

Reference: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

ENGLISH = 'en'
NORSK_BOKMÅL = 'nb'
NORSK_NYNORSK = 'nn'

dapla_metadata.datasets.utility.urn module

Validate, parse and render URNs.

class ReferenceUrlTypes(*values)[source]

Bases: Enum

The general category of the URL.

This can be useful to refer to when constructing a URL from a URN for a specific context.

API = 1
FRONTEND = 2
class SsbNaisDomains(*values)[source]

Bases: str, Enum

The available domains on SSBs Nais instance.

PROD_EXTERNAL = 'ssb.no'
PROD_INTERNAL = 'intern.ssb.no'
TEST_EXTERNAL = 'test.ssb.no'
TEST_INTERNAL = 'intern.test.ssb.no'
class UrnConverter(urn_base, id_pattern, url_bases)[source]

Bases: object

Converts URLs to URNs and vice versa.

Parameters:
urn_base

The format for the URN, up to the identifier.

id_pattern

A capturing group pattern which matches identifiers for this resource.

url_bases

The list of all the different URL representations for a resource. There will typically be a number of URL representations for a particular resource, depending on which system or technology they are accessed through and other technical factors. This list defines which concrete URLs can be considered equivalent to a URN.

convert_url_to_urn(url)[source]

Convert a URL to a generalized URN for that same resource.

Parameters:

url (str | AnyUrl) – The URL to convert.

Returns:

The URN or None if it can’t be converted.

Return type:

str | None

get_id(urn_or_url)[source]

Get an identifier from a URN or URL.

Parameters:

urn_or_url (str | AnyUrl) – The URN or URL refering to a particular resource

Returns:

The identifier for the resource, or None if it cannot be extracted.

Return type:

str | None

get_url(identifier, url_type, visibility='public')[source]

Build concrete URL to reference a resource.

There are typically multiple URLs used to refer to one resource, this method attempts to support known variations.

Parameters:
  • identifier (str) – The identifier of the resource the URL refers to.

  • url_type (ReferenceUrlTypes) – The representation type of the URL

  • visibility (UrlVisibility, optional) – Whether the URL should be that which is publicly available or not. Defaults to “public”.

Returns:

The concrete URL. None if we cannot satisfy the supplied requirements.

Return type:

str | None

get_urn(identifier)[source]

Build a URN for the given identifier.

Return type:

str

Parameters:

identifier (str)

id_pattern: str
is_id(value)[source]

Check if the value is an identifier for this URN type.

Parameters:

value (str) – The value to check.

Return type:

bool

url_bases: list[tuple[ReferenceUrlTypes, str]]
urn_base: str
convert_uris_to_urns(variables, field_name, converters)[source]

Where URIs are recognized URLs, convert them to URNs.

Where the value is not a known URL we preserve the value as it is and log an ERROR level message.

Parameters:
  • variables (VariableListType) – The list of variables.

  • field_name (str) – The name of the field which has URLs to convert to URNs

  • converters (Iterable[UrnConverter]) – One or more converters which implement conversion of URLs into one specific URN format. These will typically be specific to an individual metadata reference system.

Return type:

None

dapla_metadata.datasets.utility.utils module

calculate_percentage(completed, total)[source]

Calculate percentage as a rounded integer.

Parameters:
  • completed (int) – The number of completed items.

  • total (int) – The total number of items.

Return type:

int

Returns:

The rounded percentage of completed items out of the total.

derive_assessment_from_state(state)[source]

Derive assessment from dataset state.

Parameters:

state (DataSetState) – The state of the dataset.

Return type:

Assessment

Returns:

The derived assessment of the dataset.

get_current_date()[source]

Return a current date as str.

Return type:

str

get_missing_obligatory_dataset_fields(dataset)[source]

Identify all obligatory dataset fields that are missing values.

This function checks for obligatory fields that are either directly missing (i.e., set to None) or have multilanguage values with empty content.

Parameters:

dataset (Union[Dataset, Dataset]) – The dataset object to examine. This object must support the model_dump() method which returns a dictionary of field names and values.

Returns:

  • Fields that are directly None and are listed as obligatory metadata.

  • Multilanguage fields (listed as obligatory metadata`) where

    the value exists but the primary language text is empty.

Return type:

A list of field names (as strings) that are missing values. This includes

get_missing_obligatory_variables_fields(variables)[source]

Identify obligatory variable fields that are missing values for each variable.

This function checks for obligatory fields that are either directly missing (i.e., set to None) or have multilanguage values with empty content.

Parameters:

variables (list) – A list of variable objects to check for missing obligatory fields.

Return type:

list[dict]

Returns:

A list of dictionaries with variable short names as keys and list of missing obligatory variable fields as values. This includes: - Fields that are directly None and are llisted as obligatory metadata. - Multilanguage fields (listed as obligatory metadata) where the value

exists but the primary language text is empty.

get_missing_obligatory_variables_pseudo_fields(variables)[source]

Identify obligatory variable pseudonymization fields that are missing values for each variable.

This function checks for obligatory fields that are directly missing (i.e., set to None).

Parameters:

variables (list[Variable]) – A list of variable objects to check for missing obligatory pseudonymization fields.

Return type:

list[dict]

Returns:

A list of dictionaries with variable short names as keys and list of missing obligatory variable pseudonymization fields as values. This includes: - Fields that are directly None and are listed as obligatory metadata.

get_timestamp_now()[source]

Return a timestamp for the current moment.

Return type:

datetime

incorrect_date_order(date_from, date_until)[source]

Evaluate the chronological order of two dates.

This function checks if ‘date until’ is earlier than ‘date from’. If so, it indicates an incorrect date order.

Parameters:
  • date_from (Optional[date]) – The start date of the time period.

  • date_until (Optional[date]) – The end date of the time period.

Return type:

bool

Returns:

True if ‘date_until’ is earlier than ‘date_from’ or if only ‘date_from’ is None, False otherwise.

Example

>>> incorrect_date_order(datetime.date(1980, 1, 1), datetime.date(1967, 1, 1))
True
>>> incorrect_date_order(datetime.date(1967, 1, 1), datetime.date(1980, 1, 1))
False
>>> incorrect_date_order(None, datetime.date(2024,7,1))
True
num_obligatory_dataset_fields_completed(dataset)[source]

Count the number of completed obligatory dataset fields.

This function returns the total count of obligatory fields in the dataset that have values (are not None).

Parameters:

dataset (Union[Dataset, Dataset]) – The dataset object for which to count the fields.

Return type:

int

Returns:

The number of obligatory dataset fields that have been completed (not None).

num_obligatory_pseudo_fields_missing(variables)[source]

Counts the number of obligatory pseudonymization fields are missing.

Parameters:

variables (list[Variable]) – The variables to count obligatory fields for.

Return type:

int

Returns:

The number of obligatory pseudonymization fields that are missing.

num_obligatory_variable_fields_completed(variable)[source]

Count the number of obligatory fields completed for one variable.

This function calculates the total number of obligatory fields that have values (are not None) for one variable in the list.

Parameters:

variable (Variable) – The variable to count obligatory fields for.

Return type:

int

Returns:

The total number of obligatory variable fields that have been completed (not None) for one variable.

num_obligatory_variables_fields_completed(variables)[source]

Count the number of obligatory fields completed for all variables.

This function calculates the total number of obligatory fields that have values (are not None) for one variable in the list.

Parameters:

variables (list) – A list with variable objects.

Return type:

int

Returns:

The total number of obligatory variable fields that have been completed (not None) for all variables.

running_in_notebook()[source]

Return True if running in Jupyter Notebook.

Return type:

bool

set_dataset_owner(dataset)[source]

Sets the owner of the dataset from the DAPLA_GROUP_CONTEXT enviornment variable.

Parameters:

dataset (Union[Dataset, Dataset]) – The dataset object to set default values on.

Return type:

None

set_default_values_dataset(dataset)[source]

Set default values on dataset.

Parameters:

dataset (Union[Dataset, Dataset]) – The dataset object to set default values on.

Return type:

None

Example

>>> dataset = all_optional_model.Dataset(id=None)
>>> set_default_values_dataset(dataset)
>>> dataset.id is not None
True
set_default_values_pseudonymization(variable, pseudonymization)[source]

Populate pseudonymization fields with defaults based on the encryption algorithm.

Updates the encryption key reference and encryption parameters if they are not set, handling both PAPIS and DAED algorithms. Leaves unknown algorithms unchanged.

Return type:

None

Parameters:
  • variable (Variable | Variable)

  • pseudonymization (Pseudonymization | Pseudonymization | None)

set_default_values_variables(variables)[source]

Set default values on variables.

Parameters:

variables (Union[list[Variable], list[Variable]]) – A list of variable objects to set default values on.

Return type:

None

Example

>>> variables = [all_optional_model.Variable(short_name="pers",id=None, is_personal_data = None), all_optional_model.Variable(short_name="fnr",id='9662875c-c245-41de-b667-12ad2091a1ee', is_personal_data=True)]
>>> set_default_values_variables(variables)
>>> isinstance(variables[0].id, uuid.UUID)
True
>>> variables[1].is_personal_data == True
True
>>> variables[0].is_personal_data == False
True
set_variables_inherit_from_dataset(dataset, variables)[source]

Set specific dataset values on a list of variable objects.

This function populates ‘data source’, ‘temporality type’, ‘contains data from’, and ‘contains data until’ fields in each variable if they are not set (None). The values are inherited from the corresponding fields in the dataset.

Parameters:
  • dataset (Union[Dataset, Dataset]) – The dataset object from which to inherit values.

  • variables (list) – A list of variable objects to update with dataset values.

Return type:

None

Example

>>> dataset = all_optional_model.Dataset(short_name='person_data_v1', id='9662875c-c245-41de-b667-12ad2091a1ee', contains_data_from="2010-09-05", contains_data_until="2022-09-05")
>>> variables = [all_optional_model.Variable(short_name="pers", data_source=None, temporality_type=None, contains_data_from=None, contains_data_until=None)]
>>> set_variables_inherit_from_dataset(dataset, variables)
>>> variables[0].contains_data_from == dataset.contains_data_from
True
>>> variables[0].contains_data_until == dataset.contains_data_until
True