dapla_metadata.datasets.utility package¶
dapla_metadata.datasets.utility.constants module¶
Repository for constant values in Datadoc backend.
dapla_metadata.datasets.utility.enums module¶
Enumerations used in Datadoc.
dapla_metadata.datasets.utility.urn module¶
Validate, parse and render URNs.
- class ReferenceUrlTypes(*values)[source]¶
Bases:
EnumThe general category of the URL.
This can be useful to refer to when constructing a URL from a URN for a specific context.
- API = 1¶
- FRONTEND = 2¶
- class SsbNaisDomains(*values)[source]¶
Bases:
str,EnumThe available domains on SSBs Nais instance.
- PROD_EXTERNAL = 'ssb.no'¶
- PROD_INTERNAL = 'intern.ssb.no'¶
- TEST_EXTERNAL = 'test.ssb.no'¶
- TEST_INTERNAL = 'intern.test.ssb.no'¶
- class UrnConverter(urn_base, id_pattern, url_bases)[source]¶
Bases:
objectConverts URLs to URNs and vice versa.
- Parameters:
urn_base (str)
id_pattern (str)
url_bases (list[tuple[ReferenceUrlTypes, str]])
- urn_base¶
The format for the URN, up to the identifier.
- id_pattern¶
A capturing group pattern which matches identifiers for this resource.
- url_bases¶
The list of all the different URL representations for a resource. There will typically be a number of URL representations for a particular resource, depending on which system or technology they are accessed through and other technical factors. This list defines which concrete URLs can be considered equivalent to a URN.
- convert_url_to_urn(url)[source]¶
Convert a URL to a generalized URN for that same resource.
- Parameters:
url (str | AnyUrl) – The URL to convert.
- Returns:
The URN or None if it can’t be converted.
- Return type:
str | None
- get_id(urn_or_url)[source]¶
Get an identifier from a URN or URL.
- Parameters:
urn_or_url (str | AnyUrl) – The URN or URL refering to a particular resource
- Returns:
The identifier for the resource, or None if it cannot be extracted.
- Return type:
str | None
- get_url(identifier, url_type, visibility='public')[source]¶
Build concrete URL to reference a resource.
There are typically multiple URLs used to refer to one resource, this method attempts to support known variations.
- Parameters:
identifier (str) – The identifier of the resource the URL refers to.
url_type (ReferenceUrlTypes) – The representation type of the URL
visibility (UrlVisibility, optional) – Whether the URL should be that which is publicly available or not. Defaults to “public”.
- Returns:
The concrete URL. None if we cannot satisfy the supplied requirements.
- Return type:
str | None
- get_urn(identifier)[source]¶
Build a URN for the given identifier.
- Return type:
str- Parameters:
identifier (str)
-
id_pattern:
str¶
- is_id(value)[source]¶
Check if the value is an identifier for this URN type.
- Parameters:
value (str) – The value to check.
- Return type:
bool
-
url_bases:
list[tuple[ReferenceUrlTypes,str]]¶
-
urn_base:
str¶
- convert_uris_to_urns(variables, field_name, converters)[source]¶
Where URIs are recognized URLs, convert them to URNs.
Where the value is not a known URL we preserve the value as it is and log an ERROR level message.
- Parameters:
variables (VariableListType) – The list of variables.
field_name (str) – The name of the field which has URLs to convert to URNs
converters (Iterable[UrnConverter]) – One or more converters which implement conversion of URLs into one specific URN format. These will typically be specific to an individual metadata reference system.
- Return type:
None
dapla_metadata.datasets.utility.utils module¶
- calculate_percentage(completed, total)[source]¶
Calculate percentage as a rounded integer.
- Parameters:
completed (
int) – The number of completed items.total (
int) – The total number of items.
- Return type:
int- Returns:
The rounded percentage of completed items out of the total.
- derive_assessment_from_state(state)[source]¶
Derive assessment from dataset state.
- Parameters:
state (
DataSetState) – The state of the dataset.- Return type:
Assessment- Returns:
The derived assessment of the dataset.
- get_missing_obligatory_dataset_fields(dataset)[source]¶
Identify all obligatory dataset fields that are missing values.
This function checks for obligatory fields that are either directly missing (i.e., set to None) or have multilanguage values with empty content.
- Parameters:
dataset (
Union[Dataset,Dataset]) – The dataset object to examine. This object must support the model_dump() method which returns a dictionary of field names and values.- Returns:
Fields that are directly None and are listed as obligatory metadata.
- Multilanguage fields (listed as obligatory metadata`) where
the value exists but the primary language text is empty.
- Return type:
A list of field names (as strings) that are missing values. This includes
- get_missing_obligatory_variables_fields(variables)[source]¶
Identify obligatory variable fields that are missing values for each variable.
This function checks for obligatory fields that are either directly missing (i.e., set to None) or have multilanguage values with empty content.
- Parameters:
variables (
list) – A list of variable objects to check for missing obligatory fields.- Return type:
list[dict]- Returns:
A list of dictionaries with variable short names as keys and list of missing obligatory variable fields as values. This includes: - Fields that are directly None and are llisted as obligatory metadata. - Multilanguage fields (listed as obligatory metadata) where the value
exists but the primary language text is empty.
- get_missing_obligatory_variables_pseudo_fields(variables)[source]¶
Identify obligatory variable pseudonymization fields that are missing values for each variable.
This function checks for obligatory fields that are directly missing (i.e., set to None).
- Parameters:
variables (
list[Variable]) – A list of variable objects to check for missing obligatory pseudonymization fields.- Return type:
list[dict]- Returns:
A list of dictionaries with variable short names as keys and list of missing obligatory variable pseudonymization fields as values. This includes: - Fields that are directly None and are listed as obligatory metadata.
- incorrect_date_order(date_from, date_until)[source]¶
Evaluate the chronological order of two dates.
This function checks if ‘date until’ is earlier than ‘date from’. If so, it indicates an incorrect date order.
- Parameters:
date_from (
Optional[date]) – The start date of the time period.date_until (
Optional[date]) – The end date of the time period.
- Return type:
bool- Returns:
True if ‘date_until’ is earlier than ‘date_from’ or if only ‘date_from’ is None, False otherwise.
Example
>>> incorrect_date_order(datetime.date(1980, 1, 1), datetime.date(1967, 1, 1)) True
>>> incorrect_date_order(datetime.date(1967, 1, 1), datetime.date(1980, 1, 1)) False
>>> incorrect_date_order(None, datetime.date(2024,7,1)) True
- num_obligatory_dataset_fields_completed(dataset)[source]¶
Count the number of completed obligatory dataset fields.
This function returns the total count of obligatory fields in the dataset that have values (are not None).
- Parameters:
dataset (
Union[Dataset,Dataset]) – The dataset object for which to count the fields.- Return type:
int- Returns:
The number of obligatory dataset fields that have been completed (not None).
- num_obligatory_pseudo_fields_missing(variables)[source]¶
Counts the number of obligatory pseudonymization fields are missing.
- Parameters:
variables (
list[Variable]) – The variables to count obligatory fields for.- Return type:
int- Returns:
The number of obligatory pseudonymization fields that are missing.
- num_obligatory_variable_fields_completed(variable)[source]¶
Count the number of obligatory fields completed for one variable.
This function calculates the total number of obligatory fields that have values (are not None) for one variable in the list.
- Parameters:
variable (
Variable) – The variable to count obligatory fields for.- Return type:
int- Returns:
The total number of obligatory variable fields that have been completed (not None) for one variable.
- num_obligatory_variables_fields_completed(variables)[source]¶
Count the number of obligatory fields completed for all variables.
This function calculates the total number of obligatory fields that have values (are not None) for one variable in the list.
- Parameters:
variables (
list) – A list with variable objects.- Return type:
int- Returns:
The total number of obligatory variable fields that have been completed (not None) for all variables.
- set_dataset_owner(dataset)[source]¶
Sets the owner of the dataset from the DAPLA_GROUP_CONTEXT enviornment variable.
- Parameters:
dataset (
Union[Dataset,Dataset]) – The dataset object to set default values on.- Return type:
None
- set_default_values_dataset(dataset)[source]¶
Set default values on dataset.
- Parameters:
dataset (
Union[Dataset,Dataset]) – The dataset object to set default values on.- Return type:
None
Example
>>> dataset = all_optional_model.Dataset(id=None) >>> set_default_values_dataset(dataset) >>> dataset.id is not None True
- set_default_values_pseudonymization(variable, pseudonymization)[source]¶
Populate pseudonymization fields with defaults based on the encryption algorithm.
Updates the encryption key reference and encryption parameters if they are not set, handling both PAPIS and DAED algorithms. Leaves unknown algorithms unchanged.
- Return type:
None- Parameters:
variable (Variable | Variable)
pseudonymization (Pseudonymization | Pseudonymization | None)
- set_default_values_variables(variables)[source]¶
Set default values on variables.
- Parameters:
variables (
Union[list[Variable],list[Variable]]) – A list of variable objects to set default values on.- Return type:
None
Example
>>> variables = [all_optional_model.Variable(short_name="pers",id=None, is_personal_data = None), all_optional_model.Variable(short_name="fnr",id='9662875c-c245-41de-b667-12ad2091a1ee', is_personal_data=True)] >>> set_default_values_variables(variables) >>> isinstance(variables[0].id, uuid.UUID) True
>>> variables[1].is_personal_data == True True
>>> variables[0].is_personal_data == False True
- set_variables_inherit_from_dataset(dataset, variables)[source]¶
Set specific dataset values on a list of variable objects.
This function populates ‘data source’, ‘temporality type’, ‘contains data from’, and ‘contains data until’ fields in each variable if they are not set (None). The values are inherited from the corresponding fields in the dataset.
- Parameters:
dataset (
Union[Dataset,Dataset]) – The dataset object from which to inherit values.variables (
list) – A list of variable objects to update with dataset values.
- Return type:
None
Example
>>> dataset = all_optional_model.Dataset(short_name='person_data_v1', id='9662875c-c245-41de-b667-12ad2091a1ee', contains_data_from="2010-09-05", contains_data_until="2022-09-05") >>> variables = [all_optional_model.Variable(short_name="pers", data_source=None, temporality_type=None, contains_data_from=None, contains_data_until=None)] >>> set_variables_inherit_from_dataset(dataset, variables)
>>> variables[0].contains_data_from == dataset.contains_data_from True
>>> variables[0].contains_data_until == dataset.contains_data_until True