ssb_utdanning.format package

ssb_utdanning.format.formats module

class UtdFormat(start_dict=None)

Bases: dict[Any, Any]

Custom dictionary class designed to handle specific formatting conventions.

Parameters:

start_dict (dict[str | int, Any] | dict[str, Any] | None)

static check_if_na(key)

Checks if the specified key represents a NA (Not Available) value.

Parameters:

key (str | Any) – Key to be checked for NA value.

Returns:

True if the key represents NA, False otherwise.

Return type:

bool

int_str_confuse(key)

Handles conversion between integer and string keys.

Parameters:

key (str | int | float | NAType | None) – Key to be converted or checked for existence in the dictionary.

Return type:

None | Any

Returns:

The value associated with the key (if found) or None.

look_in_ranges(key)

Looks for the specified key within the stored ranges.

Parameters:

key (str | int | float | NAType | None) – Key to search within the stored ranges.

Return type:

None | str

Returns:

The value associated with the range containing the key, if found; otherwise, None.

set_na_value()

Sets the value for NA (Not Available) keys in the UtdFormat.

Returns:

True if NA value is successfully set, False otherwise.

Return type:

bool

set_other_as_lowercase()

Sets the key ‘other’ to lowercase if mixed cases are found.

Return type:

None

store(format_name, output_path='/ssb/stamme01/utd/utd-felles/formater/', force=False)

Stores the UtdFormat instance in a specified output path.

Parameters:
  • format_name (str) – Name of the format to be stored.

  • output_path (str) – Path where the format will be stored.

  • force (bool) – Flag to force storing even for cached instances.

Raises:

ValueError – If storing a cached UtdFormat might lead to an unexpectedly large number of keys.

Return type:

None

store_ranges()

Stores ranges based on specified keys in the dictionary.

Return type:

None

update_format()

Update method to set special instance attributes.

Return type:

None

get_format(name='', date='latest', filepath='')

Retrieves the format from a json-format-file, dependent on the name (start of filename).

Parameters:
  • name (str) – Name of the format.

  • date (str) – Date string to find the format for. Defaults to “latest”. If a datetime string, the format with the closest date will be returned.

  • filepath (str) – Send in the full path to the format directly, this will ignore the name and date args.

Returns:

The formatted dictionary or defaultdict for the specified format and date. If the format contains a “other” key, a defaultdict will be returned. If the

format contains the SAS-value for missing: “.”, or another recognized “empty-datatype”: Many known keys for empty values, will be inserted in the dict, to hopefully map these correctly.

Return type:

dict or defaultdict

Raises:

ValueError – If no name or filepath is specified.

get_path(name, date='latest')

Retrieves the path for a specific format on a given date.

Parameters:
  • name (str) – Name of the format.

  • date (str) – Date string to find the path for. Defaults to “latest”. If a datetime string, the format with the closest date will be returned.

Returns:

The path associated with the specified format and date, if found; otherwise, None.

Return type:

str

info_stored_formats(select_name='', path_prod='/ssb/stamme01/utd/utd-felles/formater/')

In Prodsone, list all json-format-files in format folder.

Does not look at file content, only what can be extracted from the filesystem. Date is parsed from filename, converting datetime strings to true datetimes as well. Sorts descending by name and date.

Parameters:
  • select_name (str) – Name of the specific format to select information for.

  • path_prod (str) – Path to the directory containing stored format files. Set to a default of “/ssb/stamme01/utd/utd-felles/formater/”

Returns:

Information extracted from the path names.

Return type:

pd.DataFrame

Raises:

OSError – If the specified path_prod directory does not exist.

is_different_from_last_time(format_name, format_content)

Checks if the current format content differs from the last saved version.

Parameters:
  • format_name (str) – The short name of the format (first part of json-filename).

  • format_content (UtdFormat) – Content of the format in dictionary format to be compared against the content stored on disk.

Returns:

True if the current format content is different from the last saved version; otherwise, False.

Return type:

bool

store_format_prod(formats, output_path='/ssb/stamme01/utd/utd-felles/formater/')

Takes a nested or unnested dictionary and saves it to prodsone-folder as a timestamped json.

Parameters:
  • formats (dict[str, dict[str, str]] | dict[str, str]) – Dictionary containing format information. Nested dictionary structure expected if multiple formats are passed. If nested, the first layer of keys should be the format-names. The values of the dictionary are the dict contents of the formats.¨ If unnested, we assume, this is a single format, and we ask for the name using input().

  • output_path (str) – Path to store the format data. Not including the filename itself, only the base folder. Defaults to FORMATS_PATH.

Raises:

NotImplementedError – If the provided formats structure is neither nested nor unnested dictionaries of strings.

Return type:

None

ssb_utdanning.format.sas_format_parsing module

batch_process_folder_sasfiles(sas_files_path, output_path='/ssb/stamme01/utd/utd-felles/formater/')

Finds all .sas files in folder, tries to extract formats from these.

Parameters:
  • sas_files_path (str) – The path to the folder containing the .sas files.

  • output_path (str) – The path to the folder where the formats will be stored. Not including the filename itself, only the base folder.

Return type:

None

parse_sas_script(sas_script_content)

Extract a format as a Python dictionary from a SAS script.

Parameters:

sas_script_content (str) – The content of the SAS script.

Returns:

A nested dictionary containing the format-name as key,

and the format-content as value.

Return type:

dict[str, dict[str, str]]

parse_value_part(value_part)

Parse a single “format value part” of a sas-script.

Parameters:

value_part (str) – The value part to parse.

Returns:

A tuple containing the format-name and the format-content.

Return type:

tuple[str, dict[str, str]]

process_single_sasfile(file, output_path='/ssb/stamme01/utd/utd-felles/formater/')

Get a single .sas file from storage, extracts formats and stores to disk as timestamped jsonfiles.

Parameters:
  • file (str) – The path to the .sas file.

  • output_path (str) – The path to the folder where the formats will be stored. Not including the filename itself, only the base folder.

Raises:

ValueError – If file path sent in is not a .sas file.

Return type:

None