nudb_use.metadata.nudb_config package

nudb_use.metadata.nudb_config.find_var_missing module

Lookup helpers for NUDB variable metadata defined in config.

find_var(var_name)

Retrieve configuration and KLASS metadata for a single variable.

Parameters:

var_name (str) – Variable name (current or historical). Comparison is case-insensitive.

Return type:

VariableMetadata | None

find_vars(var_names)

Look up multiple variables and return their configuration metadata.

Parameters:

var_names (Iterable[str]) – Iterable of variable identifiers to resolve.

Returns:

Mapping of requested names to their resolved metadata. Missing entries map to None.

Return type:

dict[str, VariableMetadata | None]

get_list_of_columns_for_dataset(dataset_name)

Get a list of str for the columns we expect to be in a given dataset according to the config.

Parameters:

dataset_name (str) – The name of the dataset according to the config.

Returns:

A list of all the column names.

Return type:

list[str]

Raises:

KeyError – If the dataset_name does not exist under datasets in the config settings.

look_up_dtype_length_for_dataset(dataset_name, display_markdown=True)

Make a str of the dtypes and length fields in a dataset from a dataset_name.

Parameters:
  • dataset_name (str) – The name of the dataset according to the config.

  • display_markdown (bool) – If we should display produced str as markdown (works best in notebooks).

Returns:

A str with line shifts per variable. If display_markdown is True, we return None.

Return type:

str | None

variables_missing_from_config(col_list)

Identifies variables that are not defined in the settings.

Parameters:

col_list (Iterable[str]) – An iterable of variable names to check against the settings.

Returns:

Variable names that are not defined in the settings. Returns an empty list if all variables are found.

Return type:

list[str]

nudb_use.metadata.nudb_config.get_variable_info module

Helpers for reading NUDB variable metadata from configuration.

get_toml_field(toml, field)

Return a field from a TOML object or None if it is missing.

Parameters:
  • toml (Mapping[str, Any]) – Parsed TOML object.

  • field (str) – Field name to retrieve.

Returns:

Field value when present, otherwise None.

Return type:

object | None

get_var_metadata(variables=None)

Get at pandas dataframe of the variable-data from the config.

Parameters:

variables (list[str] | None) – Variables to return data on, if None returns all variables.

Returns:

The information from the metadata.

Return type:

pd.DataFrame

nudb_use.metadata.nudb_config.map_get_dtypes module

Utilities for mapping NUDB variable types to concrete dtype strings.

get_dtype_from_dict(dtype, mapping, datetimes_as_string=False)

Resolve a dtype string through a mapping with optional datetime override.

Parameters:
  • dtype (str) – Logical dtype name from configuration.

  • mapping (dict[str, str] | dict[Literal['STRING', 'DATETIME', 'INTEGER', 'FLOAT', 'BOOLEAN'], Literal['datetime64[s]', 'string[pyarrow]', 'Int64', 'Float64', 'bool[pyarrow]']]) – Mapping of logical names to engine-specific dtype strings.

  • datetimes_as_string (bool) – When True, map datetime fields to string equivalents.

Returns:

Engine-specific dtype string.

Return type:

str

Raises:

ValueError – If dtype is not defined in mapping.

get_dtypes(vars_map, engine='pandas', datetimes_as_string=False)

Build a dtype mapping for a set of variables based on config metadata.

Parameters:
  • vars_map (list[str]) – Variable names to map, including historical names.

  • engine (str) – Mapping preset to use.

  • datetimes_as_string (bool) – When True, convert datetime variables to string dtypes.

Returns:

Mapping of requested variables to dtype strings. Variables not found in config map to None.

Return type:

dict[str, str | None]

map_dtype_datadoc(dtype, engine='pandas', datetimes_as_string=False)

Map a logical dtype using a named engine mapping.

Parameters:
  • dtype (str) – Logical dtype value from the NUDB config.

  • engine (str) – Name of the mapping preset to use.

  • datetimes_as_string (bool) – When True, map datetimes to string dtypes.

Returns:

Concrete dtype string for the target engine.

Return type:

str

Raises:

KeyError – If engine is not defined in DTYPE_MAPPINGS.

nudb_use.metadata.nudb_config.set_options module

set_option(setting_name, value)

Set an option in the options part of the nudb_config package.

Parameters:
  • setting_name (str) – The name of the setting to set.

  • value (Any) – The value we should set the setting to.

Returns:

The changed config settings-object.

Return type:

NudbConfig

nudb_use.metadata.nudb_config.variable_names module

Tools for sorting and renaming NUDB variables based on config metadata.

get_cols2drop(data, name=None)

Return column names to drop from a dataset based on settings_use.

Parameters:
  • data (DataFrame) – DataFrame to check.

  • name (str | None) – Name of the dataset to compare against settings_use.

Returns:

Columns present in the DataFrame but not defined in settings_use.

Return type:

pd.Index

get_cols2keep(data, name=None)

Get column names to keep in a dataset based on settings_use.

Parameters:
  • data (DataFrame) – DataFrame to check.

  • name (str | None) – Name of the dataset to compare against settings_use.

Returns:

Columns present in the dataset that are defined in settings_use.

Return type:

pd.Index

get_cols_in_config(name)

Retrieve column (variable) names from settings_use.

Parameters:

name (str | None) – Name of a dataset. If None, returns all variable names across datasets.

Returns:

List of column or variable names defined in settings_use.

Return type:

list[str]

Raises:

KeyError – If the provided dataset name is not defined in settings_use.

handle_dataset_specific_renames(df, dataset_name)

Apply dataset-specific rename overrides defined in configuration.

Parameters:
  • df (DataFrame) – DataFrame whose columns should be updated in place.

  • dataset_name (str) – Dataset key used to look up override rules.

Returns:

DataFrame with overrides applied if they exist in the config.

Return type:

pd.DataFrame

sort_cols_after_config_order(data)

Sort DataFrame columns according to the config-defined order.

Parameters:

data (DataFrame) – DataFrame whose columns should be reordered.

Returns:

DataFrame with columns reordered per config definition.

Return type:

pd.DataFrame

sort_cols_after_config_order_and_unit(data)

Sort columns after order defined in the config and by unit.

Nested function that combined the functionality of the sort_cols_by_unit and sort_cols_after_config_order functions.

Parameters:

data (DataFrame) – Dataframe to sort.

Returns:

Dataframe with columns reordered by the column and unit order defined in the config.

Return type:

pd.DataFrame

sort_cols_by_unit(data)

Sort DataFrame columns based on the unit of each variable.

Parameters:

data (DataFrame) – Input DataFrame with columns representing variables to sort.

Returns:

DataFrame with columns reordered by their units’ sort order.

Return type:

pd.DataFrame

Raises:

ValueError – If no config is found for the sort unit.

update_colnames(data, dataset_name='', lowercase=True)

Rename columns in a DataFrame based on metadata mappings.

Parameters:
  • data (DataFrame) – Input DataFrame whose column names should be updated.

  • dataset_name (str) – Dataset identifier for applying dataset-specific overrides.

  • lowercase (bool) – Whether to lowercase column names before renaming.

Returns:

Copy of the DataFrame with columns renamed according to metadata.

Return type:

pd.DataFrame

Raises:

KeyError – If the renaming results in duplicate column names.