nudb_use.metadata.nudb_config package¶

nudb_use.metadata.nudb_config.find_var_missing module¶

Lookup helpers for NUDB variable metadata defined in config.

find_var(var_name)¶

Retrieve configuration and KLASS metadata for a single variable.

Parameters:: var_name (str) – Variable name (current or historical). Comparison is case-insensitive.
Return type:: VariableMetadata | None

find_vars(var_names)¶

Look up multiple variables and return their configuration metadata.

Parameters:: var_names (Iterable[str]) – Iterable of variable identifiers to resolve.
Returns:: Mapping of requested names to their resolved metadata. Missing entries map to None.
Return type:: dict[str, VariableMetadata | None]

get_list_of_columns_for_dataset(dataset_name)¶

Get a list of str for the columns we expect to be in a given dataset according to the config.

Parameters:: dataset_name (str) – The name of the dataset according to the config.
Returns:: A list of all the column names.
Return type:: list[str]
Raises:: KeyError – If the dataset_name does not exist under datasets in the config settings.

look_up_dtype_length_for_dataset(dataset_name, display_markdown=True)¶

Make a str of the dtypes and length fields in a dataset from a dataset_name.

Parameters:

dataset_name (str) – The name of the dataset according to the config.
display_markdown (bool) – If we should display produced str as markdown (works best in notebooks).

Returns:

A str with line shifts per variable. If display_markdown is True, we return None.

Return type:

str | None

variables_missing_from_config(col_list)¶

Identifies variables that are not defined in the settings.

Parameters:: col_list (Iterable[str]) – An iterable of variable names to check against the settings.
Returns:: Variable names that are not defined in the settings. Returns an empty list if all variables are found.
Return type:: list[str]

nudb_use.metadata.nudb_config.get_variable_info module¶

Helpers for reading NUDB variable metadata from configuration.

get_toml_field(toml, field)¶

Return a field from a TOML object or None if it is missing.

Parameters:

toml (Mapping[str, Any]) – Parsed TOML object.
field (str) – Field name to retrieve.

Returns:

Field value when present, otherwise None.

Return type:

object | None

get_var_metadata(variables=None)¶

Get at pandas dataframe of the variable-data from the config.

Parameters:: variables (list[str] | None) – Variables to return data on, if None returns all variables.
Returns:: The information from the metadata.
Return type:: pd.DataFrame

nudb_use.metadata.nudb_config.map_get_dtypes module¶

Utilities for mapping NUDB variable types to concrete dtype strings.

get_dtype_from_dict(dtype, mapping, datetimes_as_string=False)¶

Resolve a dtype string through a mapping with optional datetime override.

Parameters:

dtype (str) – Logical dtype name from configuration.
mapping (dict[str, str] | dict[Literal['STRING', 'DATETIME', 'INTEGER', 'FLOAT', 'BOOLEAN'], Literal['datetime64[s]', 'string[pyarrow]', 'Int64', 'Float64', 'bool[pyarrow]']]) – Mapping of logical names to engine-specific dtype strings.
datetimes_as_string (bool) – When True, map datetime fields to string equivalents.

Returns:

Engine-specific dtype string.

Return type:

str

Raises:

ValueError – If dtype is not defined in mapping.

get_dtypes(vars_map, engine='pandas', datetimes_as_string=False)¶

Build a dtype mapping for a set of variables based on config metadata.

Parameters:

vars_map (list[str]) – Variable names to map, including historical names.
engine (str) – Mapping preset to use.
datetimes_as_string (bool) – When True, convert datetime variables to string dtypes.

Returns:

Mapping of requested variables to dtype strings. Variables not found in config map to None.

Return type:

dict[str, str | None]

map_dtype_datadoc(dtype, engine='pandas', datetimes_as_string=False)¶

Map a logical dtype using a named engine mapping.

Parameters:

dtype (str) – Logical dtype value from the NUDB config.
engine (str) – Name of the mapping preset to use.
datetimes_as_string (bool) – When True, map datetimes to string dtypes.

Returns:

Concrete dtype string for the target engine.

Return type:

str

Raises:

KeyError – If engine is not defined in DTYPE_MAPPINGS.

nudb_use.metadata.nudb_config.set_options module¶

set_option(setting_name, value)¶

Set an option in the options part of the nudb_config package.

Parameters:

setting_name (str) – The name of the setting to set.
value (Any) – The value we should set the setting to.

Returns:

The changed config settings-object.

Return type:

NudbConfig

nudb_use.metadata.nudb_config.variable_names module¶

Tools for sorting and renaming NUDB variables based on config metadata.

get_cols2drop(data, name=None)¶

Return column names to drop from a dataset based on settings_use.

Parameters:

data (DataFrame) – DataFrame to check.
name (str | None) – Name of the dataset to compare against settings_use.

Returns:

Columns present in the DataFrame but not defined in settings_use.

Return type:

pd.Index

get_cols2keep(data, name=None)¶

Get column names to keep in a dataset based on settings_use.

Parameters:

data (DataFrame) – DataFrame to check.
name (str | None) – Name of the dataset to compare against settings_use.

Returns:

Columns present in the dataset that are defined in settings_use.

Return type:

pd.Index

get_cols_in_config(name)¶

Retrieve column (variable) names from settings_use.

Parameters:: name (str | None) – Name of a dataset. If None, returns all variable names across datasets.
Returns:: List of column or variable names defined in settings_use.
Return type:: list[str]
Raises:: KeyError – If the provided dataset name is not defined in settings_use.

handle_dataset_specific_renames(df, dataset_name)¶

Apply dataset-specific rename overrides defined in configuration.

Parameters:

df (DataFrame) – DataFrame whose columns should be updated in place.
dataset_name (str) – Dataset key used to look up override rules.

Returns:

DataFrame with overrides applied if they exist in the config.

Return type:

pd.DataFrame

sort_cols_after_config_order(data)¶

Sort DataFrame columns according to the config-defined order.

Parameters:: data (DataFrame) – DataFrame whose columns should be reordered.
Returns:: DataFrame with columns reordered per config definition.
Return type:: pd.DataFrame

sort_cols_after_config_order_and_unit(data)¶

Sort columns after order defined in the config and by unit.

Nested function that combined the functionality of the sort_cols_by_unit and sort_cols_after_config_order functions.

Parameters:: data (DataFrame) – Dataframe to sort.
Returns:: Dataframe with columns reordered by the column and unit order defined in the config.
Return type:: pd.DataFrame

sort_cols_by_unit(data)¶

Sort DataFrame columns based on the unit of each variable.

Parameters:: data (DataFrame) – Input DataFrame with columns representing variables to sort.
Returns:: DataFrame with columns reordered by their units’ sort order.
Return type:: pd.DataFrame
Raises:: ValueError – If no config is found for the sort unit.

update_colnames(data, dataset_name='', lowercase=True)¶

Rename columns in a DataFrame based on metadata mappings.

Parameters:

data (DataFrame) – Input DataFrame whose column names should be updated.
dataset_name (str) – Dataset identifier for applying dataset-specific overrides.
lowercase (bool) – Whether to lowercase column names before renaming.

Returns:

Copy of the DataFrame with columns renamed according to metadata.

Return type:

pd.DataFrame

Raises:

KeyError – If the renaming results in duplicate column names.