nudb_use.metadata.nudb_config package¶
nudb_use.metadata.nudb_config.find_var_missing module¶
Lookup helpers for NUDB variable metadata defined in config.
- find_var(var_name)¶
Retrieve configuration and KLASS metadata for a single variable.
- Parameters:
var_name (
str) – Variable name (current or historical). Comparison is case-insensitive.- Return type:
VariableMetadata | None
- find_vars(var_names)¶
Look up multiple variables and return their configuration metadata.
- Parameters:
var_names (
Iterable[str]) – Iterable of variable identifiers to resolve.- Returns:
Mapping of requested names to their resolved metadata. Missing entries map to None.
- Return type:
dict[str, VariableMetadata | None]
- get_list_of_columns_for_dataset(dataset_name)¶
Get a list of str for the columns we expect to be in a given dataset according to the config.
- Parameters:
dataset_name (
str) – The name of the dataset according to the config.- Returns:
A list of all the column names.
- Return type:
list[str]
- Raises:
KeyError – If the dataset_name does not exist under datasets in the config settings.
- look_up_dtype_length_for_dataset(dataset_name, display_markdown=True)¶
Make a str of the dtypes and length fields in a dataset from a dataset_name.
- Parameters:
dataset_name (
str) – The name of the dataset according to the config.display_markdown (
bool) – If we should display produced str as markdown (works best in notebooks).
- Returns:
A str with line shifts per variable. If display_markdown is True, we return None.
- Return type:
str | None
- variables_missing_from_config(col_list)¶
Identifies variables that are not defined in the settings.
- Parameters:
col_list (
Iterable[str]) – An iterable of variable names to check against the settings.- Returns:
Variable names that are not defined in the settings. Returns an empty list if all variables are found.
- Return type:
list[str]
nudb_use.metadata.nudb_config.get_variable_info module¶
Helpers for reading NUDB variable metadata from configuration.
- get_toml_field(toml, field)¶
Return a field from a TOML object or None if it is missing.
- Parameters:
toml (
Mapping[str,Any]) – Parsed TOML object.field (
str) – Field name to retrieve.
- Returns:
Field value when present, otherwise None.
- Return type:
object | None
- get_var_metadata(variables=None)¶
Get at pandas dataframe of the variable-data from the config.
- Parameters:
variables (
list[str] |None) – Variables to return data on, if None returns all variables.- Returns:
The information from the metadata.
- Return type:
pd.DataFrame
nudb_use.metadata.nudb_config.map_get_dtypes module¶
Utilities for mapping NUDB variable types to concrete dtype strings.
- get_dtype_from_dict(dtype, mapping, datetimes_as_string=False)¶
Resolve a dtype string through a mapping with optional datetime override.
- Parameters:
dtype (
str) – Logical dtype name from configuration.mapping (
dict[str,str] |dict[Literal['STRING','DATETIME','INTEGER','FLOAT','BOOLEAN'],Literal['datetime64[s]','string[pyarrow]','Int64','Float64','bool[pyarrow]']]) – Mapping of logical names to engine-specific dtype strings.datetimes_as_string (
bool) – When True, map datetime fields to string equivalents.
- Returns:
Engine-specific dtype string.
- Return type:
str
- Raises:
ValueError – If dtype is not defined in mapping.
- get_dtypes(vars_map, engine='pandas', datetimes_as_string=False)¶
Build a dtype mapping for a set of variables based on config metadata.
- Parameters:
vars_map (
list[str]) – Variable names to map, including historical names.engine (
str) – Mapping preset to use.datetimes_as_string (
bool) – When True, convert datetime variables to string dtypes.
- Returns:
Mapping of requested variables to dtype strings. Variables not found in config map to None.
- Return type:
dict[str, str | None]
- map_dtype_datadoc(dtype, engine='pandas', datetimes_as_string=False)¶
Map a logical dtype using a named engine mapping.
- Parameters:
dtype (
str) – Logical dtype value from the NUDB config.engine (
str) – Name of the mapping preset to use.datetimes_as_string (
bool) – When True, map datetimes to string dtypes.
- Returns:
Concrete dtype string for the target engine.
- Return type:
str
- Raises:
KeyError – If engine is not defined in DTYPE_MAPPINGS.
nudb_use.metadata.nudb_config.set_options module¶
- set_option(setting_name, value)¶
Set an option in the options part of the nudb_config package.
- Parameters:
setting_name (
str) – The name of the setting to set.value (
Any) – The value we should set the setting to.
- Returns:
The changed config settings-object.
- Return type:
NudbConfig
nudb_use.metadata.nudb_config.variable_names module¶
Tools for sorting and renaming NUDB variables based on config metadata.
- get_cols2drop(data, name=None)¶
Return column names to drop from a dataset based on settings_use.
- Parameters:
data (
DataFrame) – DataFrame to check.name (
str|None) – Name of the dataset to compare against settings_use.
- Returns:
Columns present in the DataFrame but not defined in settings_use.
- Return type:
pd.Index
- get_cols2keep(data, name=None)¶
Get column names to keep in a dataset based on settings_use.
- Parameters:
data (
DataFrame) – DataFrame to check.name (
str|None) – Name of the dataset to compare against settings_use.
- Returns:
Columns present in the dataset that are defined in settings_use.
- Return type:
pd.Index
- get_cols_in_config(name)¶
Retrieve column (variable) names from settings_use.
- Parameters:
name (
str|None) – Name of a dataset. If None, returns all variable names across datasets.- Returns:
List of column or variable names defined in settings_use.
- Return type:
list[str]
- Raises:
KeyError – If the provided dataset name is not defined in settings_use.
- handle_dataset_specific_renames(df, dataset_name)¶
Apply dataset-specific rename overrides defined in configuration.
- Parameters:
df (
DataFrame) – DataFrame whose columns should be updated in place.dataset_name (
str) – Dataset key used to look up override rules.
- Returns:
DataFrame with overrides applied if they exist in the config.
- Return type:
pd.DataFrame
- sort_cols_after_config_order(data)¶
Sort DataFrame columns according to the config-defined order.
- Parameters:
data (
DataFrame) – DataFrame whose columns should be reordered.- Returns:
DataFrame with columns reordered per config definition.
- Return type:
pd.DataFrame
- sort_cols_after_config_order_and_unit(data)¶
Sort columns after order defined in the config and by unit.
Nested function that combined the functionality of the sort_cols_by_unit and sort_cols_after_config_order functions.
- Parameters:
data (
DataFrame) – Dataframe to sort.- Returns:
Dataframe with columns reordered by the column and unit order defined in the config.
- Return type:
pd.DataFrame
- sort_cols_by_unit(data)¶
Sort DataFrame columns based on the unit of each variable.
- Parameters:
data (
DataFrame) – Input DataFrame with columns representing variables to sort.- Returns:
DataFrame with columns reordered by their units’ sort order.
- Return type:
pd.DataFrame
- Raises:
ValueError – If no config is found for the sort unit.
- update_colnames(data, dataset_name='', lowercase=True)¶
Rename columns in a DataFrame based on metadata mappings.
- Parameters:
data (
DataFrame) – Input DataFrame whose column names should be updated.dataset_name (
str) – Dataset identifier for applying dataset-specific overrides.lowercase (
bool) – Whether to lowercase column names before renaming.
- Returns:
Copy of the DataFrame with columns renamed according to metadata.
- Return type:
pd.DataFrame
- Raises:
KeyError – If the renaming results in duplicate column names.