nudb_use.variables package

Subpackages

nudb_use.variables.checks module

Validation utilities for ensuring variable schemas match expectations.

check_cols_against_klass_codelists(df, col_codelist=None)

Validate DataFrame values against KLASS codelists.

Return type:

None

Parameters:
  • df (DataFrame)

  • col_codelist (dict[str, list[str] | dict[str, str]] | None)

check_column_presence(df, dataset_name=None, check_for=None, raise_errors=True)

Validate columns against config or a supplied list.

Return type:

list[Exception]

Parameters:
  • df (DataFrame)

  • dataset_name (str | None)

  • check_for (None | list[str])

  • raise_errors (bool)

identify_cols_not_in_keep_drop_in_paths(paths, cols_keep, cols_drop, raise_error_found=False)

Identify columns present in data files that are missing from keep/drop lists.

Return type:

set[str]

Parameters:
  • paths (list[Path])

  • cols_keep (list[str])

  • cols_drop (list[str])

  • raise_error_found (bool)

pyarrow_columns_from_metadata(path)

Read column names from a Parquet file via metadata only.

Return type:

list[str]

Parameters:

path (str | Path)

nudb_use.variables.cleanup module

Utilities for reorganizing and trimming NUDB datasets.

move_col_after_col(df, col_anchor, col_move_after)

Move a specified column in a DataFrame to immediately follow another specified column.

Parameters:
  • df (DataFrame) – Input pandas DataFrame.

  • col_anchor (str) – Name of the column after which the specified column will be moved.

  • col_move_after (str) – Name of the column to move.

Returns:

New DataFrame with the specified column moved to follow the anchor column.

Return type:

pd.DataFrame

move_content_from_col_to(df, from_col, to_col)

Fill empty values (NA) in one column with values from another column.

Parameters:
  • df (DataFrame) – DataFrame

  • from_col (str) – Column where information is taken.

  • to_col (str) – Column where information is moved to.

Returns:

DataFrame with values filled out.

Return type:

pd.DataFrame