nudb_use.variables.specific_vars package¶

nudb_use.variables.specific_vars.kommuner module¶

correct_kommune_single_values(df, col_name='utd_skolekom')¶

Correct a kommune-column where we know there to be only one correct value a certain incorrect value can map to.

Parameters:

df (DataFrame) – The dataframe to mutate the kommune-column in.
col_name (str) – The string-name of the kommune-column we want to correct.

Returns:

The dataframe with the modified Kommune-column.

Return type:

DataFrame

Raises:

ValueError – If we find some weird kommune-code values that are not 4 digits.

fix_kommune_codes(df, col_name='utd_skolekom', from_date='1960-01-01', to_date=None)¶

Run the kommunefixing functions in the correct order, first correct to single values, then keep only valid codes.

Parameters:

df (DataFrame) – The dataframe to mutate the kommune-column in.
col_name (str) – The string-name of the kommune-column we want to correct.
from_date (str) – The date we should include valid kommune-codes from.
to_date (str | None) – The date we should include valid kommune-codes until. If set to None, defaults to todays-date.

Returns:

The modified dataframe with the corrected column.

Return type:

DataFrame

keep_only_valid_kommune_codes(komm_col, from_date='1960-01-01', to_date=None)¶

Filter a column of country codes down to the ones who have existed.

Parameters:

komm_col (Series) – A pandas series that we should modify to only contain valid codes.
from_date (str) – The date we should include valid kommune-codes from.
to_date (str | None) – The date we should include valid kommune-codes until. If set to None, defaults to todays-date.

Returns:

The modified column that only contains valid kommune cols.

Return type:

Series

nudb_use.variables.specific_vars.orgnr module¶

cleanup_orgnr_bedrift_foretak(df, time_col_name='utd_skoleaar_start', extra_orgnr_cols_split_prio=None)¶

Cleanup into the columns orgnrbed and orgnr_foretak using datasets from BOF.

Parameters:

df (DataFrame) – The data we should fix.
time_col_name (str) – The name of the column that has time we will use to date the BOF-join-connections.
extra_orgnr_cols_split_prio (list[str] | None) – If there are extra columns containing orgnr in your dataset, not in the default list: orgnr, utd_orgnr, orgnrbed, bof_orgnrbed, orgnr_foretak

Returns:

The modified dataframe.

Return type:

DataFrame

Raises:

TypeError – If we are struggeling to determine the time-columns formatting or dtype.

nudb_use.variables.specific_vars.snr module¶

generate_uuid_for_snr_with_fnr_catalog(df, fnr_catalog_path, snr_col='snr', fnr_col='fnr')¶

Fill missing SNR values using a persisted FNR-to-UUID catalog.

Loads an existing catalog from fnr_catalog_path (if present), uses it to fill missing snr_col values based on fnr_col, then generates new UUIDs for any remaining missing SNRs via generate_uuid_for_snr_with_fnr_col. Newly created FNR/SNR pairs are appended to the catalog and written back to disk.

Parameters:

df (DataFrame) – Input DataFrame to update (modified in place).
fnr_catalog_path (str | Path) – Path to a parquet file holding FNR/SNR mappings.
snr_col (str) – Name of the SNR column to fill.
fnr_col (str) – Name of the FNR column used as the key.

Returns:

The same DataFrame instance with filled SNR values.

Return type:

DataFrame

generate_uuid_for_snr_with_fnr_col(df, snr_col='snr', fnr_col='fnr', subset=None)¶

Fill missing SNR values using FNR-based UUIDs, then per-row UUIDs.

For rows where snr_col is missing and fnr_col is present, the function generates a UUID4 per unique fnr_col value and fills those SNRs. If any SNRs remain missing (typically due to missing FNRs), it assigns a unique UUID4 per remaining row.

Parameters:

df (DataFrame) – Input DataFrame to update (modified in place).
snr_col (str) – Name of the SNR column to fill.
fnr_col (str) – Name of the FNR column used as the grouping key.
subset (list[str] | None) – Name of subsetting variables to find unique FNRs within.

Returns:

The same DataFrame instance with filled SNR values.

Return type:

DataFrame

update_snr_with_snrkat(df, update_fnr=False, create_snr_mrk=None, return_dupes=False, snr_col_name='snr', fnr_col_name='fnr')¶

Update snr and possibly fnr using snrkat.

Parameters:

df (DataFrame) – The pandas dataframe you want updated with personal idents.
update_fnr (bool) – Set this to True if you want to update fnr also, no longer considered “as it came in”. Warning! We want original FNR as reported in, in most cases. Consider carefully before updating fnr.
create_snr_mrk (None | bool) – Set this to True, if you want to create/re-derive snr_mrk, set it to False if you dont want the function to re-derive an existing snr_mrk column.
return_dupes (bool) – If you want to take a look at the dupes that arise from the first operation. Set to True.
snr_col_name (str) – If you want your snr-col to stay named something different than “snr”.
fnr_col_name (str) – If you want your fnr-col to stay named something different than “fnr”.

Returns:

The Dataframe with a modified snr column, and optionally updated fnr.

Return type:

DataFrame