Reference¶
fagfunksjoner package¶
Subpackages¶
- fagfunksjoner.api package
- Submodules
- fagfunksjoner.api.statistikkregisteret module
Contact
FuturePublishingError
LangText
MultiplePublishings
Name
Owningsection
PublishingSpecifics
PublishingSpecifics.desk_flow
PublishingSpecifics.has_changed
PublishingSpecifics.is_cancelled
PublishingSpecifics.is_period
PublishingSpecifics.name
PublishingSpecifics.period_from
PublishingSpecifics.period_until
PublishingSpecifics.precision
PublishingSpecifics.publish_id
PublishingSpecifics.revision
PublishingSpecifics.statistic
PublishingSpecifics.status
PublishingSpecifics.time
PublishingSpecifics.time_changed
PublishingSpecifics.title
PublishingSpecifics.variant
SinglePublishing
SinglePublishing.name
SinglePublishing.short_name
SinglePublishing.old_subjectcodes
SinglePublishing.firstpublishing
SinglePublishing.status
SinglePublishing.owner_name
SinglePublishing.owner_code
SinglePublishing.contacts
SinglePublishing.triggerwords
SinglePublishing.variants
SinglePublishing.regional_levels
SinglePublishing.continuation
SinglePublishing.publish_id
SinglePublishing.default_lang
SinglePublishing.approved
SinglePublishing.changed
SinglePublishing.desk_flow
SinglePublishing.dir_flow
SinglePublishing.annual_reporting
SinglePublishing.approved
SinglePublishing.changed
SinglePublishing.changes
SinglePublishing.contacts
SinglePublishing.continuation
SinglePublishing.created_date
SinglePublishing.default_lang
SinglePublishing.desk_flow
SinglePublishing.dir_flow
SinglePublishing.firstpublishing
SinglePublishing.name
SinglePublishing.old_subjectcodes
SinglePublishing.owningsection
SinglePublishing.publish_id
SinglePublishing.publishings
SinglePublishing.regional_levels
SinglePublishing.short_name
SinglePublishing.start_year
SinglePublishing.status
SinglePublishing.triggerwords
SinglePublishing.variants
StatisticPublishingShort
Variant
etree_to_dict()
find_latest_publishing()
find_publishings()
find_stat_shortcode()
get_contacts()
get_singles_publishings()
get_statistics_register()
handle_children()
kwargs_specifics()
parse_contact_single()
parse_contacts()
parse_data_single()
parse_eierseksjon_single()
parse_lang_text_single()
parse_name_single()
parse_single_stat_from_englishjson()
parse_triggerord_single()
parse_variant_single()
raise_on_missing_future_publish()
sections_publishings()
single_stat()
specific_publishing()
time_until_publishing()
- fagfunksjoner.api.valuta module
- Module contents
- fagfunksjoner.dapla package
- fagfunksjoner.data package
- Submodules
- fagfunksjoner.data.datadok_extract module
ArchiveData
CodeList
ContextVariable
Metadata
add_dollar_or_nondollar_path()
add_pii_paths()
bumpcheck_file_years_back()
codelist_to_df()
codelist_to_dict()
convert_dates()
convert_to_pathlib()
date_formats()
date_parser()
downcast_ints()
extract_codelist()
extract_context_variables()
extract_parameters()
get_path_combinations()
get_yr_char_ranges()
go_back_in_time()
handle_decimals()
import_archive_data()
look_for_filepath()
metadata_to_df()
open_path_datadok()
open_path_metapath_datadok()
replace_dollar_stamme()
test_url()
test_url_combos()
url_from_path()
- fagfunksjoner.data.dicts module
- fagfunksjoner.data.pandas_combinations module
- fagfunksjoner.data.pandas_dtypes module
- fagfunksjoner.data.pyarrow module
- fagfunksjoner.data.view_dataframe module
- Module contents
- fagfunksjoner.paths package
- fagfunksjoner.prodsone package
Submodules¶
fagfunksjoner.fagfunksjoner_logger module¶
- class ColoredFormatter(*args, colors=None, **kwargs)¶
Bases:
Formatter
Colored log formatter.
- Parameters:
args (Any)
colors (dict[str, str] | None)
kwargs (Any)
- format(record)¶
Format the specified record as text.
- Return type:
str
- Parameters:
record (LogRecord)
- silence_logger(func, *args, **kwargs)¶
Silences INFO and WARNING logs for the duration of the function call.
- Return type:
Any
- Parameters:
func (Callable[[...], Any])
args (Any)
kwargs (Any)
Module contents¶
Fagfunksjoner is a place for “loose, small functionality” produced at Statistics Norway in Python.
Often created by “fag”, not IT these are often small “helper-functions” that many might be interested in.
- class ProjectRoot¶
Bases:
object
Contextmanager to import locally “with”.
As in:
with ProjectRoot(): from src.functions.local_functions import local_function
So this class navigates back and forth using a single line/”instruction”
- static load_toml(config_file)¶
Looks for a .toml file to load the contents from.
Looks in the current folder, the specified path, the project root.
- Parameters:
config_file (
str
) – The path or filename of the config-file to load.- Returns:
The contents of the toml-file.
- Return type:
dict[Any]
- all_combos_agg(df, groupcols, aggargs, fillna_dict=None, keep_empty=False, grand_total='')¶
Generate all aggregation levels for a set of columns in a dataframe.
- Parameters:
df (
DataFrame
) – dataframe to aggregate.groupcols (
list
[str
]) – List of columns to group by.aggargs (
Callable
[[Any
],Any
] |str
|ufunc
|Mapping
[str
,Callable
[[Any
],Any
] |str
|ufunc
] |dict
[str
,list
[str
]]) – how to aggregate, is sent to the agg function in pandas, look at its documentation.fillna_dict (
dict
[str
,Any
] |None
) – Fills “totals” in the groupcols, by filling their NA values. Send a dict with col names as keys, and string-values to put in cells as values.keep_empty (
bool
) – Keep groups without observations through the process. Removing them is default behaviour of Pandasgrand_total (
dict
[str
,str
] |str
) – Fill this value, if you want a grand total in your aggregations. If you use a string, this will be input in the fields in the groupcol columns. If you send a dict, like to the fillna_dict parameter, the values in the cells in the grand_total will reflect the values in the dict.
- Returns:
- with all the group-by columns, all the aggregation columns combined
with the aggregation functions, a column called aggregation_level which separates the different aggregation levels, and a column called aggregation_ways which counts the number of group columns used for the aggregation..
- Return type:
pd.DataFrame
- Known problems:
You should not use dataframes with multi-index columns as they cause trouble.
Examples:
import pandas as pd from fagfunksjoner.data.pandas_combinations import all_combos_agg data = {'alder': [20, 60, 33, 33, 20], 'kommune': ['0301', '3001', '0301', '5401', '0301'], 'kjonn': ['1', '2', '1', '2', '2'], 'inntekt': [1000000, 120000, 220000, 550000, 50000], 'formue': [25000, 50000, 33000, 44000, 90000] } pers = pd.DataFrame(data) agg1 = all_combos_agg(pers, groupcols=['kjonn'], keep_empty=True, aggargs={'inntekt':['mean', 'sum']}) display(agg1) agg2 = all_combos_agg(pers, groupcols=['kjonn', 'alder'], aggargs={'inntekt':['mean', 'sum']}) display(agg2) agg3 = all_combos_agg(pers, groupcols=['kjonn', 'alder'], grand_total=True, grand_total='Grand total', aggargs={'inntekt':['mean', 'sum']}) display(agg3) agg4 = all_combos_agg(pers, groupcols=['kjonn', 'alder'], fillna_dict={'kjonn': 'Total kjønn', 'alder': 'Total alder'}, aggargs={'inntekt':['mean', 'sum'], 'formue': ['count', 'min', 'max']}, grand_total="Total") display(agg4) pers['antall'] = 1 groupcols = pers.columns[0:3].tolist() func_dict = {'inntekt':['mean', 'sum'], 'formue': ['sum', 'std', 'count']} fillna_dict = {'kjonn': 'Total kjønn', 'alder': 'Total alder', 'kommune': 'Total kommune'} agg5 = all_combos_agg(pers, groupcols=groupcols, aggargs=func_dict, fillna_dict=fillna_dict, grand_total=fillna_dict ) display(agg5)
- auto_dtype(df, cardinality_threshold=0, copy_df=True, show_memory=True)¶
Clean up a dataframes dtypes.
First lowers all column names. Tries to decodes byte strings to utf8. Runs pandas’ convert_dtypes() Tries to convert object to string, and strips empty spaces Downcasts ints to lower versions of ints If cardinality_threshold is set above 0, will convert object and strings to categoricals, if number of unique values in the columns are below the threshold.
- Parameters:
df (
DataFrame
) – The dataframe to manipulatecardinality_threshold (
int
) – Less unique values in columns than this threshold, means it should be converted to a categorical. Defaults to 0, meaning no conversion to categoricals.copy_df (
bool
) – The reverse of inplace, make a copy in memory. This may give a memory impact, but be safer. Defaults to True.show_memory (
bool
) – Show the user how much memory was saved by doing the conversion, does require some processing. Defaults to True.
- Returns:
_description_
- Return type:
pd.DataFrame
- check_env(raise_err=True)¶
Check if you are on Dapla or in prodsone.
- Parameters:
raise_err (bool) – Set to False if you don’t want the code to raise an error on an unrecognized environment.
- Returns:
“DAPLA” if on Dapla, “PROD” if in prodsone, otherwise “UNKNOWN”.
- Return type:
str
- Raises:
OSError – If no environment indications match (Dapla or Prod), and raise_err is set to True.
- get_latest_fileversions(glob_list_path)¶
Receives a list of filenames with multiple versions and returns the latest versions of the files.
Recommend using glob operation to create the input list. See doc for glob operations: - GCS: https://gcsfs.readthedocs.io/en/latest/api.html#gcsfs.core.GCSFileSystem.glob - Locally: https://docs.python.org/3/library/glob.html
- Parameters:
glob_list_path (
list
[str
] |str
) – List of strings or single string that represents a filepath. Recommend that the list is created with glob operation.- Returns:
List of strings with unique filepaths and their latest versions.
- Return type:
list[str]
- Raises:
TypeError – If parameter does not fit with type-narrowing to list of strings.
Example:
import dapla as dp fs = dp.FileClient.get_gcs_file_system() all_files = fs.glob("gs://dir/statdata_v*.parquet") latest_files = get_latest_fileversions(all_files)
- linux_shortcuts(insert_environ=False)¶
Manually load the “linux-forkortelser” in as dict.
If the function can find the file they are shared in.
- Parameters:
insert_environ (
bool
) – Set to True if you want the dict to be inserted into the environment variables (os.environ).- Returns:
The “linux-forkortelser” as a dict
- Return type:
dict[str, str]
- Raises:
ValueError – If the stamme_variabel file is wrongly formatted.
- next_version_path(filepath)¶
Generates a new file path with an incremented version number.
Constructs a filepath for a new version of a file, based on the latest existing version found in a specified folder. Meaning it skips to “one after the highest version it finds”. It increments the version number by one, to ensure the new file path is unique.
- Parameters:
filepath (
str
) – The address for the file.- Returns:
The new file path with an incremented version number and specified suffix.
- Return type:
str
Example:
next_version_path('gs://my-bucket/datasets/data_v1.parquet') 'gs://my-bucket/datasets/data_v2.parquet'
- open_path_datadok(path, **read_fwf_params)¶
Get archive data only based on the path of the .dat or .txt file.
This function attempts to correct and test options, to try track down the file and metadata mentioned.
- Parameters:
path (
str
|Path
) – The path to the archive file in prodsonen to attempt to get metadata for and open.read_fwf_params (
Any
) – Remaining parameters to pass to pd.read_fwf, dtype, widths, names and na_values is overwritten, so dont use those.
- Returns:
An ArchiveData object containing the imported data, metadata, and code lists.
- Return type:
- Raises:
ValueError – If no datadok-api endpoint is found for the path given.
- open_path_metapath_datadok(path, metapath, **read_fwf_params)¶
If open_path_datadok doesnt work, specify the path on linux AND the path in Datadok.
- Parameters:
path (
str
) – Path to the archive file on linux.metapath (
str
) – Path described in datadok.read_fwf_params (
Any
) – Remaining parameters to pass to pd.read_fwf, dtype, widths, names and na_values is overwritten, so dont use those.
- Returns:
An ArchiveData object containing the imported data, metadata, and code lists.
- Return type:
- saspy_df_from_path(path)¶
Use df_from_sasfile instead, this is the old (bad) name for the function.
- Parameters:
path (
str
) – The full path to the sasfile you want to open with sas.- Returns:
The raw content of the sasfile straight from saspy
- Return type:
pandas.DataFrame
- saspy_session()¶
Get an initialized saspy.SASsession object.
Use the default config, getting your password if you’ve set one.
- Returns:
An initialized saspy-session
- Return type:
saspy.SASsession
- view_dataframe(dataframe, column, operator='==', unique_limit=100)¶
Display an interactive widget for filtering and viewing data in a DataFrame based on selection of values in one column.
- Parameters:
dataframe (
DataFrame
) – The DataFrame containing the data to be filtered.column (
str
) – The column in the DataFrame to be filtered.operator (
str
) – The comparison operator for filtering (may be altered during the display). Options: ‘==’, ‘!=’, ‘>=’, ‘>’, ‘<’, ‘<=’. Default: ‘==’.unique_limit (
int
) – The maximum number of unique values in the column for using ‘==’ or ‘!=’ operators. Default: 100.
- Returns:
- An interactive widget for filtering and viewing data based on the specified criteria.
The ‘==’ and ‘!=’ operators use a dropdown list for multiple selection The other (interval) parameters us a slider
- Return type:
widgets.interactive