ssb_konjunk package¶
ssb_konjunk.data_formating module¶
Funksjoner for å gjøre dataformartering.
- bytte_koder(df, kode_dict, kolonnenavn)¶
Bytter koder.
Funksjonen for å bytte kode i en kolonne.
- Parameters:
df (
DataFrame
) – Pandas dataramme som vi skal sende inn.kode_dict (
dict
[str
,str
]) – Ordbok med gammel og ny kode.kolonnenavn (
str
) – Navn på kolonnen som skal byttes ut.
- Return type:
DataFrame
- Returns:
Dataramme med ny kode.
ssb_konjunk.fame module¶
ssb_konjunk.prompts module¶
Function for date prompts in python.
The functions should be used to prompt which period to read files from og which period to run a statistic for.
- bump_quarter(year, quarter)¶
Bump period with a quarter further.
E.g. 2023 and quarter 4 as input, will be returned as 2024 and quarter 1.
- Parameters:
year (
int
) – The year.quarter (
int
) – The quarter.
- Returns:
The year and quarter with an “added” quarter.
- Return type:
tuple
- check_publishing_date(date)¶
Validate the publishing date.
- Parameters:
date (
str
) – the date to check is on valid format.- Returns:
the returned and corrected date.
- Return type:
str
- days_in_month(year, month)¶
Function to get number of days in month.
- Parameters:
year (
int
) – Year.month (
int
) – Month.
- Returns:
List with days in month.
- Return type:
list
- extract_start_end_dates(file_name)¶
Function to extract start and end dates from file name.
- Parameters:
file_name (
str
) – String value with name of file.- Returns:
Tuple with two datetime objects
- Return type:
tuple
- get_previous_month(year, month)¶
Turn e.g. month 01 year 2023 into month 12 and year 2022.
- Parameters:
year (
str
|int
) – the current year YYYY.month (
str
|int
) – the current month MM.
- Returns:
the previous month with year.
- Return type:
list[int]
- input_month()¶
Input function for month.
- Returns:
month
- Return type:
int
- input_quarter()¶
Input function for quarter.
- Returns:
quarter
- Return type:
int
- input_term()¶
Input function for term.
- Returns:
term
- Return type:
int
- input_trimester()¶
Input function for trimester.
- Returns:
trimester
- Return type:
int
- input_week()¶
Input function for week.
- Returns:
week
- Return type:
int
- input_year()¶
Input function for year.
- Returns:
Year as int
- Return type:
int
- iterate_years_months(start_year, end_year, start_month, end_month)¶
Function to iterate over years and month.
Allows you to select start year, start month, end year and end month
- Parameters:
start_year (
int
) – Int for start year.start_month (
int
) – Int for start month.end_year (
int
) – Int for end year.end_month (
int
) – Int for end month.
- Yields:
Any – A tuple containing the year and month for each combination.
- Raises:
ValueError – If start year is bigger than end year.
ValueError – If month is invalid number.
ValueError – If end month is bigger than start and only iterating on one year.
- Return type:
Any
- months_in_quarter(quarter)¶
Return the three months in the quarter.
- Parameters:
quarter (
int
|str
) – the relevant quarter.- Returns:
a list with the months in the quarter.
- Return type:
list
- Raises:
ValueError – If invalid quarter.
- months_in_term(term)¶
Gives out months as ints from term as int.
- Parameters:
term (
int
) – term- Returns:
months
- Return type:
tuple
- publishing_date()¶
Set publishing dat at format YYYY-MM-DD.
- Returns:
the date.
- Return type:
str
- quarter_for_month(month)¶
Find corresponding quarter for a month.
- Parameters:
month (
str
|int
) – Month to find corresponding quarter for.- Returns:
The corresponding quarter.
- Return type:
int
- Raises:
ValueError – If invalid month
- set_publishing_date()¶
Set the date for publication of tables.
Used for loading to Statbank.
- Returns:
a date.
- Return type:
str
- validate_day(day)¶
Ensure day to have leading zero if it less than 10.
- Parameters:
day (
int
|str
) – the number of the month- Returns:
the number of the day with leading zero if relevant
- Return type:
str
- validate_month(month)¶
Ensure month to have leading zero if before october.
- Parameters:
month (
int
|str
) – the number of the month- Returns:
the number of the month with leading zero if relevant
- Return type:
str
ssb_konjunk.rounding module¶
- round_half_up(df, column, digits='1.')¶
Round a pandas column half up.
The “normal” (half up) rounding should be used.
- Parameters:
df (
DataFrame
) – a column in a data frame where all values will be rounded off.column (
str
) – name of the column to round off values in.digits (
str
) – number of digits after . gives the number of digits rounded off to. Default: no digits.
- Returns:
a column in a data frame where all values are rounded off
- Return type:
Series
- round_half_up_float(n, decimals=0)¶
Round a float half up.
Function from https://realpython.com/python-rounding/.
- Parameters:
n (
float
) – the float to round off.decimals (
int
) – the number of decimals to keep.
- Returns:
the rounded off number.
- Return type:
float|int
ssb_konjunk.saving module¶
Functions used for get in touch with you files and save them.
Follows the the standardization for versioning and names.
- read_ssb_file(periode, frequency, bucket, kortnavn, file_name, datatilstand='', undermappe=None, filetype='parquet', version_number=None, fs=None, seperator=';', encoding='latin1')¶
Function to read a saved file, stored at SSB-format.
Get the last version saved in the datatilstand specified (klargjorte-data, statistikk, utdata). at the correct bucket path and with the speficed name. If it is a year table, the filename is automatically adjusted.
- Parameters:
periode (
tuple
[int
,...
]) – Up to six arguments with int, to create timestamp for. E.g. (2022,2,2023,4) is p_2022-02_p2023-04 when frequency = ‘M’.frequency (
str
) – monthly (M), daily(D), quarter (Q), terital (T), weekly (W).bucket (
str
) – GCP bucket passed with a FileClient object or path in prodsonen.kortnavn (
str
) – Name of statistic or data product, temp and oppdrag is also valid.file_name (
str
) – Name for file.datatilstand (
str
) – Datatilstand following SSB standards, except when temp and oppdrag is the kortnavn.undermappe (
str
|None
) – Optional folder under ‘datatilstand’.version_number (
int
|None
) – possibility to get another version, than the newest (i.e. highest version number). Default: np.nan.filetype (
str
) – the filetype to save as. Default: ‘parquet’.fs (
GCSFileSystem
|None
) – the filesystem, pass with gsc Filesystem if Dapla. Default: None.seperator (
str
) – the seperator to use it filetype is csv. Default: ‘;’.encoding (
str
) – Encoding for file, base is latin1.
- Raises:
FileNotFoundError – If no files matching the file path and filetype are found.
- Returns:
file as a data frame.
- Return type:
pd.DataFrame
- write_ssb_file(df, periode, frequency, bucket, kortnavn, file_name, datatilstand='', undermappe=None, stable_version=True, filetype='parquet', fs=None, seperator=';', encoding='latin1')¶
Function to write and save a dataframe at SSB-format.
- Parameters:
df (
DataFrame
) – The dataframe to save.periode (
tuple
[int
,...
]) – Up to six arguments with int, to create timestamp for. E.g. (2022,2,2023,4) is p_2022-02_p2023-04 when frequency = ‘M’.frequency (
str
) – monthly (M), daily(D), quarter (Q), terital (T), weekly (W).bucket (
str
) – GCP bucket passed with a FileClient object or path in prodsonen.kortnavn (
str
) – Name of statistic or data product, temp or oppdrag is also valid.file_name (
str
) – Name for file.datatilstand (
str
) – Datatilstand following SSB standards, except when temp and oppdrag is the kortnavn.undermappe (
str
|None
) – Optional folder under ‘datatilstand’.stable_version (
bool
) – Bool for whether you should have checks in place in case of overwrite.filetype (
str
) – the filetype to save as. Default: ‘parquet’.fs (
GCSFileSystem
|None
) – the filesystem, pass with gsc Filesystem if Dapla. Default: None.seperator (
str
) – the seperator to use it filetype is csv. Default: ‘;’.encoding (
str
) – Encoding for file, base is latin1.
- Raises:
ValueError – if df has no rows.
- Return type:
None
ssb_konjunk.statbank_format module¶
- format_time_period(df, year, quarter='', col_name='periode', month='')¶
Add column with time period.
- Parameters:
df (
DataFrame
) – dataframe.year (
int
) – the year.quarter (
int
|str
) – optional, default ‘’.col_name (
str
) – optional, default ‘periode’. The name of the column for the time period.month (
int
|str
) – optional, default ‘’.
- Returns:
dataframe with a column with time period.
- Return type:
pd.DataFrame
- remove_suppressed_numbers(df, colname_value, colname_suppressed='prikka')¶
Remove values in column if marked as prikka (04).
- Parameters:
df (
DataFrame
) – dataframe.colname_value (
str
) – the name of the column with potential values to remove.colname_suppressed (
str
) – the name of the column that contains the code ‘04’ if the row should be suppressed.
- Returns:
dataframe with removed values if suppressed.
- Return type:
pd.DataFrame
ssb_konjunk.timestamp module¶
Functions to create timestamp according to SSB standard.
- check_periodic_year(year, cycle_year, period)¶
Check if a year is a part of a periodic cycle.
An example of use: a functionality should be performed every third year, starting in year 2021. I.e. not in 2022 and 2023, but in 2024. Then this function should return True when passing 2024 as the year argument, 2021 (or 2015, 2018, 2024 and so) is passed as the cycle year and period is passed as 3 (triennal period).
- Parameters:
year (
int
) – the year to check.cycle_year (
int
) – a year in the cycle.period (
int
) – the number of years in a period.
- Returns:
whether or not the year is part of the triennal cycle.
- Return type:
bool
- get_ssb_timestamp(*args, frequency='M')¶
Function to create a string in ssb timestamp format.
- Parameters:
args (
int
) – Up to six arguments with int, to create timestamp for.frequency (
str
) – Letter for which frequency the data is, Y for year etc.
- Returns:
Returns time stamp in ssb format.
- Return type:
string|None
- Raises:
ValueError – Raises error for wrong values in args.
Example
>>> get_ssb_timestamp(2024,8,1, frequency='D') 'p2024-08-01'
ssb_konjunk.xml_handling module¶
A collection of functions to make xml files handling easier both in dapla and prodsone.
- dump_element(element, indent=0)¶
Function to print xml in pretty format.
- Parameters:
element (
Element
) – ET.Element you want to print.indent (
int
) – Level of ident you want.
- Return type:
None
- read_xml(xml_file, fs=None)¶
Funtion to get xml root from disk.
- Parameters:
xml_file (
str
) – Strin value for xml filepath.fs (
GCSFileSystem
|None
) – filesystem
- Returns:
Root of xml file.
- Return type:
ET.Element
- return_txt_xml(root, child)¶
Function to return text value from child element in xml file.
- Parameters:
root (
Element
) – Root with all data stored in a branch like structure.child (
str
) – String value to find child element which contains a value.
- Returns:
Returns string value from child element.
- Return type:
str