ssb_konjunk package

ssb_konjunk.data_formating module

Funksjoner for å gjøre dataformartering.

bytte_koder(df, kode_dict, kolonnenavn)

Bytter koder.

Funksjonen for å bytte kode i en kolonne.

Parameters:
  • df (DataFrame) – Pandas dataramme som vi skal sende inn.

  • kode_dict (dict[str, str]) – Ordbok med gammel og ny kode.

  • kolonnenavn (str) – Navn på kolonnen som skal byttes ut.

Return type:

DataFrame

Returns:

Dataramme med ny kode.

ssb_konjunk.fame module

ssb_konjunk.prompts module

Function for date prompts in python.

The functions should be used to prompt which period to read files from og which period to run a statistic for.

bump_quarter(year, quarter)

Bump period with a quarter further.

E.g. 2023 and quarter 4 as input, will be returned as 2024 and quarter 1.

Parameters:
  • year (int) – The year.

  • quarter (int) – The quarter.

Returns:

The year and quarter with an “added” quarter.

Return type:

tuple

check_publishing_date(date)

Validate the publishing date.

Parameters:

date (str) – the date to check is on valid format.

Returns:

the returned and corrected date.

Return type:

str

days_in_month(year, month)

Function to get number of days in month.

Parameters:
  • year (int) – Year.

  • month (int) – Month.

Returns:

List with days in month.

Return type:

list

extract_start_end_dates(file_name)

Function to extract start and end dates from file name.

Parameters:

file_name (str) – String value with name of file.

Returns:

Tuple with two datetime objects

Return type:

tuple

get_previous_month(year, month)

Turn e.g. month 01 year 2023 into month 12 and year 2022.

Parameters:
  • year (str | int) – the current year YYYY.

  • month (str | int) – the current month MM.

Returns:

the previous month with year.

Return type:

list[int]

input_month()

Input function for month.

Returns:

month

Return type:

int

input_quarter()

Input function for quarter.

Returns:

quarter

Return type:

int

input_term()

Input function for term.

Returns:

term

Return type:

int

input_trimester()

Input function for trimester.

Returns:

trimester

Return type:

int

input_week()

Input function for week.

Returns:

week

Return type:

int

input_year()

Input function for year.

Returns:

Year as int

Return type:

int

iterate_years_months(start_year, end_year, start_month, end_month)

Function to iterate over years and month.

Allows you to select start year, start month, end year and end month

Parameters:
  • start_year (int) – Int for start year.

  • start_month (int) – Int for start month.

  • end_year (int) – Int for end year.

  • end_month (int) – Int for end month.

Yields:

Any – A tuple containing the year and month for each combination.

Raises:
  • ValueError – If start year is bigger than end year.

  • ValueError – If month is invalid number.

  • ValueError – If end month is bigger than start and only iterating on one year.

Return type:

Any

months_in_quarter(quarter)

Return the three months in the quarter.

Parameters:

quarter (int | str) – the relevant quarter.

Returns:

a list with the months in the quarter.

Return type:

list

Raises:

ValueError – If invalid quarter.

months_in_term(term)

Gives out months as ints from term as int.

Parameters:

term (int) – term

Returns:

months

Return type:

tuple

publishing_date()

Set publishing dat at format YYYY-MM-DD.

Returns:

the date.

Return type:

str

quarter_for_month(month)

Find corresponding quarter for a month.

Parameters:

month (str | int) – Month to find corresponding quarter for.

Returns:

The corresponding quarter.

Return type:

int

Raises:

ValueError – If invalid month

set_publishing_date()

Set the date for publication of tables.

Used for loading to Statbank.

Returns:

a date.

Return type:

str

validate_day(day)

Ensure day to have leading zero if it less than 10.

Parameters:

day (int | str) – the number of the month

Returns:

the number of the day with leading zero if relevant

Return type:

str

validate_month(month)

Ensure month to have leading zero if before october.

Parameters:

month (int | str) – the number of the month

Returns:

the number of the month with leading zero if relevant

Return type:

str

ssb_konjunk.rounding module

round_half_up(df, column, digits='1.')

Round a pandas column half up.

The “normal” (half up) rounding should be used.

Parameters:
  • df (DataFrame) – a column in a data frame where all values will be rounded off.

  • column (str) – name of the column to round off values in.

  • digits (str) – number of digits after . gives the number of digits rounded off to. Default: no digits.

Returns:

a column in a data frame where all values are rounded off

Return type:

Series

round_half_up_float(n, decimals=0)

Round a float half up.

Function from https://realpython.com/python-rounding/.

Parameters:
  • n (float) – the float to round off.

  • decimals (int) – the number of decimals to keep.

Returns:

the rounded off number.

Return type:

float|int

ssb_konjunk.saving module

Functions used for get in touch with you files and save them.

Follows the the standardization for versioning and names.

read_ssb_file(periode, frequency, bucket, kortnavn, file_name, datatilstand='', undermappe=None, filetype='parquet', version_number=None, fs=None, seperator=';', encoding='latin1')

Function to read a saved file, stored at SSB-format.

Get the last version saved in the datatilstand specified (klargjorte-data, statistikk, utdata). at the correct bucket path and with the speficed name. If it is a year table, the filename is automatically adjusted.

Parameters:
  • periode (tuple[int, ...]) – Up to six arguments with int, to create timestamp for. E.g. (2022,2,2023,4) is p_2022-02_p2023-04 when frequency = ‘M’.

  • frequency (str) – monthly (M), daily(D), quarter (Q), terital (T), weekly (W).

  • bucket (str) – GCP bucket passed with a FileClient object or path in prodsonen.

  • kortnavn (str) – Name of statistic or data product, temp and oppdrag is also valid.

  • file_name (str) – Name for file.

  • datatilstand (str) – Datatilstand following SSB standards, except when temp and oppdrag is the kortnavn.

  • undermappe (str | None) – Optional folder under ‘datatilstand’.

  • version_number (int | None) – possibility to get another version, than the newest (i.e. highest version number). Default: np.nan.

  • filetype (str) – the filetype to save as. Default: ‘parquet’.

  • fs (GCSFileSystem | None) – the filesystem, pass with gsc Filesystem if Dapla. Default: None.

  • seperator (str) – the seperator to use it filetype is csv. Default: ‘;’.

  • encoding (str) – Encoding for file, base is latin1.

Raises:

FileNotFoundError – If no files matching the file path and filetype are found.

Returns:

file as a data frame.

Return type:

pd.DataFrame

write_ssb_file(df, periode, frequency, bucket, kortnavn, file_name, datatilstand='', undermappe=None, stable_version=True, filetype='parquet', fs=None, seperator=';', encoding='latin1')

Function to write and save a dataframe at SSB-format.

Parameters:
  • df (DataFrame) – The dataframe to save.

  • periode (tuple[int, ...]) – Up to six arguments with int, to create timestamp for. E.g. (2022,2,2023,4) is p_2022-02_p2023-04 when frequency = ‘M’.

  • frequency (str) – monthly (M), daily(D), quarter (Q), terital (T), weekly (W).

  • bucket (str) – GCP bucket passed with a FileClient object or path in prodsonen.

  • kortnavn (str) – Name of statistic or data product, temp or oppdrag is also valid.

  • file_name (str) – Name for file.

  • datatilstand (str) – Datatilstand following SSB standards, except when temp and oppdrag is the kortnavn.

  • undermappe (str | None) – Optional folder under ‘datatilstand’.

  • stable_version (bool) – Bool for whether you should have checks in place in case of overwrite.

  • filetype (str) – the filetype to save as. Default: ‘parquet’.

  • fs (GCSFileSystem | None) – the filesystem, pass with gsc Filesystem if Dapla. Default: None.

  • seperator (str) – the seperator to use it filetype is csv. Default: ‘;’.

  • encoding (str) – Encoding for file, base is latin1.

Raises:

ValueError – if df has no rows.

Return type:

None

ssb_konjunk.statbank_format module

format_time_period(df, year, quarter='', col_name='periode', month='')

Add column with time period.

Parameters:
  • df (DataFrame) – dataframe.

  • year (int) – the year.

  • quarter (int | str) – optional, default ‘’.

  • col_name (str) – optional, default ‘periode’. The name of the column for the time period.

  • month (int | str) – optional, default ‘’.

Returns:

dataframe with a column with time period.

Return type:

pd.DataFrame

remove_suppressed_numbers(df, colname_value, colname_suppressed='prikka')

Remove values in column if marked as prikka (04).

Parameters:
  • df (DataFrame) – dataframe.

  • colname_value (str) – the name of the column with potential values to remove.

  • colname_suppressed (str) – the name of the column that contains the code ‘04’ if the row should be suppressed.

Returns:

dataframe with removed values if suppressed.

Return type:

pd.DataFrame

ssb_konjunk.timestamp module

Functions to create timestamp according to SSB standard.

check_periodic_year(year, cycle_year, period)

Check if a year is a part of a periodic cycle.

An example of use: a functionality should be performed every third year, starting in year 2021. I.e. not in 2022 and 2023, but in 2024. Then this function should return True when passing 2024 as the year argument, 2021 (or 2015, 2018, 2024 and so) is passed as the cycle year and period is passed as 3 (triennal period).

Parameters:
  • year (int) – the year to check.

  • cycle_year (int) – a year in the cycle.

  • period (int) – the number of years in a period.

Returns:

whether or not the year is part of the triennal cycle.

Return type:

bool

get_ssb_timestamp(*args, frequency='M')

Function to create a string in ssb timestamp format.

Parameters:
  • args (int) – Up to six arguments with int, to create timestamp for.

  • frequency (str) – Letter for which frequency the data is, Y for year etc.

Returns:

Returns time stamp in ssb format.

Return type:

string|None

Raises:

ValueError – Raises error for wrong values in args.

Example

>>> get_ssb_timestamp(2024,8,1, frequency='D')
'p2024-08-01'

ssb_konjunk.xml_handling module

A collection of functions to make xml files handling easier both in dapla and prodsone.

dump_element(element, indent=0)

Function to print xml in pretty format.

Parameters:
  • element (Element) – ET.Element you want to print.

  • indent (int) – Level of ident you want.

Return type:

None

read_xml(xml_file, fs=None)

Funtion to get xml root from disk.

Parameters:
  • xml_file (str) – Strin value for xml filepath.

  • fs (GCSFileSystem | None) – filesystem

Returns:

Root of xml file.

Return type:

ET.Element

return_txt_xml(root, child)

Function to return text value from child element in xml file.

Parameters:
  • root (Element) – Root with all data stored in a branch like structure.

  • child (str) – String value to find child element which contains a value.

Returns:

Returns string value from child element.

Return type:

str