Reference

statbank package

statbank package

statbank.client module

class StatbankClient(date=datetime.datetime(2024, 12, 7, 8, 16, 46, 507066, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600))), shortuser='', cc='', bcc='', overwrite=True, approve=Approve.JIT, check_username_password=True)

Bases: StatbankAuth

This is the main interface towards the rest of the statbank-package.

An initialized client, an object of this class, will contain data/parameters that often is shared, among all transfers within a statistical production. Call methods under this client to: - transfer the data: .transfer() - only validate the data against a description: .validate() - get transfer/data description (filbeskrivelse): .get_description() - set the publish date with a datepicker: .date_picker() + .set_publish_date() - get published data from the external or internal API of statbanken: apidata_all() / apidata()

Parameters:
  • date (str | dt.date | dt.datetime)

  • shortuser (str)

  • cc (str)

  • bcc (str)

  • overwrite (bool)

  • approve (int | str | Approve)

  • check_username_password (bool)

date

Date for publishing the transfer. Statbanken only allows publishing four months into the future?

Type:

dt.datetime

shortuser

The abbrivation of username at ssb. Three letters, like “cfc”. If not specified, we will try to get this from daplas environement variables.

Type:

str

cc

First person to be notified by email of transfer. Defaults to the same as “shortuser”

Type:

str

bcc

Second person to be notified by email of transfer. Defaults to the same as “cc”

Type:

str

overwrite

False = no overwrite True = overwrite

Type:

bool

approve

0 = MANUAL approval 1 = AUTOMATIC approval at transfer-time (immediately) 2 = JIT (Just In Time), approval right before publishing time

Type:

Approve | str | int

log

Each “action” (method used) on the client is appended to the log. Nice to use for appending to your own logging after you are done, or printing it in a try-except-block to see what the last actions were, before error being raised.

Type:

list[str]

static apicodelist(id_or_url='', codelist_name='')

Get one specific or all the codelists of a published statbank-table as a dict or nested dicts.

Parameters:
  • id_or_url (str) – The id of the STATBANK-table to get the total query for, or supply the total url, if the table is “internal”.

  • codelist_name (str) – The name of the specific codelist to get.

Returns:

The codelist of the table as a dict or a nested dict.

Return type:

dict[str, str] | dict[str, dict[str, str]]

static apidata(id_or_url='', payload=None, include_id=False)

Get the contents of a published statbank-table as a pandas Dataframe, specifying a query to limit the return.

Parameters:
  • id_or_url (str) – The id of the STATBANK-table to get the total query for, or supply the total url, if the table is “internal”.

  • payload (dict[str, str]|None) – a dict of the query to include with the request, can be copied from the statbank-webpage.

  • include_id (bool) – If you want to include “codes” in the dataframe, set this to True

Returns:

A pandas dataframe with the table-content

Return type:

pd.DataFrame

static apidata_all(id_or_url='', include_id=False)

Get ALL the contents of a published statbank-table as a pandas Dataframe.

Parameters:
  • id_or_url (str) – The id of the STATBANK-table to get the total query for, or supply the total url, if the table is “internal”.

  • include_id (bool) – If you want to include “codes” in the dataframe, set this to True

Returns:

A pandas dataframe with the table-content

Return type:

pd.DataFrame

static apidata_rotate(df, ind='year', val='value')

Rotate the dataframe so that time is used as the index.

Parameters:
  • df (pd.dataframe) – dataframe (from <get_from_ssb> function

  • ind (str) – string of column name denoting time

  • val (str) – string of column name denoting values

Returns:

pivoted dataframe

Return type:

pd.DataFrame

static apimetadata(id_or_url='')

Get the metadata of a published statbank-table as a dict.

Parameters:

id_or_url (str) – The id of the STATBANK-table to get the total query for, or supply the total url, if the table is “internal”.

Returns:

The metadata of the table as the json returned from the API-get-request.

Return type:

dict[str, Any]

date_picker()

Display a datapicker-widget.

Assign it to a variable, that you after editing the date, pass into set_publish_date() date = client.datepicker() # Edit date client.set_publish_date(date)

Returns:

A datepicker widget from ipywidgets, with its date set to what the client currently holds.

Return type:

widgets.DatePicker

get_description(tableid='00000')

Get the “uttrekksbeskrivelse” for the tableid, which describes metadata.

about shape of data to be transferred, and metadata about the table itself in Statbankens system, like ID, name and content of codelists.

Parameters:

tableid (str) – The tableid of the “hovedtabell” in statbanken, a 5 digit string.

Returns:

An instance of the class StatbankUttrekksBeskrivelse, which is comparable to the old “filbeskrivelse”.

Return type:

StatbankUttrekksBeskrivelse

static read_description_json(json_path_or_str)

Re-initializes a StatbankUttrekksBeskrivelse from a stored json file/string.

Checks if provided string exists on disk, if it does, tries to load it as json. Otherwise expects you to provide a json-string that works for json.loads. Inserts first layer in json as attributes under a blank StatbankUttrekksBeskrivelse-object.

Parameters:

json_path_or_str (str) – Either a path on local storage, or a loaded json-string

Returns:

An instance of the class StatbankUttrekksBeskrivelse, which is comparable to the old “filbeskrivelse”.

Return type:

StatbankUttrekksBeskrivelse

static read_transfer_json(json_path_or_str)

Checks if provided string exists on disk, if it does, tries to load it as json.

Otherwise expects you to provide a json-string that works for json.loads. Inserts first layer in json as attributes under a blank StatbankTransfer-object.

Parameters:

json_path_or_str (str) – Either a path on local storage, or a loaded json-string

Returns:

An instance of the class StatbankTransfer, missing the data transferred and some other bits probably.

Return type:

StatbankTransfer

set_publish_date(date)

Set the publishing date on the client.

Takes the widget from date_picker assigned to a variable, which is probably the intended use. If sending a string, use the format 2000-12-31, you can also send in a datetime. Hours, minutes and seconds are replaced with statbankens publish time: 08:00:00

Parameters:

date (datetime) – date-picker widget, or a date-string formatted as 2000-12-31

Raises:

TypeError – If the date-parameter is of type other than datetime, string, or ipywidgets.DatePicker.

Return type:

None

transfer(dfs, tableid='00000')

Transfers your data to Statbanken.

Make sure you’ve set the publish-date correctly before sending.

Parameters:
  • dfs (dict[str, pd.DataFrame]) – The data to validate in a dictionary of deltabell-names as keys and pandas-dataframes as values.

  • tableid (str) – The tableid of the “hovedtabell” in statbanken, a 5 digit string.

Returns:

An instance of the class StatbankTransfer, which details the content of a successful transfer.

Return type:

StatbankTransfer

validate(dfs, tableid='00000', raise_errors=False)

Gets an “uttrekksbeskrivelse” and validates the data against this.

All validation happens locally, so dont be afraid of any data being sent to statbanken using this method.

Parameters:
  • dfs (dict[str, pd.DataFrame) – The data to validate in a dictionary of deltabell-names as keys and pandas-dataframes as values.

  • tableid (str) – The tableid of the “hovedtabell” in statbanken, a 5 digit string. Defaults to “00000”.

  • raise_errors (bool) – True/False based on if you want the method to raise its own errors or not. Defaults to False.

Returns:

A dictionary of the errors the validation wants to raise.

Return type:

dict[str, str]

statbank.apidata module

apicodelist(id_or_url='', codelist_name='')

Get one specific or all the codelists of a published statbank-table as a dict or nested dicts.

Parameters:
  • id_or_url (str) – The id of the STATBANK-table to get the total query for, or supply the total url, if the table is “internal”.

  • codelist_name (str) – The name of the specific codelist to get.

Returns:

The codelist of the table as a dict or a nested dict.

Return type:

dict[str, str] | dict[str, dict[str, str]]

Raises:

ValueError – If the specified codelist_name is not in the returned metadata.

apidata(id_or_url='', payload=None, include_id=False)

Get the contents of a published statbank-table as a pandas Dataframe, specifying a query to limit the return.

Parameters:
  • id_or_url (str) – The id of the STATBANK-table to get the total query for, or supply the total url, if the table is “internal”.

  • payload (QueryWholeType | None) – a dict in the shape of a QueryWhole, to include with the request, can be copied from the statbank-webpage.

  • include_id (bool) – If you want to include “codes” in the dataframe, set this to True

Returns:

The table-content

Return type:

pd.DataFrame

Raises:

ValueError – If the first parameter is not recognized as a statbank ID or a direct url.

apidata_all(id_or_url='', include_id=False)

Get ALL the contents of a published statbank-table as a pandas Dataframe.

Parameters:
  • id_or_url (str) – The id of the STATBANK-table to get the total query for, or supply the total url, if the table is “internal”.

  • include_id (bool) – If you want to include “codes” in the dataframe, set this to True

Returns:

Table-content

Return type:

pd.DataFrame

apidata_query_all(id_or_url='')

Builds a query for ALL THE DATA in a table based on a request for metadata on the table.

Parameters:

id_or_url (str) – The id of the STATBANK-table to get the total query for, or supply the total url, if the table is “internal”.

Returns:

The prepared query based on all the codes in the table.

Return type:

QueryWholeType

apidata_rotate(df, ind='year', val='value')

Rotate the dataframe so that years are used as the index.

Parameters:
  • df (pd.DataFrame) – dataframe (from <get_from_ssb> function)

  • ind (str) – string of column name denoting time

  • val (str) – string of column name denoting values

Returns:

pivoted dataframe

Return type:

pd.DataFrame

apimetadata(id_or_url='')

Get the metadata of a published statbank-table as a dict.

Parameters:

id_or_url (str) – The id of the STATBANK-table to get the total query for, or supply the total url, if the table is “internal”.

Returns:

The metadata of the table as the json returned from the API-get-request.

Return type:

dict[str, Any]

Raises:

ValueError – If the first parameter is not recognized as a statbank ID or a direct url.

statbank.transfer module

class StatbankTransfer(data, tableid='', shortuser='', date=None, cc='', bcc='', overwrite=True, approve=Approve.JIT, validation=True, delay=False, headers=None)

Bases: StatbankAuth

Class for talking with the “transfer-API”, which actually recieves the data from the user and sends it to Statbank.

Parameters:
  • data (dict[str, pd.DataFrame])

  • tableid (str)

  • shortuser (str)

  • date (dt | str | None)

  • cc (str)

  • bcc (str)

  • overwrite (bool)

  • approve (int | str | Approve)

  • validation (bool)

  • delay (bool)

  • headers (dict[str, str] | None)

data

name of “deltabell.dat” as keys. Number of DataFrames needs to match the number of “deltabeller” in the uttakksbeskrivelse. Dict-shape can be retrieved and validated before transfer with the Uttakksbeskrivelses-class.

Type:

dict[str, pd.DataFrame]

tableid

The numeric id of the table, matching the one found on the website. Should be a 5-length numeric-string. Alternatively it should be possible to send in the “hovedtabellnavn” instead of the tableid.

Type:

str

shortuser

The abbrivation of username at ssb. Three letters, like “cfc”

Type:

str

date

Date for publishing the transfer. Shape should be “yyyy-mm-dd”, like “2022-01-01”. Statbanken only allows publishing four months into the future?

Type:

str

cc

First person to be notified by email of transfer. Defaults to the same as “shortuser”

Type:

str

bcc

Second person to be notified by email of transfer. Defaults to the same as “cc”

Type:

str

overwrite
  • False = no overwrite

  • True = overwrite

Type:

bool

approve
  • 0 = MANUAL approval

  • 1 = AUTOMATIC approval at transfer-time (immediately)

  • 2 = JIT (Just In Time), approval right before publishing time

Type:

Approve | str | int

validation
  • True, if you want the python-validation code to run user-side.

  • False, if its slow and unnecessary.

Type:

bool

boundary

String that defines the splitting of the body in the transfer-post-request. Kept here for uniform choice through the class.

Type:

str

urls

Urls for transfer, observing the result etc., built from environment variables.

Type:

dict[str, str]

headers

Might be deleted without warning. Temporarily holds the Authentication for the request.

Type:

dict[str, str]

params

This dict will be built into the post request. Keep it in this nice shape for later introspection.

Type:

dict[str, str]

body

The data parsed into the body-shape the Statbank-API expects in the transfer-post-request.

Type:

str

response

The resulting response from the transfer-request. Headers might be deleted without warning.

Type:

requests.Response

property delay: bool

Obfuscate the delay a bit from the user. We dont want transfers transferring again without recreating the object.

to_json(path='')

Store a copy of the current state of the transfer-object as a json.

If path is provided, tries to write to it, otherwise will return a json-string for you to handle like you wish.

Parameters:

path (str) – if provided, will try to write a json to a local path.

Returns:

If path is provided, tries to write a json there and returns nothing. str: If path is not provided, returns the json-string for you to handle as you wish.

Return type:

None

transfer(headers=None)

Transfers your data to Statbanken.

Make sure you’ve set the publish-date correctly before sending. Will only work if the transfer has not already been sent, meaning it was “delayed”.

Parameters:

headers (dict[str, str] | None) – Mostly for internal use by the package. Needs to be a finished compiled headers for a request including Authorization.

Raises:

ValueError – If the transfer is already transferred.

Return type:

None

statbank.uttrekk module

class StatbankUttrekksBeskrivelse(tableid, raise_errors=False, headers=None)

Bases: StatbankAuth, StatbankUttrekkValidators

Class for talking with the “uttrekksbeskrivelses-API”, which describes metadata about shape of data to be transferred.

And metadata about the table itself in Statbankens system, like ID, name of codelists etc.

Parameters:
  • tableid (str)

  • raise_errors (bool)

  • headers (dict[str, str] | None)

url

Main url for transfer

Type:

str

time_retrieved

Time of getting the Uttrekksbeskrivelse

Type:

str

tableid

Originally the ID of the main table, which to get the Uttrekksbeskrivelse on, but is reset based on the info in the Uttrekksbeskrivelse. To compansate for the possibility of the user sending in “tablename”-name as tableid.

Type:

str

tablename

The name of the main table in Statbanken, not numbers, like the ID is.

Type:

str

subtables

Names and descriptions of the individual “table-parts” that needs to be sent in as different DataFrames.

Type:

dict

variables

Metadata about the columns in the different table-parts.

Type:

dict

codelists

Metadata about column-contents, like formatting on time, or possible values (“codes”).

Type:

dict

suppression

Details around extra columns which describe main column’s “prikking”, meaning their suppression-type.

Type:

dict

headers

The headers for the request, might be sent in from a StatbankTransfer-object.

Type:

dict

filbeskrivelse

The “raw” json returned from the API-get-request, loaded into a dict.

Type:

dict

get_totalcodes_dict()

Makes a dict from each codelist where a code for “totals” is included.

Keys being the name of the codelist, values being the code to put into categorical columns, that describes totals. This dict can be passed into the parameters “fillna_dict” and “grand_total” in the function “agg_all_combos” in the package ssb-fagfunksjoner.

Returns:

A dictionary with the codelist-names as keys, the total-codes as values.

Return type:

dict[str, str]

round_data(data, round_up=True)

Converts all decimal numbers to strings, with the correct number of decimals.

IMPORTANT: Rounds “real halves” (0.5) UP, instead of “to even numbers” like Python does by default. This is maybe the behaviour staticians are used to from Excel, SAS etc.

Parameters:
  • data (dict[str, pd.DataFrame]) – The data to validate in a dictionary of deltabell-names as keys and pandas-dataframes as values.

  • round_up (bool) – Default behaviour is rounding up like Excel or SAS. Setting this to False will instead use Python’s default “Round towards equal” / “Banker’s rounding”

Returns:

A dictionary in the same shape as sent in, but with dataframes altered to correct for rounding.

Return type:

dict[str, pd.DataFrame]

to_json(path='')

Store a copy of the current state of the uttrekk-object as a json.

If path is provided, tries to write to it, otherwise will return a json-string for you to handle like you wish.

Parameters:

path (str) – if provided, will try to write a json to a local path

Returns:

If path is provided, tries to write a json to a file and returns nothing.

If path is not provided, returns the json-string for you to handle as you wish.

Return type:

None | str

transferdata_template(dfs=None)

Get the shape the data should have to name the “deltabeller”.

If we didnt use a dictionary we would have to rely on the order of a list of “deltabeller”. Instead we chose to explicitly name the deltabller in this package, and make you check this after creation.

Parameters:
  • dfs (if provided, will try to use pandas dataframes sent in to populate the dict returned by the method.) – Send in one dataframe, several, a list of dataframes or similar. ORDER IS IMPORTANT make sure the result is what you expect.

  • Returns

  • -------

  • keys (A dictionary with correct)

  • passed. (but placeholders for where the dataframes should go if no Dataframes are)

  • in (A dict of dataframes as values if a list of Dataframes are sent)

  • parameters. (or dataframes as individual)

Return type:

dict[str, str] | dict[str, DataFrame]

validate(data, raise_errors=False)

Uses the contents of itself to validate the data against.

All validation happens locally, so dont be afraid of any data being sent to statbanken using this method.

Parameters:
  • data (dict[str, pd.DataFrame]) – The data to validate in a dictionary of deltabell-names as keys and pandas-dataframes as values.

  • raise_errors (bool) – True/False based on if you want the method to raise its own errors or not.

Returns:

A dictionary of the errors the validation wants to raise.

Return type:

dict[str, ValueError]

Raises:

StatbankValidateError – if raise_errors is set to True and there are validation errors.