Reference

ssb_datafangst_person_fagfunksjoner package

ssb_datafangst_person_fagfunksjoner.functions module

example_function(number1, number2)

Compare two integers. This is merely an example function can be deleted. It is used to show and test generating documentation from code, type hinting, testing, and testing examples in the code. :type number1: int :param number1: The first number. :type number2: int :param number2: The second number, which will be compared to number1.

Return type:

str

Returns:

A string describing which number is the greatest.

Parameters:
  • number1 (int)

  • number2 (int)

Examples

Examples should be written in doctest format, and should illustrate how to use the function. >>> example_function(1, 2) 1 is less than 2

file_concat_pd(InstrumentId, start_dato=None, slutt_dato=None)

Retrieves dialhistory data for a specified instrument for a given period.

Parameters:
  • instrument_id (str) – The ID of the instrument to retrieve data for.

  • start_dato (datetime.date) – The start date of the range for filtering data. Example: datetime.date(2024, 10, 29)

  • slutt_dato (datetime.date) – The end date of the range for filtering data. Example: datetime.date(2024, 10, 29)

  • InstrumentId (str)

Returns:

A Pandas DataFrame containing the dialhistory data information for the specified instrument.

Return type:

pd.DataFrame

file_concat_pl(InstrumentId, start_dato=None, slutt_dato=None)

Retrieves dialhistory data for a specified instrument for a given period.

Parameters:
  • instrument_id (str) – The ID of the instrument to retrieve data for.

  • start_dato (datetime.date) – The start date of the range for filtering data. Example: datetime.date(2024, 10, 29)

  • slutt_dato (datetime.date) – The end date of the range for filtering data. Example: datetime.date(2024, 10, 29)

  • InstrumentId (str)

Returns:

A Polars DataFrame containing the dialhistory data for the specified instrument.

Return type:

pl.DataFrame

fill_all_para_pl(table_df)

param: polars dataframe output: prepared polars dataframe for analysis The function prepares table_df for data analysis: - fills PageIndex downward - fills FieldName downward - creates a new variable with VariableName - creates variable diff_time which represents time spent on each observation/action for an IO. - Fills LayoutSetName by session id - creates a new column with bolk name using the function make_bolk()

Return type:

DataFrame

Parameters:

table_df (DataFrame)

fill_para_pd(table_df)

param: pandas dataframe output: prepared pandas dataframe for analysis The function prepares table_df for data analysis: - fills PageIndex downward - fills FieldName downward - creates a new variable with VariableName - creates variable diff_time which represents time spent on each observation/action for an IO. - Fills LayoutSetName by session id - creates a new column with bolk name using the function make_bolk()

Return type:

DataFrame

Parameters:

table_df (DataFrame)

fill_para_pl(table_df)

param: pandas dataframe output: prepared pandas dataframe for analysis The function prepares table_df for data analysis: - fills PageIndex downward - fills FieldName downward - creates a new variable with VariableName - creates variable diff_time which represents time spent on each observation/action for an IO. - Fills LayoutSetName by session id - creates a new column with bolk name using the function make_bolk()

Return type:

DataFrame

Parameters:

table_df (DataFrame)

get_union_schema(files)

Creates a union of all schemas for the given list of Parquet files.

Parameters:

files (list[str]) – A list of file paths for the Parquet files.

Returns:

A PyArrow schema representing the union of all schemas.

Return type:

Schema

get_union_schema_para(files)

Creates a union of all schemas for the given list of Parquet files.

Parameters:

files (list[str]) – A list of file paths for the Parquet files.

Returns:

A PyArrow schema representing the union of all schemas.

Return type:

Schema

hent_status_pd(instrument_id, start_dato=None, slutt_dato=None)

Retrieves status data from GCS for a specified instrument within a given date range.

Parameters:
  • instrument_id (str) – The ID of the instrument to retrieve data for.

  • start_dato (datetime.date. Example: datetime.date(2024, 10, 29)) – The start date of the range for filtering data.

  • slutt_dato (datetime.date. Example: datetime.date(2024, 10, 29)) – The end date of the range for filtering data.

Returns:

A Pandas DataFrame containing the status information for the specified instrument within the defined date range

Return type:

pd.DataFrame

hent_status_pl(instrument_id, start_dato=None, slutt_dato=None)

Retrieves status data from GCS for a specified instrument within a given date range.

Parameters:
  • instrument_id (str) – The ID of the instrument to retrieve data for.

  • start_dato (datetime.date) – The start date of the range for filtering data. Example: datetime.date(2024, 10, 29)

  • slutt_dato (datetime.date) – The end date of the range for filtering data. Example: datetime.date(2024, 10, 29)

Returns:

A Polars DataFrame containing the status information for the specified instrument within the defined date range.

Return type:

pl.DataFrame

hent_utvalg_pd(instrument_id)

Retrieves utvalg data from GCS for a specified instrument.

Parameters:

instrument_id (str) – The ID of the instrument to retrieve data for.

Returns:

A Pandas DataFrame containing the utvalg information for the specified instrument.

Return type:

pd.DataFrame

hent_utvalg_pl(instrument_id)

Retrieves utvalg data from GCS for a specified instrument.

Parameters:

instrument_id (str) – The ID of the instrument to retrieve data for.

Returns:

A Polars DataFrame containing the utvalg information for the specified instrument.

Return type:

pl.DataFrame

make_bolk(row)

En funksjon som kan brukes med map eller apply som tar en string, FieldName, og returnerer bolk navn. Denne funksjonen tar med om IO har bolk inni bolk som repeteres. Eksempel: skjema.bolk2[1].field og skjema.bolk2[2].field blir til bolk1.bolk2. Eksempel: skjema.bolk2.bolk3.bolk4 blir bolk2.bolk3

Return type:

str

Parameters:

row (str)

para_concat_pd(InstrumentId, start_dato=None, slutt_dato=None)

Retrieves paradata for a specified instrument for a given period.

Parameters:
  • instrument_id (str) – The ID of the instrument to retrieve data for.

  • start_dato (datetime.date) – The start date of the range for filtering data. Example: datetime.date(2024, 10, 29)

  • slutt_dato (datetime.date) – The end date of the range for filtering data. Example: datetime.date(2024, 10, 29)

  • InstrumentId (str)

Returns:

A Pandas DataFrame containing the dialhistory data for the specified instrument.

Return type:

pd.DataFrame

para_concat_pl(InstrumentId, dager=None, start_dato=None, slutt_dato=None)

Retrieves paradata for a specified instrument for a given period.

Parameters:
  • instrument_id (str) – The ID of the instrument to retrieve data for.

  • dager (int) – Number of days (dager) back in time

  • start_dato (datetime.date) – The start date of the range for filtering data. Example: datetime.date(2024, 10, 29)

  • slutt_dato (datetime.date) – The end date of the range for filtering data. Example: datetime.date(2024, 10, 29)

  • InstrumentId (str)

Returns:

A Polars DataFrame containing the dialhistory data for the specified instrument.

Return type:

pl.DataFrame

question_sorting(x)

Retrieves utvalg data from GCS for a specified instrument.

Parameters:
  • instrument_id (str) – The ID of the instrument to retrieve data for.

  • x (DataFrame)

Returns:

A Polars DataFrame containing the utvalg information for the specified instrument.

Return type:

pl.DataFrame