Reference¶
ssb_datafangst_person_fagfunksjoner package¶
ssb_datafangst_person_fagfunksjoner.functions module¶
- example_function(number1, number2)¶
Compare two integers. This is merely an example function can be deleted. It is used to show and test generating documentation from code, type hinting, testing, and testing examples in the code. :type number1:
int
:param number1: The first number. :type number2:int
:param number2: The second number, which will be compared to number1.- Return type:
str
- Returns:
A string describing which number is the greatest.
- Parameters:
number1 (int)
number2 (int)
Examples
Examples should be written in doctest format, and should illustrate how to use the function. >>> example_function(1, 2) 1 is less than 2
- file_concat_pd(InstrumentId, start_dato=None, slutt_dato=None)¶
Retrieves dialhistory data for a specified instrument for a given period.
- Parameters:
instrument_id (str) – The ID of the instrument to retrieve data for.
start_dato (datetime.date) – The start date of the range for filtering data. Example: datetime.date(2024, 10, 29)
slutt_dato (datetime.date) – The end date of the range for filtering data. Example: datetime.date(2024, 10, 29)
InstrumentId (str)
- Returns:
A Pandas DataFrame containing the dialhistory data information for the specified instrument.
- Return type:
pd.DataFrame
- file_concat_pl(InstrumentId, start_dato=None, slutt_dato=None)¶
Retrieves dialhistory data for a specified instrument for a given period.
- Parameters:
instrument_id (str) – The ID of the instrument to retrieve data for.
start_dato (datetime.date) – The start date of the range for filtering data. Example: datetime.date(2024, 10, 29)
slutt_dato (datetime.date) – The end date of the range for filtering data. Example: datetime.date(2024, 10, 29)
InstrumentId (str)
- Returns:
A Polars DataFrame containing the dialhistory data for the specified instrument.
- Return type:
pl.DataFrame
- fill_all_para_pl(table_df)¶
param: polars dataframe output: prepared polars dataframe for analysis The function prepares table_df for data analysis: - fills PageIndex downward - fills FieldName downward - creates a new variable with VariableName - creates variable diff_time which represents time spent on each observation/action for an IO. - Fills LayoutSetName by session id - creates a new column with bolk name using the function make_bolk()
- Return type:
DataFrame
- Parameters:
table_df (DataFrame)
- fill_para_pd(table_df)¶
param: pandas dataframe output: prepared pandas dataframe for analysis The function prepares table_df for data analysis: - fills PageIndex downward - fills FieldName downward - creates a new variable with VariableName - creates variable diff_time which represents time spent on each observation/action for an IO. - Fills LayoutSetName by session id - creates a new column with bolk name using the function make_bolk()
- Return type:
DataFrame
- Parameters:
table_df (DataFrame)
- fill_para_pl(table_df)¶
param: pandas dataframe output: prepared pandas dataframe for analysis The function prepares table_df for data analysis: - fills PageIndex downward - fills FieldName downward - creates a new variable with VariableName - creates variable diff_time which represents time spent on each observation/action for an IO. - Fills LayoutSetName by session id - creates a new column with bolk name using the function make_bolk()
- Return type:
DataFrame
- Parameters:
table_df (DataFrame)
- get_union_schema(files)¶
Creates a union of all schemas for the given list of Parquet files.
- Parameters:
files (list[str]) – A list of file paths for the Parquet files.
- Returns:
A PyArrow schema representing the union of all schemas.
- Return type:
Schema
- get_union_schema_para(files)¶
Creates a union of all schemas for the given list of Parquet files.
- Parameters:
files (list[str]) – A list of file paths for the Parquet files.
- Returns:
A PyArrow schema representing the union of all schemas.
- Return type:
Schema
- hent_status_pd(instrument_id, start_dato=None, slutt_dato=None)¶
Retrieves status data from GCS for a specified instrument within a given date range.
- Parameters:
instrument_id (str) – The ID of the instrument to retrieve data for.
start_dato (datetime.date. Example: datetime.date(2024, 10, 29)) – The start date of the range for filtering data.
slutt_dato (datetime.date. Example: datetime.date(2024, 10, 29)) – The end date of the range for filtering data.
- Returns:
A Pandas DataFrame containing the status information for the specified instrument within the defined date range
- Return type:
pd.DataFrame
- hent_status_pl(instrument_id, start_dato=None, slutt_dato=None)¶
Retrieves status data from GCS for a specified instrument within a given date range.
- Parameters:
instrument_id (str) – The ID of the instrument to retrieve data for.
start_dato (datetime.date) – The start date of the range for filtering data. Example: datetime.date(2024, 10, 29)
slutt_dato (datetime.date) – The end date of the range for filtering data. Example: datetime.date(2024, 10, 29)
- Returns:
A Polars DataFrame containing the status information for the specified instrument within the defined date range.
- Return type:
pl.DataFrame
- hent_utvalg_pd(instrument_id)¶
Retrieves utvalg data from GCS for a specified instrument.
- Parameters:
instrument_id (str) – The ID of the instrument to retrieve data for.
- Returns:
A Pandas DataFrame containing the utvalg information for the specified instrument.
- Return type:
pd.DataFrame
- hent_utvalg_pl(instrument_id)¶
Retrieves utvalg data from GCS for a specified instrument.
- Parameters:
instrument_id (str) – The ID of the instrument to retrieve data for.
- Returns:
A Polars DataFrame containing the utvalg information for the specified instrument.
- Return type:
pl.DataFrame
- make_bolk(row)¶
En funksjon som kan brukes med map eller apply som tar en string, FieldName, og returnerer bolk navn. Denne funksjonen tar med om IO har bolk inni bolk som repeteres. Eksempel: skjema.bolk2[1].field og skjema.bolk2[2].field blir til bolk1.bolk2. Eksempel: skjema.bolk2.bolk3.bolk4 blir bolk2.bolk3
- Return type:
str
- Parameters:
row (str)
- para_concat_pd(InstrumentId, start_dato=None, slutt_dato=None)¶
Retrieves paradata for a specified instrument for a given period.
- Parameters:
instrument_id (str) – The ID of the instrument to retrieve data for.
start_dato (datetime.date) – The start date of the range for filtering data. Example: datetime.date(2024, 10, 29)
slutt_dato (datetime.date) – The end date of the range for filtering data. Example: datetime.date(2024, 10, 29)
InstrumentId (str)
- Returns:
A Pandas DataFrame containing the dialhistory data for the specified instrument.
- Return type:
pd.DataFrame
- para_concat_pl(InstrumentId, dager=None, start_dato=None, slutt_dato=None)¶
Retrieves paradata for a specified instrument for a given period.
- Parameters:
instrument_id (str) – The ID of the instrument to retrieve data for.
dager (int) – Number of days (dager) back in time
start_dato (datetime.date) – The start date of the range for filtering data. Example: datetime.date(2024, 10, 29)
slutt_dato (datetime.date) – The end date of the range for filtering data. Example: datetime.date(2024, 10, 29)
InstrumentId (str)
- Returns:
A Polars DataFrame containing the dialhistory data for the specified instrument.
- Return type:
pl.DataFrame
- question_sorting(x)¶
Retrieves utvalg data from GCS for a specified instrument.
- Parameters:
instrument_id (str) – The ID of the instrument to retrieve data for.
x (DataFrame)
- Returns:
A Polars DataFrame containing the utvalg information for the specified instrument.
- Return type:
pl.DataFrame