control_framework_base module

class ControlFrameworkBase(partitions, partitions_skjema, conn)

Bases: object

Base class for running control checks.

Designed to work on partitioned data following the recommended altinn3 data structure. Manages inserts and updates to the ‘kontrollutslag’ table via a connection interface.

To use this class you need to use this setup: class MyControls(ControlFrameworkBase):

def __init__(self, partitions: list[int | str], partitions_skjema: dict[str, int | str], conn: object) -> None:

super().__init__(partitions, partitions_skjema, conn)

def a_control_func(self):

# Your code here return dataframe

The flow of updating the control table works like this:

  1. First call ‘execute_controls’, this begins the entire process.

  2. ‘control_updates’ is run, during which the code checks existing controls, runs all controls and creates a dataframe with all results.

    ‘run_all_controls’ is run, which in turn calls ‘run_control’ for each individual control. The results from control_updates is used to check if there has been any changes since last executing controls. If there are no changes, the process stops here.

  3. Based on the results from ‘control_updates’ it generates an update query where each change in the results, where the result of a control has changed for an observation, is updated in the ‘kontrollutslag’ table.

  4. The update query is run, and the process is complete.

Parameters:
  • partitions (list[int | str])

  • partitions_skjema (dict[str, int | str])

  • conn (object)

control_new_rows()

Identifies new rows that are not already present in ‘kontrollutslag’.

Returns:

DataFrame of new rows to insert.

Return type:

pd.DataFrame

control_updates()

Identifies rows in ‘kontrollutslag’ where the control output has changed.

Returns:

DataFrame of rows that need to be updated.

Return type:

pd.DataFrame

execute_controls()

Executes control checks and updates existing rows in ‘kontrollutslag’ if needed.

Returns:

Number of rows updated.

Return type:

int

generate_update_query(df_updates)

Generates a SQL UPDATE query for updating rows in ‘kontrollutslag’.

Parameters:

df_updates (pd.DataFrame) – DataFrame with updates to apply.

Returns:

SQL query string.

Return type:

str

insert_new_rows()

Inserts any new control results that are not already in ‘kontrollutslag’.

Returns:

Number of rows inserted.

Return type:

int

Raises:

AttributeError – If ‘conn’ does not have ‘insert’ method.

run_all_controls()

Runs control methods named like ‘control_<kontrollid>’ where <id> is in self.controls.

Returns:

Combined DataFrame with all control results.

Return type:

pd.DataFrame

Raises:

TypeError – if ‘df’ variable to return is not pd.DataFrame.

run_control(control)

Runs a single control.

Parameters:

control (str) – Name of a control method to run implemented in the supplied control class built upon ControlFrameworkBase.

Returns:

Dataframe containing results from the control.

Return type:

pd.Dataframe

Raises:

TypeError – If control method does not return pd.dataframe.