Reference

statstruk package

statstruk.basemodel module

class BaseModel(pop_data, sample_data, id_nr, verbose=1, logger_level='warning')

Bases: object

Class for model estimation.

Parameters:
  • pop_data (DataFrame)

  • sample_data (DataFrame)

  • id_nr (str)

  • verbose (int)

  • logger_level (str)

change_logging_level(logger_level)

Change the logging print level.

Parameters:

logger_level (str) – Detail level for information output. Choose between ‘debug’,’info’,’warning’,’error’ and ‘critical’.

Return type:

None

change_verbose(verbose)

Change the verbose print level.

Return type:

None

Parameters:

verbose (int)

property get_logging_level: str

Retrieves the name of the current logging level set in the logger.

Returns:

The name of the current logging level, or a message if none is set.

Return type:

str

property logging_dict: dict[str, int]

Returns a dictionary mapping standard logging level names to their corresponding numeric values.

Returns:

A dictionary of logging level names and their numeric values.

Return type:

dict[str, int]

statstruk.homogenmodel module

class HomogenModel(pop_data, sample_data, id_nr, verbose=1, logger_level='warning')

Bases: StratifiedModel

Class for estimating statistics for business surveys using a homogeneous model.

Parameters:
  • pop_data (DataFrame)

  • sample_data (DataFrame)

  • id_nr (str)

  • verbose (int)

  • logger_level (str)

fit(y_var, strata_var='', exclude=None, remove_missing=True)

Run and fit a homogeneous model within strata.

Parameters:
  • y_var (str) – The target variable to estimate from the survey.

  • strata_var (str | list[str]) – The stratification variable.

  • exclude (list[str | int] | None) – List of ID numbers for observations to exclude.

  • remove_missing (bool) – Whether to automatically remove units in the sample that are missing x or y values.

Return type:

None

statstruk.ratiomodel module

class RatioModel(pop_data, sample_data, id_nr, verbose=1, logger_level='warning')

Bases: StratifiedModel

Class for estimating statistics for business surveys using a ratio model.

Parameters:
  • pop_data (DataFrame)

  • sample_data (DataFrame)

  • id_nr (str)

  • verbose (int)

  • logger_level (str)

fit(y_var, x_var, strata_var='', control_extremes=True, exclude=None, remove_missing=True)

Run and fit a ratio model within strata.

Parameters:
  • y_var (str) – The target variable to estimate from the survey.

  • x_var (str) – The variable to use as the explanatory variable in the model.

  • strata_var (str | list[str]) – The stratification variable.

  • control_extremes (bool) – Whether the model should be fitted in a way that allows for extremes value controls.

  • exclude (list[str | int] | None) – List of ID numbers for observations to exclude.

  • remove_missing (bool) – Whether to automatically remove units in the sample that are missing x or y values.

Return type:

None

get_extremes(threshold_type='both', rbound=2, gbound=2)

Get observations with extreme values based on their rstudized residual value or G value.

Parameters:
  • threshold_type (str) – Which threshold type to use. Choose between ‘rstud’ for studentized residuals, ‘G’ for dffits/G-value or ‘both’(default) for both.

  • rbound (float) – Multiplicative value to determine the extremity of the studentized residual values. (Default = 2)

  • gbound (float) – Multiplicative value to determine the extremity of the G values. (Default = 2)

Return type:

DataFrame

Returns:

A pd.DataFrame containing units with extreme values beyond a set boundary.

ratemodel

alias of RatioModel

statstruk.stratifiedmodel module

class StratifiedModel(pop_data, sample_data, id_nr, verbose=1, logger_level='warning')

Bases: BaseModel

Class for estimating statistics for business surveys using a stratified model.

Parameters:
  • pop_data (DataFrame)

  • sample_data (DataFrame)

  • id_nr (str)

  • verbose (int)

  • logger_level (str)

property get_coeffs: DataFrame

Get the model coefficients for each strata.

get_estimates(domain='', uncertainty_type='CV', variance_type='robust', return_type='unbiased', output_type='table')

Get estimates for previously run model within strata or domains. Variance and CV estimates are returned for each domain.

Parameters:
  • domain (str) – Name of the variable to use for estimation. Should be in the population data.

  • uncertainty_type (str) – Which uncertainty measures to return. Choose between ‘CV’ (default) for coefficient of variation, ‘VAR’ for variance, ‘SE’ for standard errors, ‘CI’ for confidence intervals. Multiple measures can be returned with combinations of these, for example “CV_SE” returns both the coefficient of variation and the standard error.

  • variance_type (str) – Choose from ‘robust’ or ‘standard’ estimation of variance. Currently only robust estimation is calculated for strata and aggregated strata domains estimation and standard for other domains.

  • return_type (str) – String for which robust estimates to return. Choose ‘unbiased’ to return only the unbiased robust variance estimate or ‘all’ to return all three.

  • output_type (str) – String for output type to return. Default ‘table’ returns a table with estimates per strata or domain group. Alternatively choose ‘weights’ to return the sample file with weights and estimates or ‘imputed’ to return a population file with mass imputed values and estimates.

Return type:

DataFrame

Returns:

A pd.Dataframe is returned conatining estimates and variance/coefficient of variation estimations for each domain.

get_imputed()

Get population data with imputed values based on ratio or homogen model.

Return type:

DataFrame

Returns:

Pandas data frame with all in population and imputed values

property get_obs: dict[str, Any]

Get the details for observations from the model.

get_weights()

Get sample data with weights based on ratio or homogeneous model.

Return type:

DataFrame

Returns:

Pandas data frame with sample data and weights.