Reference¶
statstruk package¶
statstruk.ratemodel module¶
- class ratemodel(pop_data, sample_data, id_nr, verbose=1)¶
Bases:
ssbmodel
Class for estimating statistics for business surveys using a rate model.
- Parameters:
pop_data (DataFrame)
sample_data (DataFrame)
id_nr (str)
verbose (int)
- fit(y_var, x_var, strata_var='', control_extremes=True, exclude=None, exclude_auto=0, remove_missing=True, rbound=2, gbound=2, count=0)¶
Run and fit a rate model within strata.
- Parameters:
y_var (
str
) – The target variable to estimate from the survey.x_var (
str
) – The variable to use as the explanatory variable in the model.strata_var (
str
|list
[str
]) – The stratification variable.control_extremes (
bool
) – Whether the model should be fitted in a way that allows for extremes value controls.exclude (
list
[str
|int
] |None
) – List of ID numbers for observations to exclude.exclude_auto (
int
) – Whether extreme values should be automatically excluded from the models. Default 0. Integer 1 indicates extreme values should be removed once and model run again.remove_missing (
bool
) – Whether to automatically remove units in the sample that are missing x or y values.rbound (
float
) – Multiplicative value to determine the extremity of the studentized residual values.gbound (
float
) – Multiplicative value to determine the extremity of the G values.count (
int
) – Integer value for the round count if using automatic exclusion of outliers.
- Return type:
None
- property get_coeffs: DataFrame¶
Get the model coefficients for each strata.
- get_estimates(domain='', uncertainty_type='CV', variance_type='robust', return_type='unbiased')¶
Get estimates for previously run model within strata or domains. Variance and CV estimates are returned for each domain.
- Parameters:
domain (
str
) – Name of the variable to use for estimation. Should be in the population data.uncertainty_type (
str
) – Which uncertainty measures to return. Choose between ‘CV’ (default) for coefficient of variation, ‘VAR’ for variance, ‘SE’ for standard errors, ‘CI’ for confidence intervals. Multiple measures can be returned with combinations of these, for example “CV_SE” returns both the coefficient of variation and the standard error.variance_type (
str
) – Choose from ‘robust’ or ‘standard’ estimation of variance. Currently only robust estimation is calculated for strata and aggregated strata domains estimation and standard for other domains.return_type (
str
) – String for which robust estimates to return. Choose ‘unbiased’ to return only the unbiased robust variance estimate or ‘all’ to return all three.
- Return type:
DataFrame
- Returns:
A pd.Dataframe is returned conatining estimates and variance/coefficient of variation estimations for each domain.
- get_extremes(threshold_type='both', rbound=2, gbound=2)¶
Get observations with extreme values based on their rstudized residual value or G value.
- Parameters:
threshold_type (
str
) – Which threshold type to use. Choose between ‘rstud’ for studentized residuals, ‘G’ for dffits/G-value or ‘both’(default) for both.rbound (
float
) – Multiplicative value to determine the extremity of the studentized residual values. (Default = 2)gbound (
float
) – Multiplicative value to determine the extremity of the G values. (Default = 2)
- Return type:
DataFrame
- Returns:
A pd.DataFrame containing units with extreme values beyond a set boundary.
- get_imputed()¶
Get population data with imputed values based on model.
- Return type:
DataFrame
- Returns:
Pandas data frame with all in population and imputed values
- property get_obs: dict[str, Any]¶
Get the details for observations from the model.
- get_weights()¶
Get sample data with weights based on model.
- Return type:
DataFrame
- Returns:
Pandas data frame with sample data and weights.