GaussSuppressionFromData
, or one of its wrappers, is run and decimal numbers are added to output by
executing SuppressDec
.
Usage
GaussSuppressDec(
data,
...,
fun = GaussSuppressionFromData,
output = NULL,
use_freqVar = NA,
digits = 9,
nRep = NULL,
rmse = pi/3,
sparseLimit = 500,
rndSeed = 123,
runIpf = FALSE,
eps = 0.01,
iter = 100,
mismatchWarning = TRUE,
whenDuplicatedInner = NULL,
whenMixedDuplicatedInner = warning
)
Arguments
- data
Input daata as a data frame
- ...
Further parameters to
GaussSuppressionFromData
- fun
A function:
GaussSuppressionFromData
or one of its wrappers such asSuppressSmallCounts
andSuppressDominantCells
.- output
NULL (default),
"publish"
,"inner"
,"publish_inner"
, or"publish_inner_x"
(x also).- use_freqVar
Logical (
TRUE
/FALSE
) with a default value ofNA
. Determines whether the variablefreqVar
is used as the basis for generating decimal numbers. IfNA
, the parameter is set toTRUE
, except in the following cases, where it is set toFALSE
:If
freqVar
is not available.If
runIpf
isFALSE
andfun
is one of the functionsSuppressFewContributors
orSuppressDominantCells
.
When
use_freqVar
isFALSE
, only zeros are used instead. This approach is more robust in practice, as decimal numbers can then be stored more accurately. The default value is chosen to ensure compatibility with existing code and to allow for the use offreqVar
when dealing with frequency tables, which may be useful.- digits
Parameter to
RoundWhole
. Values close to whole numbers will be rounded.- nRep
NULL or an integer. When >1, several decimal numbers will be generated.
- rmse
Desired root mean square error of decimal numbers. Variability around the expected, according to the linear model, inner frequencies. The expected frequencies are calculated from the non-suppressed publishable frequencies.
- sparseLimit
Limit for the number of rows of a reduced x-matrix within the algorithm. When exceeded, a new sparse algorithm is used.
- rndSeed
If non-NULL, a random generator seed to be used locally within the function without affecting the random value stream in R.
- runIpf
When TRUE, additional frequencies are generated by iterative proportional fitting using
Mipf
.- eps
Parameter to
Mipf
.- iter
Parameter to
Mipf
.- mismatchWarning
Whether to produce the warning "
Mismatch between whole numbers and suppression
", when relevant. WhennRep>1
, all replicates must satisfy the whole number requirement for non-suppressed cells. WhenmismatchWarning
is integer (>0
), this will be used as parameterdigits
toRoundWhole
when doing mismatch checking (can be quite low whennRep>1
).- whenDuplicatedInner
Function to be called when default output and when cells marked as inner correspond to several input cells (aggregated) since they correspond to published cells.
- whenMixedDuplicatedInner
Function to be called in the case above when some inner cells correspond to published cells (aggregated) and some not (not aggregated).
Value
A data frame where inner cells and cells to be published are combined or output according to parameter output
.
Examples
a <- GaussSuppressDec(data = SSBtoolsData("example1"),
fun = SuppressSmallCounts,
dimVar = c("age", "geo"),
preAggregate = TRUE,
freqVar = "freq", maxN = 3)
#> [preAggregate 18*5->6*3]
#> [extend0 6*3->6*3]
#> GaussSuppression_anySum: ..........
a
#> age geo freq primary suppressed freqDec isPublish isInner
#> 1 Total Total 59 FALSE FALSE 59.000000 TRUE FALSE
#> 2 Total Iceland 13 FALSE FALSE 13.000000 TRUE FALSE
#> 3 Total Portugal 12 FALSE FALSE 12.000000 TRUE FALSE
#> 4 Total Spain 34 FALSE FALSE 34.000000 TRUE FALSE
#> 5 old Total 38 FALSE FALSE 38.000000 TRUE FALSE
#> 6 old Iceland 10 FALSE TRUE 10.226401 TRUE TRUE
#> 7 old Portugal 11 FALSE TRUE 10.773599 TRUE TRUE
#> 8 old Spain 17 FALSE FALSE 17.000000 TRUE TRUE
#> 9 young Total 21 FALSE FALSE 21.000000 TRUE FALSE
#> 10 young Iceland 3 TRUE TRUE 2.773599 TRUE TRUE
#> 11 young Portugal 1 TRUE TRUE 1.226401 TRUE TRUE
#> 12 young Spain 17 FALSE FALSE 17.000000 TRUE TRUE
b <- GaussSuppressDec(data = SSBtoolsData("magnitude1"),
fun = SuppressDominantCells,
numVar = "value",
formula = ~sector2 * geo + sector4 * eu,
contributorVar = "company", k = c(80, 99))
#> [preAggregate 20*6->20*7]
#> [extraAggregate 20*7->10*7] Checking .....
#> GaussSuppression_numttHTT: .........:::::
b
#> geo sector4 freq value primary suppressed freqDec isPublish
#> 1 Total Total 20 462.3 FALSE FALSE 0.0000000 TRUE
#> 2 Total private 16 429.5 FALSE FALSE 0.0000000 TRUE
#> 3 Total public 4 32.8 FALSE FALSE 0.0000000 TRUE
#> 4 Iceland Total 4 37.1 FALSE FALSE 0.0000000 TRUE
#> 5 Portugal Total 8 162.5 FALSE FALSE 0.0000000 TRUE
#> 6 Spain Total 8 262.7 FALSE FALSE 0.0000000 TRUE
#> 7 Total Agriculture 4 240.2 TRUE TRUE -0.2855034 TRUE
#> 8 Total Entertainment 6 131.5 FALSE FALSE 0.0000000 TRUE
#> 9 Total Governmental 4 32.8 FALSE FALSE 0.0000000 TRUE
#> 10 Total Industry 6 57.8 FALSE TRUE 0.2855034 TRUE
#> 11 EU Total 16 425.2 FALSE FALSE 0.0000000 TRUE
#> 12 nonEU Total 4 37.1 FALSE FALSE 0.0000000 TRUE
#> 13 Iceland private 4 37.1 FALSE FALSE 0.0000000 TRUE
#> 14 Portugal private 6 138.9 FALSE TRUE 1.0581688 TRUE
#> 15 Spain private 6 253.5 FALSE TRUE -1.0581688 TRUE
#> 16 Portugal public 2 23.6 TRUE TRUE -1.0581688 TRUE
#> 17 Spain public 2 9.2 TRUE TRUE 1.0581688 TRUE
#> 18 EU Agriculture 4 240.2 TRUE TRUE -0.2855034 TRUE
#> 19 EU Entertainment 5 114.7 FALSE TRUE 0.5859539 TRUE
#> 20 nonEU Entertainment 1 16.8 TRUE TRUE -0.5859539 TRUE
#> 21 EU Governmental 4 32.8 FALSE FALSE 0.0000000 TRUE
#> 22 EU Industry 3 37.5 FALSE TRUE -0.3004505 TRUE
#> 23 nonEU Industry 3 20.3 FALSE TRUE 0.5859539 TRUE
#> 24 Portugal Agriculture 2 100.4 NA NA -0.5054764 FALSE
#> 25 Spain Agriculture 2 139.8 NA NA 0.2199730 FALSE
#> 26 Portugal Entertainment 2 9.4 NA NA 0.9375680 FALSE
#> 27 Spain Entertainment 3 105.3 NA NA -0.3516141 FALSE
#> 28 Portugal Governmental 2 23.6 NA NA -1.0581688 FALSE
#> 29 Spain Governmental 2 9.2 NA NA 1.0581688 FALSE
#> 30 Portugal Industry 2 29.1 NA NA 0.6260773 FALSE
#> 31 Spain Industry 1 8.4 NA NA -0.9265277 FALSE
#> 32 Iceland Entertainment 1 16.8 NA NA -0.5859539 FALSE
#> 33 Iceland Industry 3 20.3 NA NA 0.5859539 FALSE
#> isInner sector2 eu nUnique company
#> 1 FALSE <NA> <NA> NA <NA>
#> 2 FALSE <NA> <NA> NA <NA>
#> 3 FALSE <NA> <NA> NA <NA>
#> 4 FALSE <NA> <NA> NA <NA>
#> 5 FALSE <NA> <NA> NA <NA>
#> 6 FALSE <NA> <NA> NA <NA>
#> 7 FALSE <NA> <NA> NA <NA>
#> 8 FALSE <NA> <NA> NA <NA>
#> 9 FALSE <NA> <NA> NA <NA>
#> 10 FALSE <NA> <NA> NA <NA>
#> 11 FALSE <NA> <NA> NA <NA>
#> 12 FALSE <NA> <NA> NA <NA>
#> 13 FALSE <NA> <NA> NA <NA>
#> 14 FALSE <NA> <NA> NA <NA>
#> 15 FALSE <NA> <NA> NA <NA>
#> 16 FALSE <NA> <NA> NA <NA>
#> 17 FALSE <NA> <NA> NA <NA>
#> 18 FALSE <NA> <NA> NA <NA>
#> 19 FALSE <NA> <NA> NA <NA>
#> 20 FALSE <NA> <NA> NA <NA>
#> 21 FALSE <NA> <NA> NA <NA>
#> 22 FALSE <NA> <NA> NA <NA>
#> 23 FALSE <NA> <NA> NA <NA>
#> 24 TRUE private EU 2 <NA>
#> 25 TRUE private EU 2 <NA>
#> 26 TRUE private EU 2 <NA>
#> 27 TRUE private EU 3 <NA>
#> 28 TRUE public EU 2 <NA>
#> 29 TRUE public EU 2 <NA>
#> 30 TRUE private EU 2 <NA>
#> 31 TRUE private EU 1 C
#> 32 TRUE private nonEU 1 B
#> 33 TRUE private nonEU 3 <NA>
# FormulaSelection() works on this output as well
FormulaSelection(b, ~sector2 * geo)
#> geo sector4 freq value primary suppressed freqDec isPublish isInner
#> 1 Total Total 20 462.3 FALSE FALSE 0.000000 TRUE FALSE
#> 2 Total private 16 429.5 FALSE FALSE 0.000000 TRUE FALSE
#> 3 Total public 4 32.8 FALSE FALSE 0.000000 TRUE FALSE
#> 4 Iceland Total 4 37.1 FALSE FALSE 0.000000 TRUE FALSE
#> 5 Portugal Total 8 162.5 FALSE FALSE 0.000000 TRUE FALSE
#> 6 Spain Total 8 262.7 FALSE FALSE 0.000000 TRUE FALSE
#> 13 Iceland private 4 37.1 FALSE FALSE 0.000000 TRUE FALSE
#> 14 Portugal private 6 138.9 FALSE TRUE 1.058169 TRUE FALSE
#> 15 Spain private 6 253.5 FALSE TRUE -1.058169 TRUE FALSE
#> 16 Portugal public 2 23.6 TRUE TRUE -1.058169 TRUE FALSE
#> 17 Spain public 2 9.2 TRUE TRUE 1.058169 TRUE FALSE
#> sector2 eu nUnique company
#> 1 <NA> <NA> NA <NA>
#> 2 <NA> <NA> NA <NA>
#> 3 <NA> <NA> NA <NA>
#> 4 <NA> <NA> NA <NA>
#> 5 <NA> <NA> NA <NA>
#> 6 <NA> <NA> NA <NA>
#> 13 <NA> <NA> NA <NA>
#> 14 <NA> <NA> NA <NA>
#> 15 <NA> <NA> NA <NA>
#> 16 <NA> <NA> NA <NA>
#> 17 <NA> <NA> NA <NA>