Provides alternatives to global protection for linked tables through methods that may reduce the computational burden.
Usage
SuppressLinkedTables(
data = NULL,
fun,
...,
withinArg = NULL,
linkedGauss = "consistent",
recordAware = TRUE,
iterBackTracking = Inf,
whenEmptyUnsuppressed = NULL,
lpPackage = NULL
)
Arguments
- data
The
data
argument tofun
. When NULLdata
must be included inwithinArg
.- fun
A function:
GaussSuppressionFromData
or one of its wrappers such asSuppressSmallCounts
andSuppressDominantCells
.- ...
Arguments to
fun
that are kept constant.- withinArg
A list of named lists. Arguments to
fun
that are not kept constant. IfwithinArg
is named, the names will be used as names in the output list.- linkedGauss
Specifies the strategy for protecting linked tables. Possible values are:
"consistent"
(default): All linked tables are protected by a single call toGaussSuppression()
. The algorithm internally constructs a block diagonal model matrix and handles common cells consistently across tables."local"
: Each table is protected independently by a separate call toGaussSuppression()
."back-tracking"
: Iterative approach where each table is protected viaGaussSuppression()
, and primary suppressions are adjusted based on secondary suppressions from other tables across iterations."local-bdiag"
: Produces the same result as"local"
, but uses a single call toGaussSuppression()
with a block diagonal matrix. It does not apply the linked-table methodology.
- recordAware
If
TRUE
(default), the suppression procedure will ensure consistency across cells that aggregate the same underlying records, even when their variable combinations differ. WhenTRUE
,data
cannot be included inwithinArg
.- iterBackTracking
Maximum number of back-tracking iterations.
- whenEmptyUnsuppressed
Parameter to
GaussSuppression
. This is about a helpful message "Cells with empty input will never be secondary suppressed. Extend input data with zeros?" Here, the default is set toNULL
(no message), since preprocessing of the model matrix may invalidate the assumptions behind this message.- lpPackage
Currently ignored. If specified, a warning will be issued.
Details
The reason for introducing the new method "consistent"
, which has not yet been extensively tested in practice,
is to provide something that works better than "back-tracking"
, while still offering equally strong protection.
Note that for singleton methods of the elimination type (see SSBtools::NumSingleton()
), "back-tracking"
may lead to
the creation of a large number of redundant secondary cells. This is because, during the method's iterations,
all secondary cells are eventually treated as primary. As a result, protection is applied to prevent a singleton
contributor from inferring a secondary cell that was only included to protect that same contributor.
Note that the frequency singleton methods "subSpace"
, "anySum0"
, and "anySumNOTprimary"
are currently not implemented
and will result in an error.
As a result, the singletonZeros
parameter in the SuppressDominantCells()
function cannot be set to TRUE
,
and the SuppressKDisclosure()
function is not available for use.
Also note that automatic forcing of "anySumNOTprimary"
is disabled.
That is, SSBtools::GaussSuppression()
is called with auto_anySumNOTprimary = FALSE
.
See the parameter documentation for an explanation of why FALSE
is required.
The combination of intervals with the various linked table strategies is not yet implemented,
so the lpPackage
parameter is currently ignored.
Note
Note on differences between SuppressLinkedTables()
and alternative approaches.
By alternatives, we refer to using the linkedGauss
parameter via GaussSuppressionFromData()
, its wrappers, or through tables_by_formulas()
, as shown in the examples below.
Alternatives can be used when only the
formula
parameter varies between the linked tables.SuppressLinkedTables()
creates several smaller model matrices, which may be combined into a single block-diagonal matrix. A large overall matrix is never created.With the alternatives, a large overall matrix is created first. Smaller matrices are then derived from it. If the size of the full matrix is a bottleneck,
SuppressLinkedTables()
is the better choice.The
"global"
method is available with the alternatives, but not withSuppressLinkedTables()
.Due to differences in candidate ordering, the two methods may not always produce identical results. With the alternatives, candidate order is constructed globally across all cells (as with the global method). In contrast,
SuppressLinkedTables()
uses a locally determined candidate order within each table. The ordering across tables is coordinated to ensure the method works, but it is not based on a strictly defined global order. This may lead to some differences.
Examples
### The first example can be performed in three ways
### Alternatives are possible since only the formula parameter varies between the linked tables
a <- SuppressLinkedTables(data = SSBtoolsData("magnitude1"), # With trick "sector4 - sector4" and
fun = SuppressDominantCells, # "geo - geo" to ensure same names in output
withinArg = list(list(formula = ~(geo + eu) * sector2 + sector4 - sector4),
list(formula = ~eu:sector4 - 1 + geo - geo),
list(formula = ~geo + eu + sector4 - 1)),
dominanceVar = "value",
pPercent = 10,
contributorVar = "company",
linkedGauss = "consistent")
#> [preAggregate 20*13->20*14]
#> [extraAggregate 20*14->10*14] Checking .....
#> [preAggregate 20*13->20*13]
#> [extraAggregate 20*13->10*13] Checking .....
#> [preAggregate 20*13->20*13]
#> [extraAggregate 20*13->10*13] Checking .....
#>
#> ====== Linked GaussSuppression by "consistent" algorithm:
#>
#> GaussSuppression_numttHTT: ................
print(a)
#> [[1]]
#> geo sector4 freq value primary suppressed
#> 1 Total Total 20 462.3 FALSE FALSE
#> 2 Iceland Total 4 37.1 TRUE TRUE
#> 3 Portugal Total 8 162.5 TRUE TRUE
#> 4 Spain Total 8 262.7 FALSE FALSE
#> 5 EU Total 16 425.2 FALSE TRUE
#> 6 nonEU Total 4 37.1 TRUE TRUE
#> 7 Total private 16 429.5 FALSE FALSE
#> 8 Total public 4 32.8 FALSE FALSE
#> 9 Iceland private 4 37.1 TRUE TRUE
#> 10 Portugal private 6 138.9 TRUE TRUE
#> 11 Portugal public 2 23.6 TRUE TRUE
#> 12 Spain private 6 253.5 FALSE TRUE
#> 13 Spain public 2 9.2 TRUE TRUE
#> 14 EU private 12 392.4 FALSE TRUE
#> 15 EU public 4 32.8 FALSE FALSE
#> 16 nonEU private 4 37.1 TRUE TRUE
#>
#> [[2]]
#> sector4 geo freq value primary suppressed
#> 1 Agriculture EU 4 240.2 TRUE TRUE
#> 2 Entertainment EU 5 114.7 FALSE FALSE
#> 3 Governmental EU 4 32.8 FALSE FALSE
#> 4 Industry EU 3 37.5 FALSE FALSE
#> 5 Entertainment nonEU 1 16.8 TRUE TRUE
#> 6 Industry nonEU 3 20.3 FALSE FALSE
#>
#> [[3]]
#> geo sector4 freq value primary suppressed
#> 1 Iceland Total 4 37.1 TRUE TRUE
#> 2 Portugal Total 8 162.5 TRUE TRUE
#> 3 Spain Total 8 262.7 FALSE FALSE
#> 4 EU Total 16 425.2 FALSE TRUE
#> 5 nonEU Total 4 37.1 TRUE TRUE
#> 6 Total Agriculture 4 240.2 TRUE TRUE
#> 7 Total Entertainment 6 131.5 FALSE FALSE
#> 8 Total Governmental 4 32.8 FALSE FALSE
#> 9 Total Industry 6 57.8 FALSE FALSE
#>
# Alternatively, SuppressDominantCells() can be run directly using the linkedGauss parameter
a1 <- SuppressDominantCells(SSBtoolsData("magnitude1"),
formula = list(table_1 = ~(geo + eu) * sector2,
table_2 = ~eu:sector4 - 1,
table_3 = ~(geo + eu) + sector4 - 1),
dominanceVar = "value",
pPercent = 10,
contributorVar = "company",
linkedGauss = "consistent")
#> [preAggregate 20*6->20*7]
#> [extraAggregate 20*7->10*7] Checking .....
#>
#> ====== Linked GaussSuppression by "consistent" algorithm:
#>
#> GaussSuppression_numttHTT: ................
print(a1)
#> geo sector4 freq value primary suppressed
#> 1 Total Total 20 462.3 FALSE FALSE
#> 2 Iceland Total 4 37.1 TRUE TRUE
#> 3 Portugal Total 8 162.5 TRUE TRUE
#> 4 Spain Total 8 262.7 FALSE FALSE
#> 5 EU Total 16 425.2 FALSE TRUE
#> 6 nonEU Total 4 37.1 TRUE TRUE
#> 7 Total private 16 429.5 FALSE FALSE
#> 8 Total public 4 32.8 FALSE FALSE
#> 9 Total Agriculture 4 240.2 TRUE TRUE
#> 10 Total Entertainment 6 131.5 FALSE FALSE
#> 11 Total Governmental 4 32.8 FALSE FALSE
#> 12 Total Industry 6 57.8 FALSE FALSE
#> 13 Iceland private 4 37.1 TRUE TRUE
#> 14 Portugal private 6 138.9 TRUE TRUE
#> 15 Portugal public 2 23.6 TRUE TRUE
#> 16 Spain private 6 253.5 FALSE TRUE
#> 17 Spain public 2 9.2 TRUE TRUE
#> 18 EU private 12 392.4 FALSE TRUE
#> 19 EU public 4 32.8 FALSE FALSE
#> 20 nonEU private 4 37.1 TRUE TRUE
#> 21 EU Agriculture 4 240.2 TRUE TRUE
#> 22 EU Entertainment 5 114.7 FALSE FALSE
#> 23 EU Governmental 4 32.8 FALSE FALSE
#> 24 EU Industry 3 37.5 FALSE FALSE
#> 25 nonEU Entertainment 1 16.8 TRUE TRUE
#> 26 nonEU Industry 3 20.3 FALSE FALSE
# In fact, tables_by_formulas() is also a possibility
a2 <- tables_by_formulas(SSBtoolsData("magnitude1"),
table_fun = SuppressDominantCells,
table_formulas = list(table_1 = ~region * sector2,
table_2 = ~region1:sector4 - 1,
table_3 = ~region + sector4 - 1),
substitute_vars = list(region = c("geo", "eu"), region1 = "eu"),
collapse_vars = list(sector = c("sector2", "sector4")),
dominanceVar = "value",
pPercent = 10,
contributorVar = "company",
linkedGauss = "consistent")
#> [preAggregate 20*6->20*7]
#> [extraAggregate 20*7->10*7] Checking .....
#>
#> ====== Linked GaussSuppression by "consistent" algorithm:
#>
#> GaussSuppression_numttHTT: ................
print(a2)
#> region sector freq value primary suppressed table_1 table_2 table_3
#> 1 Total Total 20 462.3 FALSE FALSE TRUE FALSE FALSE
#> 2 Iceland Total 4 37.1 TRUE TRUE TRUE FALSE TRUE
#> 3 Portugal Total 8 162.5 TRUE TRUE TRUE FALSE TRUE
#> 4 Spain Total 8 262.7 FALSE FALSE TRUE FALSE TRUE
#> 5 EU Total 16 425.2 FALSE TRUE TRUE FALSE TRUE
#> 6 nonEU Total 4 37.1 TRUE TRUE TRUE FALSE TRUE
#> 7 Total private 16 429.5 FALSE FALSE TRUE FALSE FALSE
#> 8 Total public 4 32.8 FALSE FALSE TRUE FALSE FALSE
#> 9 Total Agriculture 4 240.2 TRUE TRUE FALSE FALSE TRUE
#> 10 Total Entertainment 6 131.5 FALSE FALSE FALSE FALSE TRUE
#> 11 Total Governmental 4 32.8 FALSE FALSE FALSE FALSE TRUE
#> 12 Total Industry 6 57.8 FALSE FALSE FALSE FALSE TRUE
#> 13 Iceland private 4 37.1 TRUE TRUE TRUE FALSE FALSE
#> 14 Portugal private 6 138.9 TRUE TRUE TRUE FALSE FALSE
#> 15 Portugal public 2 23.6 TRUE TRUE TRUE FALSE FALSE
#> 16 Spain private 6 253.5 FALSE TRUE TRUE FALSE FALSE
#> 17 Spain public 2 9.2 TRUE TRUE TRUE FALSE FALSE
#> 18 EU private 12 392.4 FALSE TRUE TRUE FALSE FALSE
#> 19 EU public 4 32.8 FALSE FALSE TRUE FALSE FALSE
#> 20 nonEU private 4 37.1 TRUE TRUE TRUE FALSE FALSE
#> 21 EU Agriculture 4 240.2 TRUE TRUE FALSE TRUE FALSE
#> 22 EU Entertainment 5 114.7 FALSE FALSE FALSE TRUE FALSE
#> 23 EU Governmental 4 32.8 FALSE FALSE FALSE TRUE FALSE
#> 24 EU Industry 3 37.5 FALSE FALSE FALSE TRUE FALSE
#> 25 nonEU Entertainment 1 16.8 TRUE TRUE FALSE TRUE FALSE
#> 26 nonEU Industry 3 20.3 FALSE FALSE FALSE TRUE FALSE
#### The second example cannot be handled using the alternative methods.
#### This is similar to the (old) LazyLinkedTables() example.
z1 <- SSBtoolsData("z1")
z2 <- SSBtoolsData("z2")
z2b <- z2[3:5] # As in ChainedSuppression example
names(z2b)[1] <- "region"
# As 'f' and 'e' in ChainedSuppression example.
# 'A' 'annet'/'arbeid' suppressed in b[[1]], since suppressed in b[[3]].
b <- SuppressLinkedTables(fun = SuppressSmallCounts,
linkedGauss = "consistent",
recordAware = FALSE,
withinArg = list(
list(data = z1, dimVar = 1:2, freqVar = 3, maxN = 5),
list(data = z2b, dimVar = 1:2, freqVar = 3, maxN = 5),
list(data = z2, dimVar = 1:4, freqVar = 5, maxN = 1)))
#> [extend0 32*3->32*3]
#> [extend0 44*3->44*3]
#> [extend0 44*5->44*5]
#>
#> ====== Linked GaussSuppression by "consistent" algorithm:
#>
#> GaussSuppression_anySum: ............................
print(b)
#> [[1]]
#> region hovedint ant primary suppressed
#> 1 Total Total 596 FALSE FALSE
#> 2 Total annet 72 FALSE FALSE
#> 3 Total arbeid 52 FALSE FALSE
#> 4 Total soshjelp 283 FALSE FALSE
#> 5 Total trygd 189 FALSE FALSE
#> 6 A Total 113 FALSE FALSE
#> 7 A annet 11 FALSE TRUE
#> 8 A arbeid 11 FALSE TRUE
#> 9 A soshjelp 55 FALSE FALSE
#> 10 A trygd 36 FALSE FALSE
#> 11 B Total 55 FALSE FALSE
#> 12 B annet 7 FALSE TRUE
#> 13 B arbeid 1 TRUE TRUE
#> 14 B soshjelp 29 FALSE FALSE
#> 15 B trygd 18 FALSE FALSE
#> 16 C Total 73 FALSE FALSE
#> 17 C annet 5 TRUE TRUE
#> 18 C arbeid 8 FALSE TRUE
#> 19 C soshjelp 35 FALSE FALSE
#> 20 C trygd 25 FALSE FALSE
#> 21 D Total 45 FALSE FALSE
#> 22 D annet 13 FALSE TRUE
#> 23 D arbeid 2 TRUE TRUE
#> 24 D soshjelp 17 FALSE FALSE
#> 25 D trygd 13 FALSE FALSE
#> 26 E Total 138 FALSE FALSE
#> 27 E annet 9 FALSE FALSE
#> 28 E arbeid 14 FALSE FALSE
#> 29 E soshjelp 63 FALSE FALSE
#> 30 E trygd 52 FALSE FALSE
#> 31 F Total 67 FALSE FALSE
#> 32 F annet 12 FALSE FALSE
#> 33 F arbeid 9 FALSE FALSE
#> 34 F soshjelp 24 FALSE FALSE
#> 35 F trygd 22 FALSE FALSE
#> 36 G Total 40 FALSE FALSE
#> 37 G annet 6 FALSE TRUE
#> 38 G arbeid 4 TRUE TRUE
#> 39 G soshjelp 22 FALSE FALSE
#> 40 G trygd 8 FALSE FALSE
#> 41 H Total 65 FALSE FALSE
#> 42 H annet 9 FALSE TRUE
#> 43 H arbeid 3 TRUE TRUE
#> 44 H soshjelp 38 FALSE FALSE
#> 45 H trygd 15 FALSE FALSE
#>
#> [[2]]
#> region hovedint ant primary suppressed
#> 1 Total Total 706 FALSE FALSE
#> 2 Total annet 88 FALSE FALSE
#> 3 Total arbeid 54 FALSE FALSE
#> 4 Total soshjelp 342 FALSE FALSE
#> 5 Total trygd 222 FALSE FALSE
#> 6 300 Total 596 FALSE FALSE
#> 7 300 annet 72 FALSE TRUE
#> 8 300 arbeid 52 FALSE TRUE
#> 9 300 soshjelp 283 FALSE FALSE
#> 10 300 trygd 189 FALSE FALSE
#> 11 400 Total 110 FALSE FALSE
#> 12 400 annet 16 FALSE TRUE
#> 13 400 arbeid 2 TRUE TRUE
#> 14 400 soshjelp 59 FALSE FALSE
#> 15 400 trygd 33 FALSE FALSE
#>
#> [[3]]
#> region hovedint ant primary suppressed
#> 1 1 Total 127 FALSE FALSE
#> 2 1 annet 14 FALSE FALSE
#> 3 1 arbeid 11 FALSE FALSE
#> 4 1 soshjelp 64 FALSE FALSE
#> 5 1 trygd 38 FALSE FALSE
#> 6 10 Total 96 FALSE FALSE
#> 7 10 annet 13 FALSE TRUE
#> 8 10 arbeid 2 FALSE TRUE
#> 9 10 soshjelp 50 FALSE FALSE
#> 10 10 trygd 31 FALSE FALSE
#> 11 300 Total 596 FALSE FALSE
#> 12 300 annet 72 FALSE TRUE
#> 13 300 arbeid 52 FALSE TRUE
#> 14 300 soshjelp 283 FALSE FALSE
#> 15 300 trygd 189 FALSE FALSE
#> 16 4 Total 55 FALSE FALSE
#> 17 4 annet 7 FALSE TRUE
#> 18 4 arbeid 1 TRUE TRUE
#> 19 4 soshjelp 29 FALSE FALSE
#> 20 4 trygd 18 FALSE FALSE
#> 21 400 Total 110 FALSE FALSE
#> 22 400 annet 16 FALSE TRUE
#> 23 400 arbeid 2 FALSE TRUE
#> 24 400 soshjelp 59 FALSE FALSE
#> 25 400 trygd 33 FALSE FALSE
#> 26 5 Total 118 FALSE FALSE
#> 27 5 annet 18 FALSE FALSE
#> 28 5 arbeid 10 FALSE FALSE
#> 29 5 soshjelp 52 FALSE FALSE
#> 30 5 trygd 38 FALSE FALSE
#> 31 6 Total 205 FALSE FALSE
#> 32 6 annet 21 FALSE FALSE
#> 33 6 arbeid 23 FALSE FALSE
#> 34 6 soshjelp 87 FALSE FALSE
#> 35 6 trygd 74 FALSE FALSE
#> 36 8 Total 105 FALSE FALSE
#> 37 8 annet 15 FALSE FALSE
#> 38 8 arbeid 7 FALSE FALSE
#> 39 8 soshjelp 60 FALSE FALSE
#> 40 8 trygd 23 FALSE FALSE
#> 41 Total Total 706 FALSE FALSE
#> 42 Total annet 88 FALSE FALSE
#> 43 Total arbeid 54 FALSE FALSE
#> 44 Total soshjelp 342 FALSE FALSE
#> 45 Total trygd 222 FALSE FALSE
#> 46 A Total 113 FALSE FALSE
#> 47 A annet 11 FALSE TRUE
#> 48 A arbeid 11 FALSE TRUE
#> 49 A soshjelp 55 FALSE FALSE
#> 50 A trygd 36 FALSE FALSE
#> 51 B Total 55 FALSE FALSE
#> 52 B annet 7 FALSE TRUE
#> 53 B arbeid 1 TRUE TRUE
#> 54 B soshjelp 29 FALSE FALSE
#> 55 B trygd 18 FALSE FALSE
#> 56 C Total 73 FALSE FALSE
#> 57 C annet 5 FALSE TRUE
#> 58 C arbeid 8 FALSE TRUE
#> 59 C soshjelp 35 FALSE FALSE
#> 60 C trygd 25 FALSE FALSE
#> 61 D Total 45 FALSE FALSE
#> 62 D annet 13 FALSE TRUE
#> 63 D arbeid 2 FALSE TRUE
#> 64 D soshjelp 17 FALSE FALSE
#> 65 D trygd 13 FALSE FALSE
#> 66 E Total 138 FALSE FALSE
#> 67 E annet 9 FALSE FALSE
#> 68 E arbeid 14 FALSE FALSE
#> 69 E soshjelp 63 FALSE FALSE
#> 70 E trygd 52 FALSE FALSE
#> 71 F Total 67 FALSE FALSE
#> 72 F annet 12 FALSE FALSE
#> 73 F arbeid 9 FALSE FALSE
#> 74 F soshjelp 24 FALSE FALSE
#> 75 F trygd 22 FALSE FALSE
#> 76 G Total 40 FALSE FALSE
#> 77 G annet 6 FALSE TRUE
#> 78 G arbeid 4 FALSE TRUE
#> 79 G soshjelp 22 FALSE FALSE
#> 80 G trygd 8 FALSE FALSE
#> 81 H Total 65 FALSE FALSE
#> 82 H annet 9 FALSE TRUE
#> 83 H arbeid 3 FALSE TRUE
#> 84 H soshjelp 38 FALSE FALSE
#> 85 H trygd 15 FALSE FALSE
#> 86 I Total 14 FALSE FALSE
#> 87 I annet 3 FALSE TRUE
#> 88 I arbeid 0 TRUE TRUE
#> 89 I soshjelp 9 FALSE FALSE
#> 90 I trygd 2 FALSE FALSE
#> 91 J Total 61 FALSE FALSE
#> 92 J annet 9 FALSE TRUE
#> 93 J arbeid 0 TRUE TRUE
#> 94 J soshjelp 32 FALSE FALSE
#> 95 J trygd 20 FALSE FALSE
#> 96 K Total 35 FALSE FALSE
#> 97 K annet 4 FALSE FALSE
#> 98 K arbeid 2 FALSE FALSE
#> 99 K soshjelp 18 FALSE FALSE
#> 100 K trygd 11 FALSE FALSE
#>