Model matrix representing crossed hierarchies
Source:R/Hierarchies2ModelMatrix.R
Hierarchies2ModelMatrix.Rd
Make a model matrix, x, that corresponds to data and represents all hierarchies crossed.
This means that aggregates corresponding to numerical variables can be computed as
t(x) %*% y
, where y
is a matrix with one column for each numerical variable.
Usage
Hierarchies2ModelMatrix(
data,
hierarchies,
inputInOutput = TRUE,
crossTable = FALSE,
total = "Total",
hierarchyVarNames = c(mapsFrom = "mapsFrom", mapsTo = "mapsTo", sign = "sign", level =
"level"),
unionComplement = FALSE,
reOrder = TRUE,
select = NULL,
removeEmpty = FALSE,
selectionByMultiplicationLimit = 10^7,
makeColnames = TRUE,
verbose = FALSE,
...
)
Arguments
- data
Matrix or data frame with data containing codes of relevant variables
- hierarchies
List of hierarchies, which can be converted by
AutoHierarchies
. Thus, the variables can also be coded by"rowFactor"
or""
, which correspond to using the categories in the data.- inputInOutput
Logical vector (possibly recycled) for each element of hierarchies. TRUE means that codes from input are included in output. Values corresponding to
"rowFactor"
or""
are ignored. Also see note.- crossTable
Cross table in output when TRUE
- total
See
AutoHierarchies
- hierarchyVarNames
Variable names in the hierarchy tables as in
HierarchyFix
- unionComplement
Logical vector (possibly recycled) for each element of hierarchies. When TRUE, sign means union and complement instead of addition or subtraction. Values corresponding to
"rowFactor"
and"colFactor"
are ignored.- reOrder
When TRUE (default) output codes are ordered in a way similar to a usual model matrix ordering.
- select
Data frame specifying variable combinations for output or a named list specifying code selections for each variable (see details).
- removeEmpty
When TRUE and when
select
is not a data frame, empty columns (only zeros) are not included in output.- selectionByMultiplicationLimit
With non-NULL
select
and when the number of elements in the model matrix exceeds this limit, the computation is performed by a slower but more memory efficient algorithm.- makeColnames
Colnames included when TRUE (default).
- verbose
Whether to print information during calculations. FALSE is default.
- ...
Extra unused parameters
Details
This function makes use of AutoHierarchies
and HierarchyCompute
via HierarchyComputeDummy
.
Since the dummy matrix is transposed in comparison to HierarchyCompute
, the parameter rowSelect
is renamed to select
and makeRownames
is renamed to makeColnames
.
The select parameter as a list can be partially specified in the sense that not all hierarchy names have to be included.
The parameter inputInOutput
will only apply to hierarchies that are not in the select
list (see note).
Note
The select
as a list is run via a special coding of the inputInOutput
parameter.
This parameter is converted into a list (as.list
) and select
elements are inserted into this list.
This is also an additional option for users of the function.
Examples
# Create some input
z <- SSBtoolsData("sprt_emp_withEU")
ageHier <- SSBtoolsData("sprt_emp_ageHier")
geoDimList <- FindDimLists(z[, c("geo", "eu")], total = "Europe")[[1]]
# First example has list output
Hierarchies2ModelMatrix(z, list(age = ageHier, geo = geoDimList), inputInOutput = FALSE,
crossTable = TRUE)
#> $modelMatrix
#> 18 x 3 sparse Matrix of class "dgCMatrix"
#> Y15-64:Europe Y15-64:EU Y15-64:nonEU
#> [1,] 1 1 .
#> [2,] 1 . 1
#> [3,] 1 1 .
#> [4,] 1 1 .
#> [5,] 1 . 1
#> [6,] 1 1 .
#> [7,] 1 1 .
#> [8,] 1 . 1
#> [9,] 1 1 .
#> [10,] 1 1 .
#> [11,] 1 . 1
#> [12,] 1 1 .
#> [13,] 1 1 .
#> [14,] 1 . 1
#> [15,] 1 1 .
#> [16,] 1 1 .
#> [17,] 1 . 1
#> [18,] 1 1 .
#>
#> $crossTable
#> age geo
#> 1 Y15-64 Europe
#> 2 Y15-64 EU
#> 3 Y15-64 nonEU
#>
m1 <- Hierarchies2ModelMatrix(z, list(age = ageHier, geo = geoDimList), inputInOutput = FALSE)
m2 <- Hierarchies2ModelMatrix(z, list(age = ageHier, geo = geoDimList))
m3 <- Hierarchies2ModelMatrix(z, list(age = ageHier, geo = geoDimList, year = ""),
inputInOutput = FALSE)
m4 <- Hierarchies2ModelMatrix(z, list(age = ageHier, geo = geoDimList, year = "allYears"),
inputInOutput = c(FALSE, FALSE, TRUE))
# Illustrate the effect of unionComplement, geoHier2 as in the examples of HierarchyCompute
geoHier2 <- rbind(data.frame(mapsFrom = c("EU", "Spain"), mapsTo = "EUandSpain", sign = 1),
SSBtoolsData("sprt_emp_geoHier")[, -4])
m5 <- Hierarchies2ModelMatrix(z, list(age = ageHier, geo = geoHier2, year = "allYears"),
inputInOutput = FALSE) # Spain is counted twice
m6 <- Hierarchies2ModelMatrix(z, list(age = ageHier, geo = geoHier2, year = "allYears"),
inputInOutput = FALSE, unionComplement = TRUE)
# Compute aggregates
ths_per <- as.matrix(z[, "ths_per", drop = FALSE]) # matrix with the values to be aggregated
t(m1) %*% ths_per # crossprod(m1, ths_per) is equivalent and faster
#> 3 x 1 Matrix of class "dgeMatrix"
#> ths_per
#> Y15-64:Europe 680.8
#> Y15-64:EU 670.2
#> Y15-64:nonEU 10.6
t(m2) %*% ths_per
#> 18 x 1 Matrix of class "dgeMatrix"
#> ths_per
#> Y15-64:Europe 680.8
#> Y15-64:EU 670.2
#> Y15-64:nonEU 10.6
#> Y15-64:Iceland 10.6
#> Y15-64:Portugal 108.8
#> Y15-64:Spain 561.4
#> Y15-29:Europe 243.5
#> Y15-29:EU 237.9
#> Y15-29:nonEU 5.6
#> Y15-29:Iceland 5.6
#> Y15-29:Portugal 38.5
#> Y15-29:Spain 199.4
#> Y30-64:Europe 437.3
#> Y30-64:EU 432.3
#> Y30-64:nonEU 5.0
#> Y30-64:Iceland 5.0
#> Y30-64:Portugal 70.3
#> Y30-64:Spain 362.0
t(m3) %*% ths_per
#> 9 x 1 Matrix of class "dgeMatrix"
#> ths_per
#> Y15-64:Europe:2014 222.3
#> Y15-64:Europe:2015 225.0
#> Y15-64:Europe:2016 233.5
#> Y15-64:EU:2014 219.0
#> Y15-64:EU:2015 221.5
#> Y15-64:EU:2016 229.7
#> Y15-64:nonEU:2014 3.3
#> Y15-64:nonEU:2015 3.5
#> Y15-64:nonEU:2016 3.8
t(m4) %*% ths_per
#> 12 x 1 Matrix of class "dgeMatrix"
#> ths_per
#> Y15-64:Europe:allYears 680.8
#> Y15-64:Europe:2014 222.3
#> Y15-64:Europe:2015 225.0
#> Y15-64:Europe:2016 233.5
#> Y15-64:EU:allYears 670.2
#> Y15-64:EU:2014 219.0
#> Y15-64:EU:2015 221.5
#> Y15-64:EU:2016 229.7
#> Y15-64:nonEU:allYears 10.6
#> Y15-64:nonEU:2014 3.3
#> Y15-64:nonEU:2015 3.5
#> Y15-64:nonEU:2016 3.8
t(m5) %*% ths_per
#> 4 x 1 Matrix of class "dgeMatrix"
#> ths_per
#> Y15-64:EUandSpain:allYears 1231.6
#> Y15-64:EU:allYears 670.2
#> Y15-64:Europe:allYears 680.8
#> Y15-64:nonEU:allYears 10.6
t(m6) %*% ths_per
#> 4 x 1 Matrix of class "dgeMatrix"
#> ths_per
#> Y15-64:EUandSpain:allYears 670.2
#> Y15-64:EU:allYears 670.2
#> Y15-64:Europe:allYears 680.8
#> Y15-64:nonEU:allYears 10.6
# Example using the select parameter as a data frame
select <- data.frame(age = c("Y15-64", "Y15-29", "Y30-64"), geo = c("EU", "nonEU", "Spain"))
m2a <- Hierarchies2ModelMatrix(z, list(age = ageHier, geo = geoDimList), select = select)
# Same result by slower alternative
m2B <- Hierarchies2ModelMatrix(z, list(age = ageHier, geo = geoDimList), crossTable = TRUE)
m2b <- m2B$modelMatrix[, Match(select, m2B$crossTable), drop = FALSE]
t(m2b) %*% ths_per
#> 3 x 1 Matrix of class "dgeMatrix"
#> ths_per
#> Y15-64:EU 670.2
#> Y15-29:nonEU 5.6
#> Y30-64:Spain 362.0
# Examples using the select parameter as a list
Hierarchies2ModelMatrix(z, list(age = ageHier, geo = geoDimList),
inputInOutput = FALSE,
select = list(geo = c("nonEU", "Portugal")))
#> 18 x 2 sparse Matrix of class "dgCMatrix"
#> Y15-64:nonEU Y15-64:Portugal
#> [1,] . .
#> [2,] 1 .
#> [3,] . 1
#> [4,] . .
#> [5,] 1 .
#> [6,] . 1
#> [7,] . .
#> [8,] 1 .
#> [9,] . 1
#> [10,] . .
#> [11,] 1 .
#> [12,] . 1
#> [13,] . .
#> [14,] 1 .
#> [15,] . 1
#> [16,] . .
#> [17,] 1 .
#> [18,] . 1
Hierarchies2ModelMatrix(z, list(age = ageHier, geo = geoDimList),
select = list(geo = c("nonEU", "Portugal"), age = c("Y15-64", "Y15-29")))
#> 18 x 4 sparse Matrix of class "dgCMatrix"
#> Y15-64:nonEU Y15-64:Portugal Y15-29:nonEU Y15-29:Portugal
#> [1,] . . . .
#> [2,] 1 . 1 .
#> [3,] . 1 . 1
#> [4,] . . . .
#> [5,] 1 . . .
#> [6,] . 1 . .
#> [7,] . . . .
#> [8,] 1 . 1 .
#> [9,] . 1 . 1
#> [10,] . . . .
#> [11,] 1 . . .
#> [12,] . 1 . .
#> [13,] . . . .
#> [14,] 1 . 1 .
#> [15,] . 1 . 1
#> [16,] . . . .
#> [17,] 1 . . .
#> [18,] . 1 . .