This function computes aggregates by crossing several hierarchical specifications and factorial variables.
Usage
HierarchyCompute(
data,
hierarchies,
valueVar,
colVar = NULL,
rowSelect = NULL,
colSelect = NULL,
select = NULL,
inputInOutput = FALSE,
output = "data.frame",
autoLevel = TRUE,
unionComplement = FALSE,
constantsInOutput = NULL,
hierarchyVarNames = c(mapsFrom = "mapsFrom", mapsTo = "mapsTo", sign = "sign", level =
"level"),
selectionByMultiplicationLimit = 10^7,
colNotInDataWarning = TRUE,
useMatrixToDataFrame = TRUE,
handleDuplicated = "sum",
asInput = FALSE,
verbose = FALSE,
reOrder = FALSE,
reduceData = TRUE,
makeRownames = NULL
)
Arguments
- data
The input data frame
- hierarchies
A named (names in
data
) list with hierarchies. Variables can also be coded by"rowFactor"
and"colFactor"
.- valueVar
Name of the variable(s) to be aggregated.
- colVar
When non-NULL, the function
HierarchyCompute2
is called. See its documentation for more information.- rowSelect
Data frame specifying variable combinations for output. The colFactor variable is not included. In addition
rowSelect="removeEmpty"
removes combinations corresponding to empty rows (only zeros) ofdataDummyHierarchy
.- colSelect
Vector specifying categories of the colFactor variable for output.
- select
Data frame specifying variable combinations for output. The colFactor variable is included.
- inputInOutput
Logical vector (possibly recycled) for each element of hierarchies. TRUE means that codes from input are included in output. Values corresponding to
"rowFactor"
and"colFactor"
are ignored.- output
One of "data.frame" (default), "dummyHierarchies", "outputMatrix", "dataDummyHierarchy", "valueMatrix", "fromCrossCode", "toCrossCode", "crossCode" (as toCrossCode), "outputMatrixWithCrossCode", "matrixComponents", "dataDummyHierarchyWithCodeFrame", "dataDummyHierarchyQuick". The latter two do not require
valueVar
(reduceData
set toFALSE
).- autoLevel
Logical vector (possibly recycled) for each element of hierarchies. When TRUE, level is computed by automatic method as in
HierarchyFix
. Values corresponding to"rowFactor"
and"colFactor"
are ignored.- unionComplement
Logical vector (possibly recycled) for each element of hierarchies. When TRUE, sign means union and complement instead of addition or subtraction as in
DummyHierarchy
. Values corresponding to"rowFactor"
and"colFactor"
are ignored.- constantsInOutput
A single row data frame to be combine by the other output.
- hierarchyVarNames
Variable names in the hierarchy tables as in
HierarchyFix
.- selectionByMultiplicationLimit
With non-NULL
rowSelect
and when the number of elements indataDummyHierarchy
exceeds this limit, the computation is performed by a slower but more memory efficient algorithm.- colNotInDataWarning
When TRUE, warning produced when elements of
colSelect
are not in data.- useMatrixToDataFrame
When TRUE (default) special functionality for saving time and memory is used.
- handleDuplicated
Handling of duplicated code rows in data. One of: "sum" (default), "sumByAggregate", "sumWithWarning", "stop" (error), "single" or "singleWithWarning". With no colFactor sum and sumByAggregate/sumWithWarning are different (original values or aggregates in "valueMatrix"). When single, only one of the values is used (by matrix subsetting).
- asInput
When TRUE (FALSE is default) output matrices match input data. Thus
valueMatrix
=
Matrix(data[, valueVar],ncol=1)
. Only possible when no colFactor.- verbose
Whether to print information during calculations. FALSE is default.
- reOrder
When TRUE (FALSE is default) output codes are ordered differently, more similar to a usual model matrix ordering.
- reduceData
When TRUE (default) unnecessary (for the aggregated result) rows of
valueMatrix
are allowed to be removed.- makeRownames
When TRUE
dataDummyHierarchy
contains rownames. By default, this is decided based on the parameteroutput
.
Details
A key element of this function is the matrix multiplication:
outputMatrix
=
dataDummyHierarchy
%*%
valueMatrix
.
The matrix, valueMatrix
is a re-organized version of the valueVar vector from input. In particular,
if a variable is selected as colFactor
, there is one column for each level of that variable.
The matrix, dataDummyHierarchy
is constructed by crossing dummy coding of hierarchies (DummyHierarchy
) and factorial variables
in a way that matches valueMatrix
. The code combinations corresponding to rows and columns of dataDummyHierarchy
can be obtained as toCrossCode
and fromCrossCode
. In the default data frame output, the outputMatrix
is stacked
to one column and combined with the code combinations of all variables.
Examples
# Data and hierarchies used in the examples
x <- SSBtoolsData("sprt_emp") # Employment in sport in thousand persons from Eurostat database
geoHier <- SSBtoolsData("sprt_emp_geoHier")
ageHier <- SSBtoolsData("sprt_emp_ageHier")
# Two hierarchies and year as rowFactor
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per")
#> age geo year ths_per
#> 1 Y15-64 Europe 2014 222.3
#> 2 Y15-64 nonEU 2014 3.3
#> 3 Y15-64 EU 2014 219.0
#> 4 Y15-64 Europe 2015 225.0
#> 5 Y15-64 nonEU 2015 3.5
#> 6 Y15-64 EU 2015 221.5
#> 7 Y15-64 Europe 2016 233.5
#> 8 Y15-64 nonEU 2016 3.8
#> 9 Y15-64 EU 2016 229.7
# Same result with year as colFactor (but columns ordered differently)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per")
#> year age geo ths_per
#> 1 2014 Y15-64 Europe 222.3
#> 2 2014 Y15-64 nonEU 3.3
#> 3 2014 Y15-64 EU 219.0
#> 4 2015 Y15-64 Europe 225.0
#> 5 2015 Y15-64 nonEU 3.5
#> 6 2015 Y15-64 EU 221.5
#> 7 2016 Y15-64 Europe 233.5
#> 8 2016 Y15-64 nonEU 3.8
#> 9 2016 Y15-64 EU 229.7
# Internally the computations are different as seen when output='matrixComponents'
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per",
output = "matrixComponents")
#> $dataDummyHierarchy
#> 9 x 18 sparse Matrix of class "dgCMatrix"
#>
#> Y15-64:Europe:2014 1 1 1 1 1 1 . . . . . . . . . . . .
#> Y15-64:nonEU:2014 . 1 . . 1 . . . . . . . . . . . . .
#> Y15-64:EU:2014 1 0 1 1 0 1 . . . . . . . . . . . .
#> Y15-64:Europe:2015 . . . . . . 1 1 1 1 1 1 . . . . . .
#> Y15-64:nonEU:2015 . . . . . . . 1 . . 1 . . . . . . .
#> Y15-64:EU:2015 . . . . . . 1 0 1 1 0 1 . . . . . .
#> Y15-64:Europe:2016 . . . . . . . . . . . . 1 1 1 1 1 1
#> Y15-64:nonEU:2016 . . . . . . . . . . . . . 1 . . 1 .
#> Y15-64:EU:2016 . . . . . . . . . . . . 1 0 1 1 0 1
#>
#> $valueMatrix
#> 18 x 1 Matrix of class "dgeMatrix"
#> ths_per
#> [1,] 66.9
#> [2,] 1.8
#> [3,] 11.6
#> [4,] 120.3
#> [5,] 1.5
#> [6,] 20.2
#> [7,] 63.4
#> [8,] 1.9
#> [9,] 14.2
#> [10,] 119.6
#> [11,] 1.6
#> [12,] 24.3
#> [13,] 69.1
#> [14,] 1.9
#> [15,] 12.7
#> [16,] 122.1
#> [17,] 1.9
#> [18,] 25.8
#>
#> $fromCrossCode
#> age geo year
#> 1 Y15-29 Spain 2014
#> 2 Y15-29 Iceland 2014
#> 3 Y15-29 Portugal 2014
#> 4 Y30-64 Spain 2014
#> 5 Y30-64 Iceland 2014
#> 6 Y30-64 Portugal 2014
#> 7 Y15-29 Spain 2015
#> 8 Y15-29 Iceland 2015
#> 9 Y15-29 Portugal 2015
#> 10 Y30-64 Spain 2015
#> 11 Y30-64 Iceland 2015
#> 12 Y30-64 Portugal 2015
#> 13 Y15-29 Spain 2016
#> 14 Y15-29 Iceland 2016
#> 15 Y15-29 Portugal 2016
#> 16 Y30-64 Spain 2016
#> 17 Y30-64 Iceland 2016
#> 18 Y30-64 Portugal 2016
#>
#> $toCrossCode
#> age geo year
#> 1 Y15-64 Europe 2014
#> 2 Y15-64 nonEU 2014
#> 3 Y15-64 EU 2014
#> 4 Y15-64 Europe 2015
#> 5 Y15-64 nonEU 2015
#> 6 Y15-64 EU 2015
#> 7 Y15-64 Europe 2016
#> 8 Y15-64 nonEU 2016
#> 9 Y15-64 EU 2016
#>
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
output = "matrixComponents")
#> $dataDummyHierarchy
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#>
#> Y15-64:Europe 1 1 1 1 1 1
#> Y15-64:nonEU 1 . . 1 . .
#> Y15-64:EU 0 1 1 0 1 1
#>
#> $valueMatrix
#> 6 x 3 sparse Matrix of class "dgCMatrix"
#> 2014 2015 2016
#> [1,] 1.8 1.9 1.9
#> [2,] 11.6 14.2 12.7
#> [3,] 66.9 63.4 69.1
#> [4,] 1.5 1.6 1.9
#> [5,] 20.2 24.3 25.8
#> [6,] 120.3 119.6 122.1
#>
#> $fromCrossCode
#> age geo
#> 1 Y15-29 Iceland
#> 2 Y15-29 Portugal
#> 3 Y15-29 Spain
#> 4 Y30-64 Iceland
#> 5 Y30-64 Portugal
#> 6 Y30-64 Spain
#>
#> $toCrossCode
#> age geo
#> 1 Y15-64 Europe
#> 2 Y15-64 nonEU
#> 3 Y15-64 EU
#>
# Include input age groups by setting inputInOutput = TRUE for this variable
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
inputInOutput = c(TRUE, FALSE))
#> year age geo ths_per
#> 1 2014 Y15-29 Europe 80.3
#> 2 2014 Y30-64 Europe 142.0
#> 3 2014 Y15-64 Europe 222.3
#> 4 2014 Y15-29 nonEU 1.8
#> 5 2014 Y30-64 nonEU 1.5
#> 6 2014 Y15-64 nonEU 3.3
#> 7 2014 Y15-29 EU 78.5
#> 8 2014 Y30-64 EU 140.5
#> 9 2014 Y15-64 EU 219.0
#> 10 2015 Y15-29 Europe 79.5
#> 11 2015 Y30-64 Europe 145.5
#> 12 2015 Y15-64 Europe 225.0
#> 13 2015 Y15-29 nonEU 1.9
#> 14 2015 Y30-64 nonEU 1.6
#> 15 2015 Y15-64 nonEU 3.5
#> 16 2015 Y15-29 EU 77.6
#> 17 2015 Y30-64 EU 143.9
#> 18 2015 Y15-64 EU 221.5
#> 19 2016 Y15-29 Europe 83.7
#> 20 2016 Y30-64 Europe 149.8
#> 21 2016 Y15-64 Europe 233.5
#> 22 2016 Y15-29 nonEU 1.9
#> 23 2016 Y30-64 nonEU 1.9
#> 24 2016 Y15-64 nonEU 3.8
#> 25 2016 Y15-29 EU 81.8
#> 26 2016 Y30-64 EU 147.9
#> 27 2016 Y15-64 EU 229.7
# Only input age groups by switching to rowFactor
HierarchyCompute(x, list(age = "rowFactor", geo = geoHier, year = "colFactor"), "ths_per")
#> year age geo ths_per
#> 1 2014 Y15-29 Europe 80.3
#> 2 2014 Y30-64 Europe 142.0
#> 3 2014 Y15-29 nonEU 1.8
#> 4 2014 Y30-64 nonEU 1.5
#> 5 2014 Y15-29 EU 78.5
#> 6 2014 Y30-64 EU 140.5
#> 7 2015 Y15-29 Europe 79.5
#> 8 2015 Y30-64 Europe 145.5
#> 9 2015 Y15-29 nonEU 1.9
#> 10 2015 Y30-64 nonEU 1.6
#> 11 2015 Y15-29 EU 77.6
#> 12 2015 Y30-64 EU 143.9
#> 13 2016 Y15-29 Europe 83.7
#> 14 2016 Y30-64 Europe 149.8
#> 15 2016 Y15-29 nonEU 1.9
#> 16 2016 Y30-64 nonEU 1.9
#> 17 2016 Y15-29 EU 81.8
#> 18 2016 Y30-64 EU 147.9
# Select some years (colFactor) including a year not in input data (zeros produced)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
colSelect = c("2014", "2016", "2018"))
#> Warning: Items in colSelect not in data[,'year'] set to zero: 2018
#> year age geo ths_per
#> 1 2014 Y15-64 Europe 222.3
#> 2 2014 Y15-64 nonEU 3.3
#> 3 2014 Y15-64 EU 219.0
#> 4 2016 Y15-64 Europe 233.5
#> 5 2016 Y15-64 nonEU 3.8
#> 6 2016 Y15-64 EU 229.7
#> 7 2018 Y15-64 Europe 0.0
#> 8 2018 Y15-64 nonEU 0.0
#> 9 2018 Y15-64 EU 0.0
# Select combinations of geo and age including a code not in data or hierarchy (zeros produced)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
rowSelect = data.frame(geo = "EU", age = c("Y0-100", "Y15-64", "Y15-29")))
#> year geo age ths_per
#> 1 2014 EU Y0-100 0.0
#> 2 2014 EU Y15-64 219.0
#> 3 2014 EU Y15-29 78.5
#> 4 2015 EU Y0-100 0.0
#> 5 2015 EU Y15-64 221.5
#> 6 2015 EU Y15-29 77.6
#> 7 2016 EU Y0-100 0.0
#> 8 2016 EU Y15-64 229.7
#> 9 2016 EU Y15-29 81.8
# Select combinations of geo, age and year
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
select = data.frame(geo = c("EU", "Spain"), age = c("Y15-64", "Y15-29"), year = 2015))
#> year age geo ths_per
#> 1 2015 Y15-64 EU 221.5
#> 2 2015 Y15-29 Spain 63.4
# Extend the hierarchy table to illustrate the effect of unionComplement
# Omit level since this is handled by autoLevel
geoHier2 <- rbind(data.frame(mapsFrom = c("EU", "Spain"), mapsTo = "EUandSpain", sign = 1),
geoHier[, -4])
# Spain is counted twice
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per")
#> year age geo ths_per
#> 1 2014 Y15-64 Europe 222.3
#> 2 2014 Y15-64 nonEU 3.3
#> 3 2014 Y15-64 EU 219.0
#> 4 2014 Y15-64 EUandSpain 406.2
#> 5 2015 Y15-64 Europe 225.0
#> 6 2015 Y15-64 nonEU 3.5
#> 7 2015 Y15-64 EU 221.5
#> 8 2015 Y15-64 EUandSpain 404.5
#> 9 2016 Y15-64 Europe 233.5
#> 10 2016 Y15-64 nonEU 3.8
#> 11 2016 Y15-64 EU 229.7
#> 12 2016 Y15-64 EUandSpain 420.9
# Can be seen in the dataDummyHierarchy matrix
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per",
output = "matrixComponents")
#> $dataDummyHierarchy
#> 4 x 6 sparse Matrix of class "dgCMatrix"
#>
#> Y15-64:Europe 1 1 1 1 1 1
#> Y15-64:nonEU 1 . . 1 . .
#> Y15-64:EU 0 1 1 0 1 1
#> Y15-64:EUandSpain 0 1 2 0 1 2
#>
#> $valueMatrix
#> 6 x 3 sparse Matrix of class "dgCMatrix"
#> 2014 2015 2016
#> [1,] 1.8 1.9 1.9
#> [2,] 11.6 14.2 12.7
#> [3,] 66.9 63.4 69.1
#> [4,] 1.5 1.6 1.9
#> [5,] 20.2 24.3 25.8
#> [6,] 120.3 119.6 122.1
#>
#> $fromCrossCode
#> age geo
#> 1 Y15-29 Iceland
#> 2 Y15-29 Portugal
#> 3 Y15-29 Spain
#> 4 Y30-64 Iceland
#> 5 Y30-64 Portugal
#> 6 Y30-64 Spain
#>
#> $toCrossCode
#> age geo
#> 1 Y15-64 Europe
#> 2 Y15-64 nonEU
#> 3 Y15-64 EU
#> 4 Y15-64 EUandSpain
#>
# With unionComplement=TRUE Spain is not counted twice
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per",
unionComplement = TRUE)
#> year age geo ths_per
#> 1 2014 Y15-64 Europe 222.3
#> 2 2014 Y15-64 nonEU 3.3
#> 3 2014 Y15-64 EU 219.0
#> 4 2014 Y15-64 EUandSpain 219.0
#> 5 2015 Y15-64 Europe 225.0
#> 6 2015 Y15-64 nonEU 3.5
#> 7 2015 Y15-64 EU 221.5
#> 8 2015 Y15-64 EUandSpain 221.5
#> 9 2016 Y15-64 Europe 233.5
#> 10 2016 Y15-64 nonEU 3.8
#> 11 2016 Y15-64 EU 229.7
#> 12 2016 Y15-64 EUandSpain 229.7
# With constantsInOutput
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
constantsInOutput = data.frame(c1 = "AB", c2 = "CD"))
#> c1 c2 year age geo ths_per
#> 1 AB CD 2014 Y15-64 Europe 222.3
#> 2 AB CD 2014 Y15-64 nonEU 3.3
#> 3 AB CD 2014 Y15-64 EU 219.0
#> 4 AB CD 2015 Y15-64 Europe 225.0
#> 5 AB CD 2015 Y15-64 nonEU 3.5
#> 6 AB CD 2015 Y15-64 EU 221.5
#> 7 AB CD 2016 Y15-64 Europe 233.5
#> 8 AB CD 2016 Y15-64 nonEU 3.8
#> 9 AB CD 2016 Y15-64 EU 229.7
# More that one valueVar
x$y <- 10*x$ths_per
HierarchyCompute(x, list(age = ageHier, geo = geoHier), c("y", "ths_per"))
#> age geo y ths_per
#> 1 Y15-64 Europe 6808 680.8
#> 2 Y15-64 nonEU 106 10.6
#> 3 Y15-64 EU 6702 670.2