This function computes aggregates by crossing several hierarchical specifications and factorial variables.
Usage
HierarchyCompute(
data,
hierarchies,
valueVar,
colVar = NULL,
rowSelect = NULL,
colSelect = NULL,
select = NULL,
inputInOutput = FALSE,
output = "data.frame",
autoLevel = TRUE,
unionComplement = FALSE,
constantsInOutput = NULL,
hierarchyVarNames = c(mapsFrom = "mapsFrom", mapsTo = "mapsTo", sign = "sign", level =
"level"),
selectionByMultiplicationLimit = 10^7,
colNotInDataWarning = TRUE,
useMatrixToDataFrame = TRUE,
handleDuplicated = "sum",
asInput = FALSE,
verbose = FALSE,
reOrder = FALSE,
reduceData = TRUE,
makeRownames = NULL
)Arguments
- data
The input data frame
- hierarchies
A named (names in
data) list with hierarchies. Variables can also be coded by"rowFactor"and"colFactor".- valueVar
Name of the variable(s) to be aggregated.
- colVar
When non-NULL, the function
HierarchyCompute2is called. See its documentation for more information.- rowSelect
Data frame specifying variable combinations for output. The colFactor variable is not included. In addition
rowSelect="removeEmpty"removes combinations corresponding to empty rows (only zeros) ofdataDummyHierarchy.- colSelect
Vector specifying categories of the colFactor variable for output.
- select
Data frame specifying variable combinations for output. The colFactor variable is included.
- inputInOutput
Logical vector (possibly recycled) for each element of hierarchies. TRUE means that codes from input are included in output. Values corresponding to
"rowFactor"and"colFactor"are ignored.- output
One of "data.frame" (default), "dummyHierarchies", "outputMatrix", "dataDummyHierarchy", "valueMatrix", "fromCrossCode", "toCrossCode", "crossCode" (as toCrossCode), "outputMatrixWithCrossCode", "matrixComponents", "dataDummyHierarchyWithCodeFrame", "dataDummyHierarchyQuick". The latter two do not require
valueVar(reduceDataset toFALSE).- autoLevel
Logical vector (possibly recycled) for each element of hierarchies. When TRUE, level is computed by automatic method as in
HierarchyFix. Values corresponding to"rowFactor"and"colFactor"are ignored.- unionComplement
Logical vector (possibly recycled) for each element of hierarchies. When TRUE, sign means union and complement instead of addition or subtraction as in
DummyHierarchy. Values corresponding to"rowFactor"and"colFactor"are ignored.- constantsInOutput
A single row data frame to be combine by the other output.
- hierarchyVarNames
Variable names in the hierarchy tables as in
HierarchyFix.- selectionByMultiplicationLimit
With non-NULL
rowSelectand when the number of elements indataDummyHierarchyexceeds this limit, the computation is performed by a slower but more memory efficient algorithm.- colNotInDataWarning
When TRUE, warning produced when elements of
colSelectare not in data.- useMatrixToDataFrame
When TRUE (default) special functionality for saving time and memory is used.
- handleDuplicated
Handling of duplicated code rows in data. One of: "sum" (default), "sumByAggregate", "sumWithWarning", "stop" (error), "single" or "singleWithWarning". With no colFactor sum and sumByAggregate/sumWithWarning are different (original values or aggregates in "valueMatrix"). When single, only one of the values is used (by matrix subsetting).
- asInput
When TRUE (FALSE is default) output matrices match input data. Thus
valueMatrix=Matrix(data[, valueVar],ncol=1). Only possible when no colFactor.- verbose
Whether to print information during calculations. FALSE is default.
- reOrder
When TRUE (FALSE is default) output codes are ordered differently, more similar to a usual model matrix ordering.
- reduceData
When TRUE (default) unnecessary (for the aggregated result) rows of
valueMatrixare allowed to be removed.- makeRownames
When TRUE
dataDummyHierarchycontains rownames. By default, this is decided based on the parameteroutput.
Details
A key element of this function is the matrix multiplication:
outputMatrix = dataDummyHierarchy %*% valueMatrix.
The matrix, valueMatrix is a re-organized version of the valueVar vector from input. In particular,
if a variable is selected as colFactor, there is one column for each level of that variable.
The matrix, dataDummyHierarchy is constructed by crossing dummy coding of hierarchies (DummyHierarchy) and factorial variables
in a way that matches valueMatrix. The code combinations corresponding to rows and columns of dataDummyHierarchy
can be obtained as toCrossCode and fromCrossCode. In the default data frame output, the outputMatrix is stacked
to one column and combined with the code combinations of all variables.
Examples
# Data and hierarchies used in the examples
x <- SSBtoolsData("sprt_emp") # Employment in sport in thousand persons from Eurostat database
geoHier <- SSBtoolsData("sprt_emp_geoHier")
ageHier <- SSBtoolsData("sprt_emp_ageHier")
# Two hierarchies and year as rowFactor
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per")
#> age geo year ths_per
#> 1 Y15-64 Europe 2014 222.3
#> 2 Y15-64 nonEU 2014 3.3
#> 3 Y15-64 EU 2014 219.0
#> 4 Y15-64 Europe 2015 225.0
#> 5 Y15-64 nonEU 2015 3.5
#> 6 Y15-64 EU 2015 221.5
#> 7 Y15-64 Europe 2016 233.5
#> 8 Y15-64 nonEU 2016 3.8
#> 9 Y15-64 EU 2016 229.7
# Same result with year as colFactor (but columns ordered differently)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per")
#> year age geo ths_per
#> 1 2014 Y15-64 Europe 222.3
#> 2 2014 Y15-64 nonEU 3.3
#> 3 2014 Y15-64 EU 219.0
#> 4 2015 Y15-64 Europe 225.0
#> 5 2015 Y15-64 nonEU 3.5
#> 6 2015 Y15-64 EU 221.5
#> 7 2016 Y15-64 Europe 233.5
#> 8 2016 Y15-64 nonEU 3.8
#> 9 2016 Y15-64 EU 229.7
# Internally the computations are different as seen when output='matrixComponents'
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per",
output = "matrixComponents")
#> $dataDummyHierarchy
#> 9 x 18 sparse Matrix of class "dgCMatrix"
#>
#> Y15-64:Europe:2014 1 1 1 1 1 1 . . . . . . . . . . . .
#> Y15-64:nonEU:2014 . 1 . . 1 . . . . . . . . . . . . .
#> Y15-64:EU:2014 1 0 1 1 0 1 . . . . . . . . . . . .
#> Y15-64:Europe:2015 . . . . . . 1 1 1 1 1 1 . . . . . .
#> Y15-64:nonEU:2015 . . . . . . . 1 . . 1 . . . . . . .
#> Y15-64:EU:2015 . . . . . . 1 0 1 1 0 1 . . . . . .
#> Y15-64:Europe:2016 . . . . . . . . . . . . 1 1 1 1 1 1
#> Y15-64:nonEU:2016 . . . . . . . . . . . . . 1 . . 1 .
#> Y15-64:EU:2016 . . . . . . . . . . . . 1 0 1 1 0 1
#>
#> $valueMatrix
#> 18 x 1 Matrix of class "dgeMatrix"
#> ths_per
#> [1,] 66.9
#> [2,] 1.8
#> [3,] 11.6
#> [4,] 120.3
#> [5,] 1.5
#> [6,] 20.2
#> [7,] 63.4
#> [8,] 1.9
#> [9,] 14.2
#> [10,] 119.6
#> [11,] 1.6
#> [12,] 24.3
#> [13,] 69.1
#> [14,] 1.9
#> [15,] 12.7
#> [16,] 122.1
#> [17,] 1.9
#> [18,] 25.8
#>
#> $fromCrossCode
#> age geo year
#> 1 Y15-29 Spain 2014
#> 2 Y15-29 Iceland 2014
#> 3 Y15-29 Portugal 2014
#> 4 Y30-64 Spain 2014
#> 5 Y30-64 Iceland 2014
#> 6 Y30-64 Portugal 2014
#> 7 Y15-29 Spain 2015
#> 8 Y15-29 Iceland 2015
#> 9 Y15-29 Portugal 2015
#> 10 Y30-64 Spain 2015
#> 11 Y30-64 Iceland 2015
#> 12 Y30-64 Portugal 2015
#> 13 Y15-29 Spain 2016
#> 14 Y15-29 Iceland 2016
#> 15 Y15-29 Portugal 2016
#> 16 Y30-64 Spain 2016
#> 17 Y30-64 Iceland 2016
#> 18 Y30-64 Portugal 2016
#>
#> $toCrossCode
#> age geo year
#> 1 Y15-64 Europe 2014
#> 2 Y15-64 nonEU 2014
#> 3 Y15-64 EU 2014
#> 4 Y15-64 Europe 2015
#> 5 Y15-64 nonEU 2015
#> 6 Y15-64 EU 2015
#> 7 Y15-64 Europe 2016
#> 8 Y15-64 nonEU 2016
#> 9 Y15-64 EU 2016
#>
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
output = "matrixComponents")
#> $dataDummyHierarchy
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#>
#> Y15-64:Europe 1 1 1 1 1 1
#> Y15-64:nonEU 1 . . 1 . .
#> Y15-64:EU 0 1 1 0 1 1
#>
#> $valueMatrix
#> 6 x 3 sparse Matrix of class "dgCMatrix"
#> 2014 2015 2016
#> [1,] 1.8 1.9 1.9
#> [2,] 11.6 14.2 12.7
#> [3,] 66.9 63.4 69.1
#> [4,] 1.5 1.6 1.9
#> [5,] 20.2 24.3 25.8
#> [6,] 120.3 119.6 122.1
#>
#> $fromCrossCode
#> age geo
#> 1 Y15-29 Iceland
#> 2 Y15-29 Portugal
#> 3 Y15-29 Spain
#> 4 Y30-64 Iceland
#> 5 Y30-64 Portugal
#> 6 Y30-64 Spain
#>
#> $toCrossCode
#> age geo
#> 1 Y15-64 Europe
#> 2 Y15-64 nonEU
#> 3 Y15-64 EU
#>
# Include input age groups by setting inputInOutput = TRUE for this variable
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
inputInOutput = c(TRUE, FALSE))
#> year age geo ths_per
#> 1 2014 Y15-29 Europe 80.3
#> 2 2014 Y30-64 Europe 142.0
#> 3 2014 Y15-64 Europe 222.3
#> 4 2014 Y15-29 nonEU 1.8
#> 5 2014 Y30-64 nonEU 1.5
#> 6 2014 Y15-64 nonEU 3.3
#> 7 2014 Y15-29 EU 78.5
#> 8 2014 Y30-64 EU 140.5
#> 9 2014 Y15-64 EU 219.0
#> 10 2015 Y15-29 Europe 79.5
#> 11 2015 Y30-64 Europe 145.5
#> 12 2015 Y15-64 Europe 225.0
#> 13 2015 Y15-29 nonEU 1.9
#> 14 2015 Y30-64 nonEU 1.6
#> 15 2015 Y15-64 nonEU 3.5
#> 16 2015 Y15-29 EU 77.6
#> 17 2015 Y30-64 EU 143.9
#> 18 2015 Y15-64 EU 221.5
#> 19 2016 Y15-29 Europe 83.7
#> 20 2016 Y30-64 Europe 149.8
#> 21 2016 Y15-64 Europe 233.5
#> 22 2016 Y15-29 nonEU 1.9
#> 23 2016 Y30-64 nonEU 1.9
#> 24 2016 Y15-64 nonEU 3.8
#> 25 2016 Y15-29 EU 81.8
#> 26 2016 Y30-64 EU 147.9
#> 27 2016 Y15-64 EU 229.7
# Only input age groups by switching to rowFactor
HierarchyCompute(x, list(age = "rowFactor", geo = geoHier, year = "colFactor"), "ths_per")
#> year age geo ths_per
#> 1 2014 Y15-29 Europe 80.3
#> 2 2014 Y30-64 Europe 142.0
#> 3 2014 Y15-29 nonEU 1.8
#> 4 2014 Y30-64 nonEU 1.5
#> 5 2014 Y15-29 EU 78.5
#> 6 2014 Y30-64 EU 140.5
#> 7 2015 Y15-29 Europe 79.5
#> 8 2015 Y30-64 Europe 145.5
#> 9 2015 Y15-29 nonEU 1.9
#> 10 2015 Y30-64 nonEU 1.6
#> 11 2015 Y15-29 EU 77.6
#> 12 2015 Y30-64 EU 143.9
#> 13 2016 Y15-29 Europe 83.7
#> 14 2016 Y30-64 Europe 149.8
#> 15 2016 Y15-29 nonEU 1.9
#> 16 2016 Y30-64 nonEU 1.9
#> 17 2016 Y15-29 EU 81.8
#> 18 2016 Y30-64 EU 147.9
# Select some years (colFactor) including a year not in input data (zeros produced)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
colSelect = c("2014", "2016", "2018"))
#> Warning: Items in colSelect not in data[,'year'] set to zero: 2018
#> year age geo ths_per
#> 1 2014 Y15-64 Europe 222.3
#> 2 2014 Y15-64 nonEU 3.3
#> 3 2014 Y15-64 EU 219.0
#> 4 2016 Y15-64 Europe 233.5
#> 5 2016 Y15-64 nonEU 3.8
#> 6 2016 Y15-64 EU 229.7
#> 7 2018 Y15-64 Europe 0.0
#> 8 2018 Y15-64 nonEU 0.0
#> 9 2018 Y15-64 EU 0.0
# Select combinations of geo and age including a code not in data or hierarchy (zeros produced)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
rowSelect = data.frame(geo = "EU", age = c("Y0-100", "Y15-64", "Y15-29")))
#> year geo age ths_per
#> 1 2014 EU Y0-100 0.0
#> 2 2014 EU Y15-64 219.0
#> 3 2014 EU Y15-29 78.5
#> 4 2015 EU Y0-100 0.0
#> 5 2015 EU Y15-64 221.5
#> 6 2015 EU Y15-29 77.6
#> 7 2016 EU Y0-100 0.0
#> 8 2016 EU Y15-64 229.7
#> 9 2016 EU Y15-29 81.8
# Select combinations of geo, age and year
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
select = data.frame(geo = c("EU", "Spain"), age = c("Y15-64", "Y15-29"), year = 2015))
#> year age geo ths_per
#> 1 2015 Y15-64 EU 221.5
#> 2 2015 Y15-29 Spain 63.4
# Extend the hierarchy table to illustrate the effect of unionComplement
# Omit level since this is handled by autoLevel
geoHier2 <- rbind(data.frame(mapsFrom = c("EU", "Spain"), mapsTo = "EUandSpain", sign = 1),
geoHier[, -4])
# Spain is counted twice
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per")
#> year age geo ths_per
#> 1 2014 Y15-64 Europe 222.3
#> 2 2014 Y15-64 nonEU 3.3
#> 3 2014 Y15-64 EU 219.0
#> 4 2014 Y15-64 EUandSpain 406.2
#> 5 2015 Y15-64 Europe 225.0
#> 6 2015 Y15-64 nonEU 3.5
#> 7 2015 Y15-64 EU 221.5
#> 8 2015 Y15-64 EUandSpain 404.5
#> 9 2016 Y15-64 Europe 233.5
#> 10 2016 Y15-64 nonEU 3.8
#> 11 2016 Y15-64 EU 229.7
#> 12 2016 Y15-64 EUandSpain 420.9
# Can be seen in the dataDummyHierarchy matrix
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per",
output = "matrixComponents")
#> $dataDummyHierarchy
#> 4 x 6 sparse Matrix of class "dgCMatrix"
#>
#> Y15-64:Europe 1 1 1 1 1 1
#> Y15-64:nonEU 1 . . 1 . .
#> Y15-64:EU 0 1 1 0 1 1
#> Y15-64:EUandSpain 0 1 2 0 1 2
#>
#> $valueMatrix
#> 6 x 3 sparse Matrix of class "dgCMatrix"
#> 2014 2015 2016
#> [1,] 1.8 1.9 1.9
#> [2,] 11.6 14.2 12.7
#> [3,] 66.9 63.4 69.1
#> [4,] 1.5 1.6 1.9
#> [5,] 20.2 24.3 25.8
#> [6,] 120.3 119.6 122.1
#>
#> $fromCrossCode
#> age geo
#> 1 Y15-29 Iceland
#> 2 Y15-29 Portugal
#> 3 Y15-29 Spain
#> 4 Y30-64 Iceland
#> 5 Y30-64 Portugal
#> 6 Y30-64 Spain
#>
#> $toCrossCode
#> age geo
#> 1 Y15-64 Europe
#> 2 Y15-64 nonEU
#> 3 Y15-64 EU
#> 4 Y15-64 EUandSpain
#>
# With unionComplement=TRUE Spain is not counted twice
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per",
unionComplement = TRUE)
#> year age geo ths_per
#> 1 2014 Y15-64 Europe 222.3
#> 2 2014 Y15-64 nonEU 3.3
#> 3 2014 Y15-64 EU 219.0
#> 4 2014 Y15-64 EUandSpain 219.0
#> 5 2015 Y15-64 Europe 225.0
#> 6 2015 Y15-64 nonEU 3.5
#> 7 2015 Y15-64 EU 221.5
#> 8 2015 Y15-64 EUandSpain 221.5
#> 9 2016 Y15-64 Europe 233.5
#> 10 2016 Y15-64 nonEU 3.8
#> 11 2016 Y15-64 EU 229.7
#> 12 2016 Y15-64 EUandSpain 229.7
# With constantsInOutput
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
constantsInOutput = data.frame(c1 = "AB", c2 = "CD"))
#> c1 c2 year age geo ths_per
#> 1 AB CD 2014 Y15-64 Europe 222.3
#> 2 AB CD 2014 Y15-64 nonEU 3.3
#> 3 AB CD 2014 Y15-64 EU 219.0
#> 4 AB CD 2015 Y15-64 Europe 225.0
#> 5 AB CD 2015 Y15-64 nonEU 3.5
#> 6 AB CD 2015 Y15-64 EU 221.5
#> 7 AB CD 2016 Y15-64 Europe 233.5
#> 8 AB CD 2016 Y15-64 nonEU 3.8
#> 9 AB CD 2016 Y15-64 EU 229.7
# More that one valueVar
x$y <- 10*x$ths_per
HierarchyCompute(x, list(age = ageHier, geo = geoHier), c("y", "ths_per"))
#> age geo y ths_per
#> 1 Y15-64 Europe 6808 680.8
#> 2 Y15-64 nonEU 106 10.6
#> 3 Y15-64 EU 6702 670.2
