A common interface to Hierarchies2ModelMatrix
, Formula2ModelMatrix
and HierarchiesAndFormula2ModelMatrix
Usage
ModelMatrix(
data,
hierarchies = NULL,
formula = NULL,
inputInOutput = TRUE,
crossTable = FALSE,
sparse = TRUE,
viaOrdinary = FALSE,
total = "Total",
removeEmpty = !is.null(formula) & is.null(hierarchies),
modelMatrix = NULL,
dimVar = NULL,
select = NULL,
...
)
NamesFromModelMatrixInput(
data = NULL,
hierarchies = NULL,
formula = NULL,
dimVar = NULL,
...
)
Arguments
- data
Matrix or data frame with data containing codes of relevant variables
- hierarchies
List of hierarchies, which can be converted by
AutoHierarchies
. Thus, the variables can also be coded by"rowFactor"
or""
, which correspond to using the categories in the data.- formula
A model formula
- inputInOutput
Logical vector (possibly recycled) for each element of hierarchies. TRUE means that codes from input are included in output. Values corresponding to
"rowFactor"
or""
are ignored.- crossTable
Cross table in output when TRUE
- sparse
Sparse matrix in output when TRUE (default)
- viaOrdinary
When TRUE, output is generated by
model.matrix
orsparse.model.matrix
. Since these functions omit a factor level, an empty factor level is first added.- total
String(s) used to name totals
- removeEmpty
When
TRUE
, empty columns (only zeros) are not included in output. Default isTRUE
with formula input without hierarchy and otherwiseFALSE
(see details).- modelMatrix
The model matrix as input (same as output)
- dimVar
The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified.
- select
Data frame specifying variable combinations for output or a named list specifying code selections for each variable (see details).
- ...
Further arguments to
Hierarchies2ModelMatrix
,Formula2ModelMatrix
orHierarchiesAndFormula2ModelMatrix
Details
The default value of removeEmpty
corresponds to the default settings of the underlying functions.
The functions Hierarchies2ModelMatrix
and HierarchiesAndFormula2ModelMatrix
have removeEmpty
as an explicit parameter with FALSE
as default.
The function Formula2ModelMatrix
is a wrapper for FormulaSums
,
which has a parameter includeEmpty
with FALSE
as default.
Thus, ModelMatrix
makes a call to Formula2ModelMatrix
with includeEmpty = !removeEmpty
.
NamesFromModelMatrixInput
returns the names of the data columns involved in creating the model matrix.
Note that data
must be non-NULL to convert dimVar as indices to names.
The select
parameter is forwarded to Hierarchies2ModelMatrix
unless removeEmpty = TRUE
is combined with select
as a data frame.
In all other cases, select
is handled outside the underlying functions by making selections in the result.
Empty columns can be added to the model matrix when removeEmpty = FALSE
(with warning).
Examples
# Create some input
z <- SSBtoolsData("sp_emp_withEU")
ageHier <- data.frame(mapsFrom = c("young", "old"), mapsTo = "Total", sign = 1)
geoDimList <- FindDimLists(z[, c("geo", "eu")], total = "Europe")[[1]]
# Small dataset example. Two dimensions.
s <- z[z$geo == "Spain" & z$year != 2016, ]
rownames(s) <- NULL
s
#> age geo eu year value
#> 1 young Spain EU 2014 66.9
#> 2 old Spain EU 2014 120.3
#> 3 young Spain EU 2015 63.4
#> 4 old Spain EU 2015 119.6
# via Hierarchies2ModelMatrix() and converted to ordinary matrix (not sparse)
ModelMatrix(s, list(age = ageHier, year = ""), sparse = FALSE)
#> Total:2014 Total:2015 old:2014 old:2015 young:2014 young:2015
#> [1,] 1 0 0 0 1 0
#> [2,] 1 0 1 0 0 0
#> [3,] 0 1 0 0 0 1
#> [4,] 0 1 0 1 0 0
# Hierarchies generated automatically. Then via Hierarchies2ModelMatrix()
ModelMatrix(s[, c(1, 4)])
#> 4 x 9 sparse Matrix of class "dgCMatrix"
#> Total:Total Total:2014 Total:2015 old:Total old:2014 old:2015 young:Total
#> [1,] 1 1 . . . . 1
#> [2,] 1 1 . 1 1 . .
#> [3,] 1 . 1 . . . 1
#> [4,] 1 . 1 1 . 1 .
#> young:2014 young:2015
#> [1,] 1 .
#> [2,] . .
#> [3,] . 1
#> [4,] . .
# via Formula2ModelMatrix()
ModelMatrix(s, formula = ~age + year)
#> 4 x 5 sparse Matrix of class "dgCMatrix"
#> Total-Total old-Total young-Total Total-2014 Total-2015
#> [1,] 1 . 1 1 .
#> [2,] 1 1 . 1 .
#> [3,] 1 . 1 . 1
#> [4,] 1 1 . . 1
# via model.matrix() after adding empty factor levels
ModelMatrix(s, formula = ~age + year, sparse = FALSE, viaOrdinary = TRUE)
#> (Intercept) ageold ageyoung year2014 year2015
#> 1 1 0 1 1 0
#> 2 1 1 0 1 0
#> 3 1 0 1 0 1
#> 4 1 1 0 0 1
#> attr(,"assign")
#> [1] 0 1 1 2 2
#> attr(,"contrasts")
#> attr(,"contrasts")$age
#> [1] "contr.treatment"
#>
#> attr(,"contrasts")$year
#> [1] "contr.treatment"
#>
# via sparse.model.matrix() after adding empty factor levels
ModelMatrix(s, formula = ~age + year, viaOrdinary = TRUE)
#> 4 x 5 sparse Matrix of class "dgCMatrix"
#> (Intercept) ageold ageyoung year2014 year2015
#> 1 1 . 1 1 .
#> 2 1 1 . 1 .
#> 3 1 . 1 . 1
#> 4 1 1 . . 1
# via HierarchiesAndFormula2ModelMatrix() and using different data and parameter settings
ModelMatrix(s, list(age = ageHier, geo = geoDimList, year = ""), formula = ~age * geo + year,
inputInOutput = FALSE, removeEmpty = TRUE, crossTable = TRUE)
#> $modelMatrix
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#> Total-Europe-2014 Total-Europe-2015 Total-Europe-Total Total-EU-Total
#> [1,] 1 . 1 1
#> [2,] 1 . 1 1
#> [3,] . 1 1 1
#> [4,] . 1 1 1
#>
#> $crossTable
#> age geo year
#> 1 Total Europe 2014
#> 2 Total Europe 2015
#> 3 Total Europe Total
#> 4 Total EU Total
#>
ModelMatrix(s, list(age = ageHier, geo = geoDimList, year = ""), formula = ~age * geo + year,
inputInOutput = c(TRUE, FALSE), removeEmpty = FALSE, crossTable = TRUE)
#> $modelMatrix
#> 4 x 11 sparse Matrix of class "dgCMatrix"
#> [[ suppressing 11 column names ‘Total-Europe-2014’, ‘Total-Europe-2015’, ‘Total-Europe-Total’ ... ]]
#>
#> [1,] 1 . 1 1 . . . . 1 1 .
#> [2,] 1 . 1 1 . 1 1 . . . .
#> [3,] . 1 1 1 . . . . 1 1 .
#> [4,] . 1 1 1 . 1 1 . . . .
#>
#> $crossTable
#> age geo year
#> 1 Total Europe 2014
#> 2 Total Europe 2015
#> 3 Total Europe Total
#> 4 Total EU Total
#> 5 Total nonEU Total
#> 6 old Europe Total
#> 7 old EU Total
#> 8 old nonEU Total
#> 9 young Europe Total
#> 10 young EU Total
#> 11 young nonEU Total
#>
ModelMatrix(z, list(age = ageHier, geo = geoDimList, year = ""), formula = ~age * year + geo,
inputInOutput = c(FALSE, TRUE), crossTable = TRUE)
#> $modelMatrix
#> 18 x 10 sparse Matrix of class "dgCMatrix"
#> [[ suppressing 10 column names ‘Total-Total-Europe’, ‘Total-Total-Europe’, ‘Total-Total-EU’ ... ]]
#>
#> [1,] 1 1 1 . . . 1 1 . .
#> [2,] 1 1 . 1 1 . . 1 . .
#> [3,] 1 1 1 . . 1 . 1 . .
#> [4,] 1 1 1 . . . 1 1 . .
#> [5,] 1 1 . 1 1 . . 1 . .
#> [6,] 1 1 1 . . 1 . 1 . .
#> [7,] 1 1 1 . . . 1 . 1 .
#> [8,] 1 1 . 1 1 . . . 1 .
#> [9,] 1 1 1 . . 1 . . 1 .
#> [10,] 1 1 1 . . . 1 . 1 .
#> [11,] 1 1 . 1 1 . . . 1 .
#> [12,] 1 1 1 . . 1 . . 1 .
#> [13,] 1 1 1 . . . 1 . . 1
#> [14,] 1 1 . 1 1 . . . . 1
#> [15,] 1 1 1 . . 1 . . . 1
#> [16,] 1 1 1 . . . 1 . . 1
#> [17,] 1 1 . 1 1 . . . . 1
#> [18,] 1 1 1 . . 1 . . . 1
#>
#> $crossTable
#> age year geo
#> 1 Total Total Europe
#> 2 Total Total Europe
#> 3 Total Total EU
#> 4 Total Total nonEU
#> 5 Total Total Iceland
#> 6 Total Total Portugal
#> 7 Total Total Spain
#> 8 Total 2014 Europe
#> 9 Total 2015 Europe
#> 10 Total 2016 Europe
#>
# via Hierarchies2ModelMatrix() using unnamed list element. See AutoHierarchies.
colnames(ModelMatrix(z, list(age = ageHier, c(Europe = "geo", Allyears = "year", "eu"))))
#> [1] "Total:Europe:Allyears" "Total:Europe:2014"
#> [3] "Total:Europe:2015" "Total:Europe:2016"
#> [5] "Total:EU:Allyears" "Total:EU:2014"
#> [7] "Total:EU:2015" "Total:EU:2016"
#> [9] "Total:nonEU:Allyears" "Total:nonEU:2014"
#> [11] "Total:nonEU:2015" "Total:nonEU:2016"
#> [13] "Total:Iceland:Allyears" "Total:Iceland:2014"
#> [15] "Total:Iceland:2015" "Total:Iceland:2016"
#> [17] "Total:Portugal:Allyears" "Total:Portugal:2014"
#> [19] "Total:Portugal:2015" "Total:Portugal:2016"
#> [21] "Total:Spain:Allyears" "Total:Spain:2014"
#> [23] "Total:Spain:2015" "Total:Spain:2016"
#> [25] "old:Europe:Allyears" "old:Europe:2014"
#> [27] "old:Europe:2015" "old:Europe:2016"
#> [29] "old:EU:Allyears" "old:EU:2014"
#> [31] "old:EU:2015" "old:EU:2016"
#> [33] "old:nonEU:Allyears" "old:nonEU:2014"
#> [35] "old:nonEU:2015" "old:nonEU:2016"
#> [37] "old:Iceland:Allyears" "old:Iceland:2014"
#> [39] "old:Iceland:2015" "old:Iceland:2016"
#> [41] "old:Portugal:Allyears" "old:Portugal:2014"
#> [43] "old:Portugal:2015" "old:Portugal:2016"
#> [45] "old:Spain:Allyears" "old:Spain:2014"
#> [47] "old:Spain:2015" "old:Spain:2016"
#> [49] "young:Europe:Allyears" "young:Europe:2014"
#> [51] "young:Europe:2015" "young:Europe:2016"
#> [53] "young:EU:Allyears" "young:EU:2014"
#> [55] "young:EU:2015" "young:EU:2016"
#> [57] "young:nonEU:Allyears" "young:nonEU:2014"
#> [59] "young:nonEU:2015" "young:nonEU:2016"
#> [61] "young:Iceland:Allyears" "young:Iceland:2014"
#> [63] "young:Iceland:2015" "young:Iceland:2016"
#> [65] "young:Portugal:Allyears" "young:Portugal:2014"
#> [67] "young:Portugal:2015" "young:Portugal:2016"
#> [69] "young:Spain:Allyears" "young:Spain:2014"
#> [71] "young:Spain:2015" "young:Spain:2016"
colnames(ModelMatrix(z, list(age = ageHier, c("geo", "year", "eu")), total = c("t1", "t2")))
#> [1] "Total:t2:t2" "Total:t2:2014" "Total:t2:2015"
#> [4] "Total:t2:2016" "Total:EU:t2" "Total:EU:2014"
#> [7] "Total:EU:2015" "Total:EU:2016" "Total:nonEU:t2"
#> [10] "Total:nonEU:2014" "Total:nonEU:2015" "Total:nonEU:2016"
#> [13] "Total:Iceland:t2" "Total:Iceland:2014" "Total:Iceland:2015"
#> [16] "Total:Iceland:2016" "Total:Portugal:t2" "Total:Portugal:2014"
#> [19] "Total:Portugal:2015" "Total:Portugal:2016" "Total:Spain:t2"
#> [22] "Total:Spain:2014" "Total:Spain:2015" "Total:Spain:2016"
#> [25] "old:t2:t2" "old:t2:2014" "old:t2:2015"
#> [28] "old:t2:2016" "old:EU:t2" "old:EU:2014"
#> [31] "old:EU:2015" "old:EU:2016" "old:nonEU:t2"
#> [34] "old:nonEU:2014" "old:nonEU:2015" "old:nonEU:2016"
#> [37] "old:Iceland:t2" "old:Iceland:2014" "old:Iceland:2015"
#> [40] "old:Iceland:2016" "old:Portugal:t2" "old:Portugal:2014"
#> [43] "old:Portugal:2015" "old:Portugal:2016" "old:Spain:t2"
#> [46] "old:Spain:2014" "old:Spain:2015" "old:Spain:2016"
#> [49] "young:t2:t2" "young:t2:2014" "young:t2:2015"
#> [52] "young:t2:2016" "young:EU:t2" "young:EU:2014"
#> [55] "young:EU:2015" "young:EU:2016" "young:nonEU:t2"
#> [58] "young:nonEU:2014" "young:nonEU:2015" "young:nonEU:2016"
#> [61] "young:Iceland:t2" "young:Iceland:2014" "young:Iceland:2015"
#> [64] "young:Iceland:2016" "young:Portugal:t2" "young:Portugal:2014"
#> [67] "young:Portugal:2015" "young:Portugal:2016" "young:Spain:t2"
#> [70] "young:Spain:2014" "young:Spain:2015" "young:Spain:2016"
# Example using the select parameter as a data frame
select <- data.frame(age = c("Total", "young", "old"), geo = c("EU", "nonEU", "Spain"))
ModelMatrix(z, list(age = ageHier, geo = geoDimList),
select = select, crossTable = TRUE)$crossTable
#> age geo
#> 1 Total EU
#> 2 young nonEU
#> 3 old Spain
# Examples using the select parameter as a list
ModelMatrix(z, list(age = ageHier, geo = geoDimList), inputInOutput = FALSE,
select = list(geo = c("nonEU", "Portugal")), crossTable = TRUE)$crossTable
#> age geo
#> 1 Total nonEU
#> 2 Total Portugal
ModelMatrix(z, list(age = ageHier, geo = geoDimList),
select = list(geo = c("nonEU", "Portugal"), age = c("Total", "young")),
crossTable = TRUE)$crossTable
#> age geo
#> 1 Total nonEU
#> 2 Total Portugal
#> 3 young nonEU
#> 4 young Portugal
# Using NAomit parameter avalable in Formula2ModelMatrix()
s$age[1] <- NA
ModelMatrix(s, formula = ~age + year)
#> 4 x 5 sparse Matrix of class "dgCMatrix"
#> Total-Total old-Total young-Total Total-2014 Total-2015
#> [1,] 1 . . 1 .
#> [2,] 1 1 . 1 .
#> [3,] 1 . 1 . 1
#> [4,] 1 1 . . 1
ModelMatrix(s, formula = ~age + year, NAomit = FALSE)
#> 4 x 6 sparse Matrix of class "dgCMatrix"
#> Total-Total old-Total young-Total NA-Total Total-2014 Total-2015
#> [1,] 1 . . 1 1 .
#> [2,] 1 1 . . 1 .
#> [3,] 1 . 1 . . 1
#> [4,] 1 1 . . . 1