Skip to contents

Usage

ModelMatrix(
  data,
  hierarchies = NULL,
  formula = NULL,
  inputInOutput = TRUE,
  crossTable = FALSE,
  sparse = TRUE,
  viaOrdinary = FALSE,
  total = "Total",
  removeEmpty = !is.null(formula) & is.null(hierarchies),
  modelMatrix = NULL,
  dimVar = NULL,
  select = NULL,
  ...
)

NamesFromModelMatrixInput(
  data = NULL,
  hierarchies = NULL,
  formula = NULL,
  dimVar = NULL,
  ...
)

Arguments

data

Matrix or data frame with data containing codes of relevant variables

hierarchies

List of hierarchies, which can be converted by AutoHierarchies. Thus, the variables can also be coded by "rowFactor" or "", which correspond to using the categories in the data.

formula

A model formula

inputInOutput

Logical vector (possibly recycled) for each element of hierarchies. TRUE means that codes from input are included in output. Values corresponding to "rowFactor" or "" are ignored.

crossTable

Cross table in output when TRUE

sparse

Sparse matrix in output when TRUE (default)

viaOrdinary

When TRUE, output is generated by model.matrix or sparse.model.matrix. Since these functions omit a factor level, an empty factor level is first added.

total

String(s) used to name totals

removeEmpty

When TRUE, empty columns (only zeros) are not included in output. Default is TRUE with formula input without hierarchy and otherwise FALSE (see details).

modelMatrix

The model matrix as input (same as output)

dimVar

The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified.

select

Data frame specifying variable combinations for output or a named list specifying code selections for each variable (see details).

...

Further arguments to Hierarchies2ModelMatrix, Formula2ModelMatrix or HierarchiesAndFormula2ModelMatrix

Value

A (sparse) model matrix or a list of two elements (model matrix and cross table)

Details

The default value of removeEmpty corresponds to the default settings of the underlying functions. The functions Hierarchies2ModelMatrix and HierarchiesAndFormula2ModelMatrix have removeEmpty as an explicit parameter with FALSE as default. The function Formula2ModelMatrix is a wrapper for FormulaSums, which has a parameter includeEmpty with FALSE as default. Thus, ModelMatrix makes a call to Formula2ModelMatrix with includeEmpty = !removeEmpty.

NamesFromModelMatrixInput returns the names of the data columns involved in creating the model matrix. Note that data must be non-NULL to convert dimVar as indices to names.

The select parameter is forwarded to Hierarchies2ModelMatrix unless removeEmpty = TRUE is combined with select as a data frame. In all other cases, select is handled outside the underlying functions by making selections in the result. Empty columns can be added to the model matrix when removeEmpty = FALSE (with warning).

See also

Author

Øyvind Langsrud

Examples

# Create some input
z <- SSBtoolsData("sp_emp_withEU")
ageHier <- data.frame(mapsFrom = c("young", "old"), mapsTo = "Total", sign = 1)
geoDimList <- FindDimLists(z[, c("geo", "eu")], total = "Europe")[[1]]

# Small dataset example. Two dimensions.
s <- z[z$geo == "Spain" & z$year != 2016, ]
rownames(s) <- NULL
s
#>     age   geo eu year value
#> 1 young Spain EU 2014  66.9
#> 2   old Spain EU 2014 120.3
#> 3 young Spain EU 2015  63.4
#> 4   old Spain EU 2015 119.6

# via Hierarchies2ModelMatrix() and converted to ordinary matrix (not sparse)
ModelMatrix(s, list(age = ageHier, year = ""), sparse = FALSE)
#>      Total:2014 Total:2015 old:2014 old:2015 young:2014 young:2015
#> [1,]          1          0        0        0          1          0
#> [2,]          1          0        1        0          0          0
#> [3,]          0          1        0        0          0          1
#> [4,]          0          1        0        1          0          0

# Hierarchies generated automatically. Then via Hierarchies2ModelMatrix()
ModelMatrix(s[, c(1, 4)])
#> 4 x 9 sparse Matrix of class "dgCMatrix"
#>      Total:Total Total:2014 Total:2015 old:Total old:2014 old:2015 young:Total
#> [1,]           1          1          .         .        .        .           1
#> [2,]           1          1          .         1        1        .           .
#> [3,]           1          .          1         .        .        .           1
#> [4,]           1          .          1         1        .        1           .
#>      young:2014 young:2015
#> [1,]          1          .
#> [2,]          .          .
#> [3,]          .          1
#> [4,]          .          .

# via Formula2ModelMatrix()
ModelMatrix(s, formula = ~age + year)
#> 4 x 5 sparse Matrix of class "dgCMatrix"
#>      Total-Total old-Total young-Total Total-2014 Total-2015
#> [1,]           1         .           1          1          .
#> [2,]           1         1           .          1          .
#> [3,]           1         .           1          .          1
#> [4,]           1         1           .          .          1

# via model.matrix() after adding empty factor levels
ModelMatrix(s, formula = ~age + year, sparse = FALSE, viaOrdinary = TRUE)
#>   (Intercept) ageold ageyoung year2014 year2015
#> 1           1      0        1        1        0
#> 2           1      1        0        1        0
#> 3           1      0        1        0        1
#> 4           1      1        0        0        1
#> attr(,"assign")
#> [1] 0 1 1 2 2
#> attr(,"contrasts")
#> attr(,"contrasts")$age
#> [1] "contr.treatment"
#> 
#> attr(,"contrasts")$year
#> [1] "contr.treatment"
#> 

# via sparse.model.matrix() after adding empty factor levels
ModelMatrix(s, formula = ~age + year, viaOrdinary = TRUE)
#> 4 x 5 sparse Matrix of class "dgCMatrix"
#>   (Intercept) ageold ageyoung year2014 year2015
#> 1           1      .        1        1        .
#> 2           1      1        .        1        .
#> 3           1      .        1        .        1
#> 4           1      1        .        .        1

# via HierarchiesAndFormula2ModelMatrix() and using different data and parameter settings
ModelMatrix(s, list(age = ageHier, geo = geoDimList, year = ""), formula = ~age * geo + year, 
            inputInOutput = FALSE, removeEmpty = TRUE, crossTable = TRUE)
#> $modelMatrix
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>      Total-Europe-2014 Total-Europe-2015 Total-Europe-Total Total-EU-Total
#> [1,]                 1                 .                  1              1
#> [2,]                 1                 .                  1              1
#> [3,]                 .                 1                  1              1
#> [4,]                 .                 1                  1              1
#> 
#> $crossTable
#>     age    geo  year
#> 1 Total Europe  2014
#> 2 Total Europe  2015
#> 3 Total Europe Total
#> 4 Total     EU Total
#> 
ModelMatrix(s, list(age = ageHier, geo = geoDimList, year = ""), formula = ~age * geo + year, 
            inputInOutput = c(TRUE, FALSE), removeEmpty = FALSE, crossTable = TRUE)
#> $modelMatrix
#> 4 x 11 sparse Matrix of class "dgCMatrix"
#>   [[ suppressing 11 column names ‘Total-Europe-2014’, ‘Total-Europe-2015’, ‘Total-Europe-Total’ ... ]]
#>                           
#> [1,] 1 . 1 1 . . . . 1 1 .
#> [2,] 1 . 1 1 . 1 1 . . . .
#> [3,] . 1 1 1 . . . . 1 1 .
#> [4,] . 1 1 1 . 1 1 . . . .
#> 
#> $crossTable
#>      age    geo  year
#> 1  Total Europe  2014
#> 2  Total Europe  2015
#> 3  Total Europe Total
#> 4  Total     EU Total
#> 5  Total  nonEU Total
#> 6    old Europe Total
#> 7    old     EU Total
#> 8    old  nonEU Total
#> 9  young Europe Total
#> 10 young     EU Total
#> 11 young  nonEU Total
#> 
ModelMatrix(z, list(age = ageHier, geo = geoDimList, year = ""), formula = ~age * year + geo, 
            inputInOutput = c(FALSE, TRUE), crossTable = TRUE)
#> $modelMatrix
#> 18 x 10 sparse Matrix of class "dgCMatrix"
#>   [[ suppressing 10 column names ‘Total-Total-Europe’, ‘Total-Total-Europe’, ‘Total-Total-EU’ ... ]]
#>                          
#>  [1,] 1 1 1 . . . 1 1 . .
#>  [2,] 1 1 . 1 1 . . 1 . .
#>  [3,] 1 1 1 . . 1 . 1 . .
#>  [4,] 1 1 1 . . . 1 1 . .
#>  [5,] 1 1 . 1 1 . . 1 . .
#>  [6,] 1 1 1 . . 1 . 1 . .
#>  [7,] 1 1 1 . . . 1 . 1 .
#>  [8,] 1 1 . 1 1 . . . 1 .
#>  [9,] 1 1 1 . . 1 . . 1 .
#> [10,] 1 1 1 . . . 1 . 1 .
#> [11,] 1 1 . 1 1 . . . 1 .
#> [12,] 1 1 1 . . 1 . . 1 .
#> [13,] 1 1 1 . . . 1 . . 1
#> [14,] 1 1 . 1 1 . . . . 1
#> [15,] 1 1 1 . . 1 . . . 1
#> [16,] 1 1 1 . . . 1 . . 1
#> [17,] 1 1 . 1 1 . . . . 1
#> [18,] 1 1 1 . . 1 . . . 1
#> 
#> $crossTable
#>      age  year      geo
#> 1  Total Total   Europe
#> 2  Total Total   Europe
#> 3  Total Total       EU
#> 4  Total Total    nonEU
#> 5  Total Total  Iceland
#> 6  Total Total Portugal
#> 7  Total Total    Spain
#> 8  Total  2014   Europe
#> 9  Total  2015   Europe
#> 10 Total  2016   Europe
#> 
            
# via Hierarchies2ModelMatrix() using unnamed list element. See AutoHierarchies.             
colnames(ModelMatrix(z, list(age = ageHier, c(Europe = "geo", Allyears = "year", "eu"))))
#>  [1] "Total:Europe:Allyears"   "Total:Europe:2014"      
#>  [3] "Total:Europe:2015"       "Total:Europe:2016"      
#>  [5] "Total:EU:Allyears"       "Total:EU:2014"          
#>  [7] "Total:EU:2015"           "Total:EU:2016"          
#>  [9] "Total:nonEU:Allyears"    "Total:nonEU:2014"       
#> [11] "Total:nonEU:2015"        "Total:nonEU:2016"       
#> [13] "Total:Iceland:Allyears"  "Total:Iceland:2014"     
#> [15] "Total:Iceland:2015"      "Total:Iceland:2016"     
#> [17] "Total:Portugal:Allyears" "Total:Portugal:2014"    
#> [19] "Total:Portugal:2015"     "Total:Portugal:2016"    
#> [21] "Total:Spain:Allyears"    "Total:Spain:2014"       
#> [23] "Total:Spain:2015"        "Total:Spain:2016"       
#> [25] "old:Europe:Allyears"     "old:Europe:2014"        
#> [27] "old:Europe:2015"         "old:Europe:2016"        
#> [29] "old:EU:Allyears"         "old:EU:2014"            
#> [31] "old:EU:2015"             "old:EU:2016"            
#> [33] "old:nonEU:Allyears"      "old:nonEU:2014"         
#> [35] "old:nonEU:2015"          "old:nonEU:2016"         
#> [37] "old:Iceland:Allyears"    "old:Iceland:2014"       
#> [39] "old:Iceland:2015"        "old:Iceland:2016"       
#> [41] "old:Portugal:Allyears"   "old:Portugal:2014"      
#> [43] "old:Portugal:2015"       "old:Portugal:2016"      
#> [45] "old:Spain:Allyears"      "old:Spain:2014"         
#> [47] "old:Spain:2015"          "old:Spain:2016"         
#> [49] "young:Europe:Allyears"   "young:Europe:2014"      
#> [51] "young:Europe:2015"       "young:Europe:2016"      
#> [53] "young:EU:Allyears"       "young:EU:2014"          
#> [55] "young:EU:2015"           "young:EU:2016"          
#> [57] "young:nonEU:Allyears"    "young:nonEU:2014"       
#> [59] "young:nonEU:2015"        "young:nonEU:2016"       
#> [61] "young:Iceland:Allyears"  "young:Iceland:2014"     
#> [63] "young:Iceland:2015"      "young:Iceland:2016"     
#> [65] "young:Portugal:Allyears" "young:Portugal:2014"    
#> [67] "young:Portugal:2015"     "young:Portugal:2016"    
#> [69] "young:Spain:Allyears"    "young:Spain:2014"       
#> [71] "young:Spain:2015"        "young:Spain:2016"       
colnames(ModelMatrix(z, list(age = ageHier, c("geo", "year", "eu")), total = c("t1", "t2")))
#>  [1] "Total:t2:t2"         "Total:t2:2014"       "Total:t2:2015"      
#>  [4] "Total:t2:2016"       "Total:EU:t2"         "Total:EU:2014"      
#>  [7] "Total:EU:2015"       "Total:EU:2016"       "Total:nonEU:t2"     
#> [10] "Total:nonEU:2014"    "Total:nonEU:2015"    "Total:nonEU:2016"   
#> [13] "Total:Iceland:t2"    "Total:Iceland:2014"  "Total:Iceland:2015" 
#> [16] "Total:Iceland:2016"  "Total:Portugal:t2"   "Total:Portugal:2014"
#> [19] "Total:Portugal:2015" "Total:Portugal:2016" "Total:Spain:t2"     
#> [22] "Total:Spain:2014"    "Total:Spain:2015"    "Total:Spain:2016"   
#> [25] "old:t2:t2"           "old:t2:2014"         "old:t2:2015"        
#> [28] "old:t2:2016"         "old:EU:t2"           "old:EU:2014"        
#> [31] "old:EU:2015"         "old:EU:2016"         "old:nonEU:t2"       
#> [34] "old:nonEU:2014"      "old:nonEU:2015"      "old:nonEU:2016"     
#> [37] "old:Iceland:t2"      "old:Iceland:2014"    "old:Iceland:2015"   
#> [40] "old:Iceland:2016"    "old:Portugal:t2"     "old:Portugal:2014"  
#> [43] "old:Portugal:2015"   "old:Portugal:2016"   "old:Spain:t2"       
#> [46] "old:Spain:2014"      "old:Spain:2015"      "old:Spain:2016"     
#> [49] "young:t2:t2"         "young:t2:2014"       "young:t2:2015"      
#> [52] "young:t2:2016"       "young:EU:t2"         "young:EU:2014"      
#> [55] "young:EU:2015"       "young:EU:2016"       "young:nonEU:t2"     
#> [58] "young:nonEU:2014"    "young:nonEU:2015"    "young:nonEU:2016"   
#> [61] "young:Iceland:t2"    "young:Iceland:2014"  "young:Iceland:2015" 
#> [64] "young:Iceland:2016"  "young:Portugal:t2"   "young:Portugal:2014"
#> [67] "young:Portugal:2015" "young:Portugal:2016" "young:Spain:t2"     
#> [70] "young:Spain:2014"    "young:Spain:2015"    "young:Spain:2016"   

# Example using the select parameter as a data frame
select <- data.frame(age = c("Total", "young", "old"), geo = c("EU", "nonEU", "Spain"))
ModelMatrix(z, list(age = ageHier, geo = geoDimList), 
            select = select, crossTable = TRUE)$crossTable
#>     age   geo
#> 1 Total    EU
#> 2 young nonEU
#> 3   old Spain
            
# Examples using the select parameter as a list
ModelMatrix(z, list(age = ageHier, geo = geoDimList), inputInOutput = FALSE, 
            select = list(geo = c("nonEU", "Portugal")), crossTable = TRUE)$crossTable
#>     age      geo
#> 1 Total    nonEU
#> 2 Total Portugal
ModelMatrix(z, list(age = ageHier, geo = geoDimList), 
            select = list(geo = c("nonEU", "Portugal"), age = c("Total", "young")), 
            crossTable = TRUE)$crossTable
#>     age      geo
#> 1 Total    nonEU
#> 2 Total Portugal
#> 3 young    nonEU
#> 4 young Portugal

# Using NAomit parameter avalable in Formula2ModelMatrix()
s$age[1] <- NA
ModelMatrix(s, formula = ~age + year)
#> 4 x 5 sparse Matrix of class "dgCMatrix"
#>      Total-Total old-Total young-Total Total-2014 Total-2015
#> [1,]           1         .           .          1          .
#> [2,]           1         1           .          1          .
#> [3,]           1         .           1          .          1
#> [4,]           1         1           .          .          1
ModelMatrix(s, formula = ~age + year, NAomit = FALSE)
#> 4 x 6 sparse Matrix of class "dgCMatrix"
#>      Total-Total old-Total young-Total NA-Total Total-2014 Total-2015
#> [1,]           1         .           .        1          1          .
#> [2,]           1         1           .        .          1          .
#> [3,]           1         .           1        .          .          1
#> [4,]           1         1           .        .          .          1