Introduction to PxWebApiData
Øyvind Langsrud, Jan Bruusgaard, Solveig Bjørkholt and Susie Jentoft
2024-11-02
Source:vignettes/Introduction.Rmd
Introduction.Rmd
Preface
An introduction to the R-package PxWebApiData is given below. Six
calls to the main function, ApiData
, are demonstrated.
First, two calls for reading data sets are shown. The third call
captures meta data. However, in practise, one may look at the meta data
first. Then three more examples and some background is given.
Specification by variable indices and variable id’s
The dataset below has three variables, Region, ContentsCode and Tid. The variables can be used as input parameters. Here two of the parameters are specified by variable id’s and one parameter is specified by indices. Negative values are used to specify reversed indices. Thus, we here obtain the two first and the two last years in the data.
A list of two data frames is returned; the label version and the id version.
ApiData("http://data.ssb.no/api/v0/en/table/04861",
Region = c("1103", "0301"), ContentsCode = "Bosatte", Tid = c(1, 2, -2, -1))
$`04861: Area and population of urban settlements, by region, contents and year`
region contents year value
1 Oslo municipality Number of residents 2000 504348
2 Oslo municipality Number of residents 2002 508134
3 Oslo municipality Number of residents 2023 705945
4 Oslo municipality Number of residents 2024 714630
5 Stavanger Number of residents 2000 106804
6 Stavanger Number of residents 2002 108271
7 Stavanger Number of residents 2023 140012
8 Stavanger Number of residents 2024 142897
$dataset
Region ContentsCode Tid value
1 0301 Bosatte 2000 504348
2 0301 Bosatte 2002 508134
3 0301 Bosatte 2023 705945
4 0301 Bosatte 2024 714630
5 1103 Bosatte 2000 106804
6 1103 Bosatte 2002 108271
7 1103 Bosatte 2023 140012
8 1103 Bosatte 2024 142897
To return a single dataset with only labels use the function
ApiData1
. The function Apidata2
returns only
id’s. To return a dataset with both labels and id’s in one dataframe use
ApiData12
.
ApiData12("http://data.ssb.no/api/v0/en/table/04861",
Region = c("1103", "0301"), ContentsCode = "Bosatte", Tid = c(1, 2, -2, -1))
region contents year Region ContentsCode Tid value
1 Oslo municipality Number of residents 2000 0301 Bosatte 2000 504348
2 Oslo municipality Number of residents 2002 0301 Bosatte 2002 508134
3 Oslo municipality Number of residents 2023 0301 Bosatte 2023 705945
4 Oslo municipality Number of residents 2024 0301 Bosatte 2024 714630
5 Stavanger Number of residents 2000 1103 Bosatte 2000 106804
6 Stavanger Number of residents 2002 1103 Bosatte 2002 108271
7 Stavanger Number of residents 2023 1103 Bosatte 2023 140012
8 Stavanger Number of residents 2024 1103 Bosatte 2024 142897
Specification by TRUE, FALSE and imaginary values (e.g. 3i).
All possible values is obtained by TRUE and corresponds to filter
"all": "*"
in the api query. Elimination of a variable is
obtained by FALSE. An imaginary value corresponds to filter
"top"
in the api query.
x <- ApiData("http://data.ssb.no/api/v0/en/table/04861",
Region = FALSE, ContentsCode = TRUE, Tid = 3i)
To show either label version or id version
x[[1]]
contents year value
1 Area of urban settlements (km²) 2022 2250.94
2 Area of urban settlements (km²) 2023 2266.99
3 Area of urban settlements (km²) 2024 2279.97
4 Number of residents 2022 4485236.00
5 Number of residents 2023 4554562.00
6 Number of residents 2024 4619969.00
x[[2]]
ContentsCode Tid value
1 Areal 2022 2250.94
2 Areal 2023 2266.99
3 Areal 2024 2279.97
4 Bosatte 2022 4485236.00
5 Bosatte 2023 4554562.00
6 Bosatte 2024 4619969.00
Show additional information
comment
list additional dataset information: Title,
latest update and source.
comment(x)
label
"04861: Area and population of urban settlements, by contents and year"
source
"Statistics Norway"
updated
"2024-10-01T06:00:00Z"
tableid
"04861"
contents
"04861: Area and population of urban settlements,"
Obtaining meta data
Meta information about the data set can be obtained by
returnMetaFrames = TRUE
.
ApiData("http://data.ssb.no/api/v0/en/table/04861", returnMetaFrames = TRUE)
$Region
values valueTexts
1 3101 Halden
2 3103 Moss
3 3105 Sarpsborg
4 3107 Fredrikstad
5 3110 Hvaler
6 3112 Råde
7 3114 Våler (Østfold)
8 3116 Skiptvet
9 3118 Indre Østfold
10 3120 Rakkestad
11 3122 Marker
12 3124 Aremark
13 3201 Bærum
14 3203 Asker
15 3205 Lillestrøm
16 3207 Nordre Follo
17 3209 Ullensaker
18 3212 Nesodden
19 3214 Frogn
20 3216 Vestby
21 3218 Ås
22 3220 Enebakk
[ reached 'max' / getOption("max.print") -- omitted 914 rows ]
$ContentsCode
values valueTexts
1 Areal Area of urban settlements (km²)
2 Bosatte Number of residents
$Tid
values valueTexts
1 2000 2000
2 2002 2002
3 2003 2003
4 2004 2004
5 2005 2005
6 2006 2006
7 2007 2007
8 2008 2008
9 2009 2009
10 2011 2011
11 2012 2012
12 2013 2013
13 2014 2014
14 2015 2015
15 2016 2016
16 2017 2017
17 2018 2018
18 2019 2019
19 2020 2020
20 2021 2021
21 2022 2022
22 2023 2023
[ reached 'max' / getOption("max.print") -- omitted 1 rows ]
attr(,"text")
Region ContentsCode Tid
"region" "contents" "year"
attr(,"elimination")
Region ContentsCode Tid
TRUE FALSE FALSE
attr(,"time")
Region ContentsCode Tid
FALSE FALSE TRUE
Aggregations using filter agg:
PxWebApi offers two more filters for groupings, agg:
and
vs:
. You can see these filters in the code “API Query for
this table” when you have made a table in PxWeb.
agg
: is used for readymade aggregation groupings.
This example shows the use of aggregation in age groups and
aggregated timeseries for the new Norwegian municipality structure from
2020. Also note the url where /en
is replaced by
/no
. That returns labels in Norwegian instead of in
English.
ApiData("http://data.ssb.no/api/v0/no/table/07459",
Region = list("agg:KommSummer", c("K-3101", "K-3103")),
Tid = 4i,
Alder = list("agg:TodeltGrupperingB", c("H17", "H18")),
Kjonn = TRUE)
$`07459: Befolkning, etter region, kjønn, alder, statistikkvariabel og år`
region kjønn alder statistikkvariabel år value
1 Halden Kvinner 0-17 år Personer 2021 2978
2 Halden Kvinner 0-17 år Personer 2022 2937
3 Halden Kvinner 0-17 år Personer 2023 2933
4 Halden Kvinner 0-17 år Personer 2024 2944
5 Halden Kvinner 18 år eller eldre Personer 2021 12587
6 Halden Kvinner 18 år eller eldre Personer 2022 12626
7 Halden Kvinner 18 år eller eldre Personer 2023 12787
[ reached 'max' / getOption("max.print") -- omitted 25 rows ]
$dataset
Region Kjonn Alder ContentsCode Tid value
1 K-3101 2 H17 Personer1 2021 2978
2 K-3101 2 H17 Personer1 2022 2937
3 K-3101 2 H17 Personer1 2023 2933
4 K-3101 2 H17 Personer1 2024 2944
5 K-3101 2 H18 Personer1 2021 12587
6 K-3101 2 H18 Personer1 2022 12626
7 K-3101 2 H18 Personer1 2023 12787
[ reached 'max' / getOption("max.print") -- omitted 25 rows ]
There are two limitations in the PxWebApi using these filters.
- The name of the filter and the id’s are not shown in metadata, only in the code “API Query for this table”.
- The filters
agg:
andvs:
can only take single elements as input. Filter"all":"*"
eg. TRUE, does not work with agg: and vs:.
The other filter vs:
, specify the grouping value sets,
which is a part of the value pool. As it is only possible to give single
elements as input, it is easier to query the value pool. This means that
vs:
is redundant.
In this example Region is the value pool and Fylker (counties) is the
value set. As vs:Fylker
is redundant, both will return the
same:
Return the API query as JSON
In PxWebApi the original query is formulated as JSON. Using the parameter returnApiQuery is useful for debugging.
ApiData("http://data.ssb.no/api/v0/en/table/04861", returnApiQuery = TRUE)
{
"query": [
{
"code": "Region",
"selection": {
"filter": "item",
"values": ["3101", "2399", "9999"]
}
},
{
"code": "ContentsCode",
"selection": {
"filter": "item",
"values": ["Areal", "Bosatte"]
}
},
{
"code": "Tid",
"selection": {
"filter": "item",
"values": ["2000", "2023", "2024"]
}
}
],
"response": {
"format": "json-stat2"
}
}
To convert an original JSON API query to a PxWebApiData query there is also a simple webpage PxWebApiData call creator.
Readymade datasets by GetApiData
Statistics Norway also provides an API with readymade datasets,
available by http GET. The data is most easily retrieved with the
GetApiData
function, which is the same as using the
parameter getDataByGET = TRUE
in the ApiData
function. This dataset is from Statistics Norway’s Economic trends
forecasts.
x <- GetApiData("https://data.ssb.no/api/v0/dataset/934516.json?lang=en")
x[[1]]
year contents value
1 2024 Gross domestic product 1.0
2 2024 GDP Mainland Norway 0.7
3 2024 Employed persons 0.5
4 2024 Unemployment rate (level) 4.1
5 2024 Wages per standard man-year 5.3
6 2024 Consumer price index (CPI) 3.4
7 2024 CPI-ATE 3.9
8 2024 Housing prices 2.5
9 2024 Money market rate (level) 4.7
10 2024 Import-weighted NOK exchange rate (44 countries) 1.0
11 2025 Gross domestic product 1.3
12 2025 GDP Mainland Norway 2.1
13 2025 Employed persons 0.7
14 2025 Unemployment rate (level) 4.1
15 2025 Wages per standard man-year 4.6
[ reached 'max' / getOption("max.print") -- omitted 25 rows ]
comment(x)
label
"12880: Main economic indicators. Accounts and forecasts, by year and contents"
source
"Statistics Norway"
updated
"2024-09-13T06:00:00Z"
Eurostat data
Eurostat REST API offers JSON-stat version 2. It is possible to use
this package to obtain data from Eurostat by using
GetApiData
or the similar functions with 1
,
2
or 12
at the end
This example shows HICP total index, latest two periods for EU and Norway. See Eurostat for more.
urlEurostat <- paste0( # Here the long url is split into several lines using paste0
"https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/prc_hicp_mv12r",
"?format=JSON&lang=EN&lastTimePeriod=2&coicop=CP00&geo=NO&geo=EU")
urlEurostat
[1] "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/prc_hicp_mv12r?format=JSON&lang=EN&lastTimePeriod=2&coicop=CP00&geo=NO&geo=EU"
GetApiData12(urlEurostat)
No encoding supplied: defaulting to UTF-8.
Time frequency Unit of measure
1 Monthly Moving 12 months average rate of change
2 Monthly Moving 12 months average rate of change
3 Monthly Moving 12 months average rate of change
4 Monthly Moving 12 months average rate of change
Classification of individual consumption by purpose (COICOP)
1 All-items HICP
2 All-items HICP
3 All-items HICP
4 All-items HICP
Geopolitical entity (reporting)
1 European Union (EU6-1958, EU9-1973, EU10-1981, EU12-1986, EU15-1995, EU25-2004, EU27-2007, EU28-2013, EU27-2020)
2 European Union (EU6-1958, EU9-1973, EU10-1981, EU12-1986, EU15-1995, EU25-2004, EU27-2007, EU28-2013, EU27-2020)
3 Norway
4 Norway
Time freq unit coicop geo time value
1 2024-08 M RCH_MV12MAVR CP00 EU 2024-08 3.1
2 2024-09 M RCH_MV12MAVR CP00 EU 2024-09 2.8
3 2024-08 M RCH_MV12MAVR CP00 NO 2024-08 3.4
4 2024-09 M RCH_MV12MAVR CP00 NO 2024-09 3.4
Practical example
We would like to extract the number of female R&D personel in the services sector of the Norwegian business life for the years 2019 and 2020.
Locate the relevant table at https://www.ssb.no that contains information on R&D personel. Having obtained the relevant table, table 07964, we create the link https://data.ssb.no/api/v0/no/table/07964/
Load the package.
- Check which variables that exist in the data.
variables <- ApiData("https://data.ssb.no/api/v0/no/table/07964/",
returnMetaFrames = TRUE)
names(variables)
[1] "NACE2007" "ContentsCode" "Tid"
- Check which values each variable contains.
values <- ApiData("https://data.ssb.no/api/v0/no/table/07964/",
returnMetaData = TRUE)
values[[1]]$values
[1] "A-N" "A03" "B05-B09" "B06_B09.1" "C" "C10-C11"
[7] "C13" "C14-C15" "C16" "C17" "C18" "C19-C20"
[13] "C21" "C22" "C23" "C24" "C25" "C26"
[19] "C26.3" "C26.5" "C27" "C28" "C29" "C30"
[25] "C30.1" "C31" "C32" "C32.5" "C33" "D35"
[31] "E36-E39" "F41-F43" "G-N" "G46" "H49-H53" "J58"
[37] "J58.2" "J59-J60" "J61" "J62" "J63" "K64-K66"
[43] "M70" "M71" "M72"
[ reached getOption("max.print") -- omitted 2 entries ]
values[[2]]$values
[1] "EnhetTot" "EnheterFoU" "FoUpersonale"
[4] "KvinneligFoUpers" "FoUPersonaleUoHutd" "FoUPersonaleDoktor"
[7] "FoUArsverk" "FoUArsverkPers" "FoUArsverkUtd"
values[[3]]$values
[1] "2007" "2008" "2009" "2010" "2011" "2012" "2013" "2014" "2015" "2016"
[11] "2017" "2018" "2019" "2020" "2021" "2022"
- Define these variables in the query to sort out the values we want.
mydata <- ApiData("https://data.ssb.no/api/v0/en/table/07964/",
Tid = c("2021", "2022"), # Define year to 2021 and 2022
NACE2007 = "G-N", # Define the services sector
ContentsCode = c("KvinneligFoUpers")) # Define women R&D personell
mydata <- mydata[[1]] # Extract the first list element, which contains full variable names.
head(mydata)
industry (SIC2007) contents year value
1 Services total Female R&D personnel 2021 4904
2 Services total Female R&D personnel 2022 5449
- Show additional information.
comment(mydata)
label
"07964: O07964: R&D personnel and R&D full-time equivalents (FTE) in Business Enterprise sector, by industry (SIC2007), contents and year"
source
"Statistics Norway"
updated
"2024-02-16T07:00:00Z"
tableid
"07964"
contents
"07964: O07964: R&D personnel and R&D full-time equivalents (FTE) in Business Enterprise sector,"
Background
PxWeb and it’s API, PxWebApi is used as output database (Statbank) by many statistical agencies in the Nordic countries and several others, i.e. Statistics Norway, Statistics Finland, Statistics Sweden. See list of installations.
For hints on using PxWebApi in general see PxWebApi User Guide.