Find equivalent sets of codes in a Klass classification
Source:R/find_equivalents.R
find_equivalent_codes.RdFind which codes should be combined to reconstruct older or newer codes across multiple dates. Creates groups dynamically to fit the provided date range, with flexible labelling functionality.
Usage
find_equivalent_codes(
classification,
dates,
labels = TRUE,
graph = klass_graph(classification),
date_format = "%Y"
)
find_equivalents(
classification,
dates,
labels = TRUE,
graph = klass_graph(classification),
date_format = "%Y"
)Arguments
- classification
The Klass classification to be used
- dates
The dates that equivalent sets of codes should be found for.
- labels
TRUE,FALSEor a named list of functions.This parameter controls whether or not to add group labels to groups of equivalent sets. If
TRUE, labels are constructed using the codes valid at the latest provided date, and comma separated like in the example below. By default, labels will be placed in a column namedlabel.If
FALSE, no labels will be applied.This parameter also accepts a named list of labelling functions. The resulting dataset will contain a label column for each of the supplied functions. The names of the label columns are specified using the names of the list of functions.
The functions provided in this parameter can accept any of the following parameters:
date,code,name,validFromandvalidTo, representing the corresponding values of each code in a group. The functions must also provide a...parameter, unless using all of the above. The functions can expect that the input variables have the same length of 1 or longer. The functions should return a character vector of length one or the same length as the input variables.The following list, when supplied to this parameter, creates two label columns: one containing the codes and names (
label1), and another with only the codeslabel2. In this example,label1creates the same labels as the default labelling used whenlabels == TRUE.- graph
Optional. Generating the graph using
klass_graphmanually beforehand and providing it in this parameter can save time if runningfind_equivalent_codesmultiple times in sequence.- date_format
Optional. Passed directly to format, this is used to specify the output format for the
datecolumn. The default keeps just the year ("YYYY"). To get the full date in"YYYY-MM-DD"format, use"%Y-%m-%d". See strptime for complete functionality.
Value
A data.frame with columns:
datecontaining the inputdatescodecontaining the set of equivalent codes in each datenamecontaining the names of each codevalidFromandvalidTovalues for each code returnedBy default,
labels, giving a unique group label for each group of equivalent sets.
Details
This function provides a solution to the problem of split or
combined codes in Klass classifications. When using update_klass to ask
"what is this code in this version of the classification in this other
version of the classification?", the answer is sometimes that the code has
been split into two or more codes (or combined from two or more codes, if
trying to back-date a code), and therefore that the code cannot be updated.
The solution provided by find_equivalent_codes is answering the question: "in
these versions of the classification, which codes were equivalent to this
code in this other version of the classification?".
Consider the following example of two codes combining into one. Here, "a"
and "b" are valid at t1, and are combined into "c" at t2.
update_klass would inform us that "a" can be updated to
"c" at t2, unless we specified combine = FALSE, in which case the
result would be NA. find_equivalent_codes() would inform us that the
equivalent of the codes "a" and "b" in t1 at t2 is "c".
We can also consider a code splitting into two. In this example, "a" is
valid at t1, and splits into "b" and "c" at t2.
update_klass is unable to provide an updated code due to the
split, and would return NA. find_equivalent_codes would inform us that the
equivalent codes of "a" at t1 is "b" and "c" at t2.
find_equivalent_codes can handle more than two dates. In the following
example, "a" splits into "b" and "c" at t2, and "b" and "c"
combine into "d" at t3. find_equivalent_codes can inform us that "a" is
equivalent to "b" and "c" at t2, and "d" at t3.
find_equivalent_codes will only search in the time range we specify. As a
consequence, generating sets of equivalent codes over longer time spans
will generally create larger sets than using shorter time spans.
To illustrate this behavior, we can add a new code "e" to the previous
example, and have "d" and "e" combine into "f" at t4.
Finding the
equivalents of "a" in t1 at t2 and t3 returns the same sets as
before:
t1:
"a"t2:
"b"and"c"t3:
"d"
However, if we also wanted to know the equivalent set for t4, the result would be:
t1:
"a"and"e"t2:
"b","c"and"e"t3:
"d"and"e"t4:
"f"
find_equivalents is a legacy alias for find_equivalent_codes.