Skip to contents

Decisions are made by the decision tree for potential duplicates identified by dup_find_fuzzy_adj().

Decisions are added to the "decision" column in id_dup_pair. There could be 3 levels of decisions, "duplicate", "not duplicate", and "check". If the decision is "not duplicate", "match" column in df will be modified.

Usage

decision_tree_adj(df, id_dup_pair)

Arguments

df

A data frame (i.e., output #1 of dup_find_fuzzy_adj())

id_dup_pair

A data frame listing id of potential duplicate pairs (i.e., output #2 of dup_find_fuzzy_adj())

Value

Two data frames: (1) the input df with "match" column modified according to the decision tree; (2) the input id_dup_pair with "decision" column added.

Details

See manuscript //TODO for details of the decision tree.

Examples

if (FALSE) {
c(df, id_dup_pair) %<-% decision_tree_adj(df, id_dup_pair)
}