Calculate string similarity between adjacent rows
simi_order_adj.Rd
The function calculates similarity based on Levenshtein edit distance for columns "title_norm"
and "abstract_norm"
between adjacent rows. Range of similarity is [0, 1]. Similarity == 1
means 100% identical while Similarity == 0
means completely different.
Arguments
- df
A data frame with bibliographic information that has gone through text normalization.
df
must have the following columnsc("title_norm", "abstract_norm")
.- order_by
Quoted name of the column by which to order the rows. Defaults to
"title_norm"
.
Value
Two data frames: (1) Ordered df
; (2) A data frame with string similarity results for "title_norm"
and "abstract_norm"
. Both data frames have a matched id
column.