Calculate string similarity between adjacent rows
simi_order_adj.RdThe function calculates similarity based on Levenshtein edit distance for columns "title_norm" and "abstract_norm" between adjacent rows. Range of similarity is [0, 1]. Similarity == 1 means 100% identical while Similarity == 0 means completely different.
Arguments
- df
A data frame with bibliographic information that has gone through text normalization.
dfmust have the following columnsc("title_norm", "abstract_norm").- order_by
Quoted name of the column by which to order the rows. Defaults to
"title_norm".
Value
Two data frames: (1) Ordered df; (2) A data frame with string similarity results for "title_norm" and "abstract_norm". Both data frames have a matched id column.