Clean and normalize the entire data frame with bibliographic information
norm_df.RdNormally we have bibliographic information imported as a data frame rather than individual character vectors. This function norm_df wraps all (1) text normalization and (2) helper field extraction that are needed in the downstream reference deduplication.
Value
A normalized data frame. New columns containing normalized and extracted data are added to the original data frame. By default, expect 8 more columns compared with the original data frame.
Details
In detail, it normalizes
doi (simply convert to lowercase; add column
doi_norm)title (see sub-function
norm_title(); add columntitle_norm)author (see sub-function
norm_author(); add columnauthor_norm)journal (see sub-function
norm_journal(); add columnjournal_norm)year (simply convert data type from character to integer; do not add any column)
abstract (see sub-function
norm_abstract(); add columnabstract_norm)
Additionally, it extracts
first_author_last_nameandfirst_author_last_name_normjournal_initialism(apply functionextract_initialism()to columnjournal_norm).
Examples
data(bib_example_small)
df_new <- norm_df(bib_example_small)