Clean and normalize the entire data frame with bibliographic information
norm_df.Rd
Normally we have bibliographic information imported as a data frame rather than individual character vectors. This function norm_df
wraps all (1) text normalization and (2) helper field extraction that are needed in the downstream reference deduplication.
Value
A normalized data frame. New columns containing normalized and extracted data are added to the original data frame. By default, expect 8 more columns compared with the original data frame.
Details
In detail, it normalizes
doi (simply convert to lowercase; add column
doi_norm
)title (see sub-function
norm_title()
; add columntitle_norm
)author (see sub-function
norm_author()
; add columnauthor_norm
)journal (see sub-function
norm_journal()
; add columnjournal_norm
)year (simply convert data type from character to integer; do not add any column)
abstract (see sub-function
norm_abstract()
; add columnabstract_norm
)
Additionally, it extracts
first_author_last_name
andfirst_author_last_name_norm
journal_initialism
(apply functionextract_initialism()
to columnjournal_norm
).
Examples
data(bib_example_small)
df_new <- norm_df(bib_example_small)