Skip to contents

Normally we have bibliographic information imported as a data frame rather than individual character vectors. This function norm_df wraps all (1) text normalization and (2) helper field extraction that are needed in the downstream reference deduplication.

Usage

norm_df(df)

Arguments

df

A data frame with bibliographic information

Value

A normalized data frame. New columns containing normalized and extracted data are added to the original data frame. By default, expect 8 more columns compared with the original data frame.

Details

In detail, it normalizes

  • doi (simply convert to lowercase; add column doi_norm)

  • title (see sub-function norm_title(); add column title_norm)

  • author (see sub-function norm_author(); add column author_norm)

  • journal (see sub-function norm_journal(); add column journal_norm)

  • year (simply convert data type from character to integer; do not add any column)

  • abstract (see sub-function norm_abstract(); add column abstract_norm)

Additionally, it extracts

  • first_author_last_name and first_author_last_name_norm

  • journal_initialism (apply function extract_initialism() to column journal_norm).

Examples

data(bib_example_small)

df_new <- norm_df(bib_example_small)