Skip to contents

RefDeduR is an R package that supports accurate and high-throughput reference deduplication. It is especially useful for large datasets and operates on standard bibliographic information (i.e., it does not require information that cannot be retrieved from a mainstream search engine such as PMID).

The deduplication pipeline is modularized into finely-tuned text normalization, three-step exact matching, and two-step fuzzy matching processes. The package features a decision-tree algorithm and considers preprints and conference proceedings when they co-exist with a peer-reviewed version.

Author

Jiaxian Shen

Department of Civil and Environmental Engineering, Northwestern University

Installation

You can install RefDeduR from GitHub with:

# install.packages("devtools")
devtools::install_github("jxshen311/RefDeduR")

Tutorial, website and publication

Citation

If you use RefDeduR, please cite: https://www.biorxiv.org/content/10.1101/2022.09.29.510210v1

Acknowledgement

We thank Yutong Wu for the illuminating discussions about the design of RefDeduR. We are also grateful to Ruochen Jiao and Alexander G. McFarland for their help in coding.

We thank Ahmad Roaayala, Eko Purnomo, and Vectors Point from Noun Project for allowing us to use the following icons Research Paper, Report Paper, report, and Stats Report to create the logo.