# Technical Report: Refcat * 2021-08-08 To be uploaded to [Arxiv](https://arxiv.org/) soon. > As part of its scholarly data efforts, the Internet Archive releases a first version of a citation graph dataset, named refcat, derived from scholarly publications and additional data sources. It is composed of data gathered by the fatcat cataloging project , related web-scale crawls targeting primary and secondary scholarly outputs, as well as metadata from the Open Library project and Wikipedia . This first version of the graph consists of 1,323,423,672 citations. We release this dataset under a CC0 Public Domain Dedication, accessible through an archive item4 . All code used in the derivation process is released under an MIT license.