It is composed of data gathered by the \href{https://fatcat.wiki}{fatcat cataloging project} and related web-scale crawls targeting primary and secondary scholarly outputs. In addition, relations are worked out between scholarly publications, web pages and their archived copies, books from the Open Library project as well as Wikipedia articles. As of version "20210810", the graph consists of over X nodes and over Y edges. We release this dataset under a Z open license under the collection at \href{https://archive.org/details/citation\_graph}{https://archive.org/details/citation\_graph}, as well as all code used for derivation under an MIT license. \end{abstract} % keywords can be removed \keywords{Citation Graph Dataset \and Scholarly Communications \and Web Archiving} \section{Introduction} The Internet Archive releases a first version of a citation graph dataset derived from a raw corpus of about 2.5B references gathered from metadata and from data obtained by PDF extraction tools such as GROBID\citep{lopez2009grobid}. The goal of this report is to describe briefly the current contents and the derivation of the Internet Archive Scholar Citation Graph Dataset (IASCG). We expect this dataset to be iterated upon, with changes both in content and processing. Modern citation indexes can be traced back to the early computing age, when projects like the Science Citation Index (1955)\citep{garfield2007evolution} were first devised, living on in existing commercial knowledge bases today. Open alternatives were started such as the Open Citations Corpus (OCC) in 2010 - the first version of which contained 6,325,178 individual references\citep{shotton2013publishing}. 