aboutsummaryrefslogtreecommitdiffstats
path: root/notes/doaj_graph.md
blob: 449220bd264132bd475f82ecc8a973782ed9ba4f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# DOAJ Citation Graph

This dataset contains a subset of the edges of the Internet Archive (IA)
Scholar Citation Graph (v1, 2021-07-28, named: refcat) where either the citing
or the cited work (or both) are part of DOAJ.

Basic numbers:

* DOAJ DOI used for matching edges: 4,886,099
* Catalog entries via DOI in fatcat: 4,773,245
* We find 124,760,397 edges, of these; 98,616,033 have a source belonging to
  DOAJ; 34,910,769 have an article in DOAJ as target; intra-DOAJ: 8,766,405
* How do we find these edges? By id: 118,314,316;  via fuzzy matching:
  6,446,081 (5.17%)

The IA Scholar citation graph is documented in various places:

* https://blog.archive.org/2021/10/19/internet-archive-releases-refcat-the-ia-scholar-index-of-over-1-3-billion-scholarly-citations/
* https://guide.fatcat.wiki/reference_graph.html
* https://arxiv.org/abs/2110.06595