diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2021-10-26 12:08:22 +0200 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2021-10-26 12:08:22 +0200 |
commit | 5ca24bf4bbf6b131a77e9dcaa69ce51965d0e51b (patch) | |
tree | 6121ba78a35b4d026a9a4e43ca77cbe82029193b /notes | |
parent | a394a7db6fa890dfe484ba53dbaa248076816a1d (diff) | |
download | refcat-5ca24bf4bbf6b131a77e9dcaa69ce51965d0e51b.tar.gz refcat-5ca24bf4bbf6b131a77e9dcaa69ce51965d0e51b.zip |
doaj: graph notes
Diffstat (limited to 'notes')
-rw-r--r-- | notes/doaj_graph.md | 20 |
1 files changed, 20 insertions, 0 deletions
diff --git a/notes/doaj_graph.md b/notes/doaj_graph.md new file mode 100644 index 0000000..449220b --- /dev/null +++ b/notes/doaj_graph.md @@ -0,0 +1,20 @@ +# DOAJ Citation Graph + +This dataset contains a subset of the edges of the Internet Archive (IA) +Scholar Citation Graph (v1, 2021-07-28, named: refcat) where either the citing +or the cited work (or both) are part of DOAJ. + +Basic numbers: + +* DOAJ DOI used for matching edges: 4,886,099 +* Catalog entries via DOI in fatcat: 4,773,245 +* We find 124,760,397 edges, of these; 98,616,033 have a source belonging to + DOAJ; 34,910,769 have an article in DOAJ as target; intra-DOAJ: 8,766,405 +* How do we find these edges? By id: 118,314,316; via fuzzy matching: + 6,446,081 (5.17%) + +The IA Scholar citation graph is documented in various places: + +* https://blog.archive.org/2021/10/19/internet-archive-releases-refcat-the-ia-scholar-index-of-over-1-3-billion-scholarly-citations/ +* https://guide.fatcat.wiki/reference_graph.html +* https://arxiv.org/abs/2110.06595 |