From cefdde667a4169bbde6b8cf2bde8eec8cb589c98 Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Mon, 29 Mar 2021 22:47:46 +0200 Subject: update README --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 7aefb35..a99ca6e 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ Context: [fatcat](https://fatcat.wiki), "Mellon Grant" (20/21). * [ ] Link PID or DOI to archived versions * [ ] URLs in corpus linked to best possible timestamp (GWB) -* [ ] Harvest all URLs in citation corpus +* [ ] Harvest all URLs in citation corpus (maybe do a sample first) * [ ] Links between records w/o DOI (fuzzy matching) * [ ] Publication of augmented citation graph, explore data mining, etc. * [ ] Interlinkage with other source, monographs, commercial publications, etc. @@ -31,7 +31,7 @@ $ refcat.pyz BiblioRefV2 * schema: [https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas](https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas) * matches via: doi, arxiv, pmid, pmcid, fuzzy title matches -* 785,569,011 edges (~103% of open citation/crossref), 39G compressed, ~260G uncompressed +* 785,569,011 edges (~103% of 12/2020 OCI/crossref release), ~39G compressed, ~288G uncompressed # Rough Notes -- cgit v1.2.3