aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-03-29 22:47:46 +0200
committerMartin Czygan <martin.czygan@gmail.com>2021-03-29 22:47:46 +0200
commitcefdde667a4169bbde6b8cf2bde8eec8cb589c98 (patch)
treef6cc44b475c219142b69549250a416f0a7ecdf46
parentc4d16fa9f7d27425de0bbe9e1a56ca0d3b3e297a (diff)
downloadrefcat-cefdde667a4169bbde6b8cf2bde8eec8cb589c98.tar.gz
refcat-cefdde667a4169bbde6b8cf2bde8eec8cb589c98.zip
update README
-rw-r--r--README.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/README.md b/README.md
index 7aefb35..a99ca6e 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ Context: [fatcat](https://fatcat.wiki), "Mellon Grant" (20/21).
* [ ] Link PID or DOI to archived versions
* [ ] URLs in corpus linked to best possible timestamp (GWB)
-* [ ] Harvest all URLs in citation corpus
+* [ ] Harvest all URLs in citation corpus (maybe do a sample first)
* [ ] Links between records w/o DOI (fuzzy matching)
* [ ] Publication of augmented citation graph, explore data mining, etc.
* [ ] Interlinkage with other source, monographs, commercial publications, etc.
@@ -31,7 +31,7 @@ $ refcat.pyz BiblioRefV2
* schema: [https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas](https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas)
* matches via: doi, arxiv, pmid, pmcid, fuzzy title matches
-* 785,569,011 edges (~103% of open citation/crossref), 39G compressed, ~260G uncompressed
+* 785,569,011 edges (~103% of 12/2020 OCI/crossref release), ~39G compressed, ~288G uncompressed
# Rough Notes