From c4d16fa9f7d27425de0bbe9e1a56ca0d3b3e297a Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Mon, 29 Mar 2021 20:54:24 +0200 Subject: update README --- README.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 046ce7a..7aefb35 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,18 @@ # cgraph -Scholarly citation graph related code; maintained by [martin@archive.org](mailto:martin@archive.org). +Scholarly citation graph related code; maintained by +[martin@archive.org](mailto:martin@archive.org); multiple subsproject to keep +all relevant code close: -* python: mostly luigi tasks -* skate: various Go tools +* python: mostly luigi tasks (using [shiv](https://github.com/linkedin/shiv) for single-file deployments) +* skate: various Go command line tools (wrapped in a deb packaged) -Context: [fatcat](https://fatcat.wiki), "Mellon Grant" (20/21) +Context: [fatcat](https://fatcat.wiki), "Mellon Grant" (20/21). # Grant related tasks +3/4 phases of the grant contain citation graph related tasks. + * [ ] Link PID or DOI to archived versions * [ ] URLs in corpus linked to best possible timestamp (GWB) * [ ] Harvest all URLs in citation corpus @@ -27,7 +31,7 @@ $ refcat.pyz BiblioRefV2 * schema: [https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas](https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas) * matches via: doi, arxiv, pmid, pmcid, fuzzy title matches -* 717,435,777 edges (94% of open citation/crossref), 37G compressed, ~260G uncompressed +* 785,569,011 edges (~103% of open citation/crossref), 39G compressed, ~260G uncompressed # Rough Notes -- cgit v1.2.3