aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-04-01 17:41:47 +0200
committerMartin Czygan <martin.czygan@gmail.com>2021-04-01 17:41:50 +0200
commit36656afe5fc4373aee7c1917c9a203b0998d9286 (patch)
treeb5281643141df1d428df7e4e5563a0f22f7bd491 /README.md
parent495adf9976d11a8abc49080abdb1fab7fcaeb85a (diff)
downloadrefcat-36656afe5fc4373aee7c1917c9a203b0998d9286.tar.gz
refcat-36656afe5fc4373aee7c1917c9a203b0998d9286.zip
update README
Diffstat (limited to 'README.md')
-rw-r--r--README.md19
1 files changed, 18 insertions, 1 deletions
diff --git a/README.md b/README.md
index e485661..ed23bbf 100644
--- a/README.md
+++ b/README.md
@@ -9,17 +9,34 @@ all relevant code close:
Context: [fatcat](https://fatcat.wiki), "Mellon Grant" (20/21).
+We use informal, internal versioning, currently v2, next will be v3.
+
# Grant related tasks
3/4 phases of the grant contain citation graph related tasks.
-* [ ] Link PID or DOI to archived versions
+* [x] Link PID or DOI to archived versions
+
+As of v2, we have linkage between fatcat release entities by doi, pmid, pmcid, arxiv.
+
* [ ] URLs in corpus linked to best possible timestamp (GWB)
* [ ] Harvest all URLs in citation corpus (maybe do a sample first)
+
+A seed-list (from refs; not from the full-text) is done; need to prepare a crawl and lookups in GWB.
+
* [ ] Links between records w/o DOI (fuzzy matching)
+
+As of v2, we do have a fuzzy matching procedure (yielding about 5-10% of the total results).
+
* [ ] Publication of augmented citation graph, explore data mining, etc.
* [ ] Interlinkage with other source, monographs, commercial publications, etc.
+
+As of v3, we have a minimal linkage with wikipedia.
+
* [ ] Wikipedia (en) references metadata or archived record
+
+This is ongoing and should be part of v3.
+
* [ ] Metadata records for often cited non-scholarly web publications
* [ ] Collaborations: I4OC, wikicite