aboutsummaryrefslogtreecommitdiffstats
path: root/projects/oai_harvest_md/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'projects/oai_harvest_md/README.md')
-rw-r--r--projects/oai_harvest_md/README.md10
1 files changed, 10 insertions, 0 deletions
diff --git a/projects/oai_harvest_md/README.md b/projects/oai_harvest_md/README.md
index bbaa915..5f2b655 100644
--- a/projects/oai_harvest_md/README.md
+++ b/projects/oai_harvest_md/README.md
@@ -1,5 +1,7 @@
# OAI metadata matching
+Goal: end-to-end data workflow (acquisition, harvest, matching, new release entities).
+
## Plan
* [ ] get JSON version, via [oai_harvest_20200215](https://archive.org/details/oai_harvest_20200215)
@@ -8,3 +10,11 @@
* [ ] (b) for items w/o doi, get a list of (id, title)
* [ ] run fuzzy matching over title list to find out which one we have
+## Get data
+
+```
+$ make
+```
+
+* compressed 12G, around 100G uncompressed
+