diff options
Diffstat (limited to 'projects/oai_harvest_md/README.md')
-rw-r--r-- | projects/oai_harvest_md/README.md | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/projects/oai_harvest_md/README.md b/projects/oai_harvest_md/README.md index bbaa915..5f2b655 100644 --- a/projects/oai_harvest_md/README.md +++ b/projects/oai_harvest_md/README.md @@ -1,5 +1,7 @@ # OAI metadata matching +Goal: end-to-end data workflow (acquisition, harvest, matching, new release entities). + ## Plan * [ ] get JSON version, via [oai_harvest_20200215](https://archive.org/details/oai_harvest_20200215) @@ -8,3 +10,11 @@ * [ ] (b) for items w/o doi, get a list of (id, title) * [ ] run fuzzy matching over title list to find out which one we have +## Get data + +``` +$ make +``` + +* compressed 12G, around 100G uncompressed + |