diff options
-rw-r--r-- | projects/.gitkeep (renamed from datasets/.gitkeep) | 0 | ||||
-rw-r--r-- | projects/README.md (renamed from datasets/README.md) | 0 | ||||
-rw-r--r-- | projects/fuzzycat.png (renamed from datasets/fuzzycat.png) | bin | 28757 -> 28757 bytes | |||
-rw-r--r-- | projects/oai_harvest_md/README.md | 10 |
4 files changed, 10 insertions, 0 deletions
diff --git a/datasets/.gitkeep b/projects/.gitkeep index e69de29..e69de29 100644 --- a/datasets/.gitkeep +++ b/projects/.gitkeep diff --git a/datasets/README.md b/projects/README.md index bfbbaef..bfbbaef 100644 --- a/datasets/README.md +++ b/projects/README.md diff --git a/datasets/fuzzycat.png b/projects/fuzzycat.png Binary files differindex 27f6ed4..27f6ed4 100644 --- a/datasets/fuzzycat.png +++ b/projects/fuzzycat.png diff --git a/projects/oai_harvest_md/README.md b/projects/oai_harvest_md/README.md new file mode 100644 index 0000000..bbaa915 --- /dev/null +++ b/projects/oai_harvest_md/README.md @@ -0,0 +1,10 @@ +# OAI metadata matching + +## Plan + +* [ ] get JSON version, via [oai_harvest_20200215](https://archive.org/details/oai_harvest_20200215) +* [ ] filter out out of scope data +* [ ] (a) for items that have a doi, figure out, whether we already have md for this doi via API +* [ ] (b) for items w/o doi, get a list of (id, title) +* [ ] run fuzzy matching over title list to find out which one we have + |