aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--projects/.gitkeep (renamed from datasets/.gitkeep)0
-rw-r--r--projects/README.md (renamed from datasets/README.md)0
-rw-r--r--projects/fuzzycat.png (renamed from datasets/fuzzycat.png)bin28757 -> 28757 bytes
-rw-r--r--projects/oai_harvest_md/README.md10
4 files changed, 10 insertions, 0 deletions
diff --git a/datasets/.gitkeep b/projects/.gitkeep
index e69de29..e69de29 100644
--- a/datasets/.gitkeep
+++ b/projects/.gitkeep
diff --git a/datasets/README.md b/projects/README.md
index bfbbaef..bfbbaef 100644
--- a/datasets/README.md
+++ b/projects/README.md
diff --git a/datasets/fuzzycat.png b/projects/fuzzycat.png
index 27f6ed4..27f6ed4 100644
--- a/datasets/fuzzycat.png
+++ b/projects/fuzzycat.png
Binary files differ
diff --git a/projects/oai_harvest_md/README.md b/projects/oai_harvest_md/README.md
new file mode 100644
index 0000000..bbaa915
--- /dev/null
+++ b/projects/oai_harvest_md/README.md
@@ -0,0 +1,10 @@
+# OAI metadata matching
+
+## Plan
+
+* [ ] get JSON version, via [oai_harvest_20200215](https://archive.org/details/oai_harvest_20200215)
+* [ ] filter out out of scope data
+* [ ] (a) for items that have a doi, figure out, whether we already have md for this doi via API
+* [ ] (b) for items w/o doi, get a list of (id, title)
+* [ ] run fuzzy matching over title list to find out which one we have
+