diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2020-08-27 16:18:08 +0200 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2020-08-27 16:18:08 +0200 |
commit | ce6a2ee453d29d0521c1dc3672363ec8934d2f2a (patch) | |
tree | 6ad7ca90649dc99672cdcfce25a7450eda6eabd3 /projects/oai_harvest_md | |
parent | 190e60c95898e105444a398523c24b7656acd660 (diff) | |
download | fuzzycat-ce6a2ee453d29d0521c1dc3672363ec8934d2f2a.tar.gz fuzzycat-ce6a2ee453d29d0521c1dc3672363ec8934d2f2a.zip |
move datasets to projects
Diffstat (limited to 'projects/oai_harvest_md')
-rw-r--r-- | projects/oai_harvest_md/README.md | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/projects/oai_harvest_md/README.md b/projects/oai_harvest_md/README.md new file mode 100644 index 0000000..bbaa915 --- /dev/null +++ b/projects/oai_harvest_md/README.md @@ -0,0 +1,10 @@ +# OAI metadata matching + +## Plan + +* [ ] get JSON version, via [oai_harvest_20200215](https://archive.org/details/oai_harvest_20200215) +* [ ] filter out out of scope data +* [ ] (a) for items that have a doi, figure out, whether we already have md for this doi via API +* [ ] (b) for items w/o doi, get a list of (id, title) +* [ ] run fuzzy matching over title list to find out which one we have + |