diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2020-08-25 19:17:56 +0200 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2020-08-25 19:17:56 +0200 |
commit | ff20a5a9ef621364b45625d0c42ee42fda5bff52 (patch) | |
tree | f7779cfb75d1dc397ad334c14614aecd6c02bf21 | |
parent | 7c5d6a600b4fb620881cd5c32b5947462d9cf6b3 (diff) | |
download | fuzzycat-ff20a5a9ef621364b45625d0c42ee42fda5bff52.tar.gz fuzzycat-ff20a5a9ef621364b45625d0c42ee42fda5bff52.zip |
start datasets section
Datasets to run fuzzy matching over, including a way to download all
inputs, run with various parameters, etc.
-rw-r--r-- | datasets/.gitkeep | 0 | ||||
-rw-r--r-- | datasets/README.md | 16 |
2 files changed, 16 insertions, 0 deletions
diff --git a/datasets/.gitkeep b/datasets/.gitkeep new file mode 100644 index 0000000..e69de29 --- /dev/null +++ b/datasets/.gitkeep diff --git a/datasets/README.md b/datasets/README.md new file mode 100644 index 0000000..cb0f24e --- /dev/null +++ b/datasets/README.md @@ -0,0 +1,16 @@ +# Datasets + +These are example datasets to run fuzzy matching over. The data is too large to +be committed in the repository, but the example inputs are kept in an archive +item. + +## Grobid References (grobid_refs) + +## Title list (titlelist) + +## Name only containers (name_only_containers) + +## OAI harvest metadata + +* [https://archive.org/details/oai_harvest_20200215](https://archive.org/details/oai_harvest_20200215) +* [oai.ndjson.zst](https://archive.org/download/oai_harvest_20200215/oai.ndjson.zst) |