diff options
Diffstat (limited to 'projects/grobid_refs/README.md')
-rw-r--r-- | projects/grobid_refs/README.md | 22 |
1 files changed, 0 insertions, 22 deletions
diff --git a/projects/grobid_refs/README.md b/projects/grobid_refs/README.md deleted file mode 100644 index 498e68b..0000000 --- a/projects/grobid_refs/README.md +++ /dev/null @@ -1,22 +0,0 @@ -# Grobid refs - -References extracted from [grobid](https://grobid.readthedocs.io). - -## TODO - -* For a given reference string in grobid, find a matching release in fatcat. - -## Approach - -Two general ways: - -* do queries against elasticsearch, which would max out at a few hundred queries/s -* offline compute a key (e.g. title, title ngram plus authors, etc.); then do comparisons - -## Misc - -Example grobid outputs: - -* [grobid.tei.xml](grobid.tei.xml), [pdf](http://dss.in.tum.de/files/brandt-research/me.pdf) -- here grobid does not extract many refs; GS looks ok -* [](), [pdf](https://ia803202.us.archive.org/21/items/jstor-1064270/1064270.pdf) - |