From 6681ee7d699fc481b3dc0e1e8f905395a0b42a3b Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Wed, 12 Sep 2018 15:32:27 -0700 Subject: TODO updates --- TODO | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/TODO b/TODO index 5c57a98..1f1c2b9 100644 --- a/TODO +++ b/TODO @@ -1,7 +1,18 @@ +- paper match heuristic: include 10.1007%2F978-3-319-49304-6_18 (URL-escaped slash) +- catch EOFFail fetching from wayback - "author counts match" in scoring - refactor "scorable" to "matchable" - look at refactoring to reduce JSON serializations +- QA tool for matches (PDF + Crossref JSON + landing page?) + => python; talks directly to HBase +- author counts should match (+/- one?) + +match strategies (hbase columns) +- legacy_doi +- url_doi +- grobid_crossref (doi) +- grobid_fatcat (fatcat ID) scalding: - better JSON library -- cgit v1.2.3