diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-02-13 15:28:48 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-02-13 15:28:48 -0800 |
commit | 3370f203c3652ace357eeb69bb8828d830b3596a (patch) | |
tree | e283ade7600932b84605b84c852da01c2cd2dbdb /proposals | |
parent | 4aec6410c2318972240ded2bce5f68706aae18df (diff) | |
download | sandcrawler-3370f203c3652ace357eeb69bb8828d830b3596a.tar.gz sandcrawler-3370f203c3652ace357eeb69bb8828d830b3596a.zip |
move pdf_trio results back under key in JSON/Kafka
Diffstat (limited to 'proposals')
-rw-r--r-- | proposals/20200207_pdftrio.md | 33 |
1 files changed, 18 insertions, 15 deletions
diff --git a/proposals/20200207_pdftrio.md b/proposals/20200207_pdftrio.md index 7ad5142..31a2db6 100644 --- a/proposals/20200207_pdftrio.md +++ b/proposals/20200207_pdftrio.md @@ -58,24 +58,27 @@ Basically just like GROBID client for now. Requests, JSON. Output that goes in Kafka topic: key (sha1hex) - status - status_code - ensemble_score - bert_score - image_score - linear_score - versions - pdftrio_version (string) - models_date (string, ISO date) - git_rev (string) - bert_model (string) - image_model (string) - linear_model (string) - timing (might be added?) - ... + pdf_trio + status + status_code + ensemble_score + bert_score + image_score + linear_score + versions + pdftrio_version (string) + models_date (string, ISO date) + git_rev (string) + bert_model (string) + image_model (string) + linear_model (string) + timing (optional/future: as reported by API) + ... file_meta sha1hex ... + timing + ... ## SQL Schema |