diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-02-12 20:33:31 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-02-12 20:33:34 -0800 |
commit | 4aec6410c2318972240ded2bce5f68706aae18df (patch) | |
tree | 1c723f7ff91205073031a5046ad33dc20da28d02 | |
parent | f269709baea5d6e95ab101eb8d030ecae9de7e77 (diff) | |
download | sandcrawler-4aec6410c2318972240ded2bce5f68706aae18df.tar.gz sandcrawler-4aec6410c2318972240ded2bce5f68706aae18df.zip |
pdftrio JSON object as top-level in Kafka results
To be same as GROBID results
-rw-r--r-- | proposals/20200207_pdftrio.md | 32 |
1 files changed, 16 insertions, 16 deletions
diff --git a/proposals/20200207_pdftrio.md b/proposals/20200207_pdftrio.md index 78d2d6c..7ad5142 100644 --- a/proposals/20200207_pdftrio.md +++ b/proposals/20200207_pdftrio.md @@ -57,22 +57,22 @@ Basically just like GROBID client for now. Requests, JSON. Output that goes in Kafka topic: - pdftrio - status - status_code - ensemble_score - bert_score - image_score - linear_score - versions - pdftrio_version (string) - models_date (string, ISO date) - git_rev (string) - bert_model (string) - image_model (string) - linear_model (string) - timing (might be added?) - ... + key (sha1hex) + status + status_code + ensemble_score + bert_score + image_score + linear_score + versions + pdftrio_version (string) + models_date (string, ISO date) + git_rev (string) + bert_model (string) + image_model (string) + linear_model (string) + timing (might be added?) + ... file_meta sha1hex ... |