aboutsummaryrefslogtreecommitdiffstats
path: root/proposals/20200207_pdftrio.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-02-12 20:33:31 -0800
committerBryan Newbold <bnewbold@archive.org>2020-02-12 20:33:34 -0800
commit4aec6410c2318972240ded2bce5f68706aae18df (patch)
tree1c723f7ff91205073031a5046ad33dc20da28d02 /proposals/20200207_pdftrio.md
parentf269709baea5d6e95ab101eb8d030ecae9de7e77 (diff)
downloadsandcrawler-4aec6410c2318972240ded2bce5f68706aae18df.tar.gz
sandcrawler-4aec6410c2318972240ded2bce5f68706aae18df.zip
pdftrio JSON object as top-level in Kafka results
To be same as GROBID results
Diffstat (limited to 'proposals/20200207_pdftrio.md')
-rw-r--r--proposals/20200207_pdftrio.md32
1 files changed, 16 insertions, 16 deletions
diff --git a/proposals/20200207_pdftrio.md b/proposals/20200207_pdftrio.md
index 78d2d6c..7ad5142 100644
--- a/proposals/20200207_pdftrio.md
+++ b/proposals/20200207_pdftrio.md
@@ -57,22 +57,22 @@ Basically just like GROBID client for now. Requests, JSON.
Output that goes in Kafka topic:
- pdftrio
- status
- status_code
- ensemble_score
- bert_score
- image_score
- linear_score
- versions
- pdftrio_version (string)
- models_date (string, ISO date)
- git_rev (string)
- bert_model (string)
- image_model (string)
- linear_model (string)
- timing (might be added?)
- ...
+ key (sha1hex)
+ status
+ status_code
+ ensemble_score
+ bert_score
+ image_score
+ linear_score
+ versions
+ pdftrio_version (string)
+ models_date (string, ISO date)
+ git_rev (string)
+ bert_model (string)
+ image_model (string)
+ linear_model (string)
+ timing (might be added?)
+ ...
file_meta
sha1hex
...