aboutsummaryrefslogtreecommitdiffstats
path: root/proposals
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-02-13 15:28:48 -0800
committerBryan Newbold <bnewbold@archive.org>2020-02-13 15:28:48 -0800
commit3370f203c3652ace357eeb69bb8828d830b3596a (patch)
treee283ade7600932b84605b84c852da01c2cd2dbdb /proposals
parent4aec6410c2318972240ded2bce5f68706aae18df (diff)
downloadsandcrawler-3370f203c3652ace357eeb69bb8828d830b3596a.tar.gz
sandcrawler-3370f203c3652ace357eeb69bb8828d830b3596a.zip
move pdf_trio results back under key in JSON/Kafka
Diffstat (limited to 'proposals')
-rw-r--r--proposals/20200207_pdftrio.md33
1 files changed, 18 insertions, 15 deletions
diff --git a/proposals/20200207_pdftrio.md b/proposals/20200207_pdftrio.md
index 7ad5142..31a2db6 100644
--- a/proposals/20200207_pdftrio.md
+++ b/proposals/20200207_pdftrio.md
@@ -58,24 +58,27 @@ Basically just like GROBID client for now. Requests, JSON.
Output that goes in Kafka topic:
key (sha1hex)
- status
- status_code
- ensemble_score
- bert_score
- image_score
- linear_score
- versions
- pdftrio_version (string)
- models_date (string, ISO date)
- git_rev (string)
- bert_model (string)
- image_model (string)
- linear_model (string)
- timing (might be added?)
- ...
+ pdf_trio
+ status
+ status_code
+ ensemble_score
+ bert_score
+ image_score
+ linear_score
+ versions
+ pdftrio_version (string)
+ models_date (string, ISO date)
+ git_rev (string)
+ bert_model (string)
+ image_model (string)
+ linear_model (string)
+ timing (optional/future: as reported by API)
+ ...
file_meta
sha1hex
...
+ timing
+ ...
## SQL Schema