aboutsummaryrefslogtreecommitdiffstats
path: root/proposals
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2021-11-01 20:05:16 -0700
committerBryan Newbold <bnewbold@archive.org>2021-11-04 17:19:52 -0700
commit4315b44a93ca31725b9b0a2a55c310725ac55efe (patch)
treeb8d7018b639651f3e241d098962866f41d2c6cd6 /proposals
parent2e9cd60819531ad73ce71f3a84109ad164624a40 (diff)
downloadsandcrawler-4315b44a93ca31725b9b0a2a55c310725ac55efe.tar.gz
sandcrawler-4315b44a93ca31725b9b0a2a55c310725ac55efe.zip
sql: grobid_refs table JSON as 'JSON' not 'JSONB'
I keep flip-flopping on this, but our disk usage is really large, and if 'JSON' is smaller than 'JSONB' in postgresql at all it is worth it.
Diffstat (limited to 'proposals')
-rw-r--r--proposals/2021-10-28_grobid_refs.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/proposals/2021-10-28_grobid_refs.md b/proposals/2021-10-28_grobid_refs.md
index ff835d4..3f87968 100644
--- a/proposals/2021-10-28_grobid_refs.md
+++ b/proposals/2021-10-28_grobid_refs.md
@@ -27,7 +27,7 @@ The overall output schema matches that of the `grobid_refs` SQL table:
source_id: string, eg '10.1145/3366650.3366668'
source_ts: optional timestamp (full ISO datetime with timezone (eg, `Z`
suffix), which identifies version of upstream metadata
- refs_json: JSONB, list of `GrobidBiblio` JSON objects
+ refs_json: JSON, list of `GrobidBiblio` JSON objects
References are re-processed on a per-article (or per-release) basis. All the
references for an article are handled as a batch and output as a batch. If
@@ -74,7 +74,7 @@ comparing, etc.
source_id TEXT NOT NULL CHECK (octet_length(source_id) >= 1),
source_ts TIMESTAMP WITH TIME ZONE,
updated TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL,
- refs_json JSONB NOT NULL,
+ refs_json JSON NOT NULL,
PRIMARY KEY(source, source_id)
);