aboutsummaryrefslogtreecommitdiffstats
path: root/extra/elasticsearch
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2020-06-04 14:01:34 -0700
committerBryan Newbold <bnewbold@robocracy.org>2020-06-04 14:12:30 -0700
commita42d5f0d00e76bf8474647fae4e1d9d61693a7d9 (patch)
treef2556c2e40212da192517d0abd7c4f9e47e82cbb /extra/elasticsearch
parent71e5662365892d32a5f92e2733b7ae804c833f57 (diff)
downloadfatcat-a42d5f0d00e76bf8474647fae4e1d9d61693a7d9.tar.gz
fatcat-a42d5f0d00e76bf8474647fae4e1d9d61693a7d9.zip
ES schema: add best_url to file schema
This will increase index size (URLs are often long in our corpus, and we have many file entities), but seems worth it. Initially added `ia_url` as a second field, guaranteed to always be an *.archive.org URL, but `best_url` defaults to that anyways so didn't seem worthwhile.
Diffstat (limited to 'extra/elasticsearch')
-rw-r--r--extra/elasticsearch/file_schema.json1
1 files changed, 1 insertions, 0 deletions
diff --git a/extra/elasticsearch/file_schema.json b/extra/elasticsearch/file_schema.json
index 9c8ee64c..0fa25c3a 100644
--- a/extra/elasticsearch/file_schema.json
+++ b/extra/elasticsearch/file_schema.json
@@ -44,6 +44,7 @@
"rels": { "type": "keyword", "normalizer": "default" },
"in_ia": { "type": "boolean" },
"in_ia_petabox": { "type": "boolean" },
+ "best_url": { "type": "keyword", "normalizer": "default" },
"release_id": { "type": "alias", "path": "release_ids" },
"sha1hex": { "type": "alias", "path": "sha1" },