summaryrefslogtreecommitdiffstats
path: root/extra/elasticsearch/file_schema.json
Commit message (Collapse)AuthorAgeFilesLines
* content_scope: include in file ES schema and transformBryan Newbold2021-11-171-0/+1
|
* ES schemas: add doc_index_ts to all mappingsBryan Newbold2021-04-061-0/+1
|
* elasticsearch schema, docs, docker: update from ES 6.x to ES 7.xBryan Newbold2021-04-061-1/+3
| | | | | Including removing index document names (use '_doc' instead during transition)
* ES schema: add best_url to file schemaBryan Newbold2020-06-041-0/+1
| | | | | | | | | This will increase index size (URLs are often long in our corpus, and we have many file entities), but seems worth it. Initially added `ia_url` as a second field, guaranteed to always be an *.archive.org URL, but `best_url` defaults to that anyways so didn't seem worthwhile.
* ES schemas: make keywords case-insensitive by defaultBryan Newbold2020-01-301-11/+23
| | | | But not applying asciifolding; don't see any need to do so?
* tweak file ES archive.org domain trackingBryan Newbold2020-01-301-0/+1
|
* elastic schema fixesBryan Newbold2020-01-291-6/+6
|
* first implementation of ES file schemaBryan Newbold2020-01-291-0/+46
Includes a trivial test and transform, but not any workers or doc updates.