summaryrefslogtreecommitdiffstats
path: root/extra/elasticsearch
Commit message (Collapse)AuthorAgeFilesLines
* commit example of an elasticsearch SQL queryBryan Newbold2020-07-011-0/+8
|
* ES schema: add best_url to file schemaBryan Newbold2020-06-041-0/+1
| | | | | | | | | This will increase index size (URLs are often long in our corpus, and we have many file entities), but seems worth it. Initially added `ia_url` as a second field, guaranteed to always be an *.archive.org URL, but `best_url` defaults to that anyways so didn't seem worthwhile.
* ES README: really need to limit to 1k esbulk batchesBryan Newbold2020-02-261-3/+3
|
* update ES transform READMEBryan Newbold2020-02-261-2/+3
| | | | | - smaller batch sizes to prevent esbulk errors - file transform/index
* ES container last tweaksBryan Newbold2020-02-261-3/+4
|
* ES release: last minor tweaksBryan Newbold2020-02-261-3/+5
|
* release schema: do doc_value on DOIsBryan Newbold2020-02-131-1/+1
| | | | | | Because DOIs are pseudo-structured (prefix, and often structure within the publisher-controlled area), I suspect we will in fact be wanting to do analytics over these strings.
* ES release: actually do want doc_values for work_idBryan Newbold2020-02-051-1/+1
| | | | Eg, for fast "unique count"
* fix axiv/arxiv typo in release schemaBryan Newbold2020-02-041-1/+1
|
* ES release schema: fix typoBryan Newbold2020-01-311-1/+1
|
* fix json typos in changelog schemaBryan Newbold2020-01-301-2/+2
|
* add upper-case work-around from kibana map joinBryan Newbold2020-01-301-0/+1
|
* JSON typo in release mappingBryan Newbold2020-01-301-1/+0
|
* ES schemas: make keywords case-insensitive by defaultBryan Newbold2020-01-304-66/+115
| | | | But not applying asciifolding; don't see any need to do so?
* tweak file ES archive.org domain trackingBryan Newbold2020-01-301-0/+1
|
* elastic schema fixesBryan Newbold2020-01-292-7/+7
|
* add country to v03b release schemaBryan Newbold2020-01-291-0/+1
|
* update ES docs and proposalBryan Newbold2020-01-291-0/+2
|
* actually implement changelog transformBryan Newbold2020-01-291-1/+10
|
* ES release schema updatesBryan Newbold2020-01-291-23/+46
|
* container ES schema changesBryan Newbold2020-01-291-13/+20
|
* first implementation of ES file schemaBryan Newbold2020-01-291-0/+46
| | | | | Includes a trivial test and transform, but not any workers or doc updates.
* elasticsearch index alias howtoBryan Newbold2019-06-061-1/+16
|
* add work-in-progress elastic index notesBryan Newbold2019-05-301-0/+11
|
* add 'superceded' release extra flag to elastic schemaBryan Newbold2019-05-231-0/+1
|
* also track work_id in release elasticsearch tableBryan Newbold2019-05-221-0/+1
|
* count linked refs (not just raw refs) in elasticsearchBryan Newbold2019-05-221-0/+1
|
* include creator_ids in release elastic schemaBryan Newbold2019-05-201-0/+1
| | | | Intent is to allow fast creator search/lookup
* elastic release schema updateBryan Newbold2019-05-201-1/+6
|
* faster elasticsearch importsBryan Newbold2019-04-301-1/+1
|
* fix wild elastic schema typoBryan Newbold2019-04-121-1/+1
|
* more integration of transform refactorBryan Newbold2019-03-111-2/+2
|
* elastic schema indentationBryan Newbold2019-03-061-6/+6
|
* include container_id in release ES schemaBryan Newbold2019-02-221-0/+1
|
* minor typo in esbulk container importBryan Newbold2019-01-281-1/+1
|
* more ES index name updatesBryan Newbold2019-01-281-2/+3
|
* transform and import fixes/tweaksBryan Newbold2019-01-253-8/+122
|
* tweak elastic schemas (again)Bryan Newbold2019-01-252-6/+4
|
* initial changelog and container ES schemasBryan Newbold2019-01-232-0/+113
|
* start changes to release ES schemaBryan Newbold2019-01-231-22/+39
|
* state in elasticsearch (and deleted/redirects)Bryan Newbold2019-01-181-0/+1
|
* remove redundant transform_release.py ES scriptBryan Newbold2018-12-242-88/+1
|
* implement release_year (and rustfmt)Bryan Newbold2018-12-241-0/+2
|
* updated docker for elastic (with plugin)Bryan Newbold2018-11-073-44/+10
| | | | Still need to install the maps (aka, schemas) manually.
* note elastic plugin neededBryan Newbold2018-11-072-0/+52
|
* for now, store is_longtail_oa in container_is_longtail_oaBryan Newbold2018-10-121-0/+2
|
* document need to LC_ALL=C.UTF-8 for ES importBryan Newbold2018-09-281-1/+2
|
* fix typo in elastic load scriptBryan Newbold2018-09-261-1/+1
|
* ignore more filesBryan Newbold2018-09-251-0/+1
|
* fix typos in es/transform scriptBryan Newbold2018-09-251-3/+3
|