aboutsummaryrefslogtreecommitdiffstats
path: root/extra/elasticsearch
Commit message (Collapse)AuthorAgeFilesLines
* ES container schema: add 'sim_pubid' and `ia_sim_collection` fieldsBryan Newbold2021-12-031-0/+2
|
* codespell fixes to various other docsBryan Newbold2021-11-241-1/+1
|
* content_scope: include in file ES schema and transformBryan Newbold2021-11-171-0/+1
|
* elasticsearch schema changesBryan Newbold2021-10-132-3/+13
|
* fatcat_ref ES schema: more doc_values; source_year not source_release_yearBryan Newbold2021-06-171-5/+2
|
* elasticsearch ref schema: 6 shards, not 12Bryan Newbold2021-05-181-1/+1
|
* update elasticsearch bootstrap indexing notesBryan Newbold2021-04-091-8/+16
|
* ES: rename fatcat_ref.json to ref_schema.json for consistency; add to READMEBryan Newbold2021-04-082-1/+4
|
* release ES schema: fix typo with shard/replica configurationBryan Newbold2021-04-081-1/+1
|
* container search schema: preservation stats, new fieldsBryan Newbold2021-04-061-8/+9
| | | | Includes transform code updates and partial test coverage.
* release ES: add discipline fieldBryan Newbold2021-04-061-0/+1
|
* ES schemas: add doc_index_ts to all mappingsBryan Newbold2021-04-065-0/+9
|
* elasticsearch schema, docs, docker: update from ES 6.x to ES 7.xBryan Newbold2021-04-067-125/+24
| | | | | Including removing index document names (use '_doc' instead during transition)
* add es draft schema for referencesMartin Czygan2021-03-301-0/+106
|
* elasticsearch: simple new dblp and doaj fieldsBryan Newbold2021-01-201-0/+3
|
* commit example of an elasticsearch SQL queryBryan Newbold2020-07-011-0/+8
|
* ES schema: add best_url to file schemaBryan Newbold2020-06-041-0/+1
| | | | | | | | | This will increase index size (URLs are often long in our corpus, and we have many file entities), but seems worth it. Initially added `ia_url` as a second field, guaranteed to always be an *.archive.org URL, but `best_url` defaults to that anyways so didn't seem worthwhile.
* ES README: really need to limit to 1k esbulk batchesBryan Newbold2020-02-261-3/+3
|
* update ES transform READMEBryan Newbold2020-02-261-2/+3
| | | | | - smaller batch sizes to prevent esbulk errors - file transform/index
* ES container last tweaksBryan Newbold2020-02-261-3/+4
|
* ES release: last minor tweaksBryan Newbold2020-02-261-3/+5
|
* release schema: do doc_value on DOIsBryan Newbold2020-02-131-1/+1
| | | | | | Because DOIs are pseudo-structured (prefix, and often structure within the publisher-controlled area), I suspect we will in fact be wanting to do analytics over these strings.
* ES release: actually do want doc_values for work_idBryan Newbold2020-02-051-1/+1
| | | | Eg, for fast "unique count"
* fix axiv/arxiv typo in release schemaBryan Newbold2020-02-041-1/+1
|
* ES release schema: fix typoBryan Newbold2020-01-311-1/+1
|
* fix json typos in changelog schemaBryan Newbold2020-01-301-2/+2
|
* add upper-case work-around from kibana map joinBryan Newbold2020-01-301-0/+1
|
* JSON typo in release mappingBryan Newbold2020-01-301-1/+0
|
* ES schemas: make keywords case-insensitive by defaultBryan Newbold2020-01-304-66/+115
| | | | But not applying asciifolding; don't see any need to do so?
* tweak file ES archive.org domain trackingBryan Newbold2020-01-301-0/+1
|
* elastic schema fixesBryan Newbold2020-01-292-7/+7
|
* add country to v03b release schemaBryan Newbold2020-01-291-0/+1
|
* update ES docs and proposalBryan Newbold2020-01-291-0/+2
|
* actually implement changelog transformBryan Newbold2020-01-291-1/+10
|
* ES release schema updatesBryan Newbold2020-01-291-23/+46
|
* container ES schema changesBryan Newbold2020-01-291-13/+20
|
* first implementation of ES file schemaBryan Newbold2020-01-291-0/+46
| | | | | Includes a trivial test and transform, but not any workers or doc updates.
* elasticsearch index alias howtoBryan Newbold2019-06-061-1/+16
|
* add work-in-progress elastic index notesBryan Newbold2019-05-301-0/+11
|
* add 'superceded' release extra flag to elastic schemaBryan Newbold2019-05-231-0/+1
|
* also track work_id in release elasticsearch tableBryan Newbold2019-05-221-0/+1
|
* count linked refs (not just raw refs) in elasticsearchBryan Newbold2019-05-221-0/+1
|
* include creator_ids in release elastic schemaBryan Newbold2019-05-201-0/+1
| | | | Intent is to allow fast creator search/lookup
* elastic release schema updateBryan Newbold2019-05-201-1/+6
|
* faster elasticsearch importsBryan Newbold2019-04-301-1/+1
|
* fix wild elastic schema typoBryan Newbold2019-04-121-1/+1
|
* more integration of transform refactorBryan Newbold2019-03-111-2/+2
|
* elastic schema indentationBryan Newbold2019-03-061-6/+6
|
* include container_id in release ES schemaBryan Newbold2019-02-221-0/+1
|
* minor typo in esbulk container importBryan Newbold2019-01-281-1/+1
|