Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | ES container schema: add 'sim_pubid' and `ia_sim_collection` fields | Bryan Newbold | 2021-12-03 | 1 | -0/+2 |
| | |||||
* | codespell fixes to various other docs | Bryan Newbold | 2021-11-24 | 1 | -1/+1 |
| | |||||
* | content_scope: include in file ES schema and transform | Bryan Newbold | 2021-11-17 | 1 | -0/+1 |
| | |||||
* | elasticsearch schema changes | Bryan Newbold | 2021-10-13 | 2 | -3/+13 |
| | |||||
* | fatcat_ref ES schema: more doc_values; source_year not source_release_year | Bryan Newbold | 2021-06-17 | 1 | -5/+2 |
| | |||||
* | elasticsearch ref schema: 6 shards, not 12 | Bryan Newbold | 2021-05-18 | 1 | -1/+1 |
| | |||||
* | update elasticsearch bootstrap indexing notes | Bryan Newbold | 2021-04-09 | 1 | -8/+16 |
| | |||||
* | ES: rename fatcat_ref.json to ref_schema.json for consistency; add to README | Bryan Newbold | 2021-04-08 | 2 | -1/+4 |
| | |||||
* | release ES schema: fix typo with shard/replica configuration | Bryan Newbold | 2021-04-08 | 1 | -1/+1 |
| | |||||
* | container search schema: preservation stats, new fields | Bryan Newbold | 2021-04-06 | 1 | -8/+9 |
| | | | | Includes transform code updates and partial test coverage. | ||||
* | release ES: add discipline field | Bryan Newbold | 2021-04-06 | 1 | -0/+1 |
| | |||||
* | ES schemas: add doc_index_ts to all mappings | Bryan Newbold | 2021-04-06 | 5 | -0/+9 |
| | |||||
* | elasticsearch schema, docs, docker: update from ES 6.x to ES 7.x | Bryan Newbold | 2021-04-06 | 7 | -125/+24 |
| | | | | | Including removing index document names (use '_doc' instead during transition) | ||||
* | add es draft schema for references | Martin Czygan | 2021-03-30 | 1 | -0/+106 |
| | |||||
* | elasticsearch: simple new dblp and doaj fields | Bryan Newbold | 2021-01-20 | 1 | -0/+3 |
| | |||||
* | commit example of an elasticsearch SQL query | Bryan Newbold | 2020-07-01 | 1 | -0/+8 |
| | |||||
* | ES schema: add best_url to file schema | Bryan Newbold | 2020-06-04 | 1 | -0/+1 |
| | | | | | | | | | This will increase index size (URLs are often long in our corpus, and we have many file entities), but seems worth it. Initially added `ia_url` as a second field, guaranteed to always be an *.archive.org URL, but `best_url` defaults to that anyways so didn't seem worthwhile. | ||||
* | ES README: really need to limit to 1k esbulk batches | Bryan Newbold | 2020-02-26 | 1 | -3/+3 |
| | |||||
* | update ES transform README | Bryan Newbold | 2020-02-26 | 1 | -2/+3 |
| | | | | | - smaller batch sizes to prevent esbulk errors - file transform/index | ||||
* | ES container last tweaks | Bryan Newbold | 2020-02-26 | 1 | -3/+4 |
| | |||||
* | ES release: last minor tweaks | Bryan Newbold | 2020-02-26 | 1 | -3/+5 |
| | |||||
* | release schema: do doc_value on DOIs | Bryan Newbold | 2020-02-13 | 1 | -1/+1 |
| | | | | | | Because DOIs are pseudo-structured (prefix, and often structure within the publisher-controlled area), I suspect we will in fact be wanting to do analytics over these strings. | ||||
* | ES release: actually do want doc_values for work_id | Bryan Newbold | 2020-02-05 | 1 | -1/+1 |
| | | | | Eg, for fast "unique count" | ||||
* | fix axiv/arxiv typo in release schema | Bryan Newbold | 2020-02-04 | 1 | -1/+1 |
| | |||||
* | ES release schema: fix typo | Bryan Newbold | 2020-01-31 | 1 | -1/+1 |
| | |||||
* | fix json typos in changelog schema | Bryan Newbold | 2020-01-30 | 1 | -2/+2 |
| | |||||
* | add upper-case work-around from kibana map join | Bryan Newbold | 2020-01-30 | 1 | -0/+1 |
| | |||||
* | JSON typo in release mapping | Bryan Newbold | 2020-01-30 | 1 | -1/+0 |
| | |||||
* | ES schemas: make keywords case-insensitive by default | Bryan Newbold | 2020-01-30 | 4 | -66/+115 |
| | | | | But not applying asciifolding; don't see any need to do so? | ||||
* | tweak file ES archive.org domain tracking | Bryan Newbold | 2020-01-30 | 1 | -0/+1 |
| | |||||
* | elastic schema fixes | Bryan Newbold | 2020-01-29 | 2 | -7/+7 |
| | |||||
* | add country to v03b release schema | Bryan Newbold | 2020-01-29 | 1 | -0/+1 |
| | |||||
* | update ES docs and proposal | Bryan Newbold | 2020-01-29 | 1 | -0/+2 |
| | |||||
* | actually implement changelog transform | Bryan Newbold | 2020-01-29 | 1 | -1/+10 |
| | |||||
* | ES release schema updates | Bryan Newbold | 2020-01-29 | 1 | -23/+46 |
| | |||||
* | container ES schema changes | Bryan Newbold | 2020-01-29 | 1 | -13/+20 |
| | |||||
* | first implementation of ES file schema | Bryan Newbold | 2020-01-29 | 1 | -0/+46 |
| | | | | | Includes a trivial test and transform, but not any workers or doc updates. | ||||
* | elasticsearch index alias howto | Bryan Newbold | 2019-06-06 | 1 | -1/+16 |
| | |||||
* | add work-in-progress elastic index notes | Bryan Newbold | 2019-05-30 | 1 | -0/+11 |
| | |||||
* | add 'superceded' release extra flag to elastic schema | Bryan Newbold | 2019-05-23 | 1 | -0/+1 |
| | |||||
* | also track work_id in release elasticsearch table | Bryan Newbold | 2019-05-22 | 1 | -0/+1 |
| | |||||
* | count linked refs (not just raw refs) in elasticsearch | Bryan Newbold | 2019-05-22 | 1 | -0/+1 |
| | |||||
* | include creator_ids in release elastic schema | Bryan Newbold | 2019-05-20 | 1 | -0/+1 |
| | | | | Intent is to allow fast creator search/lookup | ||||
* | elastic release schema update | Bryan Newbold | 2019-05-20 | 1 | -1/+6 |
| | |||||
* | faster elasticsearch imports | Bryan Newbold | 2019-04-30 | 1 | -1/+1 |
| | |||||
* | fix wild elastic schema typo | Bryan Newbold | 2019-04-12 | 1 | -1/+1 |
| | |||||
* | more integration of transform refactor | Bryan Newbold | 2019-03-11 | 1 | -2/+2 |
| | |||||
* | elastic schema indentation | Bryan Newbold | 2019-03-06 | 1 | -6/+6 |
| | |||||
* | include container_id in release ES schema | Bryan Newbold | 2019-02-22 | 1 | -0/+1 |
| | |||||
* | minor typo in esbulk container import | Bryan Newbold | 2019-01-28 | 1 | -1/+1 |
| |